1. Introduction
Smart cities are increasingly positioned to play a crucial role in addressing future challenges, particularly in the domain of transportation management. Mobility is fundamental to human activities and enables access to the workplace, healthcare facilities, educational institutions, and public services. Urban transportation systems face unprecedented challenges due to rapid population growth and increased vehicle density. In cities characterized by high urbanization and reliance on automobiles, issues such as traffic congestion, air pollution, noise pollution, and other adverse effects related to the movement of individuals and goods are prevalent. Traffic congestion has emerged as a significant issue that adversely affects economic productivity, environmental sustainability, and the quality of life in metropolitan areas. The International Transport Forum [
1] endeavors to quantify the “total costs” associated with traffic congestion, which are estimated to account for approximately 2% of GDP in various countries [
2]. There is substantial evidence of economic and environmental costs associated with traffic congestion. For instance, in Greater Los Angeles, individuals spend an average of 70 h annually in traffic, resulting in the consumption of over 200 liters of fuel [
3]. The rate of urbanization has accelerated in recent decades, with projections indicating that by 2050 more than half of the global population will reside in metropolitan areas [
3]. In numerous countries, private vehicles have emerged as the predominant mode of transportation, contributing to increasing congestion in many urban centers [
3].
The integration of information and communication technologies (ICTs) into intelligent transportation systems (ITSs) has been extensively explored, offering promising avenues for alleviating congestion through advanced forecasting capabilities. In particular, machine learning techniques have demonstrated substantial potential for analyzing complex traffic patterns and predicting congestion scenarios with greater accuracy than traditional statistical methods. These algorithms exhibit superior flexibility and adaptability compared to conventional algorithms, rendering them suitable for the analysis of large-scale data [
4,
5]. Consequently, machine learning algorithms efficiently process extensive datasets to generate precise predictions through iterative training, thereby enhancing their accuracy and adaptability. Consequently, machine learning forecasters have the potential to surpass and replace traditional forecasting methods. The most commonly employed algorithms for forecasting include supervised and unsupervised learning models [
6,
7].
In [
8], the authors reviewed real-time traffic congestion predictions using various machine learning models. Similarly, [
9] investigated a range of machine learning algorithms to optimize multiple aspects of traffic management systems, including signal management, flow prediction, congestion detection and management, and automatic signal detection. Furthermore, [
10] provided a comprehensive overview of traffic predictions based on Artificial Intelligence (AI). Medina-Salgado et al. [
11] emphasized the existing computational techniques for urban traffic flow predictions. Anirudh et al. [
12] surveyed recent studies on deep learning for traffic flow prediction. Weiwei et al. [
13] presented an overview of existing graph neural networks employed to address various traffic forecasting challenges, such as road traffic flow and speed forecasting, passenger flow forecasting in urban rail transit systems, and demand forecasting in transportation-sharing platforms. In the studies by [
13,
14], it was determined that Long Short-Term Memory, Multilayer Perceptron, and Convolutional Neural Network techniques are particularly effective in forecasting and categorizing road traffic, especially when the traffic flow variable is utilized.
Despite the expanding body of research on machine learning methodologies for traffic congestion prediction models, there remains a lack of systematic reviews that integrate these prediction models from the perspective of traffic congestion or data processing expertise and evaluate their effectiveness through systematic reviews of the intelligent transportation systems. This paper introduces a systematic review of recent studies that employed machine learning prediction models for traffic congestion.
The remainder of this paper is organized as follows:
Section 2 describes the research methods used in this study.
Section 3 reports the results and interpretations of the systematic review.
Section 4 presents a discussion, the implications, and the future directions for researchers and practitioners.
3. Results Analysis and Interpretations
This section systematically analyses 115 peer-reviewed studies conducted between 2010 and 2024, addressing seven research questions (
Table 1) through a structured progression from technical foundations to strategic implications. The analysis commences with an examination of the machine learning paradigm (
Section 3.1, RQ7), progresses through the patterns of temporal evolution (
Section 3.2, RQ1 + RQ7), an assessment of the publication landscape (
Section 3.3, RQ1), examines research contexts and methodologies (
Section 3.4, RQ2 + RQ3), investigates traffic characteristics and parameter utilization (
Section 3.5, RQ5 + RQ6), analyzes infrastructure implementation frameworks (
Section 3.6, RQ4 + RQ6), and culminates in a comprehensive synthesis (
Section 3.7) that integrates findings across all research questions to establish empirical foundations for strategic research prioritization and practical implementation guidance.
3.1. Machine Learning Paradigms and Technical Implementation Landscape (RQ7)
This section addresses Research Question 7, which pertains to machine learning models and approaches utilized in traffic congestion forecasting.
3.1.1. Fundamental Distribution Patterns in Machine Learning Approaches
An analysis of 115 studies revealed significant methodological preferences that characterize the current state of research in this domain.
Figure 2 illustrates the predominant use of supervised learning approaches, which accounted for 57% of the selected studies (
n = 66), thereby establishing this paradigm as the fundamental methodology within the research community. This predominance reflects the research community’s strong preference for labeled training data approaches, where historical traffic patterns inform predictive model development, indicating maturity in the data collection infrastructure and annotation capabilities.
Unsupervised learning methodologies accounted for 21% (n = 24) of the corpus and were predominantly used for pattern discovery and clustering applications in traffic flow analysis. The significant representation of unsupervised approaches indicates a growing recognition of the importance of exploratory data analysis and the identification of hidden patterns in complex traffic dynamics. Semi-supervised learning approaches constituted 14% (n = 16) of the investigations, reflecting an emerging appreciation for hybrid methodologies that leverage both labeled and unlabeled data sources to address the data annotation limitations commonly encountered in transportation domains.
Reinforcement learning constitutes a mere 8% (n = 9) of the research endeavors, indicating significant underutilization despite its theoretical appropriateness for dynamic traffic optimization scenarios. This limited adoption highlights considerable opportunities for the exploration of advanced learning paradigms, particularly given learning’s potential for adaptive decision making in the dynamic environment characteristics of traffic systems.
3.1.2. Task-Oriented Analysis and Problem Formulation Patterns
The distribution of machine learning tasks, as depicted in
Figure 3, offers essential insights into the preferences for problem formulation within the research community. Classification problems dominated the research landscape, accounting for 42% of the studies (
n = 48), reflecting the community’s primary focus on determining categorical traffic states, such as identifying congested versus free-flow conditions. This emphasis on discriminative tasks suggests a practical orientation towards meeting operational decision-making requirements in traffic management systems.
Regression tasks constitute 21% (n = 24) of the implementations and are predominantly utilized for the prediction of continuous variables such as travel time or traffic speed forecasting. The significant representation of regression tasks indicates a balanced focus on both categorical and continuous prediction requirements within traffic-management applications. Prediction methodologies account for 20% (n = 23) of the studies, encompassing general forecasting applications without specific mathematical formulation constraints, thereby demonstrating methodological flexibility in addressing diverse forecasting requirements.
Clustering methodologies accounted for 14% (n = 16) of the research initiatives and were predominantly employed for the identification of traffic patterns and behavioral segmentation. This moderate representation of clustering underscores the significance of unsupervised pattern discovery for comprehending complex traffic dynamics. Conversely, dimensionality reduction techniques were minimally adopted, constituting only 3% (n = 3) of the effort. This limited focus on feature space optimization, despite the inherently high-dimensional nature of traffic data, suggests potential opportunities for the advancement of feature-engineering strategies.
3.1.3. Technical Implementation and Algorithm Adoption Analysis
Figure 4 illustrates the comprehensive distribution of specific machine learning techniques, highlighting the significant technological preferences within the research community. Deep Neural Networks exhibited a predominant presence with a 47% adoption frequency (
n = 54), signifying the strong inclination of the research community towards deep learning architectures for complex pattern recognition in traffic data. This predominance reflects a paradigm shift towards advanced neural architectures capable of capturing the nonlinear relationships and temporal dependencies inherent in traffic dynamics.
Graph Convolutional Networks constitute 10% (n = 12) of the implementations, underscoring the increasing recognition of the importance of spatial relationship modeling within transportation networks. The adoption of this emerging technique signifies a methodological advancement in addressing the network topology considerations that are essential for accurate traffic flow prediction. Support Vector Machines (SVMs) accounted for 6% (n = 7) of the studies, maintaining their relevance despite the rise of deep learning. This suggests their continued utility in specific application contexts that require interpretability or are characterized by limited data scenarios.
Traditional methodologies, such as the K-Nearest Neighbor (K-NN) and ARIMA models, each account for 5% (n = 6) of the implementations, indicating their status as established techniques with ongoing, albeit diminishing, usage. The remaining 27% (n = 31) comprised a variety of alternative methods, including ensemble techniques, fuzzy systems, and hybrid approaches, reflecting the methodological diversity for comparative analysis and specialized applications.
Following the foundational landscape of machine learning paradigms, this analysis investigates the evolution of technical preferences over a 14-year period, elucidating the patterns of innovation adoption and technological transformation.
3.2. Temporal Evolution and Innovation Adoption Patterns (RQ1 + RQ7)
The analysis of temporal evolution addresses both Research Question 1 (publication years) and Research Question 7 (evolution of machine learning techniques), thereby revealing interconnected patterns between chronological development and technological adoption.
3.2.1. Longitudinal Analysis of Methodological Transformation
The 14-year temporal evolution analysis identified distinct phases in the adoption of machine learning and methodological transformation within the field of traffic congestion forecasting research.
Figure 5 illustrates the progression of machine learning tasks, highlighting a substantial increase in classification studies, which expanded from minimal representation in 2010 to peak activity during 2021–2022, with 11–14 studies conducted annually. This exponential growth pattern signifies an increasing emphasis on categorical traffic state determination and applications in operational decision making.
Prediction tasks have demonstrated parallel growth trajectories, with marked acceleration observed post-2016, culminating in 10 studies by 2022. This trend reflects an increased interest in temporal forecasting applications, coinciding with the maturation of deep learning techniques. Similarly, regression tasks have shown consistent development since 2013, reaching a peak of nine studies by 2022, indicating sustained demand for continuous variable prediction capabilities. Temporal progression illustrates a shift in research from simple statistical approaches to sophisticated predictive modeling, with significant acceleration during 2016–2022, corresponding to advancements in computational infrastructure and algorithmic developments.
3.2.2. Paradigm Adoption and Technology Diffusion Analysis
Figure 6 illustrates the progression of machine learning paradigms, highlighting the exponential increase in the adoption of supervised learning from a single study in 2010 to 25 studies by 2022. This trend underscores the growing availability of labeled traffic datasets and the preference for predictive accuracy achieved through pattern-learning methodologies. The predominance of supervised learning signifies the advancement of the research community in data annotation capabilities and performance evaluation techniques.
Since 2013, there has been a consistent increase in the application of unsupervised learning, culminating in twelve studies by 2022. This trend reflects the growing recognition of its utility for pattern discovery and clustering in traffic analysis. Similarly, semi-supervised learning has seen a gradual uptake since 2012, with a marked acceleration after 2019, reaching ten studies by 2022. This suggests an increasing appreciation for hybrid methodologies that effectively address the limitations of labeled data prevalent in transportation domains.
Throughout the study period, reinforcement learning consistently maintained minimal representation, with no more than three studies conducted annually. This trend indicates persistent underutilization, despite its theoretical applicability to dynamic traffic optimization scenarios. The limited adoption of this advanced learning paradigm suggests significant opportunities for integration into future research.
3.2.3. Technical Innovation Cycles and Algorithm Evolution
As depicted in
Figure 7, the analysis of technical evolution documented significant paradigm shifts in the algorithm preferences over the 14-year study period. Deep Neural Networks (DNNs) have exhibited remarkable growth, transitioning from no implementation in 2010 to widespread adoption by 2022, as evidenced by 15 studies. This growth trajectory notably accelerated beginning in 2017, a period that coincided with advancements in computational hardware and improvements in the accessibility of deep learning frameworks. This adoption pattern exemplifies the rapid diffusion of technology within the research community.
Since 2016, Graph Convolutional Networks have gained prominence, culminating in peak adoption in 2022, as evidenced by eight studies. This trend underscores the increasing recognition of the importance of modeling spatial relationships within transportation networks. The pattern of their emergence signifies the development of methodological sophistication and the adoption of specialized architectures tailored to meet domain-specific requirements.
The utilization of traditional techniques has resulted in a marked decline. Specifically, the use of Support Vector Machines has decreased from a notable presence in 2015, with three studies, to minimal adoption in 2024, with only one study. Similarly, the ARIMA models were completely abandoned after 2020 following the initial adoption period from 2010 to 2016. These trends reflect a shift within the field from traditional statistical methods to more advanced machine learning techniques, highlighting the rapid cycles of technological adoption and responsiveness of the research community to algorithmic innovations.
3.3. Publication Landscape and Research Quality Assessment (RQ1)
To address Research Question 1 concerning publication sources and temporal trends, we initially examined the trends in research output and patterns of academic dissemination, followed by an analysis of academic quality and scholarly impact.
3.3.1. Research Output Trends and Academic Dissemination Patterns
A comprehensive analysis of publications (
Figure 8) reveals an exponential increase in research output over the 14-year study period. The total number of publications has increased from a single study in 2010 to a peak of 18 studies in 2022. This growth trajectory indicates a significant advancement in research interest and the maturation of the field, reflecting both technological progress and increasing demand for practical applications in traffic management systems.
Throughout the study period, journal publications consistently demonstrated dominance, particularly in recent years (2020–2024), indicating a preference within the research community for peer-reviewed high-impact dissemination channels. Conference publications have maintained a steady contribution, with notable peaks observed between 2019 and 2021, reflecting a balance between the presentation of preliminary findings and the documentation of comprehensive research. The overall trajectory of publications illustrates the successful establishment of the field, which is characterized by a robust academic foundation and sustained research momentum.
The temporal distribution identifies three distinct evolutionary phases: the emergence period (2010–2015), characterized by minimal activity indicative of the nascent development of the field; the growth phase (2016–2019), marked by a steady increase in activity corresponding to advancements in deep learning; and the maturation period (2020–2024), distinguished by sustained high output, signifying an established research community and methodological sophistication.
3.3.2. Academic Quality and Scholarly Impact Analysis
Figure 9 illustrates the high-quality research standards within the traffic congestion forecasting research community. Notably, Q1 journals accounted for 85% (
n = 98) of the total publications, a figure that significantly surpasses the average of other academic fields. This indicates a predominant emphasis on high-impact, peer-reviewed academic venues characterized by rigorous editorial standards. This distribution underscores both excellence in research quality and the strategic publication approach aimed at maximizing academic impact.
Q2 journals constituted a minor proportion, representing 4% (n = 4) of the total, suggesting a pronounced preference for publishing in top-tier channels. The distribution of conference publications revealed a stratification in quality: A-grade conferences accounted for 9% (n = 9) of the total output, indicating selective engagement with premier academic forums; B-grade conferences comprised 15% (n = 15), reflecting efforts to disseminate research more broadly; and C-grade conferences represent a minimal 1% (n = 1). This distribution of quality underscores a mature research community with established standards and a successful positioning within high-impact scholarly venues.
The studies selected for this analysis were sourced from three prominent digital libraries: IEEExplore, SpringerLink, and Sciencedirect. These studies were published in leading journals and conferences, as detailed in
Table 7.
3.4. Research Context and Institutional Framework Analysis (RQ2 + RQ3)
3.4.1. Institutional Leadership and Organizational Distribution (RQ2)
The analysis of the research context, as depicted in
Figure 10, demonstrates predominant academic leadership in the field of traffic congestion forecasting research, with academic institutions accounting for 70% (
n = 81) of the total studies. This prevalence signifies a robust development of theoretical foundations and sustained scholarly inquiry within university research settings, highlighting institutional dedication to transportation research and the involvement of graduate students in systematic investigations.
Government institutions contribute 27% (n = 31) of the research output, underscoring significant public sector investment in the optimization of transportation infrastructure and policy-driven research initiatives. This governmental involvement indicates policy-level acknowledgment of the economic impacts of traffic congestion and the necessity for infrastructure investment, suggesting the potential for translating research findings into practical implementation.
Industry contributions constitute a mere 3% (n = 3) of the corpus, indicating the limited presence of private-sector research publications in academic forums. This can be attributed to proprietary concerns or the use of alternative dissemination channels. This distribution highlights the challenges related to the research–practice gap wherein academic theoretical advancements may not have direct pathways for implementation within the industry. This situation underscores the potential for enhanced collaboration between industry and academia.
3.4.2. Methodological Framework and Research Approach Analysis
Figure 11 delineates various methodological research approaches, identifying evaluation research as the predominant method, accounting for 53% (
n = 61) of the total. This predominance underscores the significant focus on the empirical validation, performance assessment, and comparative analysis of machine learning techniques. It reflects the commitment of the research community to the evidence-based evaluation of methodologies and quantification of algorithm performance.
Solution proposal methodologies constituted 35% (n = 40) of the studies, indicating significant emphasis on the development of novel techniques, algorithmic innovations, and methodological advancements. This equilibrium between evaluation and innovation suggests a robust research ecosystem that encompasses both the critical assessment and creative development components.
Validation research accounted for 11% (n = 13) of the studies and included implementation verification, model generalizability assessment, and real-world application testing. Opinion papers represent a minimum of 1% (n = 1), indicating the predominance of empirical research over theoretical discussion. This methodological distribution reflects a mature research community with strong empirical foundations and balanced focus on innovation and evaluation.
3.5. Traffic Characteristics, Parameter Utilization Analysis, and Forecast Temporal Distribution (RQ5 + RQ6)
3.5.1. Congestion Type Focus and Research Distribution (RQ5)
Figure 12 illustrates the distribution of research attention across the different types of congestion, indicating that studies on recurrent congestion constituted 76% (
n = 87) of the total. This predominant focus underscores the research community’s primary interest in predictable pattern-based congestion phenomena, which occur regularly owing to capacity limitations, demand patterns, or infrastructure constraints. The predictable nature of recurrent congestion facilitates the use of traditional sensor-based monitoring approaches and pattern recognition methodologies.
Research on nonrecurrent congestion constitutes 24% (n = 28) of the studies, focusing on unpredictable congestion events caused by incidents, weather conditions, special events, or emergencies. This limited representation highlights a significant research imbalance, as predictable congestion receives disproportionate attention despite nonrecurrent events, often resulting in severe traffic disruptions and economic impacts. The predominant focus on recurrent congestion likely reflects the advantages of data availability, preferences for modeling complexity, and appeal of predictable patterns for algorithm development.
3.5.2. Forecast Temporal Distribution (RQ5)
Figure 13 illustrates the preferences for temporal granularity in traffic congestion forecasting for the selected studies. The analysis indicated a predominance of short-term predictions, with hourly forecasting being the most common approach (36%), followed by minute-level prediction (34%). These high-frequency prediction intervals constitute 70% of all the studies, reflecting the operational demands of intelligent transportation systems that require real-time or near-real-time congestion state estimation for adaptive traffic management. Daily forecasting accounted for 16% of the studies, serving medium-term planning applications, whereas ultra-short-term prediction (every second) comprised 7% of the research focus. Long-term forecasting approaches (weekly: 2%, monthly: 1%) have received minimal attention, suggesting limited research interest in strategic planning horizons. The 4% generic category represents studies with flexible temporal frameworks adaptable to various operational contexts. This distribution pattern underscores the field’s emphasis on operational decision support rather than strategic planning, with a combined preference for sub-hourly intervals (41%), highlighting the critical importance of real-time responsiveness in modern traffic management systems. The temporal granularity preferences align with the practical requirements of dynamic route guidance, adaptive signal control, and incident response applications, which are central to an intelligent transportation infrastructure.
3.5.3. Parameter Utilization Patterns and Data Source Analysis (RQ6)
To systematically address Research Question 6 concerning the traffic metrics utilized in congestion forecasting research, this analysis commences by establishing standardized definitions of the forecast parameters identified across the selected studies. The subsequent parameter classifications adhere to the road dictionary specifications established by PIARC [
126], thereby providing a foundational understanding of the subsequent utilization pattern analysis. Traditional traffic monitoring parameters include the following.
Traffic flow (flow rate): The number of vehicles or persons passing a given point per unit of time.
Traffic speed: The distance a vehicle travels divided by the travel time.
Traffic volume: The number of vehicles or persons passing a point during a defined period.
Traffic density (traffic concentration): The number of vehicles occupying a unit length of road, carriageway, or lane at a specified time, excluding parked vehicles.
Travel time (journey time, route time): The time spent between two defined points when relevant, including parking, walking, waiting, and changing mode.
Traffic occupancy: For a given cross or short longitudinal section and time interval, the percentage of time the road is occupied by one or more vehicles or people. syn: occupancy percentage and occupancy time, respectively.
Mileage ratio: The length in miles per vehicle.
Event-Based data: Data related to event that occurs during traffic
Weather data: Weather data provides information about the weather and climate of a region.
GPS data: Data provided by a geolocation system using satellite signals to identify positions on a map.
Map data refer to any content, data, or information provided through a map, but are not limited to imagery, terrain data, latitude, and longitude coordinates.
Social media data: Any type of data related to traffic conditions that can be gathered through social media.
The comprehensive parameter analysis (
Figure 14) reveals distinct utilization patterns that reflect the characteristics of the congestion types and modeling requirements. Traditional traffic flow parameters demonstrate the highest utilization in recurrent congestion research. Flow (56 studies) and speed (55 studies) were the primary data sources, followed by volume (27 studies), travel time (19 studies), and density (10 studies). These patterns underscore the predictable nature of recurrent congestion that facilitates sensor-based monitoring.
Research on nonrecurrent congestion highlights distinct parameter priorities. While flow (20 studies) and speed (19 studies) remained significant, event-based data assumed increased importance (18 studies), underscoring the incident-driven nature of nonrecurrent congestion. In addition, weather data (nine studies), GPS/map data (seven studies), and travel time (seven studies) were pertinent for monitoring dynamic conditions, suggesting the need for methodological adaptations to address unpredictable congestion scenarios.
The distribution of the parameters revealed fundamental methodological distinctions between the types of congestion. Specifically, nonrecurrent congestion necessitates diverse real-time data integration strategies, whereas recurrent congestion favors pattern-based modeling approaches. This analysis highlights the potential advancements in sensor integration and utilization of external data sources within comprehensive congestion-management systems.
3.6. Infrastructure Context and Implementation Framework Analysis (RQ4 + RQ6)
3.6.1. Vehicle Type Coverage and Research Scope Assessment (RQ4)
Figure 15 presents an analysis of the vehicle type representation across the selected studies, indicating that passenger cars constitute a predominant focus, accounting for 90% (
n = 104) of vehicle-related investigations. This emphasis is attributed to the realistic representation of traffic composition and advantages of data availability in urban settings. However, this may also result in an underrepresentation of the impact of vehicle diversity on congestion dynamics and opportunities for system optimization.
Research on alternative vehicle types remains limited, with taxi services constituting 5% (n = 6) of studies. This reflects growing interest in the integration of commercial vehicles and the assessment of ride-sharing impacts. Public transportation received only 3% (n = 3) of the research focus despite its significant potential to influence congestion and its policy relevance for sustainable transportation initiatives. Emergency vehicles and trucks each account for a mere 1% (n = 1) of the research coverage, highlighting substantial gaps in understanding their disproportionate potential impact on congestion and their operational priority requirements.
This distribution suggests research–reality alignment limitations, where the passenger car focus may oversimplify complex multimodal traffic interactions and miss optimization opportunities through comprehensive vehicle type considerations.
3.6.2. Infrastructure Distribution and Network Coverage Analysis (RQ4)
The analysis of road infrastructure, as depicted in
Figure 16, indicates a balanced focus on the primary infrastructure types. Specifically, urban roadways constituted 51% (
n = 59) of the studies, whereas highways represented 47% (
n = 54) of the research emphasis. This nearly equal distribution suggests appropriate acknowledgment of both the complexities inherent in urban environments and the capacity challenges associated with highways in the context of congestion forecasting.
Urban roadway dominance is indicative of increasing trends in urbanization, the complexity of intersections, and challenges associated with multimodal integration, all of which necessitate advanced modeling approaches. The emphasis on highway research underscores the recognition of capacity bottlenecks, long-distance travel patterns, and the significance of corridor management in regional transportation systems.
Rural roadways account for a mere 2% (n = 2) of the representation, highlighting a significant coverage gap despite rural infrastructure constituting a substantial portion of the national transportation networks. This deficit in rural research signifies missed opportunities for the comprehensive optimization of transportation systems and suggests a potential urban bias within the research community’s focus.
3.6.3. Temporal Infrastructure Evolution and Data Strategy Framework (RQ4 + RQ6)
The temporal evolution analysis (
Figure 17) examined the focus on different road types over the 14-year study period, highlighting the consistent predominance of urban road research. This area has shown steady growth, beginning with one study in 2010, reaching a peak of 13 studies by 2022. In parallel, highway research exhibits a similar trajectory, albeit with a slightly delayed acceleration, culminating in 11 studies by 2021. This trend suggests increasing recognition of the importance of corridor-level congestion management and the potential for methodological integration.
Figure 18 presents a comprehensive comparison of the data strategies between traditional and external data sources across the six evaluative dimensions. Traditional data sources exhibit superior performance in the established operational dimensions of availability (95%), accuracy (85%), and maturity (90%), reflecting the proven infrastructure and institutional knowledge. However, limitations are evident in adaptation capabilities: integration (30%), real-time processing (70%), and potential (40%) indicate the need for modernization.
External data sources display various strengths: integration capability (80%), real-time processing (85%), and potential (90%), highlighting the advantages of modern technology, whereas availability (60%), accuracy (55%), and maturity (45%) reveal areas that require further development. This analysis indicates that optimal strategies require hybrid approaches that combine the reliability of traditional sources with the innovative capabilities of external sources to develop a comprehensive congestion-forecasting system.
3.7. Synthesis and Critical Assessment
The period from 2010 to 2024 shows the progress of machine learning in traffic congestion forecasting. This evolution spans from initial machine learning algorithms (2010–2015), through deep learning (2016–2020), to current advanced architectures and real-world applications (2021–2024). This 14-year span ensured a comprehensive overview, while maintaining relevance for applications and research trends. The documented transition from traditional statistical methods to the predominance of deep learning, along with heightened attention to spatial modeling and real-time processing capabilities, signifies the research community’s successful adaptation to technological advancements and operational demands.
The high standards of publication quality, with 85% of journals ranked in the Q1 category, along with balanced methodological approaches comprising 53% evaluation and 35% solution proposal, demonstrate a robust academic foundation. This foundation supports the ongoing advancement of research and potential for practical implementation in the field of intelligent transportation systems.
The results are presented in
Table 8, which includes the corresponding research questions for each primary section and related figures.
The analysis highlights significant gaps that necessitate strategic focus: the underrepresentation of rural infrastructure (2% coverage), limited industry collaboration (3% of studies), and the underutilization of reinforcement learning (8% adoption) present considerable opportunities to expand research and enhance practical impact. The predominant emphasis on recurrent congestion (76% of the studies) and passenger cars (90% coverage) indicates the need for diversified investigative approaches that address the complexity of comprehensive transportation systems.
4. Discussions and Implications
This systematic review presents the key findings and future research directions for machine learning-based traffic congestion forecasting. The findings were organized according to a research question to provide guidance for the research community. The analysis of 115 peer-reviewed studies from to 2010–2024 reveals critical insights that advance theoretical understanding and implementation in intelligent transportation systems.
4.1. Publication Patterns and Research Evolution (RQ1)
The comprehensive analysis indicated significant growth in research output and demonstrated exceptional quality standards within the domain of traffic congestion forecasting. The selected studies were sourced from three prominent digital libraries: IEEE Xplore, SpringerLink, and ScienceDirect, with publications appearing in leading journals and conferences, as detailed in the systematic analysis. The temporal evolution analysis (
Figure 9) documents exponential growth from minimal activity in 2010 to a peak output of 18 studies in 2022, indicating successful field establishment and sustained research momentum over the 14-year study period.
Established standards for publication quality have set high expectations for methodological rigor and empirical validation. The journal’s preference indicates that the research community values comprehensive analysis and extensive evaluation. The observed exponential growth pattern suggests that the field has reached a level of maturity with sustained momentum, thereby creating opportunities for specialized research directions and interdisciplinary collaboration.
Researchers are encouraged to broaden the scope of digital library resources to encompass DOAJ, JSTOR, CORE, BASE, and Wiley Online Library, thereby ensuring comprehensive coverage of the literature. Contemporary literature review tools, such as Semantic Scholar, Scinapse, Consensus, and Perplexity, offer advanced search capabilities that facilitate systematic investigations. Nonetheless, ethical considerations associated with the use of artificial intelligence tools in systematic reviews necessitate careful attention and transparent reporting. Established publication standards indicate that future submissions should prioritize methodological innovation, thorough empirical validation, and the potential for practical implementation to sustain the quality trajectory of the field.
4.2. Research Context and Framework Analysis (RQ2)
Academic preeminence ensures methodological rigor and fosters innovation, whereas governmental involvement signifies acknowledgment of the impacts of congestion and the necessity for infrastructure development. Minimal industry participation (3%), as shown in
Figure 10, highlights the gap between research and practice that necessitates intervention. Companies in the transportation and data analytics sectors present significant opportunities for the advancement of algorithms.
The predominant presence of academic leadership (70%) underscores robust theoretical foundations, yet may also indicate a potential disconnection from the challenges associated with practical implementation. The significant involvement of governmental entities (27%) offers avenues for policy-relevant research and access to public sector funding. Conversely, limited engagement with industry (3%) presents both a challenge and an opportunity to enhance the practical impact. Researchers must strike a balance between theoretical advancement and practical applicability to ensure the relevance and potential implementation of their research.
The academic community must embrace more collaborative strategies and furnish compelling justifications for governmental and industrial investments by customizing solutions to the specific requirements of companies or institutions reliant on transportation. Key stakeholders for enhanced collaboration include transportation service providers, government agencies, research institutes, and public safety agencies. These partnerships are intended to advance transportation planning, improve traffic management, and provide superior mobility solutions. The collective efforts of these sectors will facilitate the advancement of traffic congestion forecasting through machine learning, enhanced transportation planning, and improved traffic management via coordinated research–practice integration.
4.3. Methodological Approaches and Research Types (RQ3)
The prevailing focus of evaluation research underscores the importance of methodological rigor and the culture of comparative analysis within the research community, thereby ensuring evidence-based selection and assessment of algorithm performance. The significant representation of solution proposals (35%) indicates active progress in algorithmic development and the creation of novel techniques. In contrast, the limited presence of validation research (11%) (
Figure 11) highlights opportunities for enhanced testing of real-world implementations.
The predominance of evaluation research, accounting for 53% of the studies, underscores the field’s emphasis on empirical validation and comparative analysis, thereby establishing a standard for rigorous performance assessment in future research. The equilibrium between evaluation and solution proposal research reflects a dynamic and healthy ecosystem. Nonetheless, the negligible contribution of opinion papers, at only 1%, suggests potential deficiencies in theoretical discourse and position statements that could offer strategic directions for the field’s advancement.
Future research should diversify methodological approaches by incorporating a greater number of opinion papers and conference studies to enhance the variety of research methodologies for predicting traffic congestion using machine learning algorithms. Opportunities for methodological advancement include the integration of cross-domain and longitudinal impact studies. Overall, future research should focus on developing novel approaches to address traffic congestion issues and provide informed opinions that guide strategic research directions and practical implementation frameworks.
4.4. Infrastructure and Vehicle Coverage Assessment (RQ4)
The balanced distribution between urban and highway coverage (47–51%) indicates appropriate acknowledgment of primary congestion issues. However, the minimal focus on rural roads (2%) suggests a potential urban-centric bias within the research community. The predominant emphasis on passenger cars (90%) (
Figure 15 and
Figure 16 may lead to an oversimplification of the complex interactions within multimodal traffic systems, potentially overlooking optimization opportunities that could arise from a more comprehensive consideration of various vehicle types. The observed temporal patterns reflect the responsiveness of the research community to infrastructure priorities and highlight systematic gaps in coverage).
Research in this field should encompass a variety of road types, including rural roads, busways, and tramway routes, depending on the specific transport system being studied and availability of relevant data. As transportation electrification and automation progress, it is imperative to investigate emerging vehicle categories such as motorbikes and electric scooters. Furthermore, there is an increasing need to forecast congestion across various modes of transport, particularly with the increasing popularity of multimodal systems. Expanding the scope of research in this manner will facilitate the comprehensive optimization of transportation networks and address diverse infrastructure contexts and vehicle diversity, thereby supporting more inclusive and effective transportation management strategies.
4.5. Congestion Characteristics and Temporal Patterns (RQ5)
The predominance of short-term prediction (77%) (
Figure 13) reflects practical operational requirements, but may constrain strategic planning capabilities that require longer forecast horizons. The emphasis on recurrent congestion (76%) (
Figure 12) suggests advantages in data availability and preferences for modeling complexity, whereas the increasing attention to nonrecurrent issues since 2019 indicates a growing recognition of the importance of this area despite methodological challenges. The evolution of temporal patterns signifies the adaptation of the research community to practical implementation needs.
Future research should focus on enhancing the accuracy and timeliness of real-time predictions to facilitate the development of models capable of swiftly adapting to evolving traffic conditions. This can be achieved by increasing the availability of real-time traffic data from traffic sensors, global positioning system (GPS) devices, and connected vehicles. Forecasting requires examination of various congestion types, including recurrent, incident-induced, work zones, special events, and weather-related congestion. Researchers can achieve more significant results by concentrating on nonrecurrent congestion areas. The development of adaptive and resilient forecasting systems capable of effectively responding to unforeseen events, disruptions, and changes in traffic conditions can ensure reliable predictions in dynamic environments, particularly when addressing the increasing frequency of extreme weather events and the impacts of infrastructure aging.
4.6. Traffic Parameter Utilization and Data Strategies (RQ6)
A comprehensive parameter analysis indicated that the selected articles employed various predictive parameters contingent on the type of congestion being analyzed, as demonstrated in the comparative analysis (
Figure 14). Most studies have utilized traffic flow, volume, and speed as the primary indicators for predicting traffic congestion. Parameter utilization analysis revealed distinct patterns between congestion types, reflecting both methodological requirements and data availability constraints. The data strategy comparison (
Figure 18) highlights the complementary strengths of traditional sources (high availability 95%, accuracy 85%, maturity 90%) and external sources (superior integration 80%, real-time processing 85%, and future potential 90%). This analysis suggests that optimal forecasting strategies necessitate hybrid approaches that leverage the reliability of traditional sources along with the innovative capabilities of external sources.
The distribution of parameters reveals essential methodological distinctions between different types of congestion, necessitating researchers to adjust their data collection and modeling strategies accordingly. The predominance of traditional parameters in recurrent scenarios reflects the capabilities of the established infrastructure, whereas the significance of event data in nonrecurrent research underscores the need for diverse real-time data integration approaches. The strength of complementary data sources suggests that researchers should adopt hybrid strategies rather than rely on singular approaches.
Given the significant impact of traffic congestion on environmental factors, future research should integrate environmental variables such as air quality and noise levels into predictive models to address broader sustainability concerns. The application of big data analytics to identify latent patterns and trends within data represents a promising advancement, facilitating the processing and analysis of substantial volumes of traffic data from diverse sources and thereby enhancing the accuracy and reliability of traffic condition predictions. Hybrid approaches that combine the reliability of traditional monitoring infrastructure with the capabilities of modern external data sources offer optimal strategies for the development of comprehensive congestion-forecasting systems that address both the operational requirements and innovation potential.
4.7. Machine Learning Implementation and Technical Evolution (RQ7)
Temporal evolution analysis (
Figure 5,
Figure 6 and
Figure 7) illustrates significant paradigm shifts from traditional statistical methods to the predominance of deep learning between 2016 and 2022. During this period, Deep Neural Networks have exhibited remarkable growth, transitioning from no implementation in 2010 to widespread adoption by 2022. The majority of the reviewed articles focused on supervised and unsupervised learning models, highlighting a preference for labeled data approaches while maintaining substantial applications in unsupervised pattern discovery. The analysis of machine learning tasks revealed that prediction, classification, and regression were the most prevalent approaches in the selected studies, reflecting the emphasis of the research community on both categorical and continuous prediction requirements. Technical progression indicates the research community’s successful adaptation to algorithmic innovations, while preserving methodological diversity for comparative analysis and specialized applications.
The predominance of Deep Neural Networks (47%) indicated the successful integration of advanced architectures. However, this may also indicate the potential neglect of simpler and more interpretable methodologies that could be more suitable for certain contexts. The preference for supervised learning (57%) highlights the advantages of data availability and performance, yet it suggests a possible underutilization of unsupervised and reinforcement learning paradigms (8%), despite their theoretical appropriateness for dynamic optimization scenarios. These evolutionary trends imply that researchers should strive to balance the adoption of innovative techniques with the methodological suitability for specific applications.
An emerging area for advancement involves the incorporation of attention mechanisms that focus on relevant segments of input data during predictions, thereby prioritizing essential features and enhancing predictive accuracy. Future research should focus on developing more dynamic and adaptive forecasting models capable of real-time adjustment to fluctuating traffic conditions by employing feedback loops and continuous learning mechanisms to refine the predictions and adapt to traffic variations.
Future research should focus on several critical areas, including the development of real-time prediction systems that provide immediate forecasts of traffic congestion using current data. This approach aims to optimize speed and efficiency, and facilitate real-time decision making in traffic management. The primary focus should be on advancing anomaly detection techniques to enhance the early identification of unforeseen events that impact traffic flow. Efforts should also be directed towards developing temporal regression models capable of capturing temporal dependencies in traffic congestion over time. The objective is to predict congestion dynamics across various timescales while maintaining the computational efficiency and practical implementation.
4.8. Integrated Analysis and Future Directions
This systematic review identified interconnected opportunities across multiple research questions, necessitating coordinated advancement strategies. The integration of advanced machine learning techniques (RQ7) with comprehensive infrastructure coverage (RQ4) and diverse data sources (RQ6) can effectively address both recurrent and nonrecurrent congestion scenarios (RQ5) through enhanced temporal prediction capabilities. The documented transition from traditional methods to deep learning (47%) established a foundation for next-generation hybrid approaches that leverage methodological diversity for specialized applications.
The academic community should be encouraged to formulate comprehensive implementation frameworks that effectively integrate theoretical advancements and operational deployment requirements. This encompasses:
Standardized evaluation metrics refer to the development of uniform performance assessment protocols that facilitate systematic comparisons across methodological approaches and operational contexts.
Creation of Benchmark Datasets: The development of representative datasets encompassing a wide range of geographical, temporal, and infrastructural contexts is essential to facilitate reproducible research and practical validation.
Technology transfer mechanisms involve the establishment of structured pathways to facilitate the translation of academic innovations into operational transportation management systems through collaborations with industry partners.
This systematic review demonstrated that the field has reached a level of maturity and methodological sophistication adequate to support large-scale practical implementation. Concurrently, it maintains a strong research momentum to address emerging challenges in urban transportation management.
4.9. Limitations
4.9.1. Limitations of the Evidence Base
The 115 included studies demonstrated substantial heterogeneity in evaluation metrics, dataset characteristics, and experimental designs, which limited the ability to conduct direct comparative assessments of algorithm performance. Most studies (89%, n = 102) relied on simulation-based validation rather than real-world deployment validation, thus limiting the understanding of practical performance and scalability.
4.9.2. Limitations of the Review Process
Search Strategy Limitations:
Restriction to three major databases may have missed specialized publications.
English-only inclusion excluded potentially relevant research in other languages.
Gray literature exclusion may have missed important practical implementations.
Temporal boundaries (2010–2024) may have excluded foundational earlier work.
Selection and Assessment Limitations:
Quality assessment involved subjective judgments despite standardized criteria.
A 20% validation sample, while showing high agreement ( = 0.89), left 80% with single-reviewer extraction.
Inability to access 12 full-text articles despite author contact attempts.
4.9.3. Implications of Limitations
These limitations suggest the need for (1) standardized evaluation frameworks and benchmark datasets, (2) real-world deployment studies and long-term validation, (3) broader search strategies, including gray literature, and (4) enhanced reporting standards specific to ML applications in transportation.
5. Conclusions
Machine learning and intelligent traffic systems are two pivotal technologies with significant potential for integration into future intelligent urban environments. Machine learning techniques have shown considerable success in predicting traffic congestion, thereby establishing a solid foundation for practical transportation management applications. This systematic review offers a comprehensive and structured examination of the application of machine learning methodologies for traffic congestion forecasting, providing empirical foundations for strategic research advancement and practical implementation guidance.
This systematic review established a foundational empirical framework for research on machine learning in traffic congestion forecasting through a comprehensive analysis of 115 peer-reviewed studies conducted over 14 years. The investigation illustrates the successful evolution of the field from experimental approaches to sophisticated methodological implementations, while identifying critical opportunities for strategic advancement. The documented progression from traditional statistical methods to the dominance of deep learning, coupled with exceptional publication quality standards and sustained research momentum, positions the field for transformative practical impact. The identified research gaps provide a strategic roadmap for coordinated investment in underrepresented areas, whereas established methodological sophistication supports large-scale operational deployment.
The research community is well-equipped to tackle the challenges of next-generation intelligent transportation through integrated approaches that combine technological innovation with a comprehensive application scope. Strategic coordination between academic excellence, governmental policy support, and enhanced industry collaboration will expedite the translation of research advancements into societal impacts. This will ultimately contribute to sustainable urban mobility solutions and the development of intelligent transportation systems that enhance both the operational efficiency and the quality of life of urban populations worldwide. The systematic review methodology demonstrated in this study provides a replicable framework for the periodic assessment of the evolution of the research landscape, enabling the development of adaptive strategies such as machine learning technologies and transportation challenges, and continues to advance in the era of smart city development and sustainable urban planning.