Predicting Segment-Level Road Traffic Injury Counts Using Machine Learning Models: A Data-Driven Analysis of Geometric Design and Traffic Flow Factors
Abstract
1. Introduction
2. Related Work
3. Materials and Methods
3.1. Data Description and Feature Engineering
3.2. Machine Learning Models
3.2.1. Random Forest (RF)
3.2.2. Gradient Boosting Machines (GBM)
3.2.3. K-Nearest Neighbors (KNN)
3.3. Hyperparameter Tuning and Model Optimization
3.4. Model Evaluation and Performance Metrics
4. Results and Discussion
4.1. Model Performance Overview
4.1.1. Fatal Injury Model
4.1.2. Serious Injury Model
4.1.3. Slight Injury Model
4.2. Cross-Model Insights and Thematic Synthesis
- A comparative assessment of the three severity-specific models reveals several consistent predictive patterns, alongside notable differences in the variables contributing to the discrimination of fatal, serious-injury, and slight-injury count levels. Across all models, heavy-vehicle flows—including buses, tractor-trailers, and various truck classes—emerged as dominant predictors. Their recurrent importance reflects the strong discriminative value that freight-related and large-vehicle traffic volumes provide in distinguishing severity. These findings are consistent with previous studies indicating that heavy vehicles contribute disproportionately to crash severity due to their mass and kinetic energy, corroborating the predictive trends observed in European and North American road networks [69,70,71,72].
- Horizontal alignment features, particularly curve radii, and alignment direction, exhibited varying degrees of importance across severity levels. These features demonstrated higher predictive contribution in the slight-injury model and progressively lower contributions in the serious-injury and fatal-injury models. This gradient suggests that the informational value of geometric complexity differs across severity types, enabling the models to extract distinct patterns from horizontal-alignment variability. This observation is in agreement with prior empirical studies that report sharper curves and alignment irregularities as key contributors to minor injury crashes, whereas severe outcomes tend to be influenced more by traffic composition and operational context [73,74].
- Capacity utilization displayed a comparatively strong contribution in the slight-injury and serious-injury models, indicating that interactions linked to traffic demand and congestion offer meaningful predictive information for moderate-severity outcomes. This result highlights the relevance of dynamic operational states—captured indirectly through demand-related variables—in providing supplementary discriminatory signals beyond those embedded in static geometric or traffic-volume indicators [62,65,75].
- Additional observations show that vulnerable road users, including motorcycles, mopeds, and bicycles, are of moderate importance across all models, with a greater influence in fatal crashes. These patterns align with previous research indicating that crashes involving unprotected road users disproportionately contribute to fatalities, while functional road hierarchy is consistently associated with exposure-related severity trends [18,76]. Road category likewise appeared as a consistently informative predictor across severity levels, suggesting that functional classification encapsulates multiple contextual attributes—such as access control, expected speed environments, and modal interactions—that collectively enhance predictive accuracy [77,78,79,80]. Furthermore, segment-level counts of articulated buses and slow-moving or agricultural vehicles exhibited a notable importance in both Random Forest and Gradient Boosting models, despite their limited prevalence in the traffic flow. These findings indicate that, although rare, these vehicle classes carry substantial predictive information for severe crash counts. Ensemble methods can exploit such low-frequency yet informative patterns, where a small number of high-information splits significantly enhance predictive performance [81,82]. This behavior is consistent with previous traffic safety studies, which show that even relatively infrequent vehicle types can be involved in severe crashes when operating in particular contexts. For instance, articulated buses account for 39% of bus-involved collisions with cyclists in Germany, second only to city buses [83]. Their unique articulated design also poses challenges for autonomous vehicles: in March 2023, a Cruise self-driving car rear-ended an articulated municipal bus in San Francisco because the AI mispredicted the motion of the bus’s rear section, despite detecting it with sensors [84]. These cases demonstrate that rare or less common vehicle types can still contribute disproportionately to severe outcomes [85,86]. Similarly, studies of rural corridors and arterial systems in developing regions show that agricultural vehicles, although infrequent, frequently co-occur with roadway conditions predictive of high-severity crashes [87,88]. These insights collectively affirm the ability of ensemble models (RF, and GB) to disentangle nuanced interactions among geometric, operational, and vehicular factors in the prediction of crash severity counts.
4.3. Policy and Planning Implications
- Dedicated freight lanes and time-of-day restrictions for heavy vehicles, particularly along high-capacity corridors and primary arterials. Similar measures implemented in Germany and the Netherlands, through dedicated freight corridors have demonstrated measurable reductions in heavy-vehicle conflict points and crash rates.
- Enhanced driver visibility infrastructure, particularly at curves and intersections, particularly at curves and intersections, where line-of-sight limitations exacerbate heavy-vehicle crash risks. Such interventions have proven effective in improving driver response times and reducing side-impact collisions on European rural arterials.
- Vehicle-type-specific speed enforcement, which has shown positive outcomes in several European contexts. For instance, differentiated speed limits for heavy vehicles on rural expressways in Sweden and Finland have effectively reduced crash frequency and injury severity by mitigating speed dispersion between heavy and light vehicles.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Al-lami, A.; Török, Á. Assessing sustainability indicators of public transportation using PAHP. Sustain. Futures 2025, 9, 100500. [Google Scholar] [CrossRef]
- Sipos, T.; Afework Mekonnen, A.; Szabó, Z. Spatial Econometric Analysis of Road Traffic Crashes. Sustainability 2021, 13, 2492. [Google Scholar] [CrossRef]
- Ötvös, V.; Török, Á. Measurement of Accident Risk and a Case Study from Hungary. Period. Polytech. Transp. Eng. 2024, 52, 159–165. [Google Scholar] [CrossRef]
- Al-lami, A.; Török, Á. Regional forecasting of driving forces of CO2 emissions of transportation in Central Europe: An ARIMA-based approach. Energy Rep. 2025, 13, 1215–1224. [Google Scholar] [CrossRef]
- Jima, D.; Sipos, T. The Impact of Road Geometric Formation on Traffic Crash and Its Severity Level. Sustainability 2022, 14, 8475. [Google Scholar] [CrossRef]
- Pei, Y.; Hou, L. Safety Assessment and Risk Management of Urban Arterial Traffic Flow Based on Artificial Driving and Intelligent Network Connection: An Overview. Arch. Comput. Methods Eng. 2024, 31, 2925–2943. [Google Scholar] [CrossRef]
- Lord, D.; Mannering, F. The statistical analysis of crash-frequency data: A review and assessment of methodological alternatives. Transp. Res. Part A Policy Pract. 2010, 44, 291–305. [Google Scholar] [CrossRef]
- Mannering, F.L.; Bhat, C.R. Analytic methods in accident research: Methodological frontier and future directions. Anal. Methods Accid. Res. 2014, 1, 1–22. [Google Scholar] [CrossRef]
- Maji, A.; Ghosh, I. A systematic review on roundabout safety incorporating the safety assessment methodologies, data collection techniques, and driver behavior. Saf. Sci. 2025, 181, 106661. [Google Scholar] [CrossRef]
- Khattak, M.W.; De Backer, H.; De Winne, P.; Brijs, T.; Pirdavani, A. Analysis of Road Infrastructure and Traffic Factors Influencing Crash Frequency: Insights from Generalised Poisson Models. Infrastructures 2024, 9, 47. [Google Scholar] [CrossRef]
- Al-Mahamid, H.; Al-Nabulsi, D.; Torok, A. Developing safety performance functions incorporating pavement roughness using Poisson regression and Machine learning models on Jordan’s Desert Highway. Transp. Res. Interdiscip. Perspect. 2025, 34, 101659. [Google Scholar] [CrossRef]
- Hamdan, N.; Sipos, T. Classification of Traffic Accident Severity Using Machine Learning Models. In Proceedings of the 2nd Cognitive Mobility Conference, Budapest, Hungary, 19–20 October 2025; pp. 177–186. [Google Scholar] [CrossRef]
- Hamdan, N.; Sipos, T. Traffic Accidents Severity Prediction Using Support Vector Machine Models. In Proceedings of the 3rd Cognitive Mobility Conference, Budapest, Hungary, 7–8 October 2024; pp. 153–161. [Google Scholar] [CrossRef]
- Jamal, A.; Zahid, M.; Tauhidur Rahman, M.; Al-Ahmadi, H.M.; Almoshaogeh, M.; Farooq, D.; Ahmad, M. Injury severity prediction of traffic crashes with ensemble machine learning techniques: A comparative study. Int. J. Inj. Control Saf. Promot. 2021, 28, 408–427. [Google Scholar] [CrossRef]
- Yu, R.; Abdel-Aty, M. Utilizing support vector machine in real-time crash risk evaluation. Accid. Anal. Prev. 2013, 51, 252–259. [Google Scholar] [CrossRef]
- Altaf, I.; Kaul, A. Classifying victim degree of injury in road traffic accidents: A novel stacked DCL-X approach. Multimed. Tools Appl. 2024, 83, 66691–66723. [Google Scholar] [CrossRef]
- Wen, X.; Xie, Y.; Jiang, L.; Pu, Z.; Ge, T. Applications of machine learning methods in traffic crash severity modelling: Current status and future directions. Transp. Rev. 2021, 41, 855–879. [Google Scholar] [CrossRef]
- Macioszek, E.; Granà, A. The Analysis of the Factors Influencing the Severity of Bicyclist Injury in Bicyclist-Vehicle Crashes. Sustainability 2021, 14, 215. [Google Scholar] [CrossRef]
- Santos, D.; Saias, J.; Quaresma, P.; Nogueira, V.B. Machine Learning Approaches to Traffic Accident Analysis and Hotspot Prediction. Computers 2021, 10, 157. [Google Scholar] [CrossRef]
- Boo, Y.; Choi, Y. Comparison of mortality prediction models for road traffic accidents: An ensemble technique for imbalanced data. BMC Public Health 2022, 22, 1476. [Google Scholar] [CrossRef] [PubMed]
- Kuo, P.-F.; Hsu, W.-T.; Lord, D.; Putra, I.G.B. Classification of autonomous vehicle crash severity: Solving the problems of imbalanced datasets and small sample size. Accid. Anal. Prev. 2024, 205, 107666. [Google Scholar] [CrossRef]
- Almahdi, A.; Al Mamlook, R.E.; Bandara, N.; Almuflih, A.S.; Nasayreh, A.; Gharaibeh, H.; Alasim, F.; Aljohani, A.; Jamal, A. Boosting Ensemble Learning for Freeway Crash Classification under Varying Traffic Conditions: A Hyperparameter Optimization Approach. Sustainability 2023, 15, 15896. [Google Scholar] [CrossRef]
- Aziz, K.; Chen, F.; Khan, I.; Hussain Khahro, S.; Malik, M.A.; Ahmed Memon, Z.; Khattak, A. Road Traffic Crash Severity Analysis: A Bayesian-Optimized Dynamic Ensemble Selection Guided by Instance Hardness and Region of Competence Strategy. IEEE Access 2024, 12, 139540–139559. [Google Scholar] [CrossRef]
- Azimian, A.; Dimitra Pyrialakou, V.; Lavrenz, S.; Wen, S. Exploring the effects of area-level factors on traffic crash frequency by severity using multivariate space-time models. Anal. Methods Accid. Res. 2021, 31, 100163. [Google Scholar] [CrossRef]
- Mussone, L.; Bassani, M.; Masci, P. Analysis of factors affecting the severity of crashes in urban road intersections. Accid. Anal. Prev. 2017, 103, 112–122. [Google Scholar] [CrossRef] [PubMed]
- Manirul Islam, S.; Washington, S.; Kim, J.; Haque, M. A comprehensive analysis on the effects of signal strategies, intersection geometry, and traffic operation factors on right-turn crashes at signalised intersections: An application of hierarchical crash frequency model. Accid. Anal. Prev. 2022, 171, 106663. [Google Scholar] [CrossRef]
- Grigorev, A.; Mihaita, A.-S.; Chen, F.; Truong, L. Traffic Incident Duration Prediction: A Systematic Review of Techniques. J. Adv. Transp. 2024, 2024, 3748345. [Google Scholar] [CrossRef]
- Kitali, A.E.; Mokhtarimousavi, S.; Kadeha, C.; Alluri, P. Severity analysis of crashes on express lane facilities using support vector machine model trained by firefly algorithm. Traffic Inj. Prev. 2020, 22, 79–84. [Google Scholar] [CrossRef]
- Hamdan, N.; Sipos, T. Advancements in Machine Learning for Traffic Accident Severity Prediction: A Comprehensive Review. Period. Polytech. Transp. Eng. 2025, 53, 347–355. [Google Scholar] [CrossRef]
- Cai, Q.; Abdel-Aty, M.; Yuan, J.; Lee, J.; Wu, Y. Real-time crash prediction on expressways using deep generative models. Transp. Res. Part C Emerg. Technol. 2020, 117, 102697. [Google Scholar] [CrossRef]
- Chen, J.; Pu, Z.; Zheng, N.; Wen, X.; Ding, H.; Guo, X. A novel generative adversarial network for improving crash severity modeling with imbalanced data. Transp. Res. Part C Emerg. Technol. 2024, 164, 104642. [Google Scholar] [CrossRef]
- Al-Yarimi, F.A.M. Enhancing road safety through advanced predictive analytics in V2X communication networks. Comput. Electr. Eng. 2024, 115, 109134. [Google Scholar] [CrossRef]
- Yang, J.; Han, S.; Chen, Y.; Ghosh, I. Prediction of Traffic Accident Severity Based on Random Forest. J. Adv. Transp. 2023, 2023, 7641472. [Google Scholar] [CrossRef]
- Islam, M.K.; Reza, I.; Gazder, U.; Akter, R.; Arifuzzaman, M.; Rahman, M.M. Predicting Road Crash Severity Using Classifier Models and Crash Hotspots. Appl. Sci. 2022, 12, 11354. [Google Scholar] [CrossRef]
- Rahim, M.A.; Hassan, H.M. A deep learning based traffic crash severity prediction framework. Accid. Anal. Prev. 2021, 154, 106090. [Google Scholar] [CrossRef]
- Dadashova, B.; Arenas-Ramires, B.; Mira-McWillaims, J.; Dixon, K.; Lord, D. Analysis of crash injury severity on two trans-European transport network corridors in Spain using discrete-choice models and random forests. Traffic Inj. Prev. 2020, 21, 228–233. [Google Scholar] [CrossRef]
- Yassin, S.S.; Pooja. Road accident prediction and model interpretation using a hybrid K-means and random forest algorithm approach. SN Appl. Sci. 2020, 2, 1576. [Google Scholar] [CrossRef]
- Atumo, E.A.; Fang, T.; Jiang, X. Spatial statistics and random forest approaches for traffic crash hot spot identification and prediction. Int. J. Inj. Control Saf. Promot. 2021, 29, 207–216. [Google Scholar] [CrossRef]
- Akin, D.; Sisiopiku, V.P.; Alateah, A.H.; Almonbhi, A.O.; Al-Tholaia, M.M.H.; Al-Sodani, K.A.A. Identifying Causes of Traffic Crashes Associated with Driver Behavior Using Supervised Machine Learning Methods: Case of Highway 15 in Saudi Arabia. Sustainability 2022, 14, 16654. [Google Scholar] [CrossRef]
- Gatera, A.; Kuradusenge, M.; Bajpai, G.; Mikeka, C.; Shrivastava, S. Comparison of random forest and support vector machine regression models for forecasting road accidents. Sci. Afr. 2023, 21, e01739. [Google Scholar] [CrossRef]
- Nikolaou, D.; Ziakopoulos, A.; Dragomanovits, A.; Roussou, J.; Yannis, G. Comparing Machine Learning Techniques for Predictions of Motorway Segment Crash Risk Level. Safety 2023, 9, 32. [Google Scholar] [CrossRef]
- Ahmed, S.; Hossain, M.A.; Ray, S.K.; Bhuiyan, M.M.I.; Sabuj, S.R. A study on road accident prediction and contributing factors using explainable machine learning models: Analysis and performance. Transp. Res. Interdiscip. Perspect. 2023, 19, 100814. [Google Scholar] [CrossRef]
- Wang, X.; Su, Y.; Zheng, Z.; Xu, L. Prediction and interpretive of motor vehicle traffic crashes severity based on random forest optimized by meta-heuristic algorithm. Heliyon 2024, 10, e35595. [Google Scholar] [CrossRef] [PubMed]
- Uzunov, H.V.; Matzinski, P.G.; Uzunov, V.H.; Dechkova, S.V. Comparative Analysis of the Proportional Distribution Method and the Random Forest Algorithm for Predicting Pedestrian Traffic Accident Risk. IEEE Access 2025, 13, 129828–129844. [Google Scholar] [CrossRef]
- Daoud, R.; Vechione, M.; Gurbuz, O.; Sundaravadivel, P.; Tian, C.J.V. Comparison of machine learning models to predict nighttime crash severity: A case study in Tyler, Texas, USA. Vehicles 2025, 7, 20. [Google Scholar] [CrossRef]
- AlKheder, S.; Gharabally, H.A.; Mutairi, S.A.; Mansour, R.A. An Impact study of highway design on casualty and non-casualty traffic accidents. Injury 2022, 53, 463–474. [Google Scholar] [CrossRef]
- Li, J.; Li, C.; Zhao, X. Optimizing crash risk models for freeway segments: A focus on the heterogeneous effects of road geometric design features, traffic operation status, and crash units. Accid. Anal. Prev. 2024, 205, 107665. [Google Scholar] [CrossRef]
- Vayalamkuzhi, P.; Amirthalingam, V. Influence of geometric design characteristics on safety under heterogeneous traffic flow. J. Traffic Transp. Eng. 2016, 3, 559–570. [Google Scholar] [CrossRef]
- Zhao, J.; Guo, Y.; Liu, P. Safety impacts of geometric design on freeway segments with closely spaced entrance and exit ramps. Accid. Anal. Prev. 2021, 163, 106461. [Google Scholar] [CrossRef]
- Jaber, A.; Juhász, J.; Csonka, B.J.S. An analysis of factors affecting the severity of cycling crashes using binary regression model. Sustainability 2021, 13, 6945. [Google Scholar] [CrossRef]
- Jaber, A.; Csonka, B.J.S. Towards a sustainable and safe future: Mapping bike accidents in urbanized context. Safety 2023, 9, 60. [Google Scholar] [CrossRef]
- Sánta, E.; Szűcs, P.; Patocskai, G.; Lakatos, I.J.E.P. Prevalence and Characteristics of Traffic Accidents Endangering Vulnerable Pedestrians in Hungary. Eng. Proc. 2024, 79, 94. [Google Scholar] [CrossRef]
- Cantisani, G.; Del Serrone, G.; Mauro, R.; Peluso, P.; Pompigna, A.J.S. From Radar Sensor to Floating Car Data: Evaluating Speed Distribution Heterogeneity on Rural Road Segments Using Non-Parametric Similarity Measures. Sci 2024, 6, 52. [Google Scholar] [CrossRef]
- Faruga, Ł.; Filapek, A.; Kraszewska, M.; Baranowski, J.J.A.S. Dataset for Traffic Accident Analysis in Poland: Integrating Weather Data and Sociodemographic Factors. Appl. Sci. 2025, 15, 7362. [Google Scholar] [CrossRef]
- Ho, Y.K. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; pp. 278–282. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Zhang, W.; Wang, K.; Wang, S.; Jiang, Z.; Mondschein, A.; Noland, R.B. Synthesizing neighborhood preferences for automated vehicles. Transp. Res. Part C Emerg. Technol. 2020, 120, 102774. [Google Scholar] [CrossRef]
- Li, K.; Xu, H.; Liu, X.J.C. Analysis and visualization of accidents severity based on LightGBM-TPE. Chaos Solitons Fractals 2022, 157, 111987. [Google Scholar] [CrossRef]
- Gou, J.; Du, L.; Zhang, Y.; Xiong, T.J.J.I.C.S. A new distance-weighted k-nearest neighbor classifier. J. Inf. Comput. Sci. 2012, 9, 1429–1436. [Google Scholar]
- Adeel, M.; Khattak, A.J.; Mishra, S.; Thapa, D. Enhancing work zone crash severity analysis: The role of synthetic minority oversampling technique in balancing minority categories. Accid. Anal. Prev. 2024, 208, 107794. [Google Scholar] [CrossRef]
- Alrumaidhi, M.; Farag, M.M.G.; Rakha, H.A. Comparative Analysis of Parametric and Non-Parametric Data-Driven Models to Predict Road Crash Severity among Elderly Drivers Using Synthetic Resampling Techniques. Sustainability 2023, 15, 9878. [Google Scholar] [CrossRef]
- Gong, X.; Bo, W.; Chen, F.; Wu, X.; Zhang, X.; Li, D.; Gou, F.; Ren, H. Safety Evaluation of Highways with Sharp Curves in Highland Mountainous Areas Using an Enhanced Stacking and Low-Cost Dataset Production Method. Sustainability 2025, 17, 5857. [Google Scholar] [CrossRef]
- Khan, W.A.; Moomen, M.; Rahman, M.A.; Terkper, K.A.; Codjoe, J.; Gopu, V. Predicting Crash-Related Incident Clearance Time on Louisiana’s Rural Interstate Using Ensemble Tree-Based Learning Methods. Appl. Sci. 2024, 14, 10964. [Google Scholar] [CrossRef]
- Vincent, A.M.; Jidesh, P. An improved hyperparameter optimization framework for AutoML systems using evolutionary algorithms. Sci. Rep. 2023, 13, 4737. [Google Scholar] [CrossRef] [PubMed]
- Alotaibi, J. Enhancing Traffic Accident Severity Prediction: Feature Identification Using Explainable AI. Vehicles 2025, 7, 38. [Google Scholar] [CrossRef]
- Aldhari, I.; Almoshaogeh, M.; Jamal, A.; Alharbi, F.; Alinizzi, M.; Haider, H. Severity Prediction of Highway Crashes in Saudi Arabia Using Machine Learning Techniques. Appl. Sci. 2022, 13, 233. [Google Scholar] [CrossRef]
- Kim, S.; Lym, Y.; Kim, K.-J. Developing Crash Severity Model Handling Class Imbalance and Implementing Ordered Nature: Focusing on Elderly Drivers. Int. J. Environ. Res. Public Health 2021, 18, 1966. [Google Scholar] [CrossRef]
- Skaug, L.; Nojoumian, M.; Dang, N.; Yap, A. Road Crash Analysis and Modeling: A Systematic Review of Methods, Data, and Emerging Technologies. Appl. Sci. 2025, 15, 7115. [Google Scholar] [CrossRef]
- Chen, M.-M.; Chen, M.-C. Modeling Road Accident Severity with Comparisons of Logistic Regression, Decision Tree and Random Forest. Information 2020, 11, 270. [Google Scholar] [CrossRef]
- Du, X.; Wang, G. Analysis of Operating Safety of Tractor-Trailer under Crosswind in Cold Mountainous Areas. Appl. Sci. 2022, 12, 12755. [Google Scholar] [CrossRef]
- Montoya-Alcaraz, M.; Mungaray-Moctezuma, A.; Calderón-Ramírez, J.; García, L.; Martinez-Lazcano, C. Road Safety Analysis of High-Risk Roads: Case Study in Baja California, México. Safety 2020, 6, 45. [Google Scholar] [CrossRef]
- Shahdah, U.E.; Alanazi, F.; Azam, A.; Elbany, M. Safety and Mobility Performance Comparison of Two-Plus-One and Two-Lane Two-Way Roads: A Simulation Study. Appl. Sci. 2024, 14, 4352. [Google Scholar] [CrossRef]
- Al-Sheikh, O.; Ghasemi, S.H.; Jalayer, M. Reliability-based analysis of horizontal curve design by evaluating the impact of vehicle automation on roadway departure crashes and safety performance. Heliyon 2024, 10, e25346. [Google Scholar] [CrossRef]
- Ma, Q.; Yang, H.; Wang, Z.; Xie, K.; Yang, D. Modeling crash risk of horizontal curves using large-scale auto-extracted roadway geometry data. Accid. Anal. Prev. 2020, 144, 105669. [Google Scholar] [CrossRef] [PubMed]
- Samerei, S.A.; Aghabayk, K.; Montella, A. Analyzing Pile-Up Crash Severity: Insights from Real-Time Traffic and Environmental Factors Using Ensemble Machine Learning and Shapley Additive Explanations Method. Safety 2024, 10, 22. [Google Scholar] [CrossRef]
- Wang, M.-H. Investigating the Difference in Factors Contributing to the Likelihood of Motorcyclist Fatalities in Single Motorcycle and Multiple Vehicle Crashes. Int. J. Environ. Res. Public Health 2022, 19, 8411. [Google Scholar] [CrossRef]
- Champahom, T.; Se, C.; Aryuyo, F.; Banyong, C.; Jomnonkwao, S.; Ratanavaraha, V. Crash Severity Analysis of Young Adult Motorcyclists: A Comparison of Urban and Rural Local Roadways. Appl. Sci. 2023, 13, 11723. [Google Scholar] [CrossRef]
- Huang, H.; Ding, X.; Yuan, C.; Liu, X.; Tang, J. Jointly analyzing freeway primary and secondary crash severity using a copula-based approach. Accid. Anal. Prev. 2023, 180, 106911. [Google Scholar] [CrossRef]
- Kumara, S.D.D.R.; Walgampaya, C.K. Identification of Severity Factors and Risk Areas of Southern Expressway Accidents. Eng. J. Inst. Eng. Sri Lanka 2021, 54, 61–75. [Google Scholar] [CrossRef]
- Wei, X.; Tian, S.; Dai, Z.; Li, P. Statistical Analysis of Major and Extra Serious Traffic Accidents on Chinese Expressways from 2011 to 2021. Sustainability 2022, 14, 15776. [Google Scholar] [CrossRef]
- Rajbahadur, G.K.; Wang, S.; Oliva, G.A.; Kamei, Y.; Hassan, A.E. The Impact of Feature Importance Methods on the Interpretation of Defect Classifiers. IEEE Trans. Softw. Eng. 2022, 48, 2245–2261. [Google Scholar] [CrossRef]
- Zhang, X.; Waller, S.T.; Jiang, P. An ensemble machine learning-based modeling framework for analysis of traffic crash frequency. Comput.-Aided Civ. Infrastruct. Eng. 2019, 35, 258–276. [Google Scholar] [CrossRef]
- Schindler, R.; Jeppsson, H. In-depth analysis of scenarios and injuries in crashes between cyclists and commercial vehicles in Germany. Traffic Saf. Res. 2024, 7, e000067. [Google Scholar] [CrossRef]
- Cummings, M.L. Identifying AI Hazards and Responsibility Gaps. IEEE Access 2025, 13, 54338–54349. [Google Scholar] [CrossRef]
- Liu, Q.; Zhang, C.; Gordon, T.J.; Wang, J. Dynamics and control of articulated passenger vehicles on roads. Veh. Syst. Dyn. 2025, 63, 1395–1457. [Google Scholar] [CrossRef]
- Useche, S.A.; Cendales, B.; Alonso, F.; Montoro, L. Multidimensional prediction of work traffic crashes among Spanish professional drivers in cargo and passenger transportation. Int. J. Occup. Saf. Ergon. 2020, 28, 20–27. [Google Scholar] [CrossRef] [PubMed]
- Franklin, R.C.; King, J.C.; Riggs, M. A Systematic Review of Large Agriculture Vehicles Use and Crash Incidents on Public Roads. J. Agromed. 2019, 25, 14–27. [Google Scholar] [CrossRef]
- McFalls, M.; Ramirez, M.; Harland, K.; Zhu, M.; Morris, N.L.; Hamann, C.; Peek-Asa, C. Farm vehicle crashes on public roads: Analysis of farm-level factors. J. Rural Health 2021, 38, 537–545. [Google Scholar] [CrossRef]
- De Santos-Berbel, C.; Ferreira, S.; Couto, A.; Lobo, A. Development of Motorway Horizontal Alignment Databases for Accurate Accident Prediction Models. Sustainability 2024, 16, 7296. [Google Scholar] [CrossRef]
- Jeon, H.; Benekohal, R.F. Speed and Lane Change Management Strategies for CAV in Mixed Traffic for Post-Incident Operation. Future Transp. 2025, 5, 51. [Google Scholar] [CrossRef]
- Pei, Y.-L.; He, Y.-M.; Ran, B.; Kang, J.; Song, Y.-T. Horizontal Alignment Security Design Theory and Application of Superhighways. Sustainability 2020, 12, 2222. [Google Scholar] [CrossRef]
- Wu, X.; Chen, F.; Bo, W.; Shuai, Y.; Zhang, X.; Da, W.; Liu, H.; Chen, J. Analysis of Factors Influencing Driving Safety at Typical Curve Sections of Tibet Plateau Mountainous Areas Based on Explainability-Oriented Dynamic Ensemble Learning Strategy. Sustainability 2025, 17, 7820. [Google Scholar] [CrossRef]
- Rehak, D.; Vlkovsky, M.; Manas, P.; Apeltauer, J.; Apeltauer, T.; Hromada, M. Sustainability of the Trans-European Transport Networks Land Infrastructure to Address Large-Scale Disasters: A Case Study in the Czech Republic. Sustainability 2025, 17, 2509. [Google Scholar] [CrossRef]




| Authors | Description | Methods | Key Findings |
|---|---|---|---|
| Dadashova et al. (2020) [36] | Investigated crash injury severity using discrete-choice models and Random Forest on Spanish trans-European corridors. | Logistic regression and RF, with crash types disaggregated by roadway, driver, and environmental variables. | Roadway design elements (curvature, super elevation, lane width) were significant predictors. Logistic regression highlighted conditional effects by crash type, whereas RF identified critical factors across crash categories. |
| Yassin (2020) [37] | Developed a hybrid framework integrating K-means clustering with Random Forest for crash severity prediction | K-means was used to extract hidden features; RF was employed for classification against alternative classifiers. | The hybrid approach achieved outstanding accuracy (99.86%). Driver experience, lighting conditions, driver age, and vehicle service year emerged as dominant factors in predicting severity outcomes. |
| Atumo et al. (2021) [38] | Applied spatial statistics and Random Forest to traffic crash hot spot identification. | Getis-Ord statistics for spatial clustering complemented with RF-based crash prediction using 2010–2017 data. | Identified crash hot spots on interstate routes; RF achieved validation and prediction accuracy of 76.7% and 74%, respectively. Results highlighted the spatial dependence of crash distributions and confirmed predictive robustness. |
| Akin et al. (2022) [39] | Identified behavioral causes of crashes on Saudi Arabian highways using supervised ML. | Logistic regression, RF, and KNN applied to driver error-related crashes on the highway. | RF and logistic regression achieved the highest accuracy (78.7%), with RF attaining the largest AUC (0.712). Findings revealed that traffic flow speed and lane count reduced driver error–related crashes, while higher AADT and curve sections increased risk. |
| Gatera et al. (2023) [40] | Compared Random Forest and Support Vector Machine regression models for short-term road crash forecasting | RF and SVM were evaluated using error indices (MAE, MSE, RMSE) and R2 values. | RF demonstrated superior predictive capacity (R2 = 0.91) compared to SVM (R2 = 0.86), reinforcing the promise of ML in traffic crash forecasting. |
| Nikolaou et al. (2023) [41] | Compared ML techniques for motorway segment crash risk prediction | Logistic Regression, Decision Tree, RF, SVM, and kNN applied to road design and naturalistic driving datasets. | RF achieved the highest accuracy (89.3%) and superior precision-recall-F1 balance. Shapley additive explanations enhanced interpretability, highlighting RF’s effectiveness for crash risk assessment. |
| Yang et al. (2023) [33] | Proposed a Random Forest based framework for predicting crash severity using enriched feature sets | RF compared with BP neural network, SVM, and radial basis neural network; feature importance ranking applied to 12 variables. | RF outperformed other models, achieving higher recall (0.83) and F1 scores, with a lower false alarm rate. Results confirmed its reliability and stability in severity prediction. |
| Ahmed et al. (2023) [42] | Examines crash prediction and contributing factors using explainable ensemble machine learning models | Random Forest (RF), Decision Jungle, AdaBoost, XGBoost, LightGBM, and CatBoost, with interpretability through SHAP analysis. | RF achieved the highest predictive accuracy (81.45%), with road category and number of vehicles identified as the most influential factors affecting injury severity. |
| Wang et al. (2024) [43] | Developed a meta-heuristic optimized Random Forest model for traffic crash severity prediction. | Compared nine meta-heuristic RF variants (e.g., CPO-RF, SSA-RF) against standard ensemble and single classifiers using U.S. crash data. | CPO-RF yielded superior accuracy (95.2%) and F1 scores exceeding 90%. Application of inverse SMOTE improved accuracy to 99.6%. Key predictors included temperature, weather, pressure, GDP, population density, and time of day. |
| Uzunov et al. (2025) [44] | Comparative analysis of proportional distribution methods and Random Forest for pedestrian crash risk prediction | Proportional risk distribution and RF applied to data derived from court cases, with risk factors quantified through expert evaluation. | Both approaches were valid; however, RF provided superior accuracy and robustness. Significant correlation between methods confirmed validity, with graphical visualizations aiding the interpretability of risk severity. |
| Daoud et al. (2025) [45] | Examined nighttime crash severity and the role of roadway illumination | Developed and compared seven machine learning models (logistic regression, k-NN, naïve Bayes, random forest, ANN, XGBoost, LSTM) using TxDOT crash data. | The random forest model produced the most promising results by predicting severe crashes with 97.6% accuracy. |
| Variable Category | Representative Variable | Fatal | Serious | Slight |
|---|---|---|---|---|
| Track Code | Undivided | 83% | 84% | 85% |
| Left track | 9% | 9% | 8% | |
| Right track | 8% | 7% | 7% | |
| Road Category | Motorway | 18% | 16% | 14% |
| Expressway | 4% | 5% | 5% | |
| Primary Main Road | 22% | 25% | 26% | |
| Secondary Main Road | 57% | 54% | 55% | |
| Section Type | Rural | 81% | 62% | 63% |
| Urban | 19% | 38% | 37% | |
| Number of Lanes | Two lanes | 85% | 83% | 83% |
| Horizontal Alignment | Straight | 48% | 44% | 46% |
| Right/Left | 34% | 36% | 35% |
| Category | Variable Name | Description |
|---|---|---|
| Road Attributes | Track Code | (0: undivided, 1: left track, 2: right track). |
| Road Category | (1: motorway, 2: expressway, 3: primary main road, 4: secondary main road). | |
| Section Type | (1: rural or 2: urban). | |
| Number of Traffic Lanes | Numeric | |
| Radius of Horizontal Curve | (In meters). | |
| Direction of Horizontal Curve | (1: right, 2: left, 3: straight). | |
| Slope/Gradient | (In %.). | |
| Type of Vertical Curve | Type of vertical alignment curve (e.g., crest, sag). | |
| Radius of Vertical Curve | (In meters). | |
| Traffic Attributes | Capacity Utilization | Ratio of AADT to road capacity (%). |
| Heavy Truck Traffic | Numeric | |
| Medium Heavy (2-Axle) Truck Traffic | Numeric | |
| Truck with Trailer or Semi-Trailer Traffic | Numeric | |
| Tractor-Trailer with Semi-Trailer Traffic | Numeric | |
| Light Truck Traffic | Numeric | |
| Total Bus (Single) Traffic | Numeric | |
| Bus (Articulated) Traffic | Numeric | |
| Motorcycle and Moped Traffic | Numeric | |
| Bicycle Traffic | Numeric | |
| Slow Vehicle and Agricultural Tractor Traffic | Numeric | |
| Derived Variables | Severity | Target variable: crash severity counts (slight injuries, serious injuries, fatalities). |
| Model | Hyperparameter | Type | Search Range/Values |
|---|---|---|---|
| Random Forest | n_estimators | Discrete | {100, 200, 300} |
| max_depth | Discrete | {10, 15, 20, 25} | |
| min_samples_split | Discrete | {2, 5, 10} | |
| min_samples_leaf | Discrete | {2, 3, 4} | |
| max_features | Categorical | {‘sqrt’, ‘log2’} | |
| K-Nearest Neighbors | n_neighbors | Discrete | {3, 5, 7} |
| Gradient Boosting | n_estimators | Discrete | {100, 150, 200} |
| learning_rate | Continuous | {0.01, 0.05, 0.1} | |
| max_depth | Discrete | {2, 3, 5} | |
| min_samples_split | Discrete | {2, 5, 10} | |
| subsample | Continuous | {0.7, 0.9, 1.0} |
| Train | Test | |||||
|---|---|---|---|---|---|---|
| Metric | RF | KNN | GB | RF | KNN | GB |
| Accuracy | 0.9918 | 0.8499 | 0.9895 | 0.9373 | 0.7224 | 0.9463 |
| Precision | 0.9920 | 0.8504 | 0.9897 | 0.9381 | 0.7296 | 0.9471 |
| F1-Score | 0.9918 | 0.8487 | 0.9895 | 0.9371 | 0.7168 | 0.9460 |
| MCC | 0.9878 | 0.7762 | 0.9844 | 0.9065 | 0.5899 | 0.9200 |
| G-Mean | 0.9938 | 0.8866 | 0.9922 | 0.9528 | 0.7886 | 0.9596 |
| MSE | 0.0082 | 0.2801 | 0.0149 | 0.1075 | 0.4836 | 0.0896 |
| RMSE | 0.0906 | 0.5292 | 0.1222 | 0.3278 | 0.6954 | 0.2993 |
| R2 | 0.9877 | 0.5801 | 0.9776 | 0.8386 | 0.2735 | 0.8655 |
| Train | Test | |||||
|---|---|---|---|---|---|---|
| Metric | RF | KNN | GB | RF | KNN | GB |
| Accuracy | 0.9503 | 0.9093 | 0.9109 | 0.9377 | 0.8815 | 0.8976 |
| Precision | 0.9559 | 0.9170 | 0.9155 | 0.9428 | 0.8858 | 0.9022 |
| F1-Score | 0.9512 | 0.9114 | 0.9104 | 0.9385 | 0.8830 | 0.8967 |
| MCC | 0.9438 | 0.8969 | 0.8989 | 0.9294 | 0.8648 | 0.8839 |
| G-Mean | 0.9714 | 0.9474 | 0.9483 | 0.9641 | 0.9309 | 0.9404 |
| MSE | 0.1057 | 0.4293 | 0.2364 | 0.1859 | 0.5784 | 0.2890 |
| RMSE | 0.3251 | 0.6552 | 0.4862 | 0.4311 | 0.7605 | 0.5376 |
| R2 | 0.9899 | 0.9591 | 0.9775 | 0.9823 | 0.9449 | 0.9725 |
| Train | Test | |||||
|---|---|---|---|---|---|---|
| Metric | RF | KNN | GB | RF | KNN | GB |
| Accuracy | 0.9706 | 0.9074 | 0.7425 | 0.9064 | 0.8391 | 0.7155 |
| Precision | 0.9761 | 0.9117 | 0.7362 | 0.9108 | 0.8374 | 0.7041 |
| F1-Score | 0.9718 | 0.9072 | 0.7294 | 0.9068 | 0.8352 | 0.7003 |
| MCC | 0.9683 | 0.8994 | 0.7210 | 0.8983 | 0.8252 | 0.6917 |
| G-Mean | 0.9839 | 0.9485 | 0.8515 | 0.9480 | 0.9093 | 0.8349 |
| MSE | 0.2103 | 1.0197 | 3.1294 | 0.6024 | 1.8831 | 3.3148 |
| RMSE | 0.4586 | 1.0098 | 1.7690 | 0.7761 | 1.3723 | 1.8207 |
| R2 | 0.9824 | 0.9144 | 0.7374 | 0.9494 | 0.8419 | 0.7218 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hamdan, N.; Sipos, T. Predicting Segment-Level Road Traffic Injury Counts Using Machine Learning Models: A Data-Driven Analysis of Geometric Design and Traffic Flow Factors. Future Transp. 2025, 5, 197. https://doi.org/10.3390/futuretransp5040197
Hamdan N, Sipos T. Predicting Segment-Level Road Traffic Injury Counts Using Machine Learning Models: A Data-Driven Analysis of Geometric Design and Traffic Flow Factors. Future Transportation. 2025; 5(4):197. https://doi.org/10.3390/futuretransp5040197
Chicago/Turabian StyleHamdan, Noura, and Tibor Sipos. 2025. "Predicting Segment-Level Road Traffic Injury Counts Using Machine Learning Models: A Data-Driven Analysis of Geometric Design and Traffic Flow Factors" Future Transportation 5, no. 4: 197. https://doi.org/10.3390/futuretransp5040197
APA StyleHamdan, N., & Sipos, T. (2025). Predicting Segment-Level Road Traffic Injury Counts Using Machine Learning Models: A Data-Driven Analysis of Geometric Design and Traffic Flow Factors. Future Transportation, 5(4), 197. https://doi.org/10.3390/futuretransp5040197

