Previous Article in Journal
A Column Generation-Based Optimization Approach for the Train Loading Planning Problem with Simulation-Based Evaluation of Rail Forwarding at the Port of Valencia
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Predicting Segment-Level Road Traffic Injury Counts Using Machine Learning Models: A Data-Driven Analysis of Geometric Design and Traffic Flow Factors

1
Department of Transport Technology and Economics, Faculty of Transportation Engineering and Vehicle Engineering, Budapest University of Technology and Economics, Megyetem rkp. 3., H-1111 Budapest, Hungary
2
KTI Hungarian Institute for Transport Sciences Nonprofit Ltd., Than Károly Street 3-5., H-1119 Budapest, Hungary
*
Author to whom correspondence should be addressed.
Future Transp. 2025, 5(4), 197; https://doi.org/10.3390/futuretransp5040197
Submission received: 27 August 2025 / Revised: 5 December 2025 / Accepted: 11 December 2025 / Published: 12 December 2025

Abstract

Accurate prediction of road traffic crash severity is essential for developing data-driven safety strategies and optimizing resource allocation. This study presents a predictive modeling framework that utilizes Random Forest (RF), Gradient Boosting (GB), and K-Nearest Neighbors (KNN) to estimate segment-level frequencies of fatalities, serious injuries, and slight injuries on Hungarian roadways. The model integrates an extensive array of predictor variables, including roadway geometric design features, traffic volumes, and traffic composition metrics. To address class imbalance, each severity class was modeled using resampled datasets generated via the Synthetic Minority Over-sampling Technique (SMOTE), and model performance was optimized through grid-search cross-validation for hyperparameter optimization. For the prediction of serious- and slight-injury crash counts, the Random Forest (RF) ensemble model demonstrated the most robust performance, consistently attaining test accuracies above 0.91 and coefficient of determination (R2) values exceeding 0.95. In contrast, for fatalities count prediction, the Gradient Boosting (GB) model achieved the highest accuracy (0.95), with an R2 value greater than 0.87. Feature importance analysis revealed that heavy vehicle flows consistently dominate crash severity prediction. Horizontal alignment features primarily influenced fatal crashes, while capacity utilization was more relevant for slight and serious injuries, reflecting the roles of geometric design and operational conditions in shaping crash occurrence and severity. The proposed framework demonstrates the effectiveness of machine learning approaches in capturing non-linear relationships within transportation safety data and offers a scalable, interpretable tool to support evidence-based decision-making for targeted safety interventions.
Keywords: traffic crash severity; random forest; gradient boosting; k-nearest neighbors; ensemble learning; geometric design; imbalanced data traffic crash severity; random forest; gradient boosting; k-nearest neighbors; ensemble learning; geometric design; imbalanced data

Share and Cite

MDPI and ACS Style

Hamdan, N.; Sipos, T. Predicting Segment-Level Road Traffic Injury Counts Using Machine Learning Models: A Data-Driven Analysis of Geometric Design and Traffic Flow Factors. Future Transp. 2025, 5, 197. https://doi.org/10.3390/futuretransp5040197

AMA Style

Hamdan N, Sipos T. Predicting Segment-Level Road Traffic Injury Counts Using Machine Learning Models: A Data-Driven Analysis of Geometric Design and Traffic Flow Factors. Future Transportation. 2025; 5(4):197. https://doi.org/10.3390/futuretransp5040197

Chicago/Turabian Style

Hamdan, Noura, and Tibor Sipos. 2025. "Predicting Segment-Level Road Traffic Injury Counts Using Machine Learning Models: A Data-Driven Analysis of Geometric Design and Traffic Flow Factors" Future Transportation 5, no. 4: 197. https://doi.org/10.3390/futuretransp5040197

APA Style

Hamdan, N., & Sipos, T. (2025). Predicting Segment-Level Road Traffic Injury Counts Using Machine Learning Models: A Data-Driven Analysis of Geometric Design and Traffic Flow Factors. Future Transportation, 5(4), 197. https://doi.org/10.3390/futuretransp5040197

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop