From Prediction to Explanation: Explainable Machine Learning for Motor Vehicle–Involved Pedestrian and Cyclist Crash Risk
Abstract
1. Introduction
1.1. Traditional Statistical Approaches to Predict Vulnerable User-Involved Crashes
1.2. Machine Learning Approaches in Crash Prediction
1.3. Feature Importance and Machine Learning Model Explainability
| ML Model | Author | Variables Used | Model Output |
|---|---|---|---|
| DT/RF | Gu et al. [48] | Roadway characteristics, traffic exposure, vehicle dynamics (speed and acceleration), and intersection control features | Crash occurrence |
| Wu et al. [49] | Roadway characteristics, traffic exposure, and socio-economic attributes | Crash occurrence | |
| Yan and Shen [50] | Roadway characteristics, temporal and weather attributes, and intersection control type | Crash severity | |
| Sum et al. [51] | Roadway characteristics, traffic exposure, environmental conditions, and vehicle involvement factors | Crash severity | |
| SVM | Yu and Abdel-Aty [29] | Traffic states (speed, occupancy, flow variation) and roadway characteristics | Crash occurrence |
| You et al. [52] | Traffic states, roadway characteristics, and weather conditions | Crash occurrence | |
| Basso et al. [53] | Traffic composition, vehicle mix, and roadway characteristics | Crash occurrence | |
| KNN | Yang et al. [54] | Socio-economic attributes, traffic exposure, crash time, and roadway characteristics | Crash severity |
| Madushani et al. [55] | Roadway characteristics, pavement condition, lighting environment, and weather conditions | Crash severity | |
| Santos et al. [56] | Roadway characteristics, intersection control, temporal attributes, and weather conditions | Crash severity | |
| Haghshenas et al. [57] | Roadway characteristics, pavement condition, traffic exposure, and weather attributes | Crash severity | |
| BLR | Wang et al. [58] | Roadway characteristics, construction zone attributes, traffic states, and temporal factors | Crash occurrence |
| Shiran et al. [59] | Roadway geometry, pavement condition, lighting, weather, and traffic exposure | Crash severity | |
| Najafi Moghaddam Gilani et al. [60] | Roadway characteristics, traffic exposure, lighting conditions, weather, and socio-economic characteristics | Crash severity | |
| Wang et al. [61] | Roadway and traffic characteristics, environmental conditions, and rider demographics | Crash severity |
2. Methods
2.1. Data Preparation
2.1.1. Data Sources
2.1.2. Study Area and Period
2.1.3. Crash Data and Intersection Classification
2.1.4. Geometric and Environmental Attributes
2.1.5. Socio-Demographic Variables
2.2. Methodological Framework
2.2.1. Binary Logistic Regression (BLR)
2.2.2. K-Nearest Neighbors (KNN)
2.2.3. Support Vector Machine (SVM)
2.2.4. Decision Tree (DT)
2.2.5. Random Forests (RF)
2.2.6. Model Evaluation Metrics
2.2.7. Feature Importance Analysis
3. Results
3.1. Models’ Performance
3.2. Feature Importance and Model Interpretation
3.2.1. Global Feature Importance
3.2.2. Local Feature Contributions: SHAP Force Plot
3.2.3. Pairwise SHAP Dependence Analysis of the Top Predictors
3.3. Socioeconomic Influence on Crash Occurrence
4. Discussion
4.1. Model Performance Comparison and Selection
4.2. Infrastructure and Traffic Exposure Effects on Crash Risk
4.3. Contextual and Socioeconomic Influences on Crash Risk
4.4. Practical Implications for Intersection Safety Management
4.5. Limitations and Scope of the Analysis
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| ML | Machine Learning |
| BLR | Binary Logistic Regression |
| KNN | K-Nearest Neighbors |
| SVM | Support Vector Machine |
| DT | Decision tree |
| RF | Random Forest |
| PDPs | Partial Dependence Plots |
| SHAP | Shapley Additive exPlanations |
| TEV | Total entering vehicles |
| ITD | Idaho Transportation Department |
| GIS | Geographic Information System |
| ACS | American Community Survey |
| FHWA | Federal Highway Administration |
| MOE | Margin of Error |
| AWSC | All-Way Stop Control |
| TWSC | Two-Way Stop Control |
| ROC | Receiver Operating Characteristics |
| AUC | Area Under the Receiver Operating Characteristic Curve |
References
- Gerike, R.; de Nazelle, A.; Wittwer, R.; Parkin, J. Special issue “walking and cycling for better transport, health and the environment”. Transp. Res. Part A Policy Pract. 2019, 123, 1–6. [Google Scholar] [CrossRef]
- Bleviss, D.L. Transportation is critical to reducing greenhouse gas emissions in the United States. WIREs Energy Environ. 2021, 10, e390. [Google Scholar] [CrossRef]
- Woodcock, J.; Edwards, P.; Tonne, C.; Armstrong, B.G.; Ashiru, O.; Banister, D.; Beevers, S.; Chalabi, Z.; Chowdhury, Z.; Cohen, A. Public health benefits of strategies to reduce greenhouse-gas emissions: Urban land transport. Lancet 2009, 374, 1930–1943. [Google Scholar] [CrossRef] [PubMed]
- World Health Organization. Road Traffic Injuries; Technical Report; World Health Organization: Geneva, Switzerland, 2023. [Google Scholar]
- Le, H.T.; Buehler, R.; Hankey, S. Have walking and bicycling increased in the US? A 13-year longitudinal analysis of traffic counts from 13 metropolitan areas. Transp. Res. Part D Transp. Environ. 2019, 69, 329–345. [Google Scholar] [CrossRef]
- Jackson, S.; Raymond, P.; Taylor, S. Bicycle and Pedestrian Safety Research Project; Technical Report; Idaho Transportation Department: Boise, ID, USA, 2023.
- Mahmoudi, J.; Xiong, C.; Yang, M.; Luo, W. Modeling the Frequency of Pedestrian and Bicyclist Crashes at Intersections: Big Data-driven Evidence From Maryland. Transp. Res. Rec. J. Transp. Res. Board 2023, 2677, 1245–1260. [Google Scholar] [CrossRef]
- Anastasopoulos, P.C.; Mannering, F.L. An empirical assessment of fixed and random parameter logit models using crash-and non-crash-specific injury data. Accid. Anal. Prev. 2011, 43, 1140–1147. [Google Scholar] [CrossRef]
- Mannering, F.L.; Shankar, V.; Bhat, C.R. Unobserved heterogeneity and the statistical analysis of highway accident data. Anal. Methods Accid. Res. 2016, 11, 1–16. [Google Scholar] [CrossRef]
- Yuan, J.; Abdel-Aty, M. Approach-level real-time crash risk analysis for signalized intersections. Accid. Anal. Prev. 2018, 119, 274–289. [Google Scholar] [CrossRef]
- Abdel-Aty, M.; Haleem, K. Analyzing angle crashes at unsignalized intersections using machine learning techniques. Accid. Anal. Prev. 2011, 43, 461–470. [Google Scholar] [CrossRef] [PubMed]
- Iranitalab, A.; Khattak, A. Comparison of four statistical and machine learning methods for crash severity prediction. Accid. Anal. Prev. 2017, 108, 27–36. [Google Scholar] [CrossRef]
- Osman, O.A.; Hajij, M.; Bakhit, P.R.; Ishak, S. Prediction of Near-Crashes from Observed Vehicle Kinematics using Machine Learning. Transp. Res. Rec. J. Transp. Res. Board 2019, 2673, 463–473. [Google Scholar] [CrossRef]
- Theofilatos, A.; Chen, C.; Antoniou, C. Comparing Machine Learning and Deep Learning Methods for Real-Time Crash Prediction. Transp. Res. Rec. J. Transp. Res. Board 2019, 2673, 169–178. [Google Scholar] [CrossRef]
- Pljakić, M.; Jovanović, D.; Matović, B. The influence of traffic-infrastructure factors on pedestrian accidents at the macro-level: The geographically weighted regression approach. J. Saf. Res. 2022, 83, 248–259. [Google Scholar] [CrossRef]
- Ma, Z.; Lu, X.; Chien, S.I.J.; Hu, D. Investigating factors influencing pedestrian injury severity at intersections. Traffic Inj. Prev. 2018, 19, 159–164. [Google Scholar] [CrossRef]
- Mukherjee, D.; Mitra, S. A comprehensive study on identification of risk factors for fatal pedestrian crashes at urban intersections in a developing country. Asian Transp. Stud. 2020, 6, 100003. [Google Scholar] [CrossRef]
- Li, L.; Yang, X.; Yin, L. Exploration of Pedestrian Refuge Effect on Safety Crossing at Signalized Intersection. Transp. Res. Rec. J. Transp. Res. Board 2010, 2193, 44–50. [Google Scholar] [CrossRef]
- Salmon, P.M.; Naughton, M.; Hulme, A.; McLean, S. Bicycle crash contributory factors: A systematic review. Saf. Sci. 2022, 145, 105511. [Google Scholar] [CrossRef]
- Prati, G.; Marín Puchades, V.; De Angelis, M.; Fraboni, F.; Pietrantoni, L. Factors contributing to bicycle–motorised vehicle collisions: A systematic literature review. Transp. Rev. 2018, 38, 184–208. [Google Scholar] [CrossRef]
- Boufous, S.; De Rome, L.; Senserrick, T.; Ivers, R. Risk factors for severe injury in cyclists involved in traffic crashes in Victoria, Australia. Accid. Anal. Prev. 2012, 49, 404–409. [Google Scholar] [CrossRef]
- Meuleners, L.B.; Fraser, M.; Johnson, M.; Stevenson, M.; Rose, G.; Oxley, J. Characteristics of the road infrastructure and injurious cyclist crashes resulting in a hospitalisation. Accid. Anal. Prev. 2020, 136, 105407. [Google Scholar] [CrossRef]
- Abellán, J.; López, G.; De OñA, J. Analysis of traffic accident severity using decision rules via decision trees. Expert Syst. Appl. 2013, 40, 6047–6054. [Google Scholar] [CrossRef]
- Das, A.; Abdel-Aty, M.; Pande, A. Using conditional inference forests to identify the factors affecting crash severity on arterial corridors. J. Saf. Res. 2009, 40, 317–327. [Google Scholar] [CrossRef]
- Harb, R.; Yan, X.; Radwan, E.; Su, X. Exploring precrash maneuvers using classification trees and random forests. Accid. Anal. Prev. 2009, 41, 98–107. [Google Scholar] [CrossRef]
- Li, Z.; Liu, P.; Wang, W.; Xu, C. Using support vector machine models for crash injury severity analysis. Accid. Anal. Prev. 2012, 45, 478–486. [Google Scholar] [CrossRef]
- Dong, N.; Huang, H.; Zheng, L. Support vector machine in crash prediction at the level of traffic analysis zones: Assessing the spatial proximity effects. Accid. Anal. Prev. 2015, 82, 192–198. [Google Scholar] [CrossRef]
- Li, X.; Lord, D.; Zhang, Y.; Xie, Y. Predicting motor vehicle crashes using support vector machine models. Accid. Anal. Prev. 2008, 40, 1611–1618. [Google Scholar] [CrossRef] [PubMed]
- Yu, R.; Abdel-Aty, M. Utilizing support vector machine in real-time crash risk evaluation. Accid. Anal. Prev. 2013, 51, 252–259. [Google Scholar] [CrossRef] [PubMed]
- Lv, Y.; Tang, S.; Zhao, H. Real-time highway traffic accident prediction based on the k-nearest neighbor method. In Proceedings of the 2009 International Conference on Measuring Technology and Mechatronics Automation; IEEE: Piscataway, NJ, USA, 2009; Volume 3, pp. 547–550. [Google Scholar]
- Zhang, L.; Liu, Q.; Yang, W.; Wei, N.; Dong, D. An improved k-nearest neighbor model for short-term traffic flow prediction. Procedia-Soc. Behav. Sci. 2013, 96, 653–662. [Google Scholar] [CrossRef]
- Lu, T.; Dunyao, Z.H.U.; Lixin, Y.; Pan, Z. The traffic accident hotspot prediction: Based on the logistic regression method. In Proceedings of the 2015 International Conference on Transportation Information and Safety (ICTIS); IEEE: Piscataway, NJ, USA, 2015; pp. 107–110. [Google Scholar]
- Rahman, R.; Bhowmik, T.; Eluru, N.; Hasan, S. Assessing the crash risks of evacuation: A matched case-control approach applied over data collected during Hurricane Irma. Accid. Anal. Prev. 2021, 159, 106260. [Google Scholar] [CrossRef] [PubMed]
- Gill, N.; Hall, P.; Montgomery, K.; Schmidt, N. A responsible machine learning workflow with focus on interpretable models, post-hoc explanation, and discrimination testing. Information 2020, 11, 137. [Google Scholar] [CrossRef]
- Guerra-Manzanares, A.; Nõmm, S.; Bahsi, H. Towards the integration of a post-hoc interpretation step into the machine learning workflow for IoT botnet detection. In Proceedings of the 2019 18th IEEE International Conference On Machine Learning and Applications (ICMLA); IEEE: Piscataway, NJ, USA, 2019; pp. 1162–1169. [Google Scholar]
- Vieira, C.P.; Digiampietri, L.A. Machine Learning post-hoc interpretability: A systematic mapping study. In Proceedings of the XVIII Brazilian Symposium on Information Systems, Curitiba, Brazil, 16–19 May 2022; Association for Computing Machinery: New York, NY, USA, 2022; pp. 1–8. [Google Scholar] [CrossRef]
- Delen, D.; Tomak, L.; Topuz, K.; Eryarsoy, E. Investigating injury severity risk factors in automobile crashes with predictive analytics and sensitivity analysis methods. J. Transp. Health 2017, 4, 118–131. [Google Scholar] [CrossRef]
- Jiang, L.; Xie, Y.; Wen, X.; Ren, T. Modeling highly imbalanced crash severity data by ensemble methods and global sensitivity analysis. J. Transp. Saf. Secur. 2022, 14, 562–584. [Google Scholar] [CrossRef]
- Wen, X.; Xie, Y.; Jiang, L.; Li, Y.; Ge, T. On the interpretability of machine learning methods in crash frequency modeling and crash modification factor development. Accid. Anal. Prev. 2022, 168, 106617. [Google Scholar] [CrossRef]
- Toran Pour, A.; Moridpour, S.; Tay, R.; Rajabifard, A. Modelling pedestrian crash severity at mid-blocks. Transp. A Transp. Sci. 2017, 13, 273–297. [Google Scholar] [CrossRef]
- Danesh, T.; Ouaret, R.; Floquet, P. Interpretability in machine learning predictions: Case of Random Forest regression using Partial Dependence Plots. In Proceedings of the 18ème Congrès de la Société Française de Génie des Procédés, Toulouse, France, 7–10 November 2022. [Google Scholar]
- Apley, D.W.; Zhu, J. Visualizing the effects of predictor variables in black box supervised learning models. J. R. Stat. Soc. Ser. B Stat. Methodol. 2020, 82, 1059–1086. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Parsa, A.B.; Movahedi, A.; Taghipour, H.; Derrible, S.; Mohammadian, A.K. Toward safer highways, application of XGBoost and SHAP for real-time accident detection and feature analysis. Accid. Anal. Prev. 2020, 136, 105405. [Google Scholar] [CrossRef] [PubMed]
- Li, X.; Shi, L.; Shi, Y.; Tang, J.; Zhao, P.; Wang, Y.; Chen, J. Exploring interactive and nonlinear effects of key factors on intercity travel mode choice using XGBoost. Appl. Geogr. 2024, 166, 103264. [Google Scholar] [CrossRef]
- Dong, S.; Khattak, A.; Ullah, I.; Zhou, J.; Hussain, A. Predicting and analyzing road traffic injury severity using boosting-based ensemble learning models with SHAPley Additive exPlanations. Int. J. Environ. Res. Public Health 2022, 19, 2925. [Google Scholar] [CrossRef]
- Hasan, A.S.; Jalayer, M.; Das, S.; Kabir, M.A.B. Application of machine learning models and SHAP to examine crashes involving young drivers in New Jersey. Int. J. Transp. Sci. Technol. 2024, 14, 156–170. [Google Scholar] [CrossRef]
- Gu, Y.; Liu, D.; Arvin, R.; Khattak, A.J.; Han, L.D. Predicting intersection crash frequency using connected vehicle data: A framework for geographical random forest. Accid. Anal. Prev. 2023, 179, 106880. [Google Scholar] [CrossRef]
- Wu, D.; Zhang, Y.; Xiang, Q. Geographically weighted random forests for macro-level crash frequency prediction. Accid. Anal. Prev. 2024, 194, 107370. [Google Scholar] [CrossRef]
- Yan, M.; Shen, Y. Traffic accident severity prediction based on random forest. Sustainability 2022, 14, 1729. [Google Scholar] [CrossRef]
- Sum, S.; Se, C.; Champahom, T.; Jomnonkwao, S.; Sinha, S.; Ratanavaraha, V. A Random Forest and SHAP-based analysis of motorcycle crash severity in Thailand: Urban-Rural and Day-Night perspectives. Transp. Eng. 2025, 21, 100369. [Google Scholar] [CrossRef]
- You, J.; Wang, J.; Guo, J. Real-time crash prediction on freeways using data mining and emerging techniques. J. Mod. Transp. 2017, 25, 116–123. [Google Scholar] [CrossRef]
- Basso, F.; Basso, L.J.; Pezoa, R. The importance of flow composition in real-time crash prediction. Accid. Anal. Prev. 2020, 137, 105436. [Google Scholar] [CrossRef]
- Yang, L.; Aghaabbasi, M.; Ali, M.; Jan, A.; Bouallegue, B.; Javed, M.F.; Salem, N.M. Comparative analysis of the optimized KNN, SVM, and ensemble DT models using Bayesian optimization for predicting pedestrian fatalities: An advance towards realizing the sustainable safety of pedestrians. Sustainability 2022, 14, 10467. [Google Scholar] [CrossRef]
- Madushani, J.S.; Sandamal, R.K.; Meddage, D.P.P.; Pasindu, H.R.; Gomes, P.A. Evaluating expressway traffic crash severity by using logistic regression and explainable & supervised machine learning classifiers. Transp. Eng. 2023, 13, 100190. [Google Scholar] [CrossRef]
- Santos, D.; Saias, J.; Quaresma, P.; Nogueira, V.B. Machine learning approaches to traffic accident analysis and hotspot prediction. Computers 2021, 10, 157. [Google Scholar] [CrossRef]
- Haghshenas, S.S.; Guido, G.; Vitale, A.; Astarita, V. Assessment of the level of road crash severity: Comparison of intelligence studies. Expert Syst. Appl. 2023, 234, 121118. [Google Scholar] [CrossRef]
- Wang, J.; Song, H.; Fu, T.; Behan, M.; Jie, L.; He, Y.; Shangguan, Q. Crash prediction for freeway work zones in real time: A comparison between Convolutional Neural Network and Binary Logistic Regression model. Int. J. Transp. Sci. Technol. 2022, 11, 484–495. [Google Scholar] [CrossRef]
- Shiran, G.; Imaninasab, R.; Khayamim, R. Crash severity analysis of highways based on multinomial logistic regression model, decision tree techniques, and artificial neural network: A modeling comparison. Sustainability 2021, 13, 5670. [Google Scholar] [CrossRef]
- Najafi Moghaddam Gilani, V.; Hosseinian, S.M.; Ghasedi, M.; Nikookar, M. Data-Driven Urban Traffic Accident Analysis and Prediction Using Logit and Machine Learning-Based Pattern Recognition Models. Math. Probl. Eng. 2021, 2021, 9974219. [Google Scholar] [CrossRef]
- Wang, Z.; Huang, S.; Wang, J.; Sulaj, D.; Hao, W.; Kuang, A. Risk factors affecting crash injury severity for different groups of e-bike riders: A classification tree-based logistic regression model. J. Saf. Res. 2021, 76, 176–183. [Google Scholar] [CrossRef]
- Lowry, M.B.; Ward, C.R. Development of a Methodology to Evaluate the Highway Safety Improvement Program; Technical Report; Idaho Transportation Department: Boise, ID, USA, 2023.
- Elsayed, A.; Smith, S.; Abdel-Rahim, A.; Chang, K. Impact of the COVID-19 Pandemic on Travel Mode Choices and Fatal Crash Rates; Technical Report; Center for Safety Equity in Transportation: Fairbanks, AK, USA, 2025. [Google Scholar]
- Awad, M.; Khanna, R. Support Vector Machines for Classification. In Efficient Learning Machines; Apress: Berkeley, CA, USA, 2015; pp. 39–66. [Google Scholar] [CrossRef]
- Ghosh, S.; Dasgupta, A.; Swetapadma, A. A study on support vector machine based linear and non-linear pattern classification. In Proceedings of the 2019 International Conference on Intelligent Sustainable Systems (ICISS); IEEE: Piscataway, NJ, USA, 2019; pp. 24–28. [Google Scholar]
- Luna, J.M.; Gennatas, E.D.; Ungar, L.H.; Eaton, E.; Diffenderfer, E.S.; Jensen, S.T.; Simone, C.B.; Friedman, J.H.; Solberg, T.D.; Valdes, G. Building more accurate decision trees with the additive tree. Proc. Natl. Acad. Sci. USA 2019, 116, 19887–19893. [Google Scholar] [CrossRef]
- Syam, N.; Kaul, R. Random forest, bagging, and boosting of decision trees. In Machine Learning and Artificial Intelligence in Marketing and Sales: Essential Reference for Practitioners and Data Scientists; Emerald Publishing Limited: Leeds, UK, 2021; pp. 139–182. [Google Scholar]
- Chang, I.; Park, H.; Hong, E.; Lee, J.; Kwon, N. Predicting effects of built environment on fatal pedestrian accidents at location-specific level: Application of XGBoost and SHAP. Accid. Anal. Prev. 2022, 166, 106545. [Google Scholar] [CrossRef]
- Yuan, C.; Li, Y.; Huang, H.; Wang, S.; Sun, Z.; Wang, H. Application of explainable machine learning for real-time safety analysis toward a connected vehicle environment. Accid. Anal. Prev. 2022, 171, 106681. [Google Scholar] [CrossRef] [PubMed]
- Liu, C.; Zhao, M.; Li, W.; Sharma, A. Multivariate random parameters zero-inflated negative binomial regression for analyzing urban midblock crashes. Anal. Methods Accid. Res. 2018, 17, 32–46. [Google Scholar] [CrossRef]
- Wu, P.; Chen, T.; Wong, Y.D.; Meng, X.; Wang, X.; Liu, W. Exploring key spatio-temporal features of crash risk hot spots on urban road network: A machine learning approach. Transp. Res. Part A Policy Pract. 2023, 173, 103717. [Google Scholar] [CrossRef]
- Koepsell, T.; McCloskey, L.; Wolf, M.; Moudon, A.V.; Buchner, D.; Kraus, J.; Patterson, M. Crosswalk markings and the risk of pedestrian–motor vehicle collisions in older pedestrians. JAMA 2002, 288, 2136–2143. [Google Scholar] [CrossRef]
- Deliali, A.; Fournier, N.; Christofa, E.; Knodler, M. Investigating the Safety Impact of Segment- and Intersection-Level Bicycle Treatments on Bicycle–Motorized Vehicle Crashes. Transp. Res. Rec. J. Transp. Res. Board 2023, 2677, 1315–1330. [Google Scholar] [CrossRef]
- Younes, H.; Noland, R.B.; Von Hagen, L.A.; Meehan, S. Pedestrian-and bicyclist-involved crashes: Associations with spatial factors, pedestrian infrastructure, and equity impacts. J. Saf. Res. 2023, 86, 137–147. [Google Scholar] [CrossRef] [PubMed]
- Roll, J.; McNeil, N. Race and income disparities in pedestrian injuries: Factors influencing pedestrian safety inequity. Transp. Res. Part D Transp. Environ. 2022, 107, 103294. [Google Scholar] [CrossRef]








| Variable Category and Name | Description | Mean | SD | Min | Max |
|---|---|---|---|---|---|
| Target Variable: Crash Occurrence | 1 = Crash; 0 = No crash | 0.63 | 0.48 | 0 | 1 |
| Traffic Volume: Total Entering Vehicles | Estimated total vehicles entering the intersection | 42,608 | 47,358 | 400 | 352,360 |
| Roadway and Geometric Design Variables: | |||||
| Major Road Type | 1 = Interstate; 2 = Freeway; 3 = Principal Arterial; 4 = Minor Arterial; 5 = Major Collector; 6 = Minor Collector; 7 = Local | 5.10 | 1.71 | 1 | 7 |
| Minor Road Type | Same classification as major approach | 5.36 | 1.52 | 2 | 7 |
| Major Road Lanes | Number of lanes on major approach | 1.41 | 0.66 | 0 | 5 |
| Minor Road Lanes | Number of lanes on minor approach | 1.08 | 0.31 | 0 | 4 |
| Major Right-Turn Lane | 1 = Yes; 0 = No | 0.10 | 0.30 | 0 | 1 |
| Major Left-Turn Lane | 1 = Yes; 0 = No | 0.26 | 0.44 | 0 | 1 |
| Minor Right-Turn Lane | 1 = Yes; 0 = No | 0.14 | 0.34 | 0 | 1 |
| Minor Left-Turn Lane | 1 = Yes; 0 = No | 0.20 | 0.40 | 0 | 1 |
| Major Crosswalk | 1 = Present; 0 = Absent | 0.38 | 0.49 | 0 | 1 |
| Minor Crosswalk | 1 = Present; 0 = Absent | 0.41 | 0.49 | 0 | 1 |
| Intersection Control | 1 = Signal; 2 = AWSC; 3 = TWSC; 4 = Roundabout; 5 = Uncontrolled | 2.78 | 1.19 | 1 | 5 |
| Pavement Marking | 1 = Yes; 0 = No | 0.63 | 0.48 | 0 | 1 |
| Intersection Lighting | 1 = Yes; 0 = No | 0.89 | 0.32 | 0 | 1 |
| Pedestrian and Cyclist Activity Variables: | |||||
| Pedestrian and Cyclist Volume Level | 1 = Low; 2 = Medium; 3 = High | 1.97 | 0.71 | 1 | 3 |
| Socio-Demographic and Economic Variables: | |||||
| Population ≤ 18 years (%) | Population under 18 years old | 21.45 | 7.54 | 1.0 | 42.7 |
| Population ≥ 65 years (%) | Population aged 65 or older | 14.75 | 6.30 | 0.7 | 47.3 |
| Dependent-Age Population (%) | Population ≤ 18 or ≥65 years | 36.19 | 9.44 | 2.2 | 63.6 |
| Median Household Income (USD) | Household income over the past 12 months | 65,648 | 24,064 | 19,500 | 192,802 |
| Total Households | Total households per census tract | 1726 | 542 | 505 | 3847 |
| Housing and Household Characteristics: | |||||
| Renter-Occupied Units (%) | Housing units occupied by renters | 36.96 | 19.93 | 0 | 87.1 |
| Owner-Occupied Units (%) | Housing units occupied by owners | 55.42 | 22.00 | 2.9 | 99.1 |
| Households with No Vehicle (%) | Households reporting no vehicle ownership | 6.07 | 6.28 | 0 | 27.9 |
| Commuting and Travel Behavior Variables: * | |||||
| Drive Alone to work (%) | Workers commuting alone by car | 73.10 | 10.32 | 33.7 | 93.3 |
| Carpool to work (%) | Workers commuting by shared carpool | 8.50 | 4.74 | 0 | 33.5 |
| Public Transit to work (%) | Workers using public transit | 0.84 | 1.48 | 0 | 10.2 |
| Walk or Bike to work (%) | Workers walking or cycling to work | 5.75 | 6.42 | 0 | 36.3 |
| Work at Home (%) | Workers working remotely | 10.70 | 6.48 | 0 | 48.3 |
| Commute Time Variables: | |||||
| Commute 5–9 min (%) | Workers commuting 5–9 min | 18.19 | 9.48 | 0.6 | 53.1 |
| Commute 10–14 min (%) | Workers commuting 10–14 min | 22.16 | 9.35 | 0.2 | 63.2 |
| Commute 15–19 min (%) | Workers commuting 15–19 min | 19.52 | 7.55 | 0 | 49.8 |
| Commute 20–24 min (%) | Workers commuting 20–24 min | 12.97 | 7.61 | 0.1 | 45.7 |
| Education and School Enrollment Variables: | |||||
| Private School Enrollment (K–12) (%) | Students enrolled in private schools (K–12) | 11.01 | 10.39 | 0 | 86.0 |
| Private School Enrollment (5–8) (%) | Students enrolled in private schools (grades 5–8) | 9.51 | 13.80 | 0 | 100 |
| Private School Enrollment (9–12) (%) | Students enrolled in private schools (grades 9–12) | 10.18 | 14.16 | 0 | 100 |
| Symbol | Description | Equation No. |
|---|---|---|
| Model coefficient (weight) for feature i. | Equation (1) | |
| Intercept or bias term in the linear models. | Equation (1) | |
| Predicted probability of observation i in the logistic regression. | Equation (1) | |
| k | Number of nearest neighbors in the KNN model. | Equation (2) |
| Euclidean distance between the two observations. | Equation (2) | |
| n | Number of input features in the dataset. | Equation (2) |
| Weight vector and bias term defining the separating hyperplane in SVM. | Equation (3) | |
| Indicator function (1 if x belongs to region , 0 otherwise). | Equation (4) | |
| Constant prediction value within region in the Decision Tree. | Equation (4) | |
| Partitioned region l in the feature space of the Decision Tree. | Equation (4) |
| Metric | Description | Equation |
|---|---|---|
| Accuracy | Measures the overall proportion of correctly classified instances among all predictions. | |
| Precision | Proportion of correctly predicted positive cases out of all predicted positives. | |
| Recall (sensitivity) | Proportion of actual positives correctly identified by the model. | |
| F1-Score | Harmonic mean of precision and recall, balancing both metrics under a class imbalance. | |
| ROC | Graphical representation of the trade-off between the true-positive rate (TPR) and false-positive rate (FPR) across the classification thresholds. | |
| AUC | Scalar metric summarizing the ROC curve; represents the probability that the classifier ranks a randomly chosen positive instance higher than a negative instance. The values ranged from zero to one. |
| Model | Accuracy % | Precision % | Recall % | F1-Score % | AUC |
|---|---|---|---|---|---|
| BLR | 90.99 | 94.69 | 90.93 | 92.77 | 0.9609 |
| KNN | 90.45 | 93.35 | 91.50 | 92.42 | 0.9107 |
| SVM | 90.45 | 94.64 | 90.09 | 92.31 | 0.9492 |
| DT | 88.47 | 89.37 | 92.92 | 91.11 | 0.8681 |
| RF | 89.37 | 92.00 | 91.22 | 91.61 | 0.9639 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Elsayed, A.; Abdel-Rahim, A.; Prescott, L. From Prediction to Explanation: Explainable Machine Learning for Motor Vehicle–Involved Pedestrian and Cyclist Crash Risk. Infrastructures 2026, 11, 77. https://doi.org/10.3390/infrastructures11030077
Elsayed A, Abdel-Rahim A, Prescott L. From Prediction to Explanation: Explainable Machine Learning for Motor Vehicle–Involved Pedestrian and Cyclist Crash Risk. Infrastructures. 2026; 11(3):77. https://doi.org/10.3390/infrastructures11030077
Chicago/Turabian StyleElsayed, Ahmed, Ahmed Abdel-Rahim, and Logan Prescott. 2026. "From Prediction to Explanation: Explainable Machine Learning for Motor Vehicle–Involved Pedestrian and Cyclist Crash Risk" Infrastructures 11, no. 3: 77. https://doi.org/10.3390/infrastructures11030077
APA StyleElsayed, A., Abdel-Rahim, A., & Prescott, L. (2026). From Prediction to Explanation: Explainable Machine Learning for Motor Vehicle–Involved Pedestrian and Cyclist Crash Risk. Infrastructures, 11(3), 77. https://doi.org/10.3390/infrastructures11030077

