Vulnerable road users (VRUs) represent a large portion of fatalities and injuries occurring on European Union roads. It is therefore important to address the safety of VRUs, particularly in urban areas, by identifying which factors may affect the injury severity level that can be used to develop countermeasures. This paper aims to identify the risk factors that affect the severity of a VRU injured when involved in a motor vehicle crash. For that purpose, a comparative evaluation of two machine learning classifiers—decision tree and logistic regression—considering three different resampling techniques (under-, over- and synthetic oversampling) is presented, comparing both imbalanced and balanced datasets. Crash data records were analyzed involving VRUs from three different cities in Portugal and six years (2012–2017). The main conclusion that can be drawn from this study is that oversampling techniques improve the ability of the classifiers to identify risk factors. On the one hand, this analysis revealed that road markings, road conditions and luminosity affect the injury severity of a pedestrian. On the other hand, age group and temporal variables (month, weekday and time period) showed to be relevant to predict the severity of a cyclist injury when involved in a crash.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited