1. Introduction
Road traffic injury (RTI) imposes a substantial human and economic burden on countries worldwide. More than 1.5 million road users died in 2018 and an additional 50 million were injured or permanently disabled, particularly among vulnerable road users (i.e., pedestrians and motorcyclists) [
1]. Addressing the RTI global epidemic is of utmost importance to reduce RTI’s associated morbidity and mortality. Understanding key contributing factors that increase the likelihood of sustaining severe and fatal road injury is critical to curtailing the impact of this major public health problem. Existing literature examined major intrinsic and extrinsic factors associated with the increased risk of fatal road crashes. Rolison et al. (2018) assessed the human factors and identified drivers’ characteristics (e.g., age, gender, safety measures adopted, risk-taking behavior) that influence the severity of crash outcomes [
2]. Other studies examined the role of environmental factors (e.g., road types, nighttime travel, weather conditions) in amplifying the risk of exposure to fatal road injuries. Altwaijri et al. (2011) investigated additional contributing factors, including excessive speed, single vehicle, wet surface, and dark lighting road conditions [
3].
Ample research has explored the use of novel methodologies and statistical models to analyze and understand key elements influencing injury severity outcomes and the increased risk of fatal injuries. Statistical and regression models, in addition to feature-based artificial intelligent models, present a non-traditional alternative and technique employed to examine the multiple variables that catalyze the occurrence of road fatalities. These models have been adopted to investigate underlying factors that can contribute to preventing or reducing severe and fatal RTI and guide injury preventive efforts. Keall et al. (2004) and Zajac and Ivan (2002) employed the logit-based model to detect the impact of drinking and driving on the injury severity outcome [
4,
5]. Lack of safety measures and restraint system use (e.g., seat belt and helmet use) were also identified by Bedard et al. (2002) and Valent et al. (2002) as major predictors of RTI severity and fatality using the multivariate logistic regression models [
6,
7]. Yau (2004) used the stepwise logistic regression to investigate human and environmental factors associated with severe injuries and demonstrated that drivers’ gender is a predicting human factor while road lighting conditions, geographic district, time of crash, and vehicle age are environmental factor predictors of injury severity outcome [
8]. Al-Ghamdi et al. (2002) adopted a logistic regression model to identify the growing risk of fatal injuries associated with road types, such as non-intersection roads compared to intersection roads [
9]. The study further explored the use of regression models to examine all variables that significantly contribute to road crash fatalities including speed, running a red light, following too close, wrong way, and failure to yield. Tay (2011) presented a multinomial logit model approach to assess pedestrian–vehicle crash severity [
10]. Abdel-Aty (2004) compared various models to predict crash injury severity levels and concluded that driver gender, excess speed, use of a seatbelt, vehicle type, and rural or urban crash location are major factors affecting the crash injury severity level [
11]. Abdel-Aty (2003) and Xie et al. (2009) generated ordered probit models and Bayesian ordered probit (BOP) models to analyze the driver’s injury severity in road crashes [
12,
13]. El Tayeb et al. (2015) applied association rules’ mining algorithms to examine the effect of multiple crash factors on the severity and fatalities of the road in Dubai city, and concluded that male drivers, private vehicles, and the month of December are key factors that highly predicted crash outcomes [
14]. Bigham (2014) adopted data mining algorithms, such as logistic regression and classification and regression tree (CART), to underline the influence of human factors on escalating the severity of road crashes [
15]. Results revealed the effect of the driver’s license possession, seat belt use, gender and age on the severity of road injuries. Several studies carried out in Turkey by Aci (2018) and Akgungor et al. (2009) designed and developed severity models using machine learning methods and artificial neural network models to estimate the number of road injuries and to assess their severe outcomes; they demonstrated that ‘cloudiness’ and ‘high volume of traffic’ were major predictor factors that influence the increased occurrence and severity of road crashes in cities [
16,
17].
Nevertheless, most regression models have inherited limitations, mainly in the assumption of linear and nonlinear relationships between the exploratory variables and the variable under investigation. Violation of any of these presumed assumptions may deviate the analysis and risk potential analysis errors. The overarching objective of this study was to propose a model that will help us to gain a deep understanding of key variables that significantly contribute to fatal road injuries. The aim of this study was two-fold: First, to develop a machine learning-based intelligent model that can accurately classify and rank variables influencing the occurrence of fatal crashes; and secondly, to adopt the proposed machine learning model to investigate the relationship of fatal road crashes with a set of input feature variables. Using the proposed model, the ranking of the input feature variables was identified based on their strong association with the occurrence of fatal road injuries. Gained knowledge from the model outcome can be used to advance our understanding and awareness of major contributing factors associated with fatal crashes and serves as a first step toward addressing these critical elements and improving road safety.
2. Data Description
The primary source of data used in this study was the national road crash database procured from the Lebanese Road Accidents Platform (LRAP). The LRAP database compiles national traffic data by crowdsourcing reported road crashes in Lebanon from social media consolidated mainly from three credible sources: Traffic Management Authority, Civil Defense, and Lebanese Red Cross, intending to study crash characteristics and contributing factors [
18]. The database encompasses 8482 crash records spanning over a 4-year period from February 2015 to February 2019. As the objective of our study was to identify the variables that significantly contribute to fatal road injuries, data was refined and parsed for the needed information. Nine variables were selected to be included as input variables in this study and one variable ‘fatality occurrence’ was selected as the output variables (fatal, non-fatal). We obtained the data in digital form and coded the attributes describing the details and outcome of each occurring road crash, including the crash date and time (i.e., month, weekday, hour), location, type (i.e., vehicle, motorcycle, truck, bike, pedestrian), road type (i.e., motorway, primary, secondary, tertiary), injury severity level (no apparent injury, minor injury, serious injury), and the number of fatalities. To account for the spatial feature of the crash data, K-means clustering was applied to all crash events and the cluster ID was recorded as an input variable, ‘Spatial Cluster ID’, with values ranging from 1 to 10.
Table 1 demonstrates the input and output features selected along with their corresponding ranges. Roads’ characteristics were extracted from the Lebanese roads shapefile available at Open Street Map.
5. Discussion
The increase in the frequency and severity of road crashes necessitates the effective analysis and abstraction of relevant risk factors associated with fatal road injuries. This study adopted a unique methodological approach to propose a model that synthesizes the strengths of multiple data mining techniques in order to perform a robust analysis of the multiple factors associated with fatal road injuries. Data mining presents a plausible analytical technique widely adopted in scientific analysis; It integrates techniques from multiple disciplines, including database technology, statistics, machine learning, and spatial data analysis, to provide a rich analysis and to allow the extraction of valuable knowledge embedded in datasets. Identifying key fatal injury-contributing factors constitutes the base for informing road safety policies and designing effective preventive countermeasures.
The study findings align with existing studies and reveal that ‘crash type’, mainly the ‘vehicle–pedestrian’ and the ‘truck–motorcycle’ types were the most significant factors associated with the increased risk of road fatalities. This is evident as pedestrians and motorcyclists are particularly vulnerable road users [
32]. Studies in many low- and middle-income countries (LMICs) indicate that pedestrians disproportionally account for the majority of road fatalities [
1], particularly that a large number of individuals commute by walking- the most affordable mode of transportation. With lax safety policies and the absence of road infrastructure and regulations (e.g., designated and lighted crosswalks, road safety design, and traffic management), many pedestrians are forced to share the roads with vehicles, which places them at a greater risk of being hit or killed on the road. Many LMICs reported high rates of pedestrians’ fatalities, reaching up to 85% of total road victims [
33,
34,
35,
36,
37]. This is mainly due to pedestrians’ low tolerance to high impact vehicle collisions, often resulting in severe bodily damage and fatal injuries. In some instances, pedestrians’ unsafe behaviors tend to exacerbate the risk of fatal injuries. According to the Committee of Land Transport in the Global Safety Organization [
38], safety behaviors, namely the lack of compliance with existing road safety regulations and laws (e.g., ignoring traffic signals, crossing midblock), increased pedestrians’ risk of injuries. Concerted efforts from multiple entities, including road engineers, law enforcement officers, and non-governmental agencies, should be mobilized to encourage safe behaviors and increase pedestrians’ safety on the roads.
The ‘truck–motorcycle type’ was the second strongest predictor of fatal road crashes within the ‘crash type’ category. Given the disproportional size difference between the two transport modes, the risk of fatal injury substantially increases when a truck–motorcycle crash occurs. Moreover, the lack of motorcycling infrastructure on many roads imposes a great risk on motorcycle riders [
39] as motorcycles are often not visible to truck drivers, particularly on narrow shared roads, at intersections, or when the motorcycle is caught in the truck’s blind spot. Another major contributor to fatal injuries is when trucks attempt to turn left while motorcycle riders keep travelling straight, leading to high impact collisions and fatal injury [
40]. The implementation of sensors or assistant systems that alert truck drivers of dangerous situations or warn them of blind spot occupancy can substantially prevent fatal collisions. Moreover, wearing protective clothing, using helmets, and abiding with road safety regulations are measures that considerably reduce the fatality of the motorcyclists on the road.
Our analysis demonstrated that the injury severity level is a critical factor in determining the fatality outcome of the road injury. Individuals involved in road crashes with high injury severity levels are strongly correlated with death outcomes; road victims with reportedly high injury severity scores (i.e., ISS 51–75) suffer from low survival rates compared to victims with a low ISS score of 40 or less [
41]. Several studies confirmed the strong association between fatal road crashes and severe head trauma, particularly among pedestrians and motorcyclists [
32,
42]. Head injuries are considered a severe injury, leading to 4–5 times higher risk of road deaths. The reduced fatality of high ISS is mostly attributed to the enhanced medical services provided at the crash site, coupled with advancement in medical technologies and treatment at local trauma centers and medical facilities. Future work should focus on improving regional medical services and accessing these services through mapping fatal road crash sites relative to local emergency centers’ locations.
Geographic location and environmental factors were also highly associated with fatal crashes [
43]. This study shows that ‘spatial cluster ID’ was another major contributor to fatal injuries. Fatal crashes mainly clustered in densely populated residential locations and in areas of high socioeconomic status families. This clearly indicates that areas populated with families of a high socioeconomic status tend to observe a high number of privately owned vehicles and consequently an excess volume of traffic. With the increased number of private vehicles possessed per household and less reliance on public transport, the vehicle miles traveled (VMT) increases and ultimately the probability of fatal road crashes increases.
Another predictor of road fatality was the ‘hour of road crash’ factor. This finding agrees with existing literature and confirms the high correlation between early AM crash time and fatality outcome [
7,
37]. Crashes mostly happening between 1 and 6 AM result in a disproportionally high rate of death compared to crashes occurring during the day [
7,
37]. According to a study conducted in the United Kingdom, injury severity, defined as the number of deadly crashes per 100 crashes, is higher during nighttime than daytime [
44]. The heightened fatal crashes in the early AM hours are mainly associated with multiple factors, including drivers’ fatigue and sleepiness, alcohol consumption, and reduced visibility at night, particularly on non-lit roads. Sleepiness and fatigue are critical conditions that result in drivers’ diminished cognitive performance and proper risk judgment, especially following a long working day [
45]. Advanced technologies and sensors should be embedded in vehicles to help detect the driver’s sleepiness mode and alert the vehicle’s occupant. Impaired driving regulations should be strictly enforced by government police. Placing radar detectors and performing alcohol tests on drivers at late hours would help to mitigate the occurrence of fatal injuries. Clear visibility is a major protective factor; the absence of a properly lit road hinders drivers’ clear visibility at night and consequently reduces their information processing capabilities and slows their reaction time during crashes [
44], which translates into longer stopping distances, and increases the severity of crash outcomes.
The result of the analyses further showed that crashes occurring on a specific day of the week are more correlated with fatal outcomes. According to our study, Fridays and Sundays are associated with an increased probability of fatal crashes. Fridays and Sundays correspond to the beginning and end of weekends in Lebanon, with increased leisure travel and excessive traffic commuting and exiting main urban cities. Ample studies reported high rates of fatal road injuries occurring on weekends; approximately 38% to 56% of fatal road crashes occur on Fridays to Sundays [
34,
46,
47]. Typically, fatal road injuries are more prevalent on weekends, major holidays, and Fridays leading to long weekends [
48]. To curtail the number of fatal road crashes, strong enforcement should be rigorously implemented to enhance safety and reduce fatalities on the road.
Another main contributing factor for fatal road crashes was ‘road type’. Compared to urban and rural roads, highways were specifically correlated with fatal crashes. A number of studies demonstrated that fatal injuries mostly occur on intercity roads and highways outside of the urban setting, leading to an increased frequency and fatalities of road crashes [
7,
42,
49]. This increased crash fatalities on highways is due to multiple reasons, mainly road infrastructure, speed limit variation, and road lighting conditions. The lack of safety in road design and infrastructure presents additional potential hazards for severe injuries, including crashes occurring at sharp curves, non-lit intersections, and slippery pavement surfaces [
50].
This demonstrative model has many strengths and limitations. One of the major advantages offered by this model is its ability to predict the occurrence of fatal road crashes and to identify the interplay of several contributing elements that increase the likelihood of these fatal crashes. An advanced pre-processing technique was first used to account for the dataset’s skewness. The model was developed based on hybrid ensemble machine learning, which proved to be highly accurate. This model encountered some limitations. A limited number of variables were not available in the dataset, such as sex and alcohol blood level; therefore, an interpretation of the data predictions should be considered in light of the many assumptions and the availability of variables in the dataset. Future research questions remain to be answered, particularly concerning how sensitive the classification is to crash parameters.