Next Article in Journal
The Role of CSR in Promoting Energy-Specific Pro-Environmental Behavior among Hotel Employees
Next Article in Special Issue
Vehicle Recognition from Unmanned Aerial Vehicle Videos Based on Fusion of Target Pre-Detection and Deep Learning
Previous Article in Journal
Mining Safety and Sustainability—An Overview
Previous Article in Special Issue
Risk Identification and Conflict Prediction from Videos Based on TTC-ML of a Multi-Lane Weaving Area
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

An Injury-Severity-Prediction-Driven Accident Prevention System

Department of Electrical and Computer Engineering, Rowan University, Glassboro, NJ 08028, USA
Department of Computer Science, Rowan University, Glassboro, NJ 08028, USA
Author to whom correspondence should be addressed.
Sustainability 2022, 14(11), 6569;
Submission received: 31 March 2022 / Revised: 4 May 2022 / Accepted: 6 May 2022 / Published: 27 May 2022


Traffic accidents are inevitable events that occur unexpectedly and unintentionally. Therefore, analyzing traffic data is essential to prevent fatal accidents. Traffic data analysis provided insights into significant factors and driver behavioral patterns causing accidents. Combining these patterns and the prediction model into an accident prevention system can assist in reducing and preventing traffic accidents. This study applied various machine learning models, including neural network, ordinal regression, decision tree, support vector machines, and logistic regression to have a robust prediction model in injury severity. The trained model provides timely and accurate predictions on accident occurrence and injury severity using real-world traffic accident datasets. We proposed an informative negative data generator using feature weights derived from multinomial logit regression to balance the non-fatal accident data. Our aim is to resolve the bias that happens in the favor of the majority class as well as performance improvement. We evaluated the overall and class-level performance of the machine learning models based on accuracy and mean squared error scores. Three hidden layered neural networks outperformed the other models with 0.254 ± 0.038 and 0.173 ± 0.016 MSE scores for two different datasets. A neural network, which provides more accurate and reliable results, should be integrated into the accident prevention system.

1. Introduction

Complex traffic environments with unpredictable events threaten the safety of pedestrians, passengers, and drivers. With the increase in population and vehicles, traffic accidents have become a major concern for transportation safety. Traffic accidents increase visible and hidden costs including physical and psychological health issues, and insurance, and impact the economy [1]. A prediction system of potential accidents and injuries helps to improve transportation safety, and reduces costs. The automotive industry focuses on developing and improving sensor-based data-driven intelligent technologies in vehicles to maintain a safe environment in traffic. These intelligent vehicle technologies include functionalities such as determining following distance and perceiving on-road objects [2]. Developing advanced transportation safety systems with the timely prediction of potential traffic accidents and possible injury severity in an intelligent vehicle would ensure road, driver, and passenger safety [2,3]. Designing safe road roundabouts contribute to addressing traffic congestion and increasing pedestrian and road safety [4]. Roundabout intersections cause a small number of collision points due to their geometry [5,6], which reduces the severity of injury levels in accidents. In addition, these intersections ensure the flow of traffic by reducing the time loss at inlets [5] and preventing accidents caused by congestion. However, designing safe roundabout intersections alone is not enough to create sustainable transportation. Therefore, we propose an accident prevention and alerting system supported by a robust prediction model selected after exhaustive experiments. The system predicts traffic accident occurrence and injury severity based on driving status and environmental conditions to establish advanced transportation safety. However, injury severity prediction is a challenging problem due to imbalanced data, mainly consisting of accident records. Traditional machine learning (ML) algorithms focus on maximizing the overall accuracy of the whole dataset and tend to show poor performance on imbalanced data due to a lack of information on negative or positive samples [7]. Since ML models have a bias in favor of the majority class, achieving a good prediction model in imbalanced learning is crucial in advanced transportation safety systems. Traditional sampling techniques require to have minority classes to balance data. Since traffic data do not include negative, i.e., non-accident data, traditional sampling techniques are impractical for this domain. To overcome these challenges, we propose a negative data generation scheme based on feature weights derived from multinomial logistic regression using positive instances, i.e., accident data.
Another challenge in injury severity problems is the ordinality of classes. Many studies have been conducted to determine accident risks using ML algorithms [8,9,10]. However, these studies assume that injury severity levels are nominal and none of them used ordinal regression (OR) algorithms, which demonstrate better results than conventional ML algorithms in a classification problem where the class order is essential [11]. Since the injury severity level of an accident is usually ordinal, i.e., from non-fatal injury to fatal injury level, we use ordinal regression algorithms to have a robust accident prevention system with a reliable prediction model for intelligent vehicles to develop an advanced transportation safety system. A prevention and alerting system will detect accident-prone situations and dangerous human driving behaviors to decrease the likelihood of traffic accidents and potential injury severity. In particular, the warnings by this system allow the intelligent vehicles and drivers to take timely precautions by applying safety maneuvers such as decreasing vehicle speed, automatic braking, automatic lane keeping/control, precise maneuvering, etc. in complex traffic environments [2,3,12].
Deep learning (DL) models have achieved impressive performance in various domains such as autonomous vehicle systems with advances in computing power and technologies [13,14]. Since neural networks (NN) have become a powerful technique in finding complex patterns in high dimensional datasets and providing high prediction accuracy, they can provide robust and reliable predictions in ordinal datasets, as well [15,16]. In this study, we compared the performance of NNs, OR models, decision tree (DT), support vector machines (SVM), and logistic regression (LR) to have a robust prediction model in the accident prevention system. We also examined the effect of different hyperparameters and architectures on injury prediction performance.
The contributions of this paper are as follows:
  • A new framework to generate non-accident data based on the accident instances using the most contributing factors of traffic accidents. This will ensure a more balanced dataset and improve the predictive model in accident prevention systems for intelligent vehicles,
  • A robust and more accurate NN prediction model to estimate injury severity compared to ordinal regression and other methods. With NN, we overcome the disadvantages of ordinal regression models (i.e., low robustness, not dealing with multicollinearity).
The rest of the paper is organized as follows: Section 2 covers the literature review of accident risks and injury severity prediction models with commonly used methods. Section 3 presents the methodology including an overview of the proposed accident prevention system, used methods, and data generation process. Section 4 describes the experimental detail and discusses the experimental results and comparison. Section 5 concludes the paper with future work.

2. Literature Review

Accident data commonly include weather conditions, road conditions, temporal factors, and driving behaviors. ML models have been extensively applied [8,9,10,12,17,18,19,20,21,22] in assessing injury severity and determination of critical factors for motor vehicle accidents of their ability to solve non-linear relationships as seen in traffic data. Yuan et al. [9] applied various machine learning algorithms, including support vector machine, decision tree, neural network, and random forest (RF), to classify accident and non-accident classes. Additionally, they used informative negative sampling to balance the binary classification problem. Zhu et al. [20] proposed a machine learning-based framework to detect driver injury patterns using NN and RF. Pradhan et al. [21] modeled traffic accident severity using NN and SVM methods based on actually reported causes with seven explanatory features. Results indicated that linear SVM has the highest accuracy value. In Delen’s research [22], the contributing factors of injury severity are examined using NN, SVM, DT, and LR by modeling the problem as a binary classification. Liao et al. [23] studied injury severity prediction in autonomous vehicles for emergency decision making. They used SVM with three types of kernels and compared their results with ordered-logit and NN algorithms. Table 1 lists all the classes used for accident classification and proposed ML algorithms. The most common ML algorithms (DT, SVM, k-Nearest Neighbor, NN, RF, etc.) are frequently applied in assessing injury severity in accidents [9,10,17,18,19,20,21,22]. The studies with new and/or modified approaches assessing injury severity in accidents mainly compared with NN because of its learning power [24]. Among these studies, only Yuan et al. [9] focus on resolving the imbalanced data problem by using informative data sampling to create non-accident data. Other studies use only accident data to predict injury severity levels using various ML models. However, none of these studies investigate the effect of different NN architectures on the prediction performance for accident and injury severity.
Ordinal regression models are used when the classes represent levels of an inherent order [3,11,25,26,27,28]. Some examples of the applications include evaluating disease severity in plants [26], healthcare applications [11], and assessing credit-rating agencies [25]. This study provides a comprehensive comparison with commonly used ML algorithms in accident risk assessment and ordinal regression models [3].
We conducted exhaustive experiments to determine the best NN architecture and hyperparameter configuration for injury severity prediction. We also compared our classification results with four different ordinal regression models [3], and three commonly used methods, namely DT, Linear SVM, and logistic regression, for two different real-world fatal accident datasets. Since most accident datasets only have positive instances, we also proposed a new negative data generation process to overcome the challenges in imbalanced learning and traditional sampling techniques.

3. Methodology

3.1. Overview of Accident Prevention and Alert System

This section provides an overview of the prevention and alerting system framework and details of the prediction model [3]. The details of the prevention system and prediction model framework are shown in Figure 1. The prevention system takes inputs including driver information, GPS data, weather and road situation, and historical accident records. The inputs are used to create predictions and accident risks, then it receives the risks and provides alert messages to warn drivers to take precautions such as reducing speed, keeping a safe following distance, etc. Integrating such a reliable prediction model into intelligent vehicles helps decrease the likelihood of accidents and injury severities, and improve the safety of vulnerable road users, drivers, and passengers. Towards this end, the prediction model is critical to the overall system. The study mainly focuses on developing a robust prediction model to determine and integrate the best classification method into the system.

3.2. Ordinal Regression Models

Ordinal regression models developed by McCullagh, use the ordinal nature of data by defining various stochastic sorting paradigms [29]. These methods resolve the requirement of assigning scores to classes instead of ordinality [29]. Ordinal regression is a supervised learning problem where the label of the classes has an inherent order [27]. Ordinal regression algorithms benefit from this order information to improve classification performance [3,24]. Ordinal regression implementations occur in areas where human-sourced data are important, and output variables cannot be measured with high sensitivity [24,25]. The accident dataset used in the paper presents such characteristics. In this paper, ordinal regression methods are divided into two main groups: Threshold-based and Regression-based methods [3]. Threshold-based methods have two different approaches based on the application of threshold: logistic all-threshold (AT) and logistic immediate-threshold (IT) [3]. Regularization parameter is taken as 50 for threshold-based methods. Regression-based method includes ordinal ridge and least absolute deviation (LAD). In the LAD method, ε parameter is taken as 0.001, the tolerance value is taken as 0.0001 and the regularization parameter is taken as 10 in this study. For the ordinal ridge method, the regularization parameter and the tolerance values are equal to 10 and 0.0001, respectively. More information can be found in Alicioglu et al. [3]. Equation (1) shows general ordinal regression model [29], where γj = p1(x) + … + pj(x), β is a vector of regression coefficients and θj = logkj, kj(x) is the odds.
OR = log γ j x 1 γ j x = θ j β t x ,                 ( 1 j < k )

3.3. Neural Network

Artificial neural networks are supervised machine learning models inspired by the learning mechanism of the human brain [30]. A NN contains more than one computational layer: input layer, hidden layers, and output layer [31]. These layers transmit the information to the consecutive layers. Neurons in these layers are associated with adaptable weights and bias. Input layers forward data with randomly initialized weights to the hidden layers which perform nonlinear transformation with activation functions [32]. The hidden layer uses the output of previous layers as input and transmits its output to the next layers [15]. The last hidden layer passes the information to the output layer to create network outputs [31]. The selection of the hyperparameters of neural networks such as the number of layers, neurons, activation functions, and training algorithm depends on the structure and complexity of the tasks. These hyperparameters affect the learning performance of neural networks [33]. Considering a single-hidden layer network, the output function is obtained as follows (2):
y = σ (wTx + b)
σ indicates activation function, x is a n-dimensional input vector, w is the weight vector, b is the bias, and y is the output of the network. Rectified linear unit function (ReLU), logistic sigmoid function, and hyperbolic tangent function (Tanh) are commonly used activation functions. Forwarding all the information from the input layer to the output layer through activation functions is called a forward pass. Then the optimization algorithms measure the error by comparing the actual prediction with the ground truth value and tracebacks to each layer to update weights and bias. This process is called backpropagation where the algorithm aims to minimize loss by updating computational units. In our experiment, we adopted Stochastic Gradient Descent (SGD) to train a neural network by selecting random examples to estimate gradients instead of calculating the gradients of each example [34]. In addition, Adam algorithm [35] was also used to train the neural network.

3.4. Negative Data Generator

Random sampling, oversampling, and under-sampling techniques are commonly used to mitigate the imbalanced class problem. However, these techniques require both majority and minority classes in the datasets. Since fatal accident datasets lack a negative class, applying traditional random sampling techniques in these fatal accident datasets is impracticable. Therefore, we proposed a naïve data generator. We created negative instances (non-accident data) for datasets to be used in training based on the information on positive instances. The weights used in the negative data generation are obtained by Multinomial Logistic Regression (MLR) [36].
The weight of a feature reflects the importance degree of the feature. We generated negative samples by creating values that were mostly outside of the value ranges of the important features. For instance, as an important feature, the intersection mostly includes ranges from 1 to 3 (i.e., no intersection and four-way intersection) for all classes. Therefore, the negative data for this feature should cover mostly outside of these ranges. For the less important features based on weight values, existing ranges are used to create negative samples. Thus, negative samples, which most but not all of them are out of the range of positive instances determined by the feature distribution of the positive instances [3,9], are created. Figure 2 describes the steps to create non-accident data using the most important features. First, we obtain the feature weights of the positive class (accident data) by using MLR [36]. Then these weights are ranked in a descending order to determine the top 10 important features. We determine the distribution of these 10 important features. Then, we create negative samples mostly outside of the current feature distribution of these features. For the less important features based on the output of MLR, we randomly create feature distribution for negative class using positive data value ranges. Then we assign a new label to the negative class and combine it with the current dataset. The description of the accident data and feature distribution of the most important features are presented in Appendix A and Appendix B.

4. Experimental Results

4.1. Data Description

Experiments are performed using two different real-world accident datasets. Motor vehicle accident data used for accident risk analysis are retrieved from the US National Highway Traffic Safety Administration website, particularly the Fatality Analysis Reporting System [37] and UK Transport for Greater Manchester website [38]. The US dataset contains accident records from 2015 to 2016 for the states of California, Florida, Georgia, North Carolina, and Texas, where the highest number of accident records were found in the US. The UK dataset contains accident records for 2018. Both datasets went through preprocessing procedures by removing instances that have missing, incorrect, or undefined values in the explanatory variables. To avoid bias in the training process, post-crash-related features such as the number of fatalities are removed from datasets. We also applied a standardization process for both datasets to rescale the features due to the differences among their value ranges. Negative/non-accident samples are created by the proposed data generator for the US dataset. The newly generated class has 8104 instances labeled as “5”. The US data have 30,484 entries, six classes, and 17 features related to driving conditions [3]. The UK data have 14,593 entries and 10 features related to driving conditions [3]. Table 2 summarizes the information about injury severity levels of accidents. The classes range from no apparent/slight injury level to fatal injury level. All experiments are conducted using Python libraries.

4.2. Feature Extraction and Negative Data Generation

For the US dataset, among all 17 driving-related features, we only picked high-impact features for negative sampling to create non-accident data. The weights of the features are obtained using multinomial logistic regression. The top five features and their corresponding weights are provided in the order in Table 3. The most crucial feature from the minor injury severity level to fatal injury is alcohol. Surface type, surface condition, person type, age, and sex are also common among these levels. For accidents with low injury severities, such as non-fatal and possible injury, light condition, intersection type, and the number of traffic lanes are among the important features.
Table 3 identifies the critical factors for the five accident classes for the US dataset. For the non-accident data generation process, the top ten features’ combined range of values is examined. These features are alcohol, person type, intersection type, sex, light condition, lane, surface type, surface condition, holiday, and age, respectively, according to their importance values. For instance, the value of the surface condition feature ranges from one to two for all classes. Thus, other surface condition values should range randomly from three to five for the non-accident class. With this information, by using negative sampling surface condition values ranged from three to five in the non-accident class. Similar approaches are applied to other ten important features to generate random values for the non-accident class. For other less significant features, the values are randomly chosen from the combined range value of the five classes.
We created a side-by-side histogram of feature distributions for positive and negative data for the most important ten features. Figure 3a–d show an example of distributions for age, alcohol, intersection, and light condition features. Figure 3a depicts the distribution of intersection variables for both classes. As indicated before, we used mostly outside of the positive instance range to create negative data. While the intersection variable consists of some of the positive data variables (i.e., 1–3), most of the variables are outside of the range. For example, the number six indicates a round-about intersection and due to their geometry, round-about intersections cause less collision [4,5,6]. Our negative (non-accident) data generation supports the values/categories that cause less collision. Similarly, the light condition variable contains feature values mostly outside of the positive range. The number four and five indicates dusk and dawn times, which is around 8 pm and 5 am, respectively. Most of the accidents happen during rush hours (6–10 am and 3–7 pm) and in daylight. Since dusk and dawn times are outside of the rush hour and the number of vehicles may be less than regular traffic, the likelihood of an accident is less compared to other times. For the age and alcohol involvement variables, negative and positive classes have a similar distribution. The mean and standard deviation of the age variable is 38.35 ± 20.06 for accident data and 43.37 ± 16.33 for non-accident (negative) data. All categorical variables in the US dataset are encoded as an integer by US National Highway Traffic Safety Administration [37]. The descriptions of the features are provided in Appendix A. The distribution of other variables is illustrated in Appendix B.

4.3. Experimental Results, Comparisons, and Discussion

This section presents the experimental results of the accident datasets. In all experiments, 10-fold cross-validation is applied to avoid the effect of randomness. The dataset randomly is divided into 80% for training and 20% for testing. Using NN hyperparameters, different architectures were created and after the experiments, the top eight NN architectures were presented in Table 4. The learning rate was taken as adaptive. SGD and Adam algorithms were used as a solver. Tanh and ReLU activation functions were adopted for hidden layers. After various experiments, three and five hidden layers with a different number of neurons were also used in our experiments. To examine and compare the performances of different neural network architectures along with ordinal regression models and three existing methods, mean squared error (MSE), and class accuracy (ACC), are used as performance evaluation criteria. The results are shown in Table 4 with MSE values, and the best scores are shown in bold. Blue-colored rows and white-colored rows present MSE values for the US and UK datasets, respectively. Three hidden layers architecture adopted seventeen neurons each and hyperbolic tangent activation function and Adam solver provided the best MSE score, which is 0.254 ± 0.038 for the US (third architecture) dataset and 0.173 ± 0.016 for the UK (third architecture) dataset.
Increasing the number of hidden layers adversely affected MSE scores. Our results show that it is not necessary to have too many hidden layers in the neural network (seventh and eighth architectures) to obtain good prediction performance. Therefore, the experiments were diversified into three hidden layers using different optimizers and activation functions. Throughout the experiments, it is observed that the hyperbolic tangent activation function provided better performance.
The bagging method, also called bootstrap aggregating, is an ensemble meta-algorithm, proposed by [39] to improve the performance of the weak classifiers. In the current study, the bagging method is also implemented in both datasets by applying ordinal regression models. Performance results of the bagging method on ordinal regression models are shown in Table 5 with MSE scores. Logistic all-threshold and ordinal ridge achieved the best MSE score for US and UK datasets. Specifically, the bagging method provided a 2.1% and 2.4% improvement compared to the no-bagging method for both datasets. The comparison among ordinal regression models indicates that the UK dataset performed the best MSE.
Comprehensive comparisons among NNs, ordinal regression, and three existing methods, namely decision tree, Linear SVM, and logistic regression are presented in Table 5. Table 5 indicates NN outperformed ordinal regression algorithms and other methods with lower MSE and higher accuracy scores. For the US dataset, the best and worst NN architecture outperformed other methods. DT has the second-best MSE score for the UK dataset. The third architecture with three hidden layers presents the best MSE scores and the highest accuracy values. In the US dataset, ordinal regression algorithms have the worst performance values among all methods. Feature values associated with the US dataset overlap each other. Thereby, ordinal regression algorithms are not successful in distinguishing them and predicting injury severity classes. Specifically, the logistic IT method has the worst MSE for the US dataset since the algorithm only predicted three classes among six classes. This algorithm failed to classify injury severities as multiclass, which is not desired in an advanced transportation safety system. When comparing the performance of machine learning models on two different data, we infer that UK data have better prediction results since it has distinguishable feature values that reduce the overlapping among classes and increase the model performance. Our experiments support that NNs are more robust than other existing and ordinal regression models for accident prevention systems, as NN provides higher accuracy values per class and lower MSE scores on both datasets despite their differences.
In Pradhan and Sameen’s study [21], similar approaches are used to predict injury severity using real-world data containing 1138 observations with seven explanatory variables. Their linear SVM method outperformed the deep neural network and other SVM models with a 71.34% accuracy score. Compared to our study, higher accuracy scores are obtained using two different real-world datasets that contain more (30,484 and 14,593, respectively) observations with seventeen and ten explanatory variables. In addition, we provided more consistent and unbiased predictions by removing post-crash-related features such as collision type, number of fatalities, etc.
Confusion matrices for the neural network and best performed ordinal regression algorithms are shown in Figure 4 and Figure 5 for the US and the UK data, respectively. These matrices show how classes often are confused with each other. For example, while serious class (class 1) is often confused with fatal injury (class 2) class by having 625 misclassified instances, the ordinal ridge algorithm identified slight injury (class 0) well, as seen in Figure 4 left. The confusion matrix in Figure 5 right presents that NN performed better by successfully classifying slight and fatal injury levels. Figure 5 shows class 0 (no apparent injury), class 4 (fatal injury), and class 5 (no accident) were predicted well by the logistic all-threshold method. Other classes were confused with each other. NN confusion matrix indicated that all classes predicted well with high true positive instances.

5. Conclusions

In this paper, we proposed an accident prevention system, which is a significant matter in developing advanced transportation safety. Provided that injury severity levels cause deaths or disability, predicting accident risks and timely precautions could reduce casualties and increase safety in society. To provide a robust prediction model, we investigated the use of a deep neural network and the effect of its hyperparameters in estimating injury severity. We also generated non-accident data based on positive instances by using feature weights. Hence, we overcome the disadvantages of traditional sampling techniques and imbalanced learning by proposing a naïve data generator. Experimental results on two real-world datasets from the US National Highway Traffic Safety Administration and UK Transport for Greater Manchester are used to demonstrate the feasibility and robustness of our proposed framework. The study also analyzed the effect of data distribution and quality on the model performance. The differences in the data sets in terms of the number of classes and features and the distinguishability characteristics of the explanatory variables affected the model performance. All models have achieved better performance in the UK dataset compared to the US dataset. The US dataset has many overlapping instances and features value that belongs to different classes whereas the UK dataset has more distinguishable feature values that ease the classification problem.
We investigated the effect of hyperparameters of NNs on prediction performance. We analyze the number of hidden layers, the number of hidden neurons, activation functions, and optimizers. Our results show that it is unnecessary to have too many hidden layers (e.g., three hidden layers is good enough) in the NN to obtain a good prediction performance on injury severity. An increase in the number of hidden layers caused overfitting, which decreased the models’ performance, by learning details and noise in the training set. Tanh activation function and Adam optimizer also showed better performance than other activation functions and optimizers. Moreover, our comprehensive empirical performance comparison shows that NN outperforms four variants of ordinal regression and existing methods based on the MSE and accuracy measures on both datasets. Hence, a 3-hidden layered NN risk prediction model can be added to the proposed accident prevention and alert system in intelligent vehicles to alert drivers and trigger safety functions to reduce the risks of accidents.
The proposed prediction framework can be integrated into an accident prevention and alert system to be used by drivers. Additionally, we defined significant factors and patterns causing road accidents. These patterns as well as driver behavior patterns can assist the real-time alerting messages to reduce the accidents and develop a better design of autonomous vehicles and enhance advanced transportation safety in future work. Future work should also include the integration of object detection in the system to alert drivers of inevitable events, especially in blind spots.

Author Contributions

Conceptualization, G.A., B.S. and S.S.H.; methodology, G.A., B.S. and S.S.H.; software, G.A.; validation, G.A.; formal analysis, G.A.; investigation, G.A. and B.S.; resources, B.S. and S.S.H.; data curation, G.A. and B.S.; writing—original draft preparation, G.A.; writing—review and editing, G.A., B.S. and S.S.H.; visualization, G.A. and B.S.; supervision, B.S. and S.S.H. All authors have read and agreed to the published version of the manuscript.


This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

US dataset is retrieved from: National Highway Traffic Safety Administration. Available online: (accessed on 18 February 2019). UK dataset is retrieved from UK Transport for Greater Manchester. Available online: (accessed on 20 December 2019).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Description of independent variables for the US dataset.
Table A1. Description of independent variables for the US dataset.
Atmospheric Condition1—Clear
6—Severe crosswinds
Holiday Related0—No Holiday
1—New Year
2—M. Luther King
3—JR Day
4—President’s Day
5—Memorial Day
6—Independence Day
7—Labor Day
8—Veterans Day
Light Condition1—Daylight
Intersection Type1—Not intersection
5—Traffic circle
Traffic Lane1–7—Actual number of lanes in a road
Age001–120—Actual ages
Person Type1—Driver
Travel Speed000–151—Reported speed up to 151 mph
998—Not Reported
Vehicle Make01–94—Actual make
97—Not reported
98—Other make
99—Unknown make
Alcohol Involvement0—No
Surface Condition1—Dry
Surface Type1—Concrete

Appendix B

Figure A1. The distribution of the most important variables of the US dataset. (a) Lane, (b) Person Type, (c) Holiday, (d) Surface Type, (e) Surface Condition, (f) Sex.
Figure A1. The distribution of the most important variables of the US dataset. (a) Lane, (b) Person Type, (c) Holiday, (d) Surface Type, (e) Surface Condition, (f) Sex.
Sustainability 14 06569 g0a1


  1. National Center for Statistics and Analysis. 2015 Motor Vehicle Crashes: Overview. Traffic Saf. Facts Res. Note 2016, 2016, 1–9. [Google Scholar]
  2. Han, S.; Wang, X.; Xu, L.; Sun, H.; Zheng, N. Frontal object perception for intelligent vehicles based on radar and camera fusion. In Proceedings of the 35th Chinese Control Conference (CCC), Chengdu, China, 27–29 July 2016. [Google Scholar] [CrossRef]
  3. Alicioglu, G.; Sun, B.; Ho, S.S. Assessing accident risk using ordinal regression and multinomial logistic regression data generation. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020. [Google Scholar] [CrossRef]
  4. Severino, A.; Pappalardo, G.; Curto, S.; Trubia, S.; Olayode, I.O. Safety Evaluation of Flower Roundabout Considering Autonomous Vehicles Operation. Sustainability 2021, 13, 10120. [Google Scholar] [CrossRef]
  5. Macioszek, E. Roundabout Entry Capacity Calculation—A Case Study Based on Roundabouts in Tokyo, Japan, and Tokyo Surroundings. Sustainability 2020, 12, 1533. [Google Scholar] [CrossRef] [Green Version]
  6. Macioszek, E. The Comparison of Models for Critical Headways Estimation at Roundabouts. In Contemporary Challenges of Transport Systems and Traffic Engineering Lecture Notes in Networks and Systems; Macioszek, E., Sierpiński, G., Eds.; Springer: Cham, Switzerlands, 2017; Volume 2. [Google Scholar] [CrossRef]
  7. Thabtah, F.A.; Hammoud, S.; Kamalov, F.; Gonsalves, A. Data imbalance in classification: Experimental evaluation. Inf. Sci. 2020, 513, 429–441. [Google Scholar] [CrossRef]
  8. Mujalli, R.O.; Oña, J.D. A method for simplifying the analysis of traffic accidents injury severity on two-lane highways using Bayesian networks. J. Saf. Res. 2011, 42, 317–326. [Google Scholar] [CrossRef] [PubMed]
  9. Yuan, Z.; Zhou, X.; Yang, T.; Tamerius, J. Predicting traffic accidents through heterogeneous urban data: A case study. In Proceedings of the International Workshop on Urban Computing (KDD), Halifax, NS, Canada, 13–17 August 2017. [Google Scholar]
  10. Jeong, H.; Jang, Y.; Bowman, P.J.; Masoud, N. Classification of motor vehicle crash injury severity: A hybrid approach for imbalanced data. Accid. Anal. Prev. 2018, 120, 250–261. [Google Scholar] [CrossRef]
  11. Pérez-Ortiz, M.; Gutiérrez, P.A.; García-Alonso, C.R.; Salvador-Carulla, L.; Salinas-Perez, J.A.; Hervás-Martínez, C. Ordinal classification of depression spatial hot-spots of prevalence. In Proceedings of the 11th International Conference on Intelligent Systems Design and Applications, Cordoba, Spain, 22–24 November 2011. [Google Scholar] [CrossRef]
  12. Aci, C.; Ozden, C. Predicting the severity of motor vehicle accident injuries in Adana-Turkey using machine learning methods and detailed meteorological data. Int. J. Intell. Syst. Appl. Eng. 2018, 6, 72–79. [Google Scholar] [CrossRef]
  13. Wang, Y.; Ho, I.W. Joint Deep Neural Network Modelling and Statistical Analysis on Characterizing Driving Behaviors. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26–30 June 2018. [Google Scholar] [CrossRef]
  14. Kahng, M.; Andrews, P.Y.; Kalro, A.; Chau, D. ActiVis: Visual Exploration of Industry-Scale Deep Neural Network Models. IEEE Trans. Vis. Comput. Graph. 2018, 24, 88–97. [Google Scholar] [CrossRef] [Green Version]
  15. Chatzimparmpas, A.; Martins, R.M.; Jusufi, I.; Kucher, K.; Rossi, F.; Kerren, A. The State of the Art in Enhancing Trust in Machine Learning Models with the Use of Visualizations. Comput. Graph. Forum 2020, 39, 713–756. [Google Scholar] [CrossRef]
  16. Azodi, C.B.; Tang, J.; Shiu, S. Opening the Black Box: Interpretable Machine Learning for Geneticists. Trends Genet. 2020, 36, 442–455. [Google Scholar] [CrossRef]
  17. Çodur, M.Y.; Tortum, A. An Artificial Neural Network Model for Highway Accident Prediction: A Case Study of Erzurum, Turkey. Promet-Traffic Transp. 2015, 27, 217–225. [Google Scholar] [CrossRef] [Green Version]
  18. Chong, M.; Abraham, A.; Paprzycki, M. Traffic accident analysis using machine learning paradigms. Informatica 2005, 29, 89–98. [Google Scholar]
  19. Iranitalab, A.; Khattak, A.J. Comparison of four statistical and machine learning methods for crash severity prediction. Accid. Anal. Prev. 2017, 108, 27–36. [Google Scholar] [CrossRef] [PubMed]
  20. Zhu, M.; Li, Y.; Wang, Y. Design and experiment verification of a novel analysis framework for recognition of driver injury patterns: From a multi-class classification perspective. Accid. Anal. Prev. 2018, 120, 152–164. [Google Scholar] [CrossRef]
  21. Pradhan, B.; Sameen, M.I. Modeling Traffic Accident Severity Using Neural Networks and Support Vector Machines. In Laser Scanning Systems in Highway and Safety Assessment; Springer: Cham, Switzerlands, 2020; pp. 111–117. [Google Scholar] [CrossRef]
  22. Delen, D.; Tomak, L.; Topuz, K.; Eryarsoy, E. Investigating injury severity risk factors in automobile crashes with predictive analytics and sensitivity analysis methods. J. Transp. Health 2017, 4, 118–131. [Google Scholar] [CrossRef]
  23. Liao, Y.; Zhang, J.; Wang, S.; Li, S.; Han, J. Study on Crash Injury Severity Prediction of Autonomous Vehicles for Different Emergency Decisions Based on Support Vector Machine Model. Electronics 2018, 7, 381. [Google Scholar] [CrossRef] [Green Version]
  24. Zeng, Q.; Huang, H. A stable and optimized neural network model for crash injury severity prediction. Accid. Anal. Prev. 2014, 73, 351–358. [Google Scholar] [CrossRef]
  25. Fernández-Navarro, F.; Campoy-Muñoz, P.; Paz-Marin, M.L.; Hervás-Martínez, C.; Yao, X. Addressing the EU sovereign ratings using an ordinal regression approach. IEEE Trans. Cybern. 2013, 43, 2228–2240. [Google Scholar] [CrossRef]
  26. Landschoot, S.; Waegeman, W.; Audenaert, K.; Haesaert, G.; Baets, B.D. Ordinal regression models for predicting deoxynivalenol in winter wheat. Plant Pathol. 2013, 62, 1319–1329. [Google Scholar] [CrossRef]
  27. Gao, X.; Feng, Y. Penalized weighted least absolute deviation regression. Stat. Its Interface 2018, 11, 79–89. [Google Scholar] [CrossRef]
  28. Xia, F.; Zhou, L.; Yang, Y.; Zhang, W. Ordinal regression as multiclass classification. Int. J. Intell. Control. Syst. 2007, 12, 230–236. [Google Scholar]
  29. Zahid, F.M.; Ramzan, S. Ordinal ridge regression with categorical predictors. J. Appl. Stat. 2012, 39, 161–171. [Google Scholar] [CrossRef] [Green Version]
  30. Aggarwal, C.C. Neural Networks and Deep Learning; Springer: Cham, Switzerlands, 2018. [Google Scholar] [CrossRef]
  31. Haykin, S. Neural Networks and Learning Machines, 3rd ed.; Prentice Hall: New York, NY, USA, 2009. [Google Scholar]
  32. Kalogirou, S.A. Solar Energy Engineering, 2nd ed.; Elsevier: Amsterdam, The Netherlands; Academic Press: Cambridge, MA, USA, 2014. [Google Scholar] [CrossRef]
  33. Ripley, B.D. Pattern Recognition and Neural Networks; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar] [CrossRef]
  34. Bottou, L. Stochastic Gradient Descent Tricks. In Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science; Montavon, G., Orr, G.B., Müller, K.R., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7700, pp. 421–436. [Google Scholar] [CrossRef] [Green Version]
  35. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
  36. Williams, R. Generalized Ordered Logit/Partial Proportional Odds Models for Ordinal Dependent Variables. Stata J. 2006, 6, 58–82. [Google Scholar] [CrossRef] [Green Version]
  37. National Highway Traffic Safety Administration. Available online: (accessed on 18 February 2019).
  38. UK Transport for Greater Manchester. Available online: (accessed on 20 December 2019).
  39. Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef] [Green Version]
Figure 1. The detailed framework of the prevention system and prediction model.
Figure 1. The detailed framework of the prevention system and prediction model.
Sustainability 14 06569 g001
Figure 2. Negative (Non-accident) data generation process.
Figure 2. Negative (Non-accident) data generation process.
Sustainability 14 06569 g002
Figure 3. The distribution of variables for accident and non-accident data. (a) Intersection, (b) Light Condition, (c) Age, (d) Alcohol Involvement.
Figure 3. The distribution of variables for accident and non-accident data. (a) Intersection, (b) Light Condition, (c) Age, (d) Alcohol Involvement.
Sustainability 14 06569 g003
Figure 4. The UK dataset confusion matrices. (a) Ordinal Ridge, (b) NN.
Figure 4. The UK dataset confusion matrices. (a) Ordinal Ridge, (b) NN.
Sustainability 14 06569 g004
Figure 5. The US dataset confusion matrices. (a) Logistic All-threshold, (b) NN.
Figure 5. The US dataset confusion matrices. (a) Logistic All-threshold, (b) NN.
Sustainability 14 06569 g005
Table 1. Summary of research related to accident risk assessment.
Table 1. Summary of research related to accident risk assessment.
StudiesClass DescriptionsAlgorithms
[8]Slight Injured
Killed or Seriously Injured
Bayesian Networks
No Accident
[10]Fatal Injury
Incapacitating Injury
Non-Incapacitating Injury
Possible Injury
No Injury
Logistic Regression (LR)
Gradient Boosting Model
Naïve Bayes
[12]Non-Fatal Injury
Fatal Injury
Naïve Bayes
[17]No Injury
Possible Injury
Non-Incapacitating Injury
Incapacitating Injury
Fatal Injury
Hybrid DT-Artificial NN
[18]Property Damage Only
Possible Injury
Visible Injury
Fatal Injury
Multinomial Logit
[20]No Injury
Possible Injury
Evident Injury
Fatal Injury
Table 2. Description of injury severity levels in accident datasets.
Table 2. Description of injury severity levels in accident datasets.
US Accident Dataset (2015–2016)UK Accident Dataset (2018)
Injury Severity# of AccidentsInjury Severity# of Accidents
Class 0No apparent6405 (21.0%)Slight8381 (57.4%)
Class 1Possible2697 (8.84%)Serious4541 (31.1%)
Class 2Minor2967 (9.73%)Fatal1671 (11.5%)
Class 3Serious1812 (5.95%)
Class 4Fatal8499 (27.8%)
Class 5No accident8104 (26.5%)
Table 3. The top five features and corresponding weights of the US dataset.
Table 3. The top five features and corresponding weights of the US dataset.
Non-Fatal InjuryPossible InjuryMinor InjuryMajor InjuryFatal Injury
Light condition0.166Person type0.264Alcohol0.262Alcohol0.490Alcohol0.918
Lane0.161Intersection type0.213Person type0.259Person type0.442Surface type0.099
Intersection type0.064Sex0.189Surface condition0.122Surface type0.127Age0.013
Holiday0.016Lane0.081Surface type0.099Sex0.106Vehicle make0.005
Accident hour0.012Surface condition0.032Accident hour0.004Surface condition0.022Surface condition0.002
Table 4. Classification results of different neural network architectures. Blue color: US dataset, White color: UK dataset.
Table 4. Classification results of different neural network architectures. Blue color: US dataset, White color: UK dataset.
ArchitectureHidden LayerNeuronSolverActivation FunctionMSE
1317 neuron eachSGDReLuUS dataset: 0.264 ± 0.040 a
UK dataset: 0.252 ± 0.078
217 neuron eachSGDTanh0.258 ± 0.053
0.297 ± 0.081
317 neuron eachAdamTanh0.254 ± 0.038
0.173 ± 0.016
450 neuron eachSGDTanh0.283 ± 0.044
0.208 ± 0.054
5100, 50, 25AdamTanh0.368 ± 0.026
0.176 ± 0.027
6100, 50, 25SGDTanh0.311 ± 0.037
0.183 ± 0.034
7525, 50, 50, 50, 100SGDTanh0.283 ± 0.035
0.236 ± 0.051
8100 neuron eachAdamTanh0.339 ± 0.030
0.175 ± 0.024
a Mean Squared Error ± Standard Deviation.
Table 5. Classification results and comparison of machine learning algorithms.
Table 5. Classification results and comparison of machine learning algorithms.
DataMethodMSEClass Accuracy
Class 0Class 1Class 2Class 3Class 4Class 5
US DatasetNN# 3 (Best)0.254 ± 0.0380.963 *0.9740.977 *0.8200.979 *1.000 *
# 5 (Worst)0.368 ± 0.0260.9230.978 *0.9770.834 *0.9700.999
OR ModelsOrdinal RidgeNB: 1.177 ± 0.0970.1780.2620.2890.4260.2380.703
B: 1.158 ± 0.094
LAD1.193 ± 0.1060.3320.2550.3210.4260.2370.829
1.174 ± 0.102
Logistic IT1.793 ± 0.1890.9270.0000.0000.0000.9170.997
1.686 ± 0.184
Logistic AT0.948 ± 0.1350.7010.2690.2200.1950.7620.993
0.928 ± 0.132
DT0.472 ± 0.136
Linear SVM0.797 ± 0.067
LR0.773 ± 0.043
UK DatasetNN# 3 (Best)0.173 ± 0.0160.833 *0.658 *0.969
# 2 (Worst)0.297 ± 0.0810.8290.5560.895
OR ModelsOrdinal RidgeNB: 0.372 ± 0.0250.6200.5340.771
B: 0.363 ± 0.022
LAD0.585 ± 0.0350.4510.1410.974 *
0.501 ± 0.092
Logistic IT0.438 ± 0.0620.6240.2720.890
0.426 ± 0.059
Logistic AT0.396 ± 0.0220.6200.4300.831
0.387 ± 0.023
DT0.205 ± 0.052
Linear SVM0.387 ± 0.071
LR0.430 ± 0.038
NB: No Bagging, B: Bagging, * Indicates the best class accuracy.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Alicioglu, G.; Sun, B.; Ho, S.S. An Injury-Severity-Prediction-Driven Accident Prevention System. Sustainability 2022, 14, 6569.

AMA Style

Alicioglu G, Sun B, Ho SS. An Injury-Severity-Prediction-Driven Accident Prevention System. Sustainability. 2022; 14(11):6569.

Chicago/Turabian Style

Alicioglu, Gulsum, Bo Sun, and Shen Shyang Ho. 2022. "An Injury-Severity-Prediction-Driven Accident Prevention System" Sustainability 14, no. 11: 6569.

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop