# Risk Levels Classification of Near-Crashes in Naturalistic Driving Data

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Work

## 3. Methodology

#### 3.1. The Driving Experiment and Collected Dataset

#### 3.2. Data Preprocessing

#### 3.2.1. Near-Crash Extraction

^{2}, longitudinal: −1.5 m/s

^{2}) [33]. Our study defined a near-crash by exploring three significant driving variables, including deceleration, braking pressure, and time headway, as in [4,34]. In naturalistic driving experiments, a near-crash can be detected by achieving at least one of the following three thresholds of driving variables: an acceleration under −0.4 m/s

^{2}, a time headway below 0.6 s, or a braking pressure above 10 mph. In addition, the collected near-crashes were validated by checking the recorded videos on the related timestamps to find whether a near-crash occurred or not. Finally, several near-crash-related variables can be appended to the variables in Section 3.1. Table 2 illustrates these variables in detail.

#### 3.2.2. Near-Crash Categorization

#### 3.2.3. Feature Selection

_{i}, y

_{i}), i = 1, 2, …, n, which are generated as follows:

^{2}), function g: R → R denotes a non-linear mapping function, which is not known a priori, and X

_{i}∈ R

_{p}are feature vectors.

#### 3.2.4. Normalization

_{min}, x

_{max}, and x

_{norm}are the original, minimum, maximum, and normalized values from the dataset (training dataset), respectively.

#### 3.3. Classification Models

#### 3.3.1. Support Vector Machine (SVM)

_{i}∈R

_{n}, for i = 1, 2,…, n, which denote a set of near-crash-related variables, and the output is defined as yi∈Rn, which represents the risk levels of the near-crashes. In addition, the hyper-plane for outputs could be drawn as a set of points X following Equation (5).

_{i}, y

_{i}), by using the model, it needs to address the optimization problem [40] as follows:

#### 3.3.2. Random Forest (RF)

#### 3.3.3. Multi-Layer Perception (MLP)

#### 3.3.4. Ordinal Probit Model (OP)

_{i}· β

_{i}+ ε

_{i}

_{i}is a vector of input variables, b is a vector of regression coefficients, and ε

_{i}is an error that follows a logistic distribution with a mean of zero and a variance of ${\pi}^{2}/3$.

#### 3.3.5. Multinominal Logit Model (MNL)

_{i}is the probability of a near-crash, which is labeled with the risk level (output) i, $\beta $

_{i}is a vector of the calculative coefficient for the output risk level i, and X

_{i}is an input vector. $\beta $

_{i}coefficients can be calculated by the maximum likelihood approach.

#### 3.3.6. Long-Short-Term Memory (LSTM)

_{t}. The calculation process in these layers during training are performed as follows [47]:

_{i}, W

_{f}, and Wc denote the weight of the input gate, the forget gate, and the output gate, respectively, In addition, the memory cell vectors ct and the candidate value $\tilde{{c}_{t}}$ are calculated as follows:

#### 3.3.7. Gated Recursive Unit (GRU)

_{t}and reset gate r

_{t}. The reset gate (rt) utilizes the sigmoid function to properly reset the previous information and multiplies the value by the past hidden layer. The update gate (zt) is a combination of the forget and input gates as in the LSTM model. The update gate determines the rate of the update of the current and previous information. In the update gate, the result of the output as sigmoid determines the amount of information at the current node and the value subtracted from 1 (1 − z

_{t}) is multiplied by the information of the hidden layer at the most recent time. Each update gate is similar to the input and forget gates of the LSTM. The output value can be obtained by multiplying the hidden layer’s value at the previous unit and the information at the present unit by weight with the following equations [49]:

_{t}is the input vector at time t, and W

_{z}, U

_{z}, W

_{r}, U

_{r}, W

_{H}, U

_{H}are the weight matrices for the nodes in GRU. Other information are similar to the information in LSTM.

## 4. Models Comparison and Results

#### 4.1. Experimental Settings

#### 4.2. Evaluation Metrics

#### 4.3. Results

#### 4.3.1. Clustering Results

#### 4.3.2. Feature Selection Results

#### 4.3.3. Model Comparison

## 5. Discussion

## 6. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- WHO. Road Traffic Injuries. 2020. Available online: https://www.who.int/en/news-room/fact-sheets/detail/road-traffic-injuries (accessed on 12 August 2021).
- Rezapour, M.; Ksaibati, K. Application of multinomial and ordinal logistic regression to model injury severity of truck crashes, using violation and crash data. J. Mod. Transp.
**2018**, 26, 268–277. [Google Scholar] [CrossRef] [Green Version] - Wang, J.Z.; Li, Y.; Yu, X.; Kodaka, C.; Li, K. Driving risk assessment using near-crash database through data mining of tree-based model. Accid. Anal. Prev.
**2015**, 84, 54–64. [Google Scholar] [CrossRef] [PubMed] - Naji, H.; Xue, Q.; Lyu, N.; Wu, C.; Zheng, K. Evaluating the driving risk of near-crash events using a mixed-ordered logit model. Sustainability
**2018**, 10, 2868. [Google Scholar] [CrossRef] [Green Version] - Iranitalab, A.; Khattak, A. Comparison of four statistical and machine learning methods for crash severity prediction. Accid. Anal. Prev.
**2017**, 108, 27–36. [Google Scholar] [CrossRef] - Theofilatos, A.; Yannis, G.; Antoniou, C.; Chaziris, A.; Sermpis, D. Time series and support vector machines to predict powered-two-wheeler accident risk and accident type propensity: A combined approach. J. Transp. Saf. Secur.
**2018**, 10, 471–490. [Google Scholar] [CrossRef] - Al Mamlook, R.E.; Abdulhameed, T.Z.; Hasan, R.; Al-Shaikhli, H.I.; Mohammed, I.; Tabatabai, S. Utilizing Machine Learning Models to Predict the Car Crash Injury Severity among Elderly Drivers. In Proceedings of the 2020 IEEE International Conference on Electro Information Technology (EIT), Chicago, IL, USA, 31 July–1 August 2020; pp. 105–111. [Google Scholar]
- Duong, T.H.; Qiao, F.; Yeh, J.-H.; Zhang, Y. Prediction of Fatality Crashes with Multilayer Perceptron of Crash Record Information System Datasets. In Proceedings of the 2020 IEEE 19th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC), Beijing, China, 26–28 September 2020; pp. 225–229. [Google Scholar]
- Mokhtarimousavi, S.; Anderson, J.C.; Hadi, M.; Azizinamini, A. A temporal investigation of crash severity factors in worker-involved work zone crashes: Random parameters and machine learning approaches. Transp. Res. Interdiscip. Perspect.
**2021**, 10, 100378. [Google Scholar] [CrossRef] - Princess, P.J.B.; Silas, S.; Rajsingh, E.B. Classification of Road Accidents Using SVM and KNN. In Advances in Artificial Intelligence and Data Engineering; Springer: Berlin/Heidelberg, Germany, 2021; pp. 27–41. [Google Scholar]
- Xie, J.; Zhu, M. Maneuver-based driving behavior classification based on random forest. IEEE Sens. Lett.
**2019**, 3, 1–4. [Google Scholar] [CrossRef] - Mokhtarimousavi, S. A time of day analysis of pedestrian-involved crashes in California: Investigation of injury severity, a logistic regression and machine learning approach using HSIS data. Inst. Transp. Eng. ITE J.
**2019**, 89, 25–33. [Google Scholar] - Wang, Y.; Xu, W.; Zhang, Y.; Qin, Y.; Zhang, W.; Wu, X. Machine learning methods for driving risk prediction. In Proceedings of the 3rd ACM SIGSPATIAL Workshop on Emergency Management Using, Redondo Beach, CA, USA, 7–10 November 2017; pp. 1–6. [Google Scholar]
- Chandrasiri, N.P.; Nawa, K.; Ishii, A. Driving skill classification in curve driving scenes using machine learning. J. Mod. Transp.
**2016**, 24, 196–206. [Google Scholar] [CrossRef] [Green Version] - Peppes, N.; Alexakis, T.; Adamopoulou, E.; Demestichas, K. Driving Behaviour Analysis Using Machine and Deep Learning Methods for Continuous Streams of Vehicular Data. Sensors
**2021**, 21, 4704. [Google Scholar] [CrossRef] - Candefjord, S.; Muhammad, A.S.; Bangalore, P.; Buendia, R. On Scene Injury Severity Prediction (OSISP) machine learning algorithms for motor vehicle crash occupants in US. J. Transp. Health
**2021**, 22, 101124. [Google Scholar] [CrossRef] - Yang, K.; Wang, X.; Quddus, M.; Yu, R. Deep Learning for Real-Time Crash Prediction on Urban Expressways. In Proceedings of the Transportation Research Board 97th Annual Meeting, Washington, DC, USA, 7–11 January 2018. [Google Scholar]
- Li, P.; Abdel-Aty, M.; Yuan, J. Real-time crash risk prediction on arterials based on LSTM-CNN. Accid. Anal. Prev.
**2020**, 135, 105371. [Google Scholar] [CrossRef] [PubMed] - Yuan, J.; Abdel-Aty, M.; Gong, Y.; Cai, Q. Real-time crash risk prediction using long short-term memory recurrent neural network. Transp. Res. Rec.
**2019**, 2673, 314–326. [Google Scholar] [CrossRef] - Jiang, F.; Yuen, K.K.R.; Lee, E.W.M. Long short-term memory networks-based Framework for Traffic Crash Detection with Traffic Data. In Proceedings of the Transportation Research Board (TRB) 99th Annual Meeting, Washington, DC, USA, 12–16 January 2020. [Google Scholar]
- Yu, R.; Wang, Y.; Zou, Z.; Wang, L. Convolutional neural networks with refined loss functions for the real-time crash risk analysis. Transp. Res. Part C Emerg. Technol.
**2020**, 119, 102740. [Google Scholar] [CrossRef] - Zhao, J.; Liu, P.; Xu, C.; Bao, J. Understand the impact of traffic states on crash risk in the vicinities of Type A weaving segments: A deep learning approach. Accid. Anal. Prev.
**2021**, 159, 106293. [Google Scholar] [CrossRef] [PubMed] - Dingus, T.A.; Klauer, S.G.; Neale, V.L.; Petersen, A.; Lee, S.E.; Sudweeks, J.; Perez, M.A.; Hankey, J.; Ramsey, D.; Gupta, S.; et al. The 100-Car Naturalistic Driving Study, Phase II-Results of the 100-Car Field Experiment; United States Department of Transportation, National Highway Traffic Safety Administration: Washington, DC, USA, 2006.
- Guo, F.; Klauer, S.G.; Hankey, J.M.; Dingus, T.A. Near-Crashes as Crash Surrogate for Naturalistic Driving Studies. J. Transp. Res. Board
**2010**, 2147, 66–74. [Google Scholar] [CrossRef] - Tarko, A.P. Surrogate Measures of Safety, in Safe Mobility: Challenges, Methodology and Solutions; Emerald Publishing Limited: Bingley, UK, 2018. [Google Scholar]
- Osman, O.A.; Hajij, M.; Bakhit, P.R.; Ishak, S. Prediction of near-crashes from observed vehicle kinematics using machine learning. Transp. Res. Rec. J. Transp. Res. Board
**2019**, 2673, 463–473. [Google Scholar] [CrossRef] - Seacrist, T.; Douglas, E.C.; Hannan, C.; Rogers, R.; Belwadi, A.; Loeb, H. Near crash characteristics among risky drivers using the SHRP2 naturalistic driving study. J. Saf. Res.
**2020**, 73, 263–269. [Google Scholar] [CrossRef] - Naji, H.A.; Xue, Q.; Zheng, K.; Lyu, N. Investigating the significant individual historical factors of driving risk using hierarchical clustering analysis and quasi-poisson regression model. Sensors
**2020**, 20, 2331. [Google Scholar] [CrossRef] - Perez, M.A.; Sudweeks, J.D.; Sears, E.; Antin, J.; Lee, S.; Hankey, J.M.; Dingus, T.A. Performance of basic kinematic thresholds in the identification of crash and near-crash events within naturalistic driving data. Accid. Anal. Prev.
**2017**, 103, 10–19. [Google Scholar] [CrossRef] - Kong, X.; Das, S.; Zhang, Y. Mining patterns of near-crash events with and without secondary tasks. Accid. Anal. Prev.
**2021**, 157, 106162. [Google Scholar] [CrossRef] [PubMed] - Guo, F.; Fang, Y. Individual driver risk assessment using naturalistic driving data. Accid. Anal. Prev.
**2013**, 61, 3–9. [Google Scholar] [CrossRef] [PubMed] - Wu, K.-F.; Jovanis, P.P. Defining and screening crash surrogate events using naturalistic driving data. Accid. Anal. Prev.
**2013**, 61, 10–22. [Google Scholar] [CrossRef] [PubMed] - Zheng, Y.; Wang, J.; Li, X.; Yu, C. Driving risk assessment using cluster analysis based on naturalistic driving data. In Proceedings of the IEEE, International Conference on Intelligent Transportation Systems, Qingdao, China, 8–11 October 2014; pp. 2584–2589. [Google Scholar]
- Naji, H.A.; Lyu, N.; Wu, C.; Zhang, H. Examining contributing factors on driving risk of naturalistic driving using K-means clustering and ordered logit regression. In Proceedings of the 2017 4th International Conference on Transportation Information and Safety (ICTIS), Banff, AB, Canada, 8–10 August 2017; pp. 1189–1195. [Google Scholar]
- Wu, C.; Sun, C.; Chu, D.; Huang, Z.; Ma, J.; Li, H. Clustering of several typical behavioral characteristics of commercial vehicle drivers based on GPS data mining: Case study of highways in China. Transp. Res. Rec. J. Transp. Res. Board
**2016**, 2581, 154–163. [Google Scholar] [CrossRef] - Constantinescu, Z.; Marinoiu, C.; Vladoiu, M. Driving Style Analysis Using Data Mining Techniques. Int. J. Comput. Commun. Control.
**2009**, 5, 654–663. [Google Scholar] [CrossRef] - Samarasinghe, T.; Gunawardena, T.; Mendis, P.; Sofi, M.; Aye, L. Dependency Structure Matrix and Hierarchical Clustering based algorithm for optimum module identification in MEP systems. Autom. Constr.
**2019**, 104, 153–178. [Google Scholar] [CrossRef] - Krakovska, O.; Christie, G.; Sixsmith, A.; Ester, M.; Moreno, S. Performance comparison of linear and non-linear feature selection methods for the analysis of large survey datasets. PLoS ONE
**2019**, 14, e0213584. [Google Scholar] [CrossRef] - Zhang, Y.; Guo, W.; Ray, S. On the consistency of feature selection with lasso for non-linear targets. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; pp. 183–191. [Google Scholar]
- Zhang, J.; Li, Z.; Pu, Z.; Xu, C. Comparing prediction performance for crash injury severity among various machine learning and statistical methods. IEEE Access
**2018**, 6, 60079–60087. [Google Scholar] [CrossRef] - Rodriguez-Galiano, V.F.; Ghimire, B.; Rogan, J.; Chica-Olmo, M.; Rigol-Sanchez, J.P. An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J. Photogramm. Remote Sens.
**2012**, 67, 93–104. [Google Scholar] [CrossRef] - Taud, H.; Mas, J. Multilayer Perceptron (MLP). In Geomatic Approaches for Modeling Land Change Scenarios; Springer: Berlin/Heidelberg, Germany, 2018; pp. 451–455. [Google Scholar]
- Chen, F.; Song, M.; Ma, X. Investigation on the injury severity of drivers in rear-end collisions between cars using a random parameters bivariate ordered probit model. Int. J. Environ. Res. Public Health
**2019**, 16, 2632. [Google Scholar] [CrossRef] [Green Version] - Anarkooli, A.J.; Hosseinpour, M.; Kardar, A. Investigation of factors affecting the injury severity of single-vehicle rollover crashes: A random-effects generalized ordered probit model. Accid. Anal. Prev.
**2017**, 106, 399–410. [Google Scholar] [CrossRef] [PubMed] - Vajari, M.A.; Aghabayk, K.; Sadeghian, M.; Shiwakoti, N. A multinomial logit model of motorcycle crash severity at Australian intersections. J. Saf. Res.
**2020**, 73, 17–24. [Google Scholar] [CrossRef] [PubMed] - Saleh, K.; Hossny, M.; Nahavandi, S. Driving behavior classification based on sensor data fusion using LSTM recurrent neural networks. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017; pp. 1–6. [Google Scholar]
- Bani-Salameh, H.; Sallam, M.; Al Shboul, B. A Deep-Learning-Based Bug Priority Prediction Using RNN-LSTM Neural. E-Inform. Softw. Eng. J.
**2021**, 15, 29–45. [Google Scholar] [CrossRef] - Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv
**2014**, arXiv:1412.6980. [Google Scholar] - Onyekpe, U.; Palade, V.; Kanarachos, S.; Christopoulos, S.-R. A Quaternion Gated Recurrent Unit Neural Network for Sensor Fusion. Information
**2021**, 12, 117. [Google Scholar] [CrossRef] - Hung, P.D.; Lien, N.T.T.; Ngoc, N.D. Customer segmentation using hierarchical agglomerative clustering. In Proceedings of the 2019 2nd International Conference on Information Science and Systems, Tokyo, Japan, 16–19 March 2019; pp. 33–37. [Google Scholar]
- Assi, K. Traffic Crash Severity Prediction—A Synergy by Hybrid Principal Component Analysis and Machine Learning Models. Int. J. Environ. Res. Public Health
**2020**, 17, 7598. [Google Scholar] [CrossRef] - Alkheder, S.; Taamneh, M.; Taamneh, S. Severity prediction of traffic accident using an artificial neural network. J. Forecast.
**2017**, 36, 100–108. [Google Scholar] [CrossRef]

**Figure 6.**A cluster dendrogram of near-crash levels: (1) minimal, (2) slight, (3) moderate, (4) serious, and (5) severe.

Variable | Symbol | Type | Details |
---|---|---|---|

Driving Behavior (Vehicle Status) | |||

Beginning Speed | Begin_Sp | continuous | Vehicle velocity once a near-crash happens (m/s) |

Average of Deceleration | Avg_Dec | continuous | Average of Deceleration (m/s^{2}) |

Average of Speed | Avg_Sp | continuous | Average of Speed (m/s) |

Time Headway Average | Avr_THW | continuous | Average of Time Headway(s) |

Braking Pressure Average | Avr_Br | continuous | Average of Braking Pressure(MPA) |

Minimum Deceleration | Min_Dec | continuous | Minimum Deceleration(m/s^{2}) |

Minimum Time Headway | MinTHW | continuous | Minimum Time Headway(s) |

Max Braking Pressure | Max_Br | continuous | Maximum Braking Pressure (mpa) |

Kinetic Energy | Eneg | continuous | Vehicle Kinetic Energy |

Road Condition | |||

Wet | Wet | nominal | 1. Wet 2. Dry |

Road Type | R_ty | nominal | 1. Expressway 2. Freeway 3. Urban Expressway 4. Urban road. |

Lane Numbers | Lane_Nu | nominal | 1. 1; 2.2; 3.3; 4.4; 5.5 |

Speed Limit | Sp_lim | nominal | 1. 40–60; 2.80; 3.100–120 |

Road Congestion | congested | nominal | 1. Yes; 0. No |

Weather | Weather | nominal | 1. Sunny; 2. Rain; 3. Cloud |

Light | Light | nominal | 1. Light; 2.Dark |

Time Variables | |||

Peak Hour | Peak_hrs | nominal | 1. Yes; 2. No |

Weekend | Weekend | nominal | 1. Yes; 2. No |

Time of Day | Time_day | nominal | 1. 6:00–12:00; 2. 12:00:−18:00; 3. 18:00–24:00 |

Driver Inputs | |||

Education Level | Edu_lev | nominal | 1. Less than graduate 2. graduate 3. Post-graduate and above |

Age | Age | nominal | 1. less than 23; 2. 23–45; 3. More than 45 |

Gender | Gender | categorical | 1. Male 2. Female |

Driving Miles | Dri_ miles | continues | Driving Miles (miles) |

Driving Experience | Dri_years | continuous | Driving license (years) |

Factor | Symbol | Type | Details |
---|---|---|---|

Near-Crash Type | NC_type | nominal | 1. Subject-Vehicle (Head) vs. Object-Vehicle (Head) 2. Subject-Vehicle (Head) vs. Object-Vehicle (Tail) 3. Subject-Vehicle (Head) vs. Object-Vehicle (Side) 4. Subject-Vehicle (Side) vs. Object-Vehicle (Side) 5. Subject-Vehicle (Side) vs. Object-Vehicle (Tail) 6. Conflict with Pedestrian 7. Parts of Road 8. Others |

Near-Crash Reason | NC_reason | nominal | 1. Head-vehicle abruptly halted 2. Traffic Signals 3. Traffic Jam 4. Road Repairs 5. Road changes 6. Pedestrians 7. Subject-Vehicle turned-off 8. Object-Vehicle turned-off 9. Others |

SVM | RF | MLP | LSTM | GRU |
---|---|---|---|---|

Penalty = 0.25 | Max depth = 20, Estimators = 30 lass_weight: ‘balanced’, decision:entropy | hidden_layers = 4, epochs = 50, batch_size = 256 | learning rate 0.0012, LSTM Unit Number = 16, hidden_layers:50, units:100, epochs: 100, batch_size: 512 | Hidden layer = 20, learning rate = 0.001, epochs 100, batch_size: 512 |

Number | Level | Near-Crash Events | Percentage (%) |
---|---|---|---|

1 | Minimal | 86 | 5% |

2 | Slight | 411 | 26.5% |

3 | Moderate | 882 | 52.8% |

4 | Serious | 215 | 12.9 |

5 | Severe | 46 | 2.8% |

Factor | Coefficients | Factor | Coefficients |
---|---|---|---|

Driving Behavior Features | Time Features | ||

Beginning Speed | - | Time of Day | |

Average of Deceleration | −0.0018 | 1. 60:00–12:00 | - |

Average of Speed | 0.0124 | 2. 12:00–18:00 | −0.0561 |

Time Headway Average | - | 3. 18:00–24:00 | 0.0171 |

Braking Pressure Average | - | Weekend | |

Min Deceleration | - | 0. No | 0.0165 |

Min Time Headway | - | 1. Yes | 0.0354 |

Max Braking Pressure | 0.0298 | Peak Hour | |

Vehicle Kinetic Energy | −0.0103 | 1. Yes | - |

Road Features | 2. No | - | |

Road Type | Near-Crash Features | ||

1. Expressway ^{a} | - | Near-Crash Reason | |

2. Freeway | 0.0408 | 1.Head-vehicle abruptly halted | −0.0192 |

3. Urban Expressway | - | 2. Traffic Signals | - |

4. Urban Road | −0.0092 | 3. Traffic Jam ^{a} | −0.0358 |

Road Congestion | 4. Road Repairs | 0.0154 | |

0. Yes | −0.0158 | 5. Road changes | - |

1. No | - | 6. Pedestrians | - |

Wet | 7. Subject-Vehicle turned-off | −0.0483 | |

1. Wet | - | 8. Object-Vehicle turned-off | - |

2. Dry | - | Near-crash Type | |

Light | Subject-Vehicle(Head) vs. Object-Vehicle (Head) ^{a} | - | |

1. Light | - | Subject-Vehicle (Head) vs. Object-Vehicle (Tail) | −0.0141 |

2. Dark | 0.0174 | Subject-Vehicle (Head) vs. Object-Vehicle (Side) | - |

Weather | Subject-Vehicle (Side) vs. Object-Vehicle (Side) | - | |

1. Sunny | - | Subject-Vehicle (Side) vs. Object-Vehicle (Tail) | −0.0045 |

2. Rain | 0.0244 | 6. Conflict with Pedestrian | - |

3. Cloud | - | 7. Parts of Road | - |

Driver Features | |||

Age | Education level | ||

1. Less than 23 | −0.0218 | 1. Less than graduate | - |

2. 23–45 ^{a} | - | 2. Graduate ^{a} | - |

3. More than 45 | −0.0373 | 3. Post-graduate and above | - |

Gender | Driving Mileage | −0.0293 | |

1. Male | - | Driving Experience (years) | −0.0164 |

2. Female | 0.0172 |

^{a}Base reference of a categorical variable; - non-significant variable

Model | Risk Levels of Near-Crashes | |||||
---|---|---|---|---|---|---|

Minimal | Slight | Moderate | Serious | Severe | Average | |

Support Vector Machine (SVM) | 0.89 | 0.93 | 0.89 | 0.65 | 0.76 | 0.83 |

Random Forest (RF) | 0.85 | 0.82 | 0.84 | 0.81 | 0.77 | 0.82 |

Multi-Layer Perception (MLP) | 0.84 | 0.89 | 0.76 | 0.95 | 0.97 | 0.88 |

Ordinal Probit Model (OP) | 0.80 | 0.72 | 0.90 | 0.82 | 0.80 | 0.81 |

Mutlinominal Logit Model (MNL) | 0.71 | 0.77 | 0.76 | 0.84 | 0.80 | 0.78 |

Long-Short-Term Memory (LSTM) | 0.93 | 0.94 | 0.85 | 0.93 | 0.96 | 0.96 |

Gated Recursive Unit (GRU) | 0.96 | 0.87 | 0.75 | 0.94 | 0.98 | 0.91 |

Reference | Method | Accuracy |
---|---|---|

Wang et al. [3] | Classification Regression Tree (CART) | 66.1% |

Alkheder et al. [52] | K-means clustering based NN | 74.6% |

Assi et al. [51] | Fuzzy c-means clustering based SVM | 74% |

Fuzzy c-means clustering based NN | 71% | |

Mokhtarimousavi et al. [9] | Cuckoo Search based SVM | 89.4% |

Osman et al. [26] | AdaBoost | 95% |

Our Study | Long-Short-Term Memory (LSTM) | 96% |

Gated Recursive Unit (GRU) | 91% |

Model | Risk Levels of Near-Crashes | ||||
---|---|---|---|---|---|

Minimal | Slight | Moderate | Serious | Severe | |

Support Vector Machine (SVM) | 0.76 | 0.82 | 0.75 | 0.82 | 0.85 |

Random Forest (RF) | 0.84 | 0.87 | 0.78 | 0.82 | 0.87 |

Multi-Layer Perception (MLP) | 0.92 | 0.85 | 0.76 | 0.92 | 0.89 |

Ordinal Probit Model (OP) | 0.76 | 0.7 | 0.73 | 0.87 | 0.74 |

Mutlinominal Logit Model (MNL) | 0.71 | 0.72 | 0.72 | 0.76 | 0.76 |

Long-Short-Term Memory (LSTM) | 0.95 | 0.95 | 0.88 | 0.92 | 0.91 |

Gated Recursive Unit (GRU) | 0.94 | 0.92 | 0.91 | 0.95 | 0.91 |

Model | Risk Levels of Near-Crashes | ||||
---|---|---|---|---|---|

Minimal | Slight | Moderate | Serious | Severe | |

Support Vector Machine (SVM) | 0.78 | 0.87 | 0.84 | 0.73 | 0.66 |

Random Forest (RF) | 0.81 | 0.90 | 0.85 | 0.75 | 0.66 |

Multi-Layer Perception (MLP) | 0.91 | 0.90 | 0.85 | 0.80 | 0.75 |

Ordinal Probit Model (OP) | 0.68 | 0.84 | 0.81 | 0.81 | 0.68 |

Mutlinominal Logit Model (MNL) | 0.64 | 0.81 | 0.80 | 0.78 | 0.62 |

Long-Short-Term Memory (LSTM) | 0.93 | 0.92 | 0.91 | 0.86 | 0.79 |

Gated Recursive Unit (GRU) | 0.92 | 0.82 | 0.82 | 0.81 | 0.78 |

Model | Risk Levels of Near-Crashes | ||||
---|---|---|---|---|---|

Minimal | Slight | Moderate | Serious | Severe | |

Support Vector Machine (SVM) | 0.77 | 0.85 | 0.82 | 0.78 | 0.74 |

Random Forest (RF) | 0.82 | 0.88 | 0.83 | 0.78 | 0.75 |

Multi-Layer Perception (MLP) | 0.91 | 0.87 | 0.83 | 0.85 | 0.81 |

Ordinal Probit Model (OP) | 0.72 | 0.75 | 0.73 | 0.84 | 0.73 |

Mutlinominal Logit Model (MNL) | 0.67 | 0.75 | 0.76 | 0.77 | 0.65 |

Long-Short-Term Memory (LSTM) | 0.94 | 0.94 | 0.92 | 0.89 | 0.85 |

Gated Recursive Unit (GRU) | 0.93 | 0.87 | 0.86 | 0.88 | 0.84 |

Model | Average Accuracy | Average Recall | Average Precision | Average F1-Measure |
---|---|---|---|---|

Support Vector Machine (SVM) | 83% | 0.81 | 0.78 | 0.79 |

Random Forest (RF) | 82% | 0.84 | 0.79 | 0.81 |

Multi-layered Perception (MLP) | 88% | 0.88 | 0.84 | 0.86 |

Ordinal Probit Model (OP) | 81% | 0.77 | 0.76 | 0.77 |

Mutlinominal Logit Model (MNL) | 78% | 0.72 | 0.73 | 0.72 |

Long-Short-Term Memory (LSTM) | 96% | 0.93 | 0.88 | 0.91 |

Gated Recursive Unit (GRU) | 91% | 0.93 | 0.83 | 0.88 |

Model | Training Loss | Validation Loss | Training Time (s) | Testing Time (s) |
---|---|---|---|---|

Support Vector Machine (SVM) | 0.010 | 0.010 | 7.31 | 2.07 |

Random Forest (RF) | 0.000 | 0.000 | 8.40 | 2.59 |

Multi-layered Perception (MLP) | 0.007 | 0.006 | 10.27 | 3.21 |

Ordinal Probit Model (OP) | 0.004 | 0.002 | 3.22 | 2.52 |

Mutlinominal Logit Model (MNL) | 0.011 | 0.09 | 4.51 | 3.31 |

Long-Short-Term Memory (LSTM) | 0.005 | 0.006 | 11.76 | 3.44 |

Gated Recursive Unit (GRU) | 0.004 | 0.003 | 11.68 | 3.22 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Naji, H.A.H.; Xue, Q.; Lyu, N.; Duan, X.; Li, T.
Risk Levels Classification of Near-Crashes in Naturalistic Driving Data. *Sustainability* **2022**, *14*, 6032.
https://doi.org/10.3390/su14106032

**AMA Style**

Naji HAH, Xue Q, Lyu N, Duan X, Li T.
Risk Levels Classification of Near-Crashes in Naturalistic Driving Data. *Sustainability*. 2022; 14(10):6032.
https://doi.org/10.3390/su14106032

**Chicago/Turabian Style**

Naji, Hasan A. H., Qingji Xue, Nengchao Lyu, Xindong Duan, and Tianfeng Li.
2022. "Risk Levels Classification of Near-Crashes in Naturalistic Driving Data" *Sustainability* 14, no. 10: 6032.
https://doi.org/10.3390/su14106032