Factors, Prediction, and Explainability of Vehicle Accident Risk Due to Driving Behavior through Machine Learning: A Systematic Literature Review, 2013–2023

Javier Lacherre; José Luis Castillo-Sequera; David Mauricio

doi:10.3390/computation12070131

,

and

¹

Faculty of Systems Engineering and Informatics, National University of San Marcos, Lima 15081, Peru

²

Department of Computer Sciences, Polytechnic School, University of Alcala, 28871 Alcala de Henares, Spain

^*

Author to whom correspondence should be addressed.

Computation2024, 12(7), 131;https://doi.org/10.3390/computation12070131

This article belongs to the Section Computational Engineering

Version Notes

Order Reprints

Review Reports

Abstract

Road accidents are on the rise worldwide, causing 1.35 million deaths per year, thus encouraging the search for solutions. The promising proposal of autonomous vehicles stands out in this regard, although fully automated driving is still far from being an achievable reality. Therefore, efforts have focused on predicting and explaining the risk of accidents using real-time telematics data. This study aims to analyze the factors, machine learning algorithms, and explainability methods most used to assess the risk of vehicle accidents based on driving behavior. A systematic review of the literature produced between 2013 and July 2023 on factors, prediction algorithms, and explainability methods to predict the risk of traffic accidents was carried out. Factors were categorized into five domains, and the most commonly used predictive algorithms and explainability methods were determined. We selected 80 articles from journals indexed in the Web of Science and Scopus databases, identifying 115 factors within the domains of environment, traffic, vehicle, driver, and management, with speed and acceleration being the most extensively examined. Regarding machine learning advancements in accident risk prediction, we identified 22 base algorithms, with convolutional neural network and gradient boosting being the most commonly used. For explainability, we discovered six methods, with random forest being the predominant choice, particularly for feature importance analysis. This study categorizes the factors affecting road accident risk, presents key prediction algorithms, and outlines methods to explain the risk assessment based on driving behavior, taking vehicle weight into consideration.

Keywords:

machine learning; prediction algorithms; risk assessment; road accident

1. Introduction

There are around 1.35 million deaths worldwide per year due to vehicle accidents [1]; in Europe, 60% of such deaths occur on two-lane roads [2]. In this regard, the United Nations Organization has proposed 17 sustainable development goals (SDGs) for the year 2030, where SDG-3, “Good health and well-being”, aims to reduce deaths and injuries resulting from traffic incidents by 50% worldwide [3]. One potential option is the implementation of autonomous vehicles. Nevertheless, complete automation in driving is still a considerable distance away, making it unlikely in the foreseeable future [4]; furthermore, extensive research is still needed, especially in the field of prediction.

Since 1975, research has focused on predicting vehicle accident risk (VAR). Chipman and Morgan [5] studied various factors such as demerit points, age, gender, license class, and accident history. Their findings highlighted demerit points as the key factor influencing future accident risk, offering a chance to prevent accidents when linked with driving behavior (DB). Extensive research over time has led to modifications and regulations in environmental, vehicular, traffic, driver, and management domains to reduce risks. These measures aim to reduce risks, such as using deceleration devices and central protection barriers on roads for risk mitigation [2]. Additionally, mechanisms have been implemented for collision prevention, pedestrian identification, lane change alerts, and detection of driver distraction and drowsiness with feedback to the driver, among other capabilities [6]. These advancements prompted governments to implement safety manuals, such as the Road Safety Manual, which includes a widely used VAR prediction model [7,8]; however, this model does not consider DB in its statistical analysis [9].

The study problem in this article focuses on driving behavior (DB) and its impact on traffic accident incidence. DB refers to the actions and responses of a driver during various driving scenarios, encompassing the journey from an initial point to a final destination, taking into account factors such as travel time [10]. DB can be categorized into distinct groups with similar patterns, facilitating the estimation of driving risk levels [11]. These groups include the following: normal, drowsy, and aggressive behaviors [12].

In the VAR context, DB holds utmost significance as it accounts for the highest incidence of traffic accidents—surpassing 70% of accidents in certain countries, such as Peru [13]. Therefore, the vehicle accident risk due to driving behavior (DBVAR) refers to the probability of a traffic accident occurring due to actions taken by drivers behind the wheel, which can increase the chances of suffering an accident and endanger road safety. Identifying this risk is fundamental to protecting human lives, promoting road safety, reducing the costs associated with traffic accidents, and developing effective safety policies.

Research has been conducted to determine factors for predicting traffic incidents using machine learning (ML) methods. Xu et al. [14] found that there was a strong correlation between aggressive DB and aspects of the driver, vehicle, and environment. In a similar vein, Li et al. [15] included the environment, vehicle, driver, and traffic. Likewise, Niu et al. [16] and Yang et al. [17] included the management domain. It is important to study each of these factors, not only to find a better model but also to mitigate the risk of accidents and their consequences, in addition to improving road safety [18].

Regarding prediction, different artificial intelligence algorithms have been used to predict the risk of vehicle accidents. Geng et al. [2] presented an extensive modeling framework for evaluating truck safety on two-lane rural roads using extreme gradient boosting (XGboost), achieving an impressive accuracy of 96.67%. In the study by Peng et al. [19], it was noted that long short-term memory (LSTM) is suitable for extracting significant and continuous information from vehicles such as accelerations and decelerations, which they applied for DBVAR prediction and achieved a 93.5% accuracy.

On the other hand, various algorithms have also been used for DBVAR explainability. In the study by Masello et al. [20], Shapley additive explanations (SHAP) was applied, and it was found that the speed limit was a very relevant factor for the riskiest events. In the same sense, the study by Alfai et al. [21], based on the random forest (RF) feature importance method, discovered that the most significant predictors for DBVAR were the mean speed of the vehicle, the vehicle’s instantaneous speed, and its longitudinal acceleration.

The amount of research on DBVAR has motivated various researchers to perform state-of-the-art studies. In the study by Bouhsissin et al. [22], 93 articles were reviewed between 2015 and 2022, from which it was highlighted that ML algorithms occupied the predominant position with 60%, followed by deep learning (DL) algorithms and statistical methods (with 34.87% and 5.15%, respectively). The most-used algorithms were support vector machine (SVM), logistic regression (LR), LSTM, artificial neural network (ANN), k-nearest neighbors (KNN), RF, and convolutional neural network (CNN). In parallel, 39 relevant factors were identified in this area. In the study by Paredes et al. [23], 27 articles were analyzed between 2015 and 2020, finding 17 ML algorithms in which Bayesian algorithms and decision trees mainly stood out. In addition, 21 relevant factors were identified in this context, coinciding with the results of Bouhsissin et al. [22], where the most used were acceleration, deceleration, and speed. Likewise, in the research of Elassad et al. [24], 82 articles from the period 2009–2019 were reviewed, and the factors and prediction aspects were analyzed. A total of 14 general factors grouped into the dimensions of driver, vehicle, and environment were identified, and it was found that SVM, neural network (NN), Bayesian learners (BL), and ensemble learners (EL) were the four most used algorithms, present in 72% of the selected studies. On the other hand, in the study by Silva et al. [25], the prediction and explainability aspects were studied in relation to the frequency of accidents and severity classification, based on 26 articles from the period 2003–2020, and it was found that the main techniques were KNN and decision tree (DT); however, ANN was found to be the most suitable for predicting accident frequency. Furthermore, they highlighted the road environment, human behaviors, accident characteristics, and vehicle-related elements as the main contributors to the elucidation of accident causes.

Studies in the field have revealed that a wealth of knowledge exists that needs to be inventoried, analyzed, and classified. However, in the context of ML, there is a tendency to use algorithms that evaluate risk based on accident frequency and DBVAR, without differentiating between light and heavy vehicles or associated factors related to vehicle trip management. These factors include the estimated delay time to the destination or whether a heavy vehicle is loaded or empty. Furthermore, current approaches focus on contributing factors that explain the frequency or severity of accidents but do not identify the factors contributing to DBVAR. This gap is crucial as regulations increasingly mandate the incorporation of mechanisms for reading trajectory and security data. Through analyzing these data, conducting prediction in real time, and explaining the causes, we can significantly mitigate the number of accidents.

This study aims to systematically review all the important developed aspects related to the factors, prediction, and explainability of DBVAR based on ML and aims to answer the following research question: Which factors, ML advances for prediction, and explainability methods have been investigated in relation to DBVAR?

The main contributions of this article are as follows:

Providing a comprehensive catalog of traffic accident risk factors, classified into five dimensions;
Identifying the various prediction algorithms, data sets used, and performance metrics employed in the analysis;
Compiling the various studies utilizing multiple methods to explain factors contributing to DBVAR;
Providing the reader with a wide range of bibliographic references that they can utilize to delve deeper into understanding the models based on ML that facilitate prediction and explanation of DBVAR.

This article is organized into five sections, as follows. In Section 2, the methodology followed for the systematic review of the literature is presented. Section 3 presents the results, focused on answering the research questions, the discussion of which is presented in Section 4. Finally, the conclusions follow in Section 5.

2. Methodology

For this article, a systematic review of the literature was carried out based on the model applied by Silva et al. [25] and Shiguihara et al. [26] to ensure scientific rigor, which consisted of the following phases:

Planning: Define the research questions to be addressed, establish the sequence of steps to be carried out to search, and identify primary studies in indexed databases, also including the inclusion/exclusion criteria used for the selection of articles.
Development: The selection of primary studies is carried out in accordance with planning, following which the quality is evaluated and the data are extracted and synthesized.

Results: Statistics on publications are shown, and the research questions are answered in Section 2.3 and Section 3, respectively.

2.1. Planning

Three research questions were proposed in order to determine the aspects developed to understand the factors, prediction, and explainability of the DBVAR:

RQ1: What are the factors considered in predicting DBVAR?
RQ2: What are the advances of ML in DBVAR prediction?
RQ3: What advances in explainable artificial intelligence (XAI) exist for DBVAR prediction?

In order to address the research questions, we conducted a review of primary publications in journals indexed in the SCOPUS and Web of Science (WoS) databases, using the following search string:

(“vehicle accident risk” OR “car accident risk” OR “car following” OR “driving behavior ” OR “driving style” OR “driver behavior ” OR “driving risk” OR “driver risk” OR “road safety”) AND ((factors OR features OR causes) OR (predicti* OR forecast* OR progno*) OR (explainability OR explainable OR interpretabl* OR xai)) AND (“machine learning” OR “deep learning” OR lstm).

As shown in Table 1, the string was applied in “title-abs-key” format for Scopus and “topic” format for WoS, considering the period from January 2013 to July 2023. Additionally, the search was limited to publications with SCImago journal ranking impact factor. Finally, the inclusion and exclusion criteria established in Table 2 were applied.

Table 1. Database search string.

Table 2. Inclusion and exclusion criteria.

2.2. Development

The possible original investigations found during the search were subjected to a selection procedure based on the criteria detailed in Table 2, covering both inclusion and exclusion criteria. To achieve this, it was necessary to carry out a prior review of the content, in order to determine its relevance for the present study and find those studies related to the factors, prediction, or explainability of DBVAR using ML. Most of the works were discarded as they corresponded to unrelated topics such as driver identification, energy consumption, autonomous vehicles, vehicles with fewer than four wheels, racing cars, pollution, level of accident severity, traffic study, or time and cost optimization. Figure 1 explains the applied process and identifies the activities carried out to select or reject studies.

Figure 1. Systematic review process according to PRISMA [27].

2.3. Results

Potentially eligible studies and selected studies

The systematic review search conducted in Scopus and WoS resulted in 1674 articles, of which 80 were selected (see Table 3).

Table 3. Potentially eligible studies and selected studies.

Trend of studies by year

The number of publications in the aspects of factors, prediction, or explainability of DBVAR showed exponential growth both in potential articles (see Figure 2a) and in selected articles (see Figure 2b). This could be explained by the increasing number of traffic accidents and the introduction of ML technologies for accident prediction and explainability.

Figure 2. Number of publications per year: (a) potentially eligible and (b) selected studies.

Study trends across different countries

Figure 3 illustrates the distribution of studies based on the authors’ country of affiliation, with China and the United States representing 45% of the total concentration.

Figure 3. Studies by authors’ country of affiliation.

Articles selected by journal quality factor

Regarding the journal quality factor, 60% (48) of the articles were categorized in quartile Q1 and 35% (28) in quartile Q2, indicating that 95% of the articles fell within the top two quartiles (see Figure 4). This highlights the quality of the studies.

Figure 4. Articles by quality factor.

Articles selected by journal

Figure 5 illustrates that the two most prominent journals—Accident Analysis and Prevention and IEEE Access—were situated in the Q1 quartile and collectively accounted for 25% of the publications. Notably, there were 27 other journals categorized under “Others”, each contributing a single article.

Figure 5. Articles by journal.

3. Results

This section addresses the research questions posed in Section 2.1 based on the selected studies.

A.: RQ1: What are the factors considered in predicting DBVAR?

DB encompasses a driver’s actions, awareness, and adherence to road regulations. These factors can directly impact a driver’s behavior or prompt changes, and comprehending them aids in enhancing safety standards [28]. In this context, 115 factors were found in 48 studies, which were classified considering three of the four categories from Silva et al. [25], separating the factors related to traffic from the environment category and adding a management category, then excluding the accident category (characteristics of the occurred accident type) as this was a result and not a risk, and so, it did not correspond to a DBVAR. The resulting categories were as follows:

(1): Environment: environment and geographical distribution.
(2): Traffic: related to vehicles surrounding to the one being studied.
(3): Vehicle: static or moving mode features.
(4): Driver: related to the human who drives the vehicle.
(5): Management: efficient vehicle fleet and drivers control and coordination.

Environment factors: A total of 20 factors were found from 23 articles, where the weather was the most used (in 9), followed by date–time and slope (in 8 and 5, respectively; see Table 4).

Table 4. Environmental factors used in DBVAR.

Traffic factors: A total of 17 were identified, where the most studied were the distance between two vehicles, the time to collision, and the traffic density, in 13, 10, and 9 studies out of 25, respectively (see Table 5).

Table 5. Traffic factors used in DBVAR.

Vehicle factors: A total of 44 factors were identified in 39 articles, where the most used were speed, acceleration, and steering angle, in 27, 23, and 9 studies, respectively (see Table 6).

Table 6. Vehicle factors used in DBVAR.

Driver factors: A total of 25 were identified, where the most used were heart rate and eye, in 4 studies each, representing 39% of the 18 studies (see Table 7).

Table 7. Driver factors used in DBVAR.

Management factors: A total of nine were identified in two studies (see Table 8).

Table 8. Management factors used in DBVAR.

On the other hand, the DBVAR prediction studies considered four variables of interest that described the accident, which were presented in two studies (see Table 9) and were used as a prediction object.

Table 9. Variables that describe the accident used in DBVAR prediction.

B.: RQ2: What are the advances of ML in DBVAR prediction?

Prediction based on statistical or ML methods allows behavior to be predicted in the case of an event, in order to predict probable future results such as DB or traffic accidents [30]. These models use the factors as input to make predictions; however, once the result is obtained, the reasoning behind the decision becomes unknown, and it is not possible to determine which of the factors has contributed most significantly to the generated effect [65]. For this reason, they are called “closed box” techniques, and to fully understand them, the use of additional explainability techniques is necessary.

To address this question, we examined 76 studies, identifying 22 core algorithms. Among these, CNN and GB emerged as the primary choices, which were featured in 19 and 15 studies, respectively. These algorithms were employed either independently or in hybrid models, resulting in the highest accuracy with XGBoost at 100%, as detailed in Table 10. Additionally, Table 11 reveals that out of all the conducted studies, four were centered on heavy vehicles, while two focused on rural roads. Moreover, Table 12 underscores that the primary aspect under scrutiny in the analysis of DB was driving style, comprising 45 studies.

Table 10. Algorithms used in the DBVAR.

Table 11. Studies applied in heavy vehicles and rural roads.

Table 12. Studies applied by type of driving risk.

C.: RQ3: What advances in XAI exist for DBVAR prediction?

XAI allows for adequate interpretation of the prediction process [17], for which models are used to analyze the importance and dependence of the factors that contribute to explaining the result [40]; in this way, confidence and transparency in the predictions can be ensured, such that they can reasonably be applied in the field of transportation safety [14].

To answer this question, 18 studies were found that used six methods to explain the factors with the greatest contribution to the DBVAR. In this context, RF and GB feature importance were the most used (in 50%), as well as SHAP (in 33%). They mainly focused on explaining DB in accident risk, where China and the United States were the main countries where the studies were applied (see Table 13).

Table 13. Methods used in the explainability of the DBVAR.

4. Discussion

The result of this systematic literature review is a catalog of factors, prediction algorithms, and methods used to explain the importance of the factors. Researchers can use the different results to understand progress in the field and provide new approaches to reduce the risk of accidents to protect human lives, promote road safety, reduce the costs associated with traffic accidents, and develop effective safety policies. The relevance of this information is validated as 95% of studies were within the first two quartiles, such that the quality of the results is guaranteed. The research questions are discussed below.

4.1. About Factors

In this study, it was observed that the factors were classified into five dimensions (vehicle, environment, traffic, driver, and management), where vehicle was the most studied. Speed, acceleration, and distance between two vehicles stood out as the most-used factors due to their direct influence on the driver’s ability to control the vehicle in various risk situations. In addition, they also determine the level of severity of an accident. Additional crucial factors include the geographical location, determined through the Global Positioning System (GPS), as it enables us to comprehend other related elements, such as the geographical environment. The increasing prevalence of cost-effective sensors and cameras in vehicles is driving a trend toward greater data acquisition in real time, consequently enhancing the precision of models. At present, China leads research on DBVAR factors, probably due to the growth, leadership, and expansion of its automotive industry.

Some studies have considered the accident domain; however, this refers to the results and not the causes. Furthermore, they tend to focus on accident characterization, and so, they have not been considered as factors; however, they could be considered as an object to predict. Likewise, it is important to highlight that there were no factors associated with trip management, such as delay in delivery, the driver’s experience on the route, or whether the vehicle was loaded. Therefore, it is important to consider management-related factors (i.e., those in the management dimension) to evaluate commercial vehicles and improve the understanding of vehicle accidents as a whole.

4.2. About Prediction

In this study, the algorithms, vehicles, and roads types used in accident risk prediction research were identified. The novelty lies in the growing use of algorithms that combine convolutional and recurrent neural networks, taking advantage of the diversity of sensors integrated into vehicles, which generate data and images. The most commonly used algorithm is CNN or other combinations with CNN as it allows one to take advantage of the individual strengths of each model through sequential data processing, such as text and image analysis. Furthermore, it was identified that the most important algorithm in terms of accuracy for detecting distracted driving was Mobilenetv3 as it showed high accuracy in terms of real-time pattern recognition. However, for driving style recognition, the algorithm with the best accuracy was XGBoost. Therefore, there is a trend toward the use of deep learning, possibly due to the availability of larger volumes of data and advances in hardware and software, as well as the ability to achieve better performance overall. Regarding the metrics used to evaluate the algorithms, a consensus has been found in the use of accuracy as one of the main indicators in most studies. However, it is important to note, as observed in Table 10, that eight different metrics were identified, and not all studies considered the same metrics. Additionally, it is worth mentioning that only 2.6% of the studies focused on rural roads and 5% on heavy vehicles. To improve precision in this topic, it is suggested to explore the incorporation of transformer neural networks and dynamic Bayesian networks, which can capture long-term relationships in time series data. Additionally, alerts can be implemented for drivers and fleet managers regarding risk level, such that they can take preventive actions based on the provided information.

Moreover, based on Table 10, it can be determined that the most commonly used models (present in 80% of the studies) are MLP, CNN, GB, LSTM, and RF. This could be explained by the fact that MLP is one of the pioneering models in ML, while the other four stand out for their high accuracy, averaging 92.94%. Furthermore, an analysis of Table 4, Table 5, Table 6, Table 7, Table 8 and Table 9 on factors and Table 10 on models reveals that the common factors influencing the performance of the most commonly used models are speed, acceleration, and heading angle.

4.3. About Explainability

In this study, six explainability methods were identified in 18 studies, where the most studied was “RF feature importance,” with influencing factors related to the environment, such as road shape, road network, and weather. The increasing adoption of deep learning algorithms has highlighted the importance of understanding and trusting model decisions, driving the use of explainability methods to identify influential risk factors that might not be obvious to humans. Although the reviewed research barely addressed management factors, it is relevant to study their importance in explainability. Furthermore, there exist very successful methods, such as local interpretable model-agnostic explanations (LIME), which could provide good results in this context.

5. Conclusions

For this study, we conducted a systematic literature review related to DBVAR through ML. Out of the 1674 articles identified, 80 research papers were meticulously chosen through analysis, enabling the discovery of advancements in the field with respect to factors, prediction, and explainability. Within this review, we identified 115 factors across 48 studies, 22 prediction algorithms within 76 studies, and 6 explainability algorithms across 18 studies, all of which elucidated the influence of certain factors on prediction outcomes. Unlike other state-of-the-art studies on DBVAR, this work considered three crucial aspects: the influencing factors, accident prediction, and explainability. In relation to factors, we identified five dimensions: environment (20 factors), traffic (17 factors), vehicle (44 factors), driver (25 factors), and management (9 factors). In particular, speed, acceleration, and distance between two vehicles were the most-studied factors. In the realm of ML advancements, CNN and GB emerged as the most commonly employed algorithms. Moreover, there is a growing trend in leveraging deep learning and hybrid models for enhanced precision. Notably, XGboost achieved the highest accuracy at 100% on a DBD data set of Turkish origin. It is worth noting that the majority of studies focused on light vehicles, with limited research conducted on heavy vehicles and rural roads. In reference to advances in explainability, it was found that the most-used method was the RF algorithm with feature importance. Additionally, the most studied models were MLP, CNN, GB, LSTM, and RF, and the common factors influencing their performance were speed, acceleration, and heading angle.

This study had some limitations that should be considered. Only studies in English were included, and only the WoS and Scopus databases were used as sources of information. Based on our findings, future research should focus on developing practices and strategies to address DBVAR factors in order to reduce the occurrence of traffic accidents, as well as extending this study to include other languages and additional databases.

Author Contributions

Conceptualization, J.L.; methodology, J.L.; formal analysis, J.L.; investigation, J.L.; resources, J.L.; writing—original draft preparation, J.L.; writing—review and editing, J.L. and D.M.; supervision, D.M. and J.L.C.-S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflicts of interest.

References

World Health Organization. Global Status Report on Road Safety—Time for Action; World Health Organization: Geneva, Switzerland, 2018. [Google Scholar]
Geng, Z.; Ji, X.; Cao, R.; Lu, M.; Qin, W. A Conflict Measures-Based Extreme Value Theory Approach to Predicting Truck Collisions and Identifying High-Risk Scenes on Two-Lane Rural Highways. Sustainability 2022, 14, 11212. [Google Scholar] [CrossRef]
Naciones Unidas. La Agenda 2030 y los Objetivos de Desarrollo Sostenible: Una Oportunidad para América Latina y el Caribe; Comisión Económica para América Latina y el Caribe (CEPAL): Santiago de Chile, Chile, 2018; ISBN 978-92-1-058643-6. [Google Scholar]
Kashevnik, A.; Shchedrin, R.; Kaiser, C.; Stocker, A. Driver Distraction Detection Methods: A Literature Review and Framework. IEEE Access 2021, 9, 60063–60076. [Google Scholar] [CrossRef]
Chipman, M.L.; Morgan, P. The Role of Driver Demerit Points and Age in the Prediction of Motor Vehicle Collisions. J. Epidemiol. Community Health 1975, 29, 190–195. [Google Scholar] [CrossRef][Green Version]
Celaya-Padilla, J.M.; Galván-Tejada, C.E.; Lozano-Aguilar, J.S.A.; Zanella-Calzada, L.A.; Luna-García, H.; Galván-Tejada, J.I.; Gamboa-Rosales, N.K.; Rodriguez, A.V.; Gamboa-Rosales, H. “Texting & Driving” Detection Using Deep Convolutional Neural Networks. Appl. Sci. 2019, 9, 2962. [Google Scholar] [CrossRef]
AASHTO. Highway Safety Manual; American Association of State Highway and Transportation Officials: Washington, DC, USA, 2010. [Google Scholar]
Das, S.; Tsapakis, I.; Khodadadi, A. Safety Performance Functions for Low-Volume Rural Minor Collector Two-Lane Roadways. IATSS Res. 2021, 45, 347–356. [Google Scholar] [CrossRef]
Tola, A.M.; Demissie, T.A.; Saathoff, F.; Gebissa, A. Crash Distribution Dataset: Development and Validation for the Undivided Rural Roads in Oromia, Ethiopia. Transp. Telecommun. J. 2022, 23, 11–24. [Google Scholar] [CrossRef]
Halim, Z.; Sulaiman, M.; Waqas, M.; Aydın, D. Deep Neural Network-Based Identification of Driving Risk Utilizing Driver Dependent Vehicle Driving Features: A Scheme for Critical Infrastructure Protection. J. Ambient Intell. Humaniz. Comput. 2023, 14, 11747–11765. [Google Scholar] [CrossRef]
Shi, X.; Wong, Y.D.; Li, M.Z.-F.; Palanisamy, C.; Chai, C. A Feature Learning Approach Based on XGBoost for Driving Assessment and Risk Prediction. Accid. Anal. Prev. 2019, 129, 170–179. [Google Scholar] [CrossRef]
Yi, D.W.; Su, J.Y.; Liu, C.J.; Quddus, M.; Chen, W.H. A Machine Learning Based Personalized System for Driving State Recognition. Transp. Res. Part C Emerg. Technol. 2019, 105, 241–261. [Google Scholar] [CrossRef]
Observatorio Nacional de Seguridad Vial Boletín Estadístico de Siniestralidad Vial, 2021. Available online: https://www.onsv.gob.pe/post/boletin-estadistico-de-siniestralidad-vial-2021/ (accessed on 20 May 2022).
Xu, W.; Wang, J.; Fu, T.; Gong, H.; Sobhani, A. Aggressive Driving Behavior Prediction Considering Driver’s Intention Based on Multivariate-Temporal Feature Data. Accid. Anal. Prev. 2022, 164, 106477. [Google Scholar] [CrossRef] [PubMed]
Li, D.; Wang, Y.; Xu, W. A Deep Multichannel Network Model for Driving Behavior Risk Classification. IEEE Trans. Intell. Transp. Syst. 2023, 24, 1204–1219. [Google Scholar] [CrossRef]
Niu, Y.; Li, Z.M.; Fan, Y.X. Analysis of Truck Drivers’ Unsafe Driving Behaviors Using Four Machine Learning Methods. Int. J. Ind. Ergon. 2021, 86, 103192. [Google Scholar] [CrossRef]
Yang, C.; Chen, M.Y.; Yuan, Q. The Application of XGBoost and SHAP to Examining the Factors in Freight Truck-Related Crashes: An Exploratory Analysis. Accid. Anal. Prev. 2021, 158, 106153. [Google Scholar] [CrossRef]
Zhong, S.; Fu, X.; Lu, W.; Tang, F.; Lu, Y. An Expressway Driving Stress Prediction Model Based on Vehicle, Road and Environment Features. IEEE Access 2022, 10, 57212–57226. [Google Scholar] [CrossRef]
Peng, L.; Wang, Y.; Zhang, F.; Zhang, J.; Li, Z. Evaluation of Emergency Driving Behaviour and Vehicle Collision Risk in Connected Vehicle Environment: A Deep Learning Approach. IET Intell. Transp. Syst. 2021, 15, 584–594. [Google Scholar] [CrossRef]
Masello, L.; Castignani, G.; Sheehan, B.; Guillen, M.; Murphy, F. Using Contextual Data to Predict Risky Driving Events: A Novel Methodology from Explainable Artificial Intelligence. Accid. Anal. Prev. 2023, 184, 106997. [Google Scholar] [CrossRef]
Al-refai, G.; Elmoaqet, H.; Ryalat, M. In-Vehicle Data for Predicting Road Conditions and Driving Style Using Machine Learning. Appl. Sci. 2022, 12, 8928. [Google Scholar] [CrossRef]
Bouhsissin, S.; Sael, N.; Benabbou, F. Driver Behavior Classification: A Systematic Literature Review. IEEE Access 2023, 11, 14128–14153. [Google Scholar] [CrossRef]
Paredes, J.J.; Yepes, S.F.; Salazar-Cabrera, R.; de la Cruz, Á.P.; Molina, J.M.M. Intelligent Collision Risk Detection in Medium-Sized Cities of Developing Countries, Using Naturalistic Driving: A Review. J. Traffic Transp. Eng. (Engl. Ed.) 2022, 9, 912–929. [Google Scholar] [CrossRef]
Elassad, Z.E.A.; Mousannif, H.; Moatassime, H.A.; Karkouch, A. The Application of Machine Learning Techniques for Driving Behavior Analysis: A Conceptual Framework and a Systematic Literature Review. Eng. Appl. Artif. Intell. 2020, 87, 103312. [Google Scholar] [CrossRef]
Silva, P.B.; Andrade, M.; Ferreira, S. Machine Learning Applied to Road Safety Modeling: A Systematic Literature Review. J. Traffic Transp. Eng. (Engl. Ed.) 2020, 7, 775–790. [Google Scholar] [CrossRef]
Shiguihara, P.; De Andrade Lopes, A.; Mauricio, D. Dynamic Bayesian Network Modeling, Learning, and Inference: A Survey. IEEE Access 2021, 9, 117639–117648. [Google Scholar] [CrossRef]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 Statement: An Updated Guideline for Reporting Systematic Reviews. Syst. Rev. 2021, 10, 89. [Google Scholar] [CrossRef]
Alkinani, M.H.; Khan, W.Z.; Arshad, Q. Detecting Human Driver Inattentive and Aggressive Driving Behavior Using Deep Learning: Recent Advances, Requirements and Open Challenges. IEEE Access 2020, 8, 105008–105030. [Google Scholar] [CrossRef]
Elassad, Z.E.A.; Mousannif, H.; Moatassime, H.A. A Real-Time Crash Prediction Fusion Framework: An Imbalance-Aware Strategy for Collision Avoidance Systems. Transp. Res. Part C Emerg. Technol. 2020, 118, 102708. [Google Scholar] [CrossRef]
Shangguan, Q.; Fu, T.; Wang, J.; Luo, T.; Fang, S. An Integrated Methodology for Real-Time Driving Risk Status Prediction Using Naturalistic Driving Data. Accid. Anal. Prev. 2021, 156, 106122. [Google Scholar] [CrossRef]
Zhao, H.; Li, X.; Cheng, H.; Zhang, J.; Wang, Q.; Zhu, H. Deep Learning-Based Prediction of Traffic Accidents Risk for Internet of Vehicles. China Commun. 2022, 19, 214–224. [Google Scholar] [CrossRef]
Hu, Z.; Zhou, J.; Zhang, E. Improving Traffic Safety through Traffic Accident Risk Assessment. Sustainability 2023, 15, 3748. [Google Scholar] [CrossRef]
Wang, J.; Xu, W.; Fu, T.; Jiang, R. Recognition of Trip-Based Aggressive Driving: A System Integrated With Gaussian Mixture Model Structured of Factor-Analysis, and Hierarchical Clustering. IEEE Trans. Intell. Transp. Syst. 2022, 23, 20442–20451. [Google Scholar] [CrossRef]
Ghandour, R.; Potams, A.J.; Boulkaibet, I.; Neji, B.; Barakeh, Z.A. Driver Behavior Classification System Analysis Using Machine Learning Methods. Appl. Sci. 2021, 11, 10562. [Google Scholar] [CrossRef]
Khodairy, M.A.; Abosamra, G. Driving Behavior Classification Based on Oversampled Signals of Smartphone Embedded Sensors Using an Optimized Stacked-LSTM Neural Networks. IEEE Access 2021, 9, 4957–4972. [Google Scholar] [CrossRef]
Li, D.C.; Lin, M.Y.-C.; Chou, L.-D. Macroscopic Big Data Analysis and Prediction of Driving Behavior with an Adaptive Fuzzy Recurrent Neural Network on the Internet of Vehicles. IEEE Access 2022, 10, 47881–47895. [Google Scholar] [CrossRef]
Arumugam, S.; Bhargavi, R. Road Rage and Aggressive Driving Behaviour Detection in Usage-Based Insurance Using Machine Learning. Int. J. Softw. Innov. 2023, 11, 1–29. [Google Scholar] [CrossRef]
Kanwal, K.; Rustam, F.; Chaganti, R.; Jurcut, A.D.; Ashraf, I. Smartphone Inertial Measurement Unit Data Features for Analyzing Driver Driving Behavior. IEEE Sens. J. 2023, 23, 11308–11323. [Google Scholar] [CrossRef]
Shangguan, Q.; Fu, T.; Wang, J.; Fang, S.; Fu, L. A Proactive Lane-Changing Risk Prediction Framework Considering Driving Intention Recognition and Different Lane-Changing Patterns. Accid. Anal. Prev. 2022, 164, 106500. [Google Scholar] [CrossRef]
Nikolaou, D.; Ziakopoulos, A.; Dragomanovits, A.; Roussou, J.; Yannis, G. Comparing Machine Learning Techniques for Predictions of Motorway Segment Crash Risk Level. Safety 2023, 9, 32. [Google Scholar] [CrossRef]
Guo, H.; Xie, K.; Keyvan-Ekbatani, M. Modeling Driver’s Evasive Behavior during Safety–Critical Lane Changes: Two-Dimensional Time-to-Collision and Deep Reinforcement Learning. Accid. Anal. Prev. 2023, 186, 107063. [Google Scholar] [CrossRef]
Lyu, N.; Wang, Y.; Wu, C.; Peng, L.; Thomas, A.F. Using Naturalistic Driving Data to Identify Driving Style Based on Longitudinal Driving Operation Conditions. J. Intell. Connect. Veh. 2022, 5, 17–35. [Google Scholar] [CrossRef]
Abdelrahman, A.E.; Hassanein, H.S.; Abu-Ali, N. Robust Data-Driven Framework for Driver Behavior Profiling Using Supervised Machine Learning. IEEE Trans. Intell. Transp. Syst. 2022, 23, 3336–3350. [Google Scholar] [CrossRef]
Wang, H.; Wang, X.; Han, J.; Xiang, H.; Li, H.; Zhang, Y.; Li, S. A Recognition Method of Aggressive Driving Behavior Based on Ensemble Learning. Sensors 2022, 22, 644. [Google Scholar] [CrossRef]
Liu, Z.; Ren, S.; Peng, M. Identification of Driver Distraction Based on SHRP2 Naturalistic Driving Study. Math. Probl. Eng. 2021, 2021, 6699327. [Google Scholar] [CrossRef]
Zhao, L.; Xu, T.; Zhang, Z.; Hao, Y. Lane-Changing Recognition of Urban Expressway Exit Using Natural Driving Data. Appl. Sci. 2022, 12, 9762. [Google Scholar] [CrossRef]
van der Wall, H.E.C.; Doll, R.J.; van Westen, G.J.P.; Koopmans, I.; Zuiker, R.G.; Burggraaf, J.; Cohen, A.F. The Use of Machine Learning Improves the Assessment of Drug-Induced Driving Behaviour. Accid. Anal. Prev. 2020, 148, 105822. [Google Scholar] [CrossRef]
Yurtsever, E.; Yamazaki, S.; Miyajima, C.; Takeda, K.; Mori, M.; Hitomi, K.; Egawa, M. Integrating Driving Behavior and Traffic Context Through Signal Symbolization for Data Reduction and Risky Lane Change Detection. IEEE Trans. Intell. Veh. 2018, 3, 242–253. [Google Scholar] [CrossRef]
Wang, K.; Yang, Y.; Wang, S.; Shi, Z. Research on Car-Following Model Considering Driving Style. Math. Probl. Eng. 2022, 2022, 7215697. [Google Scholar] [CrossRef]
Rahman, M.J.; Beauchemin, S.S.; Bauer, M.A. Predicting Driver Behaviour at Intersections Based on Driver Gaze and Traffic Light Recognition. IET Intell. Transp. Syst. 2020, 14, 2083–2091. [Google Scholar] [CrossRef]
Misra, A.; Samuel, S.; Cao, S.; Shariatmadari, K. Detection of Driver Cognitive Distraction Using Machine Learning Methods. IEEE Access 2023, 11, 18000–18012. [Google Scholar] [CrossRef]
Malik, M.; Nandal, R.; Dalal, S.; Jalglan, V.; Le, D.N. Driving Pattern Profiling and Classification Using Deep Learning. Intel. Autom. Soft Comput. 2021, 28, 887–906. [Google Scholar] [CrossRef]
Seo, H.; Shin, J.; Kim, K.-H.; Lim, C.; Bae, J. Driving Risk Assessment Using Non-Negative Matrix Factorization with Driving Behavior Records. IEEE Trans. Intell. Transp. Syst. 2022, 23, 20398–20412. [Google Scholar] [CrossRef]
Lattanzi, E.; Freschi, V. Machine Learning Techniques to Identify Unsafe Driving Behavior by Means of In-Vehicle Sensor Data. Expert Syst. Appl. 2021, 176, 114818. [Google Scholar] [CrossRef]
Kadri, N.; Ellouze, A.; Ksantini, M.; Turki, S.H. New LSTM Deep Learning Algorithm for Driving Behavior Classification. Cybern. Syst. 2023, 54, 387–405. [Google Scholar] [CrossRef]
Zhang, X.; Yan, X. Predicting Collision Cases at Unsignalized Intersections Using EEG Metrics and Driving Simulator Platform. Accid. Anal. Prev. 2023, 180, 106910. [Google Scholar] [CrossRef]
Tran, D.; Do, H.M.; Sheng, W.H.; Bai, H.; Chowdhary, G. Real-Time Detection of Distracted Driving Based on Deep Learning. IET Intell. Transp. Syst. 2018, 12, 1210–1219. [Google Scholar] [CrossRef]
Nakano, K.; Chakraborty, B. Real-Time Distraction Detection from Driving Data Based Personal Driving Model Using Deep Learning. Int. J. Intell. Transp. Syst. Res. 2022, 20, 238–251. [Google Scholar] [CrossRef]
Panagopoulos, G.; Pavlidis, I. Forecasting Markers of Habitual Driving Behaviors Associated with Crash Risk. IEEE Trans. Intell. Transp. Syst. 2020, 21, 841–851. [Google Scholar] [CrossRef]
Shi, L.; Qian, C.; Guo, F. Real-Time Driving Risk Assessment Using Deep Learning with XGBoost. Accid. Anal. Prev. 2022, 178, 106836. [Google Scholar] [CrossRef]
Fan, X.; Wang, F.; Song, D.; Lu, Y.; Liu, J. GazMon: Eye Gazing Enabled Driving Behavior Monitoring and Prediction. IEEE Trans. Mob. Comput. 2021, 20, 1420–1433. [Google Scholar] [CrossRef]
Albadawi, Y.; AlRedhaei, A.; Takruri, M. Real-Time Machine Learning-Based Driver Drowsiness Detection Using Visual Features. J. Imaging 2023, 9, 91. [Google Scholar] [CrossRef]
Alotaibi, M.; Alotaibi, B. Distracted Driver Classification Using Deep Learning. Signal Image Video Process. 2020, 14, 617–624. [Google Scholar] [CrossRef]
Taherisadr, M.; Asnani, P.; Galster, S.; Dehzangi, O. ECG-Based Driver Inattention Identification during Naturalistic Driving Using Mel-Frequency Cepstrum 2-D Transform and Convolutional Neural Networks. Smart Health 2018, 9–10, 50–61. [Google Scholar] [CrossRef]
Haque, M.M.; Sarker, S.; Dewan, M.A.A. Driving Maneuver Classification from Time Series Data: A Rule Based Machine Learning Approach. Appl. Intell. 2022, 52, 16900–16915. [Google Scholar] [CrossRef]
Hu, J.; Zhang, X.; Maybank, S. Abnormal Driving Detection with Normalized Driving Behavior Data: A Deep Learning Approach. IEEE Trans. Veh. Technol. 2020, 69, 6943–6951. [Google Scholar] [CrossRef]
Jahan, I.; Uddin, K.M.A.; Murad, S.A.; Miah, M.S.U.; Khan, T.Z.; Masud, M.; Aljahdali, S.; Bairagi, A.K. 4D: A Real-Time Driver Drowsiness Detector Using Deep Learning. Electronics 2023, 12, 235. [Google Scholar] [CrossRef]
Khan, T.; Choi, G.; Lee, S. EFFNet-CA: An Efficient Driver Distraction Detection Based on Multiscale Features Extractions and Channel Attention Mechanism. Sensors 2023, 23, 3835. [Google Scholar] [CrossRef]
Abosaq, H.A.; Ramzan, M.; Althobiani, F.; Abid, A.; Aamir, K.M.; Abdushkour, H.; Irfan, M.; Gommosani, M.E.; Ghonaim, S.M.; Shamji, V.R.; et al. Unusual Driver Behavior Detection in Videos Using Deep Learning Models. Sensors 2023, 23, 311. [Google Scholar] [CrossRef]
Huang, C.; Huang, C.; Wang, X.; Cao, J.; Wang, S.; Wang, S.; Zhang, Y.; Zhang, Y. HCF: A Hybrid CNN Framework for Behavior Detection of Distracted Drivers. IEEE Access 2020, 8, 109335–109349. [Google Scholar] [CrossRef]
Li, T.; Zhang, Y.; Li, Q.; Zhang, T. AB-DLM: An Improved Deep Learning Model Based on Attention Mechanism and BiFPN for Driver Distraction Behavior Detection. IEEE Access 2022, 10, 83138–83151. [Google Scholar] [CrossRef]
Aljohani, A.A. Real-Time Driver Distraction Recognition: A Hybrid Genetic Deep Network Based Approach. Alex. Eng. J. 2023, 66, 377–389. [Google Scholar] [CrossRef]
Lin, Y.C.; Cao, D.X.; Fu, Z.H.; Huang, Y.M.; Song, Y.Y. A Lightweight Attention-Based Network towards Distracted Driving Behavior Recognition. Appl. Sci. 2022, 12, 4191. [Google Scholar] [CrossRef]
Kabir, M.F.; Roy, S. Real-Time Vehicular Accident Prevention System Using Deep Learning Architecture. Expert Syst. Appl. 2022, 206, 117837. [Google Scholar] [CrossRef]
Hossain, M.U.; Rahman, M.A.; Islam, M.M.; Akhter, A.; Uddin, M.A.; Paul, B.K. Automatic Driver Distraction Detection Using Deep Convolutional Neural Networks. Intell. Syst. Appl. 2022, 14, 200075. [Google Scholar] [CrossRef]
Lin, P.-W.; Hsu, C.-M. Innovative Framework for Distracted-Driving Alert System Based on Deep Learning. IEEE Access 2022, 10, 77523–77536. [Google Scholar] [CrossRef]
Xiao, W.; Liu, H.; Ma, Z.; Chen, W.; Sun, C.; Shi, B. Fatigue Driving Recognition Method Based on Multi-Scale Facial Landmark Detector. Electronics 2022, 11, 4103. [Google Scholar] [CrossRef]
Ezzouhri, A.; Charouh, Z.; Ghogho, M.; Guennoun, Z. Robust Deep Learning-Based Driver Distraction Detection and Classification. IEEE Access 2021, 9, 168080–168092. [Google Scholar] [CrossRef]
Xue, Q.; Gao, K.; Xing, Y.; Lu, J.; Qu, X. A Context-Aware Framework for Risky Driving Behavior Evaluation Based on Trajectory Data. IEEE Intell. Transp. Syst. Mag. 2023, 15, 70–83. [Google Scholar] [CrossRef]
Fan, Y.; Gu, F.; Wang, J.; Wang, J.; Lu, K.; Niu, J. SafeDriving: An Effective Abnormal Driving Behavior Detection System Based on EMG Signals. IEEE Internet Things J. 2022, 9, 12338–12350. [Google Scholar] [CrossRef]
Zhang, Y.; Chen, Y.; Gao, C. Deep Unsupervised Multi-Modal Fusion Network for Detecting Driver Distraction. Neurocomputing 2021, 421, 26–38. [Google Scholar] [CrossRef]
Boucetta, Z.; Fazziki, A.E.; Adnani, M.E. Integration of Ensemble Variant CNN with Architecture Modified LSTM for Distracted Driver Detection. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 440–458. [Google Scholar] [CrossRef]
Safarov, F.; Akhmedov, F.; Abdusalomov, A.B.; Nasimov, R.; Cho, Y.I. Real-Time Deep Learning-Based Drowsiness Detection: Leveraging Computer-Vision and Eye-Blink Analyses for Enhanced Road Safety. Sensors 2023, 23, 6459. [Google Scholar] [CrossRef]
Jardin, P.; Moisidis, I.; Kartal, K.; Rinderknecht, S. Adaptive Driving Style Classification through Transfer Learning with Synthetic Oversampling. Vehicles 2022, 4, 1314–1331. [Google Scholar] [CrossRef]
Wang, H.; Chen, J.; Huang, Z.; Li, B.; Lv, J.; Xi, J.; Wu, B.; Zhang, J.; Wu, Z. FPT: Fine-Grained Detection of Driver Distraction Based on the Feature Pyramid Vision Transformer. IEEE Trans. Intell. Transp. Syst. 2023, 24, 1594–1608. [Google Scholar] [CrossRef]
Ping, P.; Huang, C.; Ding, W.; Liu, Y.; Chiyomi, M.; Kazuya, T. Distracted Driving Detection Based on the Fusion of Deep Learning and Causal Reasoning. Inf. Fusion 2023, 89, 121–142. [Google Scholar] [CrossRef]
Jahangiri, A.; Berardi, V.J.; MacHiani, S.G. Application of Real Field Connected Vehicle Data for Aggressive Driving Identification on Horizontal Curves. IEEE Trans. Intell. Transp. Syst. 2018, 19, 2316–2324. [Google Scholar] [CrossRef]
Siddiqui, H.U.R.; Saleem, A.A.; Brown, R.; Bademci, B.; Lee, E.; Rustam, F.; Dudley, S. Non-Invasive Driver Drowsiness Detection System. Sensors 2021, 21, 4833. [Google Scholar] [CrossRef]
Zhang, Y.; Chen, Y.; Gu, X.; Sze, N.N.; Huang, J. A Proactive Crash Risk Prediction Framework for Lane-Changing Behavior Incorporating Individual Driving Styles. Accid. Anal. Prev. 2023, 188, 107072. [Google Scholar] [CrossRef]
Cai, B.; Di, Q. Different Forecasting Model Comparison for Near Future Crash Prediction. Appl. Sci. 2023, 13, 759. [Google Scholar] [CrossRef]
Singh, G.; Bansal, D.; Sofat, S. A Smartphone Based Technique to Monitor Driving Behavior Using DTW and Crowdsensing. Pervasive Mob. Comput. 2017, 40, 56–70. [Google Scholar] [CrossRef]

Figure 1. Systematic review process according to PRISMA [27].

Figure 2. Number of publications per year: (a) potentially eligible and (b) selected studies.

Figure 3. Studies by authors’ country of affiliation.

Figure 4. Articles by quality factor.

Figure 5. Articles by journal.

Table 1. Database search string.

Database

Search String

Scopus

TITLE-ABS-KEY ((“vehicle accident risk” OR “car accident risk” OR “car following” OR “driving behavior” OR “driving style” OR “driver behavior” OR “driving risk” OR “driver risk” OR “ road safety”) AND ((factors OR features OR causes) OR (predicti* OR forecast* OR progno*) OR (explainability OR explainable OR interpretabl* OR xai)) AND (“machine learning” OR “deep learning”))

WoS

Results for (“vehicle accident risk” OR “car accident risk” OR “car following” OR “driving behavior” OR “driving style” OR “driver behavior” OR “driving risk” OR “driver risk” OR “road safety”) AND ((factors OR aspects OR causes) OR (predicti* OR forecast*) OR (explainability OR explainable OR interpretable OR xai)) AND (“machine learning” OR “deep learning”) (Topic)

Table 2. Inclusion and exclusion criteria.

Inclusion Criteria	Exclusion Criteria
CI1: Studies that answer the research questions (factors, prediction models, or explainability) CI2: Primary type studies CI3: Studies that present metrics to evaluate the quality of predictive models CI4: Studies presented in English	CE1: Studies aimed at cost reduction CE2: Studies not related to vehicular transportation CE3: Studies that do not present test results CE4: Studies that are not of the “journal” type of article

Table 3. Potentially eligible studies and selected studies.

Source	Potentially Eligible Studies	Selected Studies
Scopus	1115	52
WoS	559	28
Total	1674	80 ^a

^a 26 studies removed from WoS for being duplicates in Scopus.

Table 4. Environmental factors used in DBVAR.

ID	Factor	Description	#	Studies
01	Weather	Atmospheric conditions affecting visibility and traction, increasing accident risk.	9	[14,15,16,20,29,30,31,32,33]
02	Date–time	Specific time and date of travel, influencing traffic congestion and driver fatigue.	8	[30,31,32,34,35,36,37,38]
03	Slope	Terrain inclination impacting vehicle speed and control.	5	[2,14,15,18,20]
04	Lane	Vehicle position on the road, influencing collision risk.	5	[20,34,39,40,41]
05	Road condition	Pavement quality and obstacles compromising safety.	4	[15,16,18,20]
06	Meteorological conditions	Atmospheric elements such as rain and snow, affecting visibility and vehicle adherence.	4	[15,20,29,31]
07	Light conditions	Level of available light impacting visibility and driver reaction time.	4	[20,30,31,32]
08	Road type	Road design affecting speed and maneuverability.	3	[12,20,42]
09	Road obstruction	Roadside obstacles posing hazards.	3	[16,41,43]
10	Curve type	Shape and degree of road curves that affect the driver’s driving.	3	[14,18,20]
11	Segment length	Distance between road reference points.	3	[18,40,42]
12	Curve radius	Measure of road curve curvature.	2	[2,40]
13	Road safety	Presence of safety measures on the road.	1	[18]
14	Number of lanes	Quantity of lanes available on the road.	1	[40]
15	Weekday	Day of the week of travel.	1	[30]
16	Road Measurements	Specific road data.	1	[40]
17	Crosswalk	Designated pedestrian crossing areas.	1	[31]
18	Population density	Number of individuals living in a specific area.	1	[17]
19	Employment density	Concentration of workplaces in a given area.	1	[17]
20	Land use	Utilization of land along the road.	1	[17]

Table 5. Traffic factors used in DBVAR.

ID	Factor	Description	#	Studies
01	Distance between two vehicles	Space separating two vehicles on the road, influencing collision likelihood.	13	[2,11,14,30,33,34,35,39,42,44,45,46,47]
02	Time to collision	Estimated time before a collision between vehicles, assuming current speeds and trajectories remain unchanged.	10	[11,14,29,33,34,35,42,46,48,49]
03	Traffic density	Volume of vehicles on the road, impacting accident frequency.	9	[2,14,17,18,20,30,31,34,35]
04	Overspeeding	Driving at a speed exceeding legal limits, elevating accident risk.	5	[14,15,20,31,40]
05	Speed difference between two vehicles	Variation in velocity between two vehicles, affecting collision potential.	4	[30,33,39,45]
06	Road signals	Signs indicating traffic regulations or hazards, influencing driver actions and accident likelihood.	3	[15,17,20]
07	Time headway	Time interval between vehicles, affecting collision risk.	3	[42,46,48]
08	Accident risk level	Degree of vulnerability to vehicular accidents, influenced by various factors.	3	[11,31,40]
09	Average speed	Mean velocity of vehicles, affecting accident probability.	2	[2,40]
10	Density by vehicle type	Distribution of vehicle types on the road, impacting accident dynamics.	2	[2,17]
11	Non-compliance with regulations	Failure to adhere to traffic laws, elevating accident risk.	2	[16,43]
12	Lateral distance with objects	Distance between vehicles and roadside objects, affecting collision probability.	1	[44]
13	Acceleration difference between two vehicles	Variation in acceleration rates between vehicles, influencing collision potential.	1	[39]
14	There is a surrounding vehicle	Presence of neighboring vehicles, affecting driving dynamics and accident risk.	1	[15]
15	Lack of laws	Absence or lax enforcement of traffic regulations, increasing accident likelihood.	1	[16]
16	traffic light status	State of traffic signals, influencing driver behavior and accident risk.	1	[50]
17	Vehicle in front with high beams	Leading vehicle using high beam headlights, impacting visibility and accident risk.	1	[16]

Table 6. Vehicle factors used in DBVAR.

ID	Factor	Description	#	Studies
01	Speed	The rate at which a vehicle is traveling, measured in distance per unit of time, directly impacting the vehicle’s ability to respond to hazards and increasing the severity of potential collisions.	27	[10,11,14,15,21,29,30,33,35,36,37,39,42,43,44,45,47,48,50,51,52,53,54,55,56,57,58]
02	Acceleration	The rate of change of velocity of a vehicle over time, either increasing or decreasing, crucial for determining the vehicle’s ability to adjust its speed and navigate safely through traffic, influencing collision potential.	23	[11,12,14,15,19,21,30,33,34,36,38,39,42,44,45,48,51,53,55,56,58,59,60]
03	Steering angle	Degree of wheel rotation, affecting vehicle trajectory.	9	[14,19,29,47,48,50,51,54,57]
04	Vehicle GPS position	Global position coordinates, crucial for navigation and accident location determination.	8	[11,12,19,31,32,35,37,41]
05	Heading angle	Direction of vehicle travel, crucial for navigation and collision avoidance.	7	[33,35,37,48,51,58,59]
06	Braking	Deceleration of the vehicle, critical for collision avoidance.	6	[10,15,40,43,53,58]
07	Pedals position	Position of accelerator and brake pedals, impacting vehicle speed control.	6	[29,48,51,52,54,57]
08	Lane number	Assigned lane on the road, influencing collision risk during lane changes.	5	[11,34,47,51,58]
09	Yaw angle	Angle of rotation around the vertical axis, affecting vehicle stability.	5	[12,29,35,44,55]
10	RPM	Engine speed, influencing vehicle acceleration and control.	5	[21,29,51,52,57]
11	Coolant temperature	Temperature of the engine coolant, impacting vehicle performance.	4	[21,29,36,52]
12	Vehicle type	Classification of the vehicle, influencing handling characteristics and collision dynamics.	3	[11,29,39]
13	Lane change	Change in lane position, increasing collision risk due to potential blind spots.	3	[33,51,53]
14	Vehicle length	Length of the vehicle, influencing maneuverability and collision severity.	3	[2,11,19]
15	Balancing angle	Angle of vehicle balance, affecting stability and risk of rollover.	3	[12,35,55]
16	Engine load	Demand placed on the engine, impacting vehicle performance and stability.	3	[21,52,54]
17	Fuel	Remaining combustible in the tank, critical for propulsion and impacting vehicle range.	3	[21,29,57]
18	Turn type	The specific maneuver a vehicle intends to execute, such as a left turn, right turn, or U-turn, crucial for anticipating traffic flow and collision avoidance	3	[10,16,53]
19	Jerk	Rate of change of acceleration, affecting passenger comfort and vehicle control.	2	[11,30]
20	Pitch angle	Angle of vehicle tilt, influencing stability and collision risk.	2	[12,55]
21	Brake temperature	Temperature of the braking system, affecting braking efficiency and collision avoidance.	2	[16,29]
22	Altitude	Elevation above sea level, influencing engine performance and vehicle handling.	2	[21,37]
23	Traveled distance	Distance covered by the vehicle, impacting fatigue and collision risk.	2	[29,36]
24	Brake failure	Malfunction of the braking system, increasing collision risk.	2	[16,19]
25	Directional	Vehicle direction of travel, crucial for collision avoidance and navigation.	2	[10,57]
26	Suspension height	Height of the vehicle suspension, affecting stability and collision risk.	2	[19,29]
27	Vehicle Width	Width of the vehicle, impacting maneuverability and collision risk.	1	[19]
28	Harsh accelerations	Abrupt changes in acceleration, impacting passenger comfort and vehicle control.	1	[40]
29	Clutch	Mechanism for engaging and disengaging the engine from the transmission, crucial for vehicle control.	1	[15]
30	Wheel angle	Angle of the vehicle wheels, influencing steering and collision risk.	1	[51]
31	G force in all three axes	Forces acting on the vehicle in three-dimensional space, impacting vehicle stability and control.	1	[29]
32	Oil	Lubricant for the vehicle engine, crucial for engine function and longevity.	1	[29]
33	Water pressure	Pressure in the vehicle cooling system, impacting engine temperature regulation.	1	[29]
34	Air pressure	Pressure in the vehicle tires, crucial for tire performance and vehicle stability.	1	[21]
35	Tires	Contact points between the vehicle and the road, crucial for traction and vehicle control.	1	[29]
36	Damaged rear-view mirror	Impaired visibility to the rear of the vehicle, increasing collision risk.	1	[16]
37	Overspeed alarm	Warning system for exceeding speed limits, crucial for collision avoidance.	1	[16]
38	Loaded with hazardous material	Transporting dangerous substances, increasing collision risk and potential for environmental damage.	1	[16]
39	Damaged windshield wiper	Impaired visibility in adverse weather conditions, increasing collision risk.	1	[16]
40	Gear	Transmission setting, impacting vehicle speed and acceleration.	1	[10]
41	Transmission	Mechanism for transferring engine power to the wheels, crucial for vehicle propulsion.	1	[19]
42	Reverse	Gear setting for backward vehicle movement, crucial for maneuvering and collision avoidance.	1	[10]
43	Horn	Audible warning device, crucial for communication and collision avoidance.	1	[10]
44	Vehicle exterior light	Illumination for visibility in low-light conditions, crucial for collision avoidance.	1	[16]

Table 7. Driver factors used in DBVAR.

ID	Factor	Description	#	Studies
01	Heart rate	Pulse rate indicating stress or fatigue levels affecting driving performance.	4	[29,37,51,59]
02	Eye	Eye movements and tracking, influencing attention and reaction times.	4	[50,51,61,62]
03	Head	Head position and movement, indicating focus and awareness.	3	[50,61,62]
04	Age	Driver’s age, impacting reflexes and driving abilities.	3	[14,16,42]
05	Distraction	Level of attentional diversion from driving tasks.	3	[14,16,63]
06	Electrocardiogram (ECG)	Heart activity measurement, indicating stress or health issues.	2	[18,64]
07	Electrodermal Activity	Skin conductance reflecting stress or arousal levels.	2	[51,59]
08	Breathing frequency	Rate of breathing, indicating stress or fatigue.	2	[32,59]
09	Gender	A driver attribute used to analyze differences in driving behavior and accident risk.	2	[14,42]
10	Driving experience	Duration of driving practice, affecting skill and accident risk.	2	[16,42]
11	Driver’s mood	Emotional state impacting focus and decision-making.	2	[16,34]
12	Electroencephalogram (EEG)	Brain activity measurement indicating alertness levels.	1	[56]
13	Temperature	Body temperature, affecting comfort and concentration.	1	[51]
14	Sleep	Amount of rest influencing alertness and reaction times.	1	[14]
15	Driver video	Visual monitoring of driver behavior and attention.	1	[18]
16	Educational background	Education level influencing knowledge and adherence to traffic rules.	1	[16]
17	Birthplace	Origin of driver, potentially affecting driving habits and risk perception.	1	[16]
18	Driver’s license type	Classification of the license, indicating permitted vehicle types and driver qualifications.	1	[16]
19	Extreme excitement	High arousal levels impacting decision-making and control.	1	[16]
20	Unaware of road conditions	Lack of awareness about current road status, increasing accident risk.	1	[16]
21	Perinasal perspiration	Sweat around the nose indicating stress or discomfort.	1	[59]
22	Face	Facial expressions reflecting emotions and attention levels.	1	[61]
23	Mouth	Mouth movements indicating speech or stress levels.	1	[62]
24	Reaction time	Speed of response to stimuli, crucial for accident avoidance.	1	[45]
25	Driving time	Duration of driving, impacting fatigue and alertness.	1	[15]

Table 8. Management factors used in DBVAR.

ID	Factor	Description	#	Studies
01	Driver evaluation	Assessment of a driver’s performance and skills, impacting safety and accident risk.	1	[16]
02	Overtime work	Extended work hours contributing to driver fatigue and increased accident risk.	1	[16]
03	Vehicle management	Oversight of vehicle operations, ensuring safety and reducing accident likelihood.	1	[16]
04	Safety training	Programs aimed at improving driver safety and reducing risky behaviors.	1	[16]
05	Drivers’ care	Measures ensuring driver well-being, impacting alertness and accident risk.	1	[16]
06	Workload	Amount of work assigned to drivers, affecting fatigue and focus.	1	[16]
07	Units monitoring	Surveillance of vehicles to ensure compliance with safety standards.	1	[16]
08	Distance to destination point	Remaining distance influencing driver fatigue and decision-making.	1	[17]
09	Density of warehousing facilities	The concentration of storage locations in an area, influencing traffic patterns and accident risks through truck and delivery vehicle flow, potentially increasing congestion and interactions with other traffic.	1	[17]

Table 9. Variables that describe the accident used in DBVAR prediction.

Studies	Factor
[31]	Severity, number of accidents
[32]	Accident type, accident causes

Table 10. Algorithms used in the DBVAR.

Studies	Algorithm ^a	Data Set	Study Area	Result
[54]	ANN: Backpropagation Levenberg–Marquardt	D.B.D.	Turkey	acc = 90.00%
[52]	ANN	Own	India	acc = 99.00%
[66]	SdsAE	D.B.D.	Turkey	acc = 98.33%
[67]	CNN:4D	MRL Eye	Czech Republic	acc = 97.53%
[6]	CNN: Inception v3	Own	Mexico	acc = 92.80%
[68]	CNN: EFFNet-CA	SF3D	USA	acc = 99.58%
[69]	CNN	SF3D	USA	acc = 95.00%
[70]	CNN: HCF	SF3D	USA	acc = 96.74%
[58]	CNN	Own	Japan	acc = 83.00%
[71]	CNN: BiFPN	DMD	-	acc = 95.60%
[64]	CNN: DCNN	Own	USA	acc = 95.51%
[72]	CNN: DenseNet + GA	SF3D	USA	acc = 99.80%
[57]	CNN: GoogleNet	Own	-	acc = 89.00%
[73]	CNN: LWANet (VGG16)	SF3D	USA	acc = 99.37%
[74]	CNN: MobileNet	COCO	USA	acc = 90.00%
[75]	CNN: MobileNetV2	SF3D	USA	acc = 99.68%
[76]	CNN: MobileNetv3	3D KITTI	Germany	acc = 99.95%
[77]	CNN: MSFLD	HNUFDD	China	acc = 99.13%
[78]	CNN: VGG-19	AUCD2	Greece	acc = 95.77%
[61]	CNN + LSTM	Own	-	acc = 94.00%
[60]	CNN-GRU + XGBoost	SHRP 2	USA	acc = 97.50%
[15]	DMNM	NavInfo	-	acc = 99.00%
[47]	GB	BEBO	The Netherlands	acc = 81.00%
[10]	GB	Own	Pakistan	acc = 97.00%
[34]	GB	UAH- DriveSet	Spain	acc = 67.00%
[16]	GB: GBDT	Own	China	acc = 80.00%
[79]	GB: LightGBM	HighD	Germany	acc = 97.58%
[80]	GRU	Own	China	acc = 93.94%
[11]	GB: XGboost	NGSIM	USA	acc = 89.00%
[2]	GB: XGboost	Own	China	acc = 96.66%
[59]	GB: XGboost	SIM 1	USA	acc = 89.24%
[38]	GB:XGBoost	D.B.D.	Turkey	acc = 100.00%
[33]	GMM: HC + FA	SH-NDS	China	acc = 87.00%
[50]	AIO-HMM	RoadLab	Canada	acc = 86.40%
[39]	LSTM	HighD	Germany	acc = 97.00%
[19]	LSTM	Own	-	acc = 93.50%
[44]	LSTM:Ensemble Classifier	Own	China	acc = 90.50%
[14]	LSTM + HMM	SH-NDS	China	acc = 84.00%
[32]	LSTM: BCDU-Net	Own	China	acc = 98.48%
[81]	ConvLSTM: UMMFN	Own	China	acc = 97.79%
[63]	ResNet + HRNN + Inception	SF3D	USA	acc = 96.23%
[82]	EV-CNN + LSTM	SF3D	USA	acc = 93.68%
[35]	LSTM: Stacked-LSTM	UAH-DriveSet	Spain	acc = 99.47%
[55]	LSTM: Stacked-LSTM	UAH-DriveSet	Spain	acc = 94.00%
[45]	LSTM-NN	SHRP 2	USA	acc = 88.00%
[83]	MediaPipe face mesh	Own	-	acc = 95.80%
[84]	MLP	Own	Germany	acc = 87.00%
[56]	MLP	Own	China	acc = 88.00%
[42]	MLP	Own	China	acc = 69.60%
[30]	MLP	SH-NDS	China	acc = 89.20%
[85]	MLP + CNN + Tranformer	SF3D	USA	acc = 99.91%
[86]	ResNet: TSD-DLN	AUCD2	Greece	acc = 89.50%
[62]	RF	NTHUDDD	Taiwan	acc = 99.00%
[40]	RF	Own	Greece	acc = 89.30%
[51]	RF	Own	Canada	acc = 91.78%
[46]	RF	Own	China	acc = 93.00%
[37]	RF	Own	India	acc = 98.00%
[43]	RF	SHRP 2	USA	acc = 90.00%
[21]	RF	Traffic, Driving Style and Road Surface Condition	Italy	acc = 95.00%
[12]	RF	UAH- DriveSet	Spain	acc = 91.60%
[87]	RF	SPMD	USA	acc = 92.77%
[31]	RF	UK Car Accident 2015	United Kingdom	acc = 99.00%
[65]	Sequential Covering	D.B.D.	Turkey	acc = 96.25%
[88]	SVM	Own	Pakistan	acc = 87.00%
[41]	DDPG	SPMD	USA	RMSE = 0.4254
[18]	GB: LightGBM	Own	China	RMSE = 0.004
[89]	GB: LightGBM	HighD	Germany	RMSE = 0.0447
[20]	GB: XGboost	Own	Germany	RMSE = 0.0463
[17]	GB: XGboost	SWITRS	USA	RMSE = 4.058
[36]	LSTM	Own	Taiwan	RMSE = 0.733
[90]	NB	Freeway-USA	USA	RMSD = 0.7
[29]	MCS: BL + KNN + SVM + MLP	Own	Morocco	F1 = 93.56%
[91]	DTW	Own	India	dr = 100
[53]	NMF	Own	South Korea	drs = 72.9
[49]	K-Means	NGSIM	USA	TTCi = 3.1602
[48]	sHDP-HMM/NPYLM-K-Means	NUDrive corpus	USA	ROC = 0.953

Accuracy = acc; root mean squared deviation = RMSD; F1 score = F1; root mean square error = RMSE; detection rate = dr; mean absolute percentage error = MAPE; time to collision = TTC; driving risk score = drs; area under the curve = AUC; receiver operating characteristic = ROC. ^a Acronyms used for the algorithms used in DBVAR; multi-classifier system (MCS), Bayesian learning (BL), multi-layer perceptron (MLP), EfficientNet with channel attention (EFFNet-CA), stacked denoising sparse AutoEncoders (SdsAE), hybrid CNN framework (HCF), bi-directional feature pyramid network (BiFPN), deep CNN (DCNN), genetic algorithm (GA), lightweight attention-based network (LWANet), multi-scale facial landmark detector (MSFLD), gated recurrent unit (GRU), deep deterministic policy gradient (DDPG), deep multichannel network model (DMNM), dynamic time warping (DTW), gradient boosting (GB), gradient boosting decision trees (GBDT), Gaussian mixture model (GMM), hierarchical clustering (HC), factor-analysis (FA), hidden Markov model (HMM), auto-regressive input output HMM (AIO-HMM), convolutional LSTM (ConvLSTM), bi-directional ConvLSTM U-Net (BCDU-Net), unsupervised multi-modal fusion network (UMMFN), hierarchical recurrent neural network (HRNN), ensemble variant CNN (EV-CNN), non-negative matrix factorization (NMF), temporal–spatial double-line DL network (TSD-DLN), negative binomial (NB), and sticky hierarchical Dirichlet process hidden Markov model (sHDP-HMM).

Table 11. Studies applied in heavy vehicles and rural roads.

Vehicle/Road	Studies
Heavy vehicles	[2,16,17,53]
Type of rural road	[2,40]

Table 12. Studies applied by type of driving risk.

DB	#	Studies
Lane change	4	[39,41,48,89]
Distraction	22	[6,45,51,57,58,61,63,64,68,70,71,72,73,74,75,76,77,80,81,82,85,86]
Driving style	45	[2,10,11,12,14,15,16,17,19,20,21,29,30,31,32,33,34,35,36,37,38,40,42,43,44,46,47,49,50,52,53,54,55,56,59,60,65,66,69,79,80,84,87,90,91]
Stress	1	[18]
Drowsiness	4	[62,67,83,88]

Table 13. Methods used in the explainability of the DBVAR.

Studies	Method	Explanation	#Factors	Country
[15]	SHAP	DB	20	China
[89]		Lane change	6	Germany
[18]		Driving stress	22	China
[40]		Accident risk	10	Greece
[17]		Injuries in accident	16	USA
[20]		Driving risk	26	Germany
[90]	RF features importance	Driving risk	6	USA
[21]		Driving style	7	Italy
[31]		Accident risk	16	United Kingdom
[37]		DB	5	India
[16]		DB	15	China
[87]		Aggressive/risk DB on horizontal curves	23	USA
[39]	GB features importance	Lane change	7	Germany
[47]		Driving under the influence of different substances	36	The Netherlands
[11]		DB	3	USA
[10]	Laplacian punctuation	DB	14	Pakistan
[51]	ExtraTrees	Driver distraction	19	Canada
[14]	Average attention weight	Aggressive DB	8	China

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Factors, Prediction, and Explainability of Vehicle Accident Risk Due to Driving Behavior through Machine Learning: A Systematic Literature Review, 2013–2023

Abstract

1. Introduction

2. Methodology

2.1. Planning

2.2. Development

2.3. Results

3. Results

4. Discussion

4.1. About Factors

4.2. About Prediction

4.3. About Explainability

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics