Evaluation of Tree-Based Machine Learning Algorithms for Accident Risk Mapping Caused by Driver Lack of Alertness at a National Scale

: Drivers’ lack of alertness is one of the main reasons for fatal road trafﬁc accidents (RTA) in Iran. Accident-risk mapping with machine learning algorithms in the geographic information system (GIS) platform is a suitable approach for investigating the occurrence risk of these accidents by analyzing the role of effective factors. This approach helps to identify the high-risk areas even in unnoticed and remote places and prioritizes accident-prone locations. This paper aimed to evaluate tuned machine learning algorithms of bagged decision trees (BDTs), extra trees (ETs), and random forest (RF) in accident-risk mapping caused by drivers’ lack of alertness (due to drowsiness, fatigue, and reduced attention) at a national scale of Iran roads. Accident points and eight effective criteria, namely distance to the city, distance to the gas station, land use/cover, road structure, road type, time of day, trafﬁc direction, and slope, were applied in modeling, using GIS. The time factor was utilized to represent drivers’ varied alertness levels. The accident dataset included 4399 RTA records from March 2017 to March 2019. The performance of all models was cross-validated with ﬁve-folds and tree metrics of mean absolute error, mean squared error, and area under the curve of the receiver operating characteristic (ROC-AUC). The results of cross-validation showed that BDT and RF performance with an AUC of 0.846 were slightly more accurate than ET with an AUC of 0.827. The importance of modeling features was assessed by using the Gini index, and the results revealed that the road type, distance to the city, distance to the gas station, slope, and time of day were the most important, while land use/cover, trafﬁc direction, and road structure were the least important. The proposed approach can be improved by applying the trafﬁc volume in modeling and helps decision-makers take necessary actions by identifying important factors on road safety.


Introduction
According to the World Health Organization (WHO) report in 2018 on road safety, Iran has a higher rate of road traffic fatalities per person than the global average [1]. Iran is one of the countries in West Asia with an almost unsafe road situation. Road traffic injuries have always been one of the leading causes of Iranians' deaths [2]. According to the latest statistics of the Iran road maintenance and transportation organization (IRMTO) in 2020, from March 2019 to March 2020, 159,735 road traffic accidents (RTA) occurred in Iran, killing 16,947 and injuring 347,307 people [3]. The high rate of RTA in Iran, along with significant economic damage and loss of life [4,5], necessitates research into the problem.
Because RTA causes significant financial damage to countries and leads to many people's death every year [6], its modeling has always been a hot topic. In general, the available traffic accident models are in three categories of severity modeling [7][8][9], risk modeling [10,11], and frequency modeling [12,13]. These models aids in the prediction of road safety conditions by explaining the influence of several elements on accident occurrence [10]. Since many spatial and non-spatial factors influence traffic accidents [10,14,15], RTA modeling requires methods to extract knowledge from multi-dimensional data. Machine learning provides suitable methods for this purpose.
Machine learning algorithms are computational methods that learn from input data to achieve impressive results [16]. These algorithms contain two parameters: model parameters that will be tuned in the learning process and hyper parameters that should be tuned before modeling to improve the learning process to the highest level [17]. Machine learning algorithms have an excellent ability to analyze complex relationships in data [18] and gained popularity in various fields, due to their good accuracy, robustness, efficiency, simplicity, and computation speed [19]. Accident analysis is also a field in which machine learning has been successful [20]. Machine learning methods are practical in predicting accident occurrence and identify nonlinear relationships between affective factors and RTA risk better than traditional statistical algorithms [21,22]. These methods can work with high degrees of freedom, without needing traditional hypotheses, and are more flexible to outliers [23]. A short review of the most common machine learning applications for accident analysis is given in Table 1. Silva et al. [24] presented more details about applied machine learning to road-safety modeling. Table 1. Summary of machine learning applications in accident analysis.

Paper Machine Learning Application
Farhangi et al. [10] Accident-risk modeling and mapping Lee et al. [25] Accident severity prediction Mestri et al. [26] Identification of accident-prone locations Al-dogom et al. [21] Spatio-temporal analysis for accidents prediction Fan et al. [27] Identification of accident black spots and analyzing their characteristics Rovšek et al. [28] Identifying the critical risk factors of accident injury severity Taamneh et al. [29] Accident modeling and prediction Kumar and Toshniwal [30] Characterizing road accident locations Tao et al. [31] Creating a diagnostic model between driving violation behaviors and accident morphologies Zheng et al. [32] Accident frequency modeling Wang et al. [33] Driving risk assessment using near-crash database Beshah et al. [34] Pattern recognition and knowledge discovery from accident data Das and Abdel-Aty [35] Combined frequency-severity accident analysis Chang and Chen [36] Establishing the empirical relationships between accidents and road geometric Comparing the machine learning algorithms in modeling, we see that ensemble methods usually perform more precisely than a single algorithm. The main idea of ensemble learning is to weigh multiple single estimators and merge them to enhance predictive performance [37]. Ensemble learning algorithms, such as bagged decision trees (BDT), extra trees (ET), and random forest (RF), are a series of decision trees. The benefits of decision tree and ensemble learning make BDT, ET, and RF algorithms easy to understand and precise [38]. However, each of these ensemble algorithms has a different structure that improves its performance; for example, BDT helps reduce variance and error in big data, RF contains a large number of independent trees [39], and ET provides fast accurate predictions [40].
One of the primary steps in road safety modeling is the identification, selection, and preparation of influential factors. For this purpose, the geographic information system (GIS) is a suitable platform. This system provides the most demanding tools required to analyze RTA and road design that can be noteworthy in achieving road safety [41], manages different types of databases [42], includes data analysis methods [43], provides a suitable platform for big data management [10], and has been widely used as the base platform of many road safety research so far [44]. Generally, application of GIS in road safety analysis includes spatial modeling of accident risk [45,46], spatial and spatiotemporal analyzing of accidents [47,48], extraction of accident hotspots [44,49], preparing accidentrisk map [10,50], identifying spatiotemporal patterns of accidents [51], spatiotemporal clustering of road accidents [52], and exploring the relationships between affective factors and accident rates [53,54]. Researchers often combine GIS with other analysis methods. Machine learning is one of these methods utilized in road-safety assessments in various GISbased research [21], due to its popularity as a robust and data-driven family of prediction tools [23].
The combination of machine learning and GIS provides a suitable platform for road safety analysis. Table 2 presents the recent literature on combined machine learning and GIS in the road-safety analysis. In the field of accident modeling with GIS and machine learning, it should be noted that influential factors on accidents might influence the occurrence of each accident type differently, and it is essential to have models of accidents with a specific cause [10]. In general, the causes of accident occurrence fall into several categories, including driver, environment, road, and vehicle [10]. Drivers play an essential role in RTA occurrence in Iran [55]. An epidemiological analysis from 1996 to 2014 revealed that influential factors on driver alertness, including lack of attention, drowsiness, and fatigue, had been the most common risk factors associated with RTA in Iran [56]. This increases the necessity of investigating accidents caused by driver lack of alertness and influential factors. Table 2. Review of the recent literature on combined machine learning and GIS in the road-safety analysis.

Paper Aim Summary Study Area Hyper Parameters Tuning
Afolabi et al. [57] Proactively predicting traffic accident Ensemble machine learning algorithms of lightGBM, catboost, and lightGBM + catboost were used to predict the occurrence of accidents accurately at a given segment for every hour ranging. Data processing and visualization were performed with GIS.

Cape Town, South Africa No
Al-Aamri et al. [58] Mapping road traffic crash hotspots The network-based analysis and KDE identified traffic crash hotspots in GIS. Random forest was used to classify the crash hot and cold zones and evaluate the role of effective factors.

Muscat Governorate, Oman No
Roland et al. [59] Modeling and predicting the vehicle accident occurrence The multi-layer perceptron model used different spatial attributes to inform local law enforcement officers of high likelihood accident hotspots for any given day. Manipulating spatial information into desired formats was performed with GIS.

Chattanooga City, Tennessee No
Farhangi et al. [10] Drowsy accidents risk modeling and mapping Drowsy accidents occurrence risk was modeled with RF, SVM, and decision tree. The preparation and preprocessing of spatial factors and accident-risk mapping were performed in GIS.

Qazvin Province, Iran No
Liu [60] Classification of the accident severity using large-scale data Traffic accident severity was classified with SGD linear, K-nearest neighbors, decision tree, RF, and XGBoost algorithms based on various influential factors in large-scale data. Data processing was performed with GIS.
California State, United States No Adopting machine learning and spatial analysis for driver risk assessment Driving violation hotspots along two expressways developed in GIS and K-nearest neighbors, SVM, and CN2 rule inducer algorithms assessed risk based on the characteristics of hotspots well.

Luzhou City, China No
Zhu et al. [62] Identification of potential traffic accident hotspots on accident data First, spatial analysis in GIS was used to identify traffic accident hotspots. Then, logistic regression and RF algorithms identified influencing factors on the creation of the hot spots.

Beijing city, China No
Drivers' lack of alertness is due to pre-driving situations or the impacts of different factors while driving [10]. Road type [63], characteristics of road geometry, driving environment [64], time of day [65], and driving duration [66] are some of the most important factors that influence driver performance by affecting driver alertness level while driving. Understanding the influential factors on driver alertness makes it is possible to extract valuable patterns between them and accident occurrence with machine learning.
Although various techniques can detect drivers' alertness to prevent accident occurrence through monitoring driver psychological signals, driver behavior, and vehicle-based parameters [67][68][69], they are not common yet. Hence, accident-risk mapping with machine learning algorithms in the GIS environment is an excellent approach for achieving road safety. This approach identifies high-risk areas even in unnoticed and remote places and understands the role of influential factors on accidents. Identifying high-risk areas can be helpful in planning for placing road emergency services, roadside rest areas, and warning signs, and understanding the role of influential factors helps to predict the safety situation in other areas. For more explanation, it should be said that the location of emergency services is vital since driver lack of alertness increases the risk of fatal and injury accidents [70], building roadside rest areas helps to decrease the number of accidents caused by driver lack of alertness [71], and the existence of warning signs may increase the driver alertness [72]. Accident-risk mapping with machine learning algorithms in the GIS environment can give better results by choosing a large study area (it makes results more generalize [73]), hyper parameters' tuning (it enhances the learning process of machine learning algorithms), and focusing on RTA with a specific cause, but previous works often did not consider all of these.
The present study aimed to evaluate tuned ensemble machine learning algorithms of BDT, ET, and RF in accident-risk mapping caused by driver lack of alertness (due to drowsiness, fatigue, and reduced attention) in the GIS platform. The modeling was performed at a national scale of Iran roads to cover a large scale of factors, and the time factor was utilized to represent drivers' varied alertness levels. Accident-risk maps and the kernel density estimation (KDE) were used to prioritize accident-prone locations in the study area. The effectiveness of models' hyper parameters tuning were evaluated, and models' performance was cross-validated and compared.

Methodology
For spatial prediction of accident risk, this study was conducted in four steps. First, a spatial dataset of accident points in the study area was created. Second, eight effective criteria, including distance to the city, distance to the gas station, land use/cover, road structure, road type, time of day, traffic direction, and slope, were selected for modeling. These eight criteria values/weights were normalized in the range of [0, 1] before modeling. Third, the K-fold method was used to train and validate three machine learning algorithms of BDT, ET, and RF with 5-folds. In each iteration, the mean of mean absolute error (MAE), mean squared error (MSE), and area under the curve of the receiver operating characteristic (ROC-AUC) metrics were calculated. The tuning process of these three algorithms was performed with a random search method. Fourth, predicted accident risk in different time classes was mapped in GIS. Then, using the accident-risk maps accident-prone locations were prioritized with KDE. The steps for conducting research are summarized in Figure 1.

Study Area
Iran, with an approximate area of 1,648,195 km 2 , is located between east longitude of 44 • 2 50 to 63 • 19 2 and north latitude of 25 • 3 31 to 39 • 46 37 ( Figure 2). This country with a geopolitically strategic location is a regional middle power and, its gross domestic product rank is among the top 30 countries in the world. Iran's climate is changeable, and its topography is almost mountainous, despite two central salt deserts (the Dasht-e Lut and the Dasht-e Kavir) in the middle. Iran's population is estimated to be over 80 million people, according to a 2016 report by the Statistical Centre of Iran [74]. Out of 291,014 km of roads under the supervision of IRMTO, about 71% are rural roads, 13% are secondary roads, 9% are primary roads, 6% are highways, and 1% are freeways [3]. Moreover, mentioned roads contain 379 tunnels with an approximate length of 199 km and 355,306 bridges.

Accident Dataset
The accidents dataset was prepared by the IRMTO and included 4399 RTA records from March 2017 to March 2019. The cause of mentioned accidents was driver lack of alertness, and they resulted in 4889 damaged vehicles, 5158 injuries, and 797 deaths. To train and validate algorithms, a balanced dataset on the occurrence and non-occurrence of accident points was needed. Therefore, 4399 road points where the accident did not occur were randomly selected as accident-free road points. These accident-free road points and RTA records formed 8798 data records to train and cross-validate machine learning algorithms. In each iteration of cross-validation with 5-folds, 1760 of these data records were used. Figure 3 shows the distribution of these 8798 data records at different times of day on Iran roads. In Figure 3, the number of accident points in map A, map B, map C, and map D is 203, 1684, 1490, and 1022, respectively, which formed 4399 RTA records, and the number of free-accident road points in map E is 4399. Classification map of the number of accidents per Iran province and distribution map of accident points for top provinces with the most accident points are shown in Figure 4.

Effective Criteria Dataset
Identification of effective criteria is a primary step in modeling. According to the reviewed literature, related factors to road characteristics, road geometry, driving environment, time of day, and driving duration can be considered effective on diver alertness [63][64][65][66]. Experts chose eight effective criteria for spatial modeling: distance to the city, distance to the gas station, land use/cover, road structure, road type, time of day, traffic direction, and slope. ArcGIS 10.3 software was used to estimate spatial criteria ( Figure 5). OpenStreetMap (OSM) data in 2019 prepared the research road layer with its attributes. This road layer and distance analysis were used to compute distance to the city and distance to the gas station, and road layer attributes defined road structure, road type, and traffic direction. Detailed effective criteria are given in Table 3. 1.
Distance to the city: In Iran, 60% of RTA occurs within 30 km of cities. Several types of traffic flows occur at the city entrance/exit areas, which result in the different performance of drivers. This causes heterogeneous vehicular traffic and, consequently, accidents [75]. Since distance to the city correlates with high accident risk at the city entrance/exit areas and driving duration, it was selected for modeling.

2.
Distance to the gas station: Refueling the vehicle prevents a long period of driving. Besides, most of the roadside gas stations in Iran have facilities such as supermarket, coffee, parking, etc., and many drivers rest at these places. This prevents the driver from losing alertness and reduces the risk of accidents. To apply this criterion, the distance to the gas station was selected as the modeling criterion. 3.
Land use/cover: The surrounding environment of the road influence the level of driver alertness. Each land-use/cover type has a different visual diversity which can make the road environment monotonous or absorbing. Thus, land use/cover was selected as an effective criterion for spatial modeling. Using Landsat 8 satellite images in ENVI 5.1 software and the maximum likelihood classification method, the land-use/cover map was prepared. According to experts, all land-use/cover types were weighted (Table 3). With these weights, a map of this criterion with a pixel size of 30 × 30 m was prepared. 4.
Road structure: Generally, three structure types of bridge, tunnel, and normal road were in the study area. According to Iran's Highway Geometric Design Code (No. 415), each of the road constructions mentioned above has its own set of requirements, such as speed limits, lighting conditions, curvature, and slope limitations, all of which affect driving conditions [76]. Since a change in the driving situation can influence the driver alertness, road structure type was chosen for spatial modeling. According to experts, bridges labeled 1, normal roads labeled 0.5, and tunnels were labeled 0 in modeling.

5.
Road type: There are specific standards for the construction of any road type. These standards define limitations for geometric road characteristics such as slope, curvature, speed limit, and road width [76]. As the geometry and characteristics of the road affect the level of driver alertness, the road type was chosen as an effective criterion in modeling. To apply the effect of this criterion in modeling, experts assigned a weight to each road type, as listed in Table 3. 6.
Time of day: Time of day is an associated factor with driver alertness [65]. Accident data records were divided into four classes (Figure 3) based on a typical circadian rhythm with peak alertness at around 8:00 p.m. and 10 a.m. [77], and all classes were weighted ( Table 3). The detailed classes and the number of accidents per class are listed in Table 4. 7.
Traffic direction: Traffic flow affects the level of driver alertness [78]. Drivers are used to getting visual information about traffic situations, and this behavior is associated with their alertness level [63]. In general, there are two types of traffic direction (one-way and two-way) that make different conditions both for traffic flow and visual traffic situations. In modeling, one-way roads were labeled 1, and two-way roads were labeled 0. 8.
Slope: Road geometry is significantly affected by the terrain slope. Increasing the slope makes the road geometry more varied [10]. Moreover, changing the slope make drivers careful to control their speed [45]. These led to the choice of slope as an effective criterion in modeling. The global multi-resolution terrain elevation data (GMTED 2010), with a resolution of 225 m, were used to obtain slope values.   One of the preprocessing approaches in machine learning where the data are scaled or transformed to make modeling features contribute equally is normalization [79]. Before applying predictive models, using Equation (1), we normalized all features in the range of [0, 1].
In Equation (1), X min and X Max are the minimum and maximum values of the feature, respectively.

Bagged Decision Trees Algorithm
Bagging is an ensemble machine learning algorithm that helps to enhance the unstable models' performance when data are high-dimensional [80]. In ensemble learning, a group of estimators is used; that is, each estimator creates its data model for prediction, and at the end, samples are predicted by voting or averaging between models' predictions [81]. Bagging can use different estimators as the base predictive model, but the decision tree is often chosen. Each algorithm is trained with a random subset of samples in this ensemble algorithm [82].

Extra Trees Algorithm
The decision tree is a practical machine learning algorithm [83]. This method is popular due to its high learning speed, lack of domain knowledge, interaction with multidimensional datasets, and construction of understandable models [10]. A decision tree structure is top-down, with a root, nodes, branches, and leaves [84]. The extra tree and classic decision tree are different in the way they are built. In the extra tree algorithm, to obtain the best divisions for separating the samples of a node into two groups, random splits are drawn, and the best division among them is chosen [40].

Random Forest Algorithm
RF is a supervised machine learning algorithm that makes its data model with ensemble learning technique. The base estimators in this ensemble algorithm are decision trees trained with randomly selected samples and sample features [85]. This learning technique has performed well in large-scale problems or where the number of variables is more than observations [86]. In RF, all trees contribute to the result, and samples are predicted by averaging or voting between trees' predictions [10].

K-Fold Cross-Validation
The K-fold cross-validation method helps to understand how the prediction of a model will generalize to independent data. This method clarifies the effectiveness of a predictive model in practice [87]. K-fold subdivides the data into K subsets. One of the data subsets is utilized for validation in each iteration, while the other K-1 data subsets are used for training. This approach is made for K times; therefore, all data are used for training and validation. In the end, the average of all K validation times shows the final estimate [88].

Validation Metrics
The performance of three machine learning algorithms was assessed using MAE, MSE, and ROC-AUC metrics. MAE is the mean absolute error between the actual values and the estimated values. This metric is not sensitive to significant errors [10]. MAE is a suitable metric for the evaluation of average model performance. MSE equals to mean squares of errors between the actual values and the estimated values. This metric punishes outliers and does not apply mistakes with the same weight [89]. MAE and MSE are defined with Equations (2) and (3), respectively.
In Equations (2) and (3), y i is the predicted value,ŷ i is the actual sample value, and n is the number of samples.
The y-axis of the ROC curve represents sensitivity, whereas the x-axis represents specificity. The contrast between the sensitivity and specificity in several cutting points evaluates the model performance [10]. The sensitivity and specificity are probabilistic metrics in the range of [0, 1], computed using Equations (4) and (5), respectively. The probability of a correct prediction of positive and negative samples is measured by sensitivity, and specificity, respectively [90]. The area under the ROC curve is called AUC and detects the probability of a correct prediction of a random sample. AUC is in the range of [0, 1], and the higher AUC shows the better model performance [91].
The number of correctly predicted positive samples is TP, the number of incorrectly predicted positive samples in FP, the number of correctly predicted negative samples is TN, and the number of incorrectly predicted negative samples in FN [92].

Results
With the Python scikit-learn library in Anaconda software, we created BDT, ET, and RF algorithms. In this stage, according to experts, the most important hyper parameters for each algorithm were selected to be tuned by using the random search method [93] with 100 iterations. Table 5 shows the hyper parameters used for the optimization of BDT, ET, and RF algorithms. In this table, the base estimator is the decision tree, max_samples is the ratio of samples needed for training each tree. Moreover, max_features for BDT is the ratio of features needed for training each tree, and for ET and RF, it is the ratio of features to consider when looking for the best split. Furthermore, min_samples_split is the minimum number of samples required to split an internal node, min_samples_leaf is the minimum number of samples needed to be at a leaf node, and max_depth is the maximum depth of trees. RF used the most estimators. The minimum and the mean tree depth of this ensemble algorithm were 23 and 28.93, respectively. The number of estimators of the ET algorithm was four less than RF, and its trees often had more depth than the other two algorithms. The minimum and the mean tree depth of the ET algorithm were 27 and 31.08, respectively. BDT was made of the least estimators, and its trees usually had less depth. The maximum, minimum, and mean depth of the BDT trees were 45, 19, and 26.19, respectively. Spatial prediction of accident risk caused by driver lack of alertness was performed with BDT, ET, and RF algorithms in the range of [0,1]. Very low risk (0-0.2), low risk (0.2-0.4), medium risk (0.4-0.6), high risk (0.6-0.8), and very high risk (0.8-1) were used to classify these predictions into five classes with equal intervals. Figures 6-8 present the mapped accident risk by BDT, ET, and RF, respectively. Four different risk maps are given in these figures; each relates to a circadian alertness condition (Table 4). Map A, map B, map C, and map D indicate when circadian alertness is at its peak, reduced, slightly impaired, and dangerously low, respectively. In all prepared maps, the mean of estimated accident risk increases as circadian alertness goes from peak level to slightly impaired. However, the mean of calculated accident risk when circadian alertness is dangerously low is unacceptable. By merging the prepared risk maps, areas that had the highest accident risk at all times of day were identified. The Khalij Fars, Zanjan-Gahzvin, Tabriz-Zanjan, Amirkabir, Saveh-Hamedan, Tehran-Saveh, and Karaj-Ghazvin freeways had the highest accident risk in all estimated risk maps, respectively.

Identification and Prioritizing of Accident-Prone Locations
By applying the KDE on accident-risk maps, the most significant accident-prone locations were identified. Accordingly, we first merged the accident-risk maps to obtain the mean accident risk at each road point. Then we applied the KDE and classified the results into two classes with the natural junks method. Overall, 237 points were identified in the study area, as shown in Figure 9. These points must be given priority in taking the necessary steps.

Correlation between Accident Risks at Different Times of Day
To understand how estimated accident risk changed at different times of day, the Pearson coefficient was calculated. Figure 10 shows the correlation between accident risk in four risk maps of BDT, ET, and RF models. High correlation values indicate that also time factor changes the accident risk, but it does not change the accident occurrence pattern significantly. Map D had the lowest correlations with other maps. Completely different traffic and light conditions at 05:00-06:00 compared to other times of day might cause this.

Spatial Features' Importance
The Gini method [94] evaluated the importance of features in the spatial risk models (Figure 11). The features' importance in these tree models was similar. Distance to the gas station (0.214), distance to the city (0.207), road type (0.200), and slope (0.199) had significantly higher scores than the average score (0.125); time (0.9) was close to the average score, and traffic direction (0.053), land use/cover (0.032), and road structure (0.005) had remarkably lowest scores.

Models Validation
Cross-validation of the study was performed with MAE, MSE, and ROC-AUC metrics. Figure 12 shows MAE and MSE values of train and test data. Although RF almost had the lowest errors, no large difference was observed in the errors of test data between all three risk models.
Using the 1760 data records and five-folds, we drew ROC graphs for cross-validation of accident-risk models (Figure 13). For plotting the ROC graphs, accident points were labeled positive (1), and other accident-free road points were labeled negative (0). The ratio of positive samples to negative samples in this cross-validation was 0.5125, 0.4790, 0.4994, 0.5011, and 0.5023, respectively. The detailed results of all ROC graphs are listed in Table 6. The greatest AUC for BDT was 0.850, with a standard error of 0.00906 and a 95% confidence range of 0.826 to 0.861, according to Table 6. The highest observed AUC for ET was 0.846, with a standard error of 0.00903 and a 95% confidence interval of 0.828 to 0.862. Correspondingly, 0.851, 0.00886, and 0.834 to 0.868 were similar values for RF. On average, RF, BDT, and ET with slight differences were the most accurate models, respectively. SD values were also calculated to understand models' accuracy to train and test data distribution. No significant dependency was observed for all models.

Evaluation of Hyper Parameters Tuning
The accuracy of BDT, ET, and RF algorithms was validated with ROC-AUC before and after hyper parameters tuning to understand the effectiveness of model optimization with the random search method. Table 7 indicates the results of ROC-AUC for ensemble algorithms before hyper parameter tuning. It is clear that hyper parameters tuning improved the models' accuracy, but improvement is not significant. It can be concluded that random search is a limited method for tuning hyper parameters, and more effective strategies should be used for this purpose.

Discussion
In this study, eight effective criteria were used for spatial modeling of accident risk caused by driver lack of alertness. Three ensemble-tree-based machine learning algorithms with different structures, including BDT, ET, and RF trained with actual accident points. They estimated the accident risk on Iran roads when typical circadian alertness is at peak level, reduced, slightly impaired, and dangerously low.
Based on the results derived from the ROC curves, RF, BDT, and ET algorithms with slight differences had the most accurate predictions, respectively. No significant difference between AUC values in cross-validation with five-folds for all risk models was observed. Calculated values of SD for AUC showed that created risk models had no significant dependency to train and test data distribution. The reason for the above can be found in the way how trees of each algorithm are built. Despite the RF that uses a subset of sample features for splitting a tree node and dividing samples into two groups, in the BDT, all sample features are used for this purpose. Therefore RF has a more independent structure and usually performs better than BDT [95]. Unlike BDT and RF that use the bootstrap technique for developing each decision tree, the ET algorithm fits trees on the whole samples [96]. In the ET algorithm, to create a tree node, an attempt is made to select the best feature for separating the samples into two groups [40]. Therefore, ET is tied to provide trees with more depth to cover all samples, leading to its lower accuracy and higher SD value in the cross-validation with ROC-AUC.
By describing the structure of BDT, ET, and RF algorithms, MAE and MSE values can be discussed. ET is flexible to train data and use all samples for building each tree. Therefore, its error of train data in each iteration of cross-validation was equal, and zero values of SD showed that samples' distribution did not affect the training process of the ET algorithm. However, as BDT and RF trees do not learn with all samples, they had some errors in all iterations. In the case of test data, ET had more significant errors than BDT and RF that shows BDT and RF are more generalized to new data. Eventually, much lower MSE values than MAE indicated that no outlier was in predictions.
In general, all three algorithms performed at the same level of accuracy. It is observed in previous works that tee-based ensemble learning algorithms had almost similar performance in accident modeling [13,97]. Overall, it can be concluded that tree-based ensemble algorithms are helpful in the field of road safety analysis even when working with large-scale data, as they result in a robust prediction with reduced variance [97].
The mean of estimated accident risk in all prepared risk maps was also calculated to validate them. An acceptable rising trend was observed in all estimated risk maps when circadian alertness decreased, but while circadian alertness was dangerously low, mean accident risk was less than expected. This is because traffic volume at 05:00 to 06:00 is low, and usually, professional drivers, who can drive longer with high alertness, drive at this time of day [98,99]. This is because, only at 05:00 to 06:00, circadian alertness is considered dangerously low, and at this time of day, traffic volume is low. Low traffic volume results in low accident frequency, and since BDT, ET, and RF only learned based on the accident points data, mean accident risk at 05:00 to 06:00 was predicted to be low. However, freeways that often have high traffic volume at all hours of the day had a high level of accident risk in all prepared maps. This means driver alertness decreases more rapidly on freeways [100].
Using the estimated accident-risk maps, the most significant accident-prone locations were prioritized with KDE. Compared to previous methods that considered only the spatial density of accident points [101,102], the result of this approach is more reliable and generalizable because the accident-prone locations are obtained according to the impact of various factors on the occurrence of accidents.
On average, distance to the gas station, distance to the city, road type, slope, time of day, traffic direction, land use/cover, and road structure were the most important modeling features. Distance to the city and distance to the gas station had similar importance. In line with previous research [10], distance-based factors are often good features in modeling, as they include an extensive range of values, and driving duration is associated with distance. Besides, roadside gas stations in Iran often provide suitable facilities for drivers to rest, and this increased the importance of distance to the gas station criterion. This finding is in line with previous work, which confirmed the influence of roadside rest areas in reducing the accident risk [71]. Any road type has its design standards. Road type defines the geometric, traffic, structure, and many characteristics of the road [76], so this criterion affects a wide range of road attributes, and its high importance was expected. The slope was the next important modeling feature. In designing any road, earthworks are dependent on terrain slope [10]. Therefore, slope influences the road characteristics directly, which is an effective factor for driver alertness [64]. Alertness is controlled by the body clock and has different levels at different times of day [103]. Therefore, the time of day was an essential feature in this modeling. However, RTA frequency that affected the learning process of the created risk models is associated with traffic volume, and different traffic volume situations in times of day influenced the importance of this criterion. Traffic direction and road structure did not have a good range of values to help the predictions, and their importance obtained low. The low impact of the traffic situation on driver alertness was observed in other research, too [65]. Land use/cover was another low important modeling feature, while many land-use/cover types were observed in the study area. This observation is in line with Ahlström et al.'s [104] experimental finding that visual characteristics of road environment have little impact on driver alertness.

Conclusions
This study aimed to use three machine learning algorithms for spatial modeling of accident risk caused by drivers' lack of alertness. Accident risk was mapped with BDT, ET, and RF algorithms on Iranian roads in different circadian alertness situations. The performance of created risk models was cross-validated by different metrics. Validation results showed that all three algorithms, namely BDT, ET, and RF, had similar performance and no significant difference was in their accuracy. Nevertheless, in a strict comparison, BDT had faster training due to its easier tuning, the training process of ET had less dependency to samples distribution, and RF was more accurate. The hyper parameters tuning was performed with the random search method and increased the accuracy of machine learning algorithms slightly. It is recommended that the effectiveness of other optimization methods on accident-risk modeling be investigated in future works.
The mean estimated accident risk in different circadian alertness situations was investigated. It can be said that traffic volume was the limitation of this study. Contrary to what was expected, dangerously low alertness at 05:00 to 06:00 did not result in higher accident risk due to special traffic conditions. As a result, RTA risk modeling using accident points without normalization of accident frequency with traffic volume might decrease the quality of prediction and features' importance evaluation.
Freeways are identified as the riskiest road types when the driver is not alert. The risk of accidents on freeways was predicted to be high at all times of day, and Khalij Fars, Zanjan-Gahzvin, Tabriz-Zanjan, Amirkabir, Saveh-Hamedan, Tehran-Saveh, and Karaj-Ghazvin freeways had the highest estimated accident risk, respectively. In general, risk mapping of RTA by clarifying the impact of different factors on road safety, identifying risky areas, and prioritizing accident-prone locations can help decision-makers to take necessary actions. Data Availability Statement: Data during the current study are not publicly available due to integrity and legal reasons but are available from the corresponding author on reasonable request.