You are currently viewing a new version of our website. To view the old version click .
Applied Sciences
  • Article
  • Open Access

9 February 2022

Comprehensive Analysis of Traffic Accidents in Seoul: Major Factors and Types Affecting Injury Severity

,
,
and
1
Department of Computer Science and Engineering, Kongju National University, Cheonan 31080, Korea
2
Department of Urban Systems Engineering, Kongju National University, Cheonan 31080, Korea
3
Intelligent Convergence Research Laboratory, Electronics and Telecommunications Research Institute, Daejeon 34129, Korea
*
Authors to whom correspondence should be addressed.
This article belongs to the Special Issue Focus on Traffic Safety: From Artificial Intelligence Approaches to Other Advances

Abstract

Accident and fatality rates of traffic accidents worldwide are steadily increasing every year; thus, considerable effort has been made to prevent traffic accidents and prepare countermeasures. This study aims to identify the major factors and types that affect the severity of traffic accidents in Seoul by utilizing the Seoul Metropolitan Government’s traffic accident dataset. To achieve this, we perform a comprehensive analysis by adopting various machine learning techniques—not only supervised learning methods but also unsupervised learning methods. As a result of the experiment, we derived several critical factors that were found to affect the severity of traffic accidents via supervised learning methods (i.e., ensemble-based and regression-based algorithms) and discovered dominant accident types via unsupervised learning methods (i.e., clustering-based algorithms). One of our primary findings is that, in contrast to common sense, environmental factors such as weather, season, and day of the week do not significantly affect the severity of traffic accidents in Seoul. Moreover, all methods highlight the importance of pedestrian-related factors, implying that it is highly necessary to prepare more meticulous institutional measures for pedestrians to reduce the negative influence of serious traffic accidents in Seoul.

1. Introduction

Traffic accidents have emerged as a serious social problem today, as the number of car registrations has increased rapidly owing to global economic growth and improvements in living standards [1,2,3]. According to the report published by the World Health Organization (WHO) [4] in 2018, nearly 1.35 million people worldwide die in traffic accidents every year, implying that one person dies in a traffic accident every 24 s, an increase of 100,000 people compared to 2015. In addition, according to the Center for Disease Control and Prevention (CDC) [5], the cost of medical and productivity losses associated with deaths from car accidents in one year exceeds $63 billion. Therefore, it is necessary to identify major factors and types of traffic accidents to prevent traffic accidents in advance based on the results obtained.
Along these lines, a number of related studies and policies are being carried out abroad. However, there is still a lack of understanding of the major causes and mechanisms of serious traffic accidents in Seoul. Seoul is the largest city in South Korea, with various types of transportations used by almost 10 million citizens and vehicles every day, implying that the traffic accidents would cause tremendous social and economic losses.
The results of traffic accident data analysis may vary depending on the characteristics of the local traffic environment. Thus, it is necessary to focus on the intrinsic properties of Seoul for a deeper understanding of the causes and mechanisms of traffic accidents in Seoul. Furthermore, traffic accidents are caused by a combination of various factors such as human-errors, road conditions, and environments. This means that we need to perform a comprehensive analysis of traffic accident datasets. Additionally, there is no single method that always yields the best results in all cases; therefore, various methodologies with different philosophies should be used for complex analysis.
In this study, we aim to identify the significant factors and types that affect the severity of traffic accidents by focusing on the cases of Seoul. To this end, we used big data on traffic accidents in Seoul pertaining to various factors by adopting three widely used machine learning techniques: ensemble-based, regression-based, and clustering-based methodologies. Throughout the analysis, we found that the severity of traffic accidents is mainly determined by pedestrian-related variables, not by driver-related variables, which is different from the results reported in previous studies [6,7,8,9,10]. We assume that this is because of the unique characteristics of Seoul, which has created a vehicle-oriented transportation environment that has been inevitably promoted by the daily traffic volume being so high, almost 10 million vehicles [11].
This paper makes the following contributions.
  • We analyzed a set of features that affect the number of traffic accidents by classifying the features into three main factors—human, road, and environment—with a focus on Seoul, the capital of Republic of Korea.
  • We unveiled the significant features that affect the severity of traffic accidents by exploiting various machine learning approaches: ensemble, regression, and clustering-based analytics.
  • By performing further qualitative analysis, we suggest that establishing more preventive measures against pedestrian accidents would be an adequate approach to reduce the number of fatal injuries due to traffic accidents in Seoul.
The remainder of this paper is organized as follows: Section 2 reviews previous research. Section 3 describes the characteristics of the dataset we used for the analysis. Section 4 introduces three methods that we adopted, and Section 5 shows our findings based on the analysis using these methods. Finally, Section 6 concludes the study.

3. Characteristics of Dataset

The dataset we used includes 362,298 cases of traffic accidents that occurred between 2010 to 2018 in Seoul, provided by a public data portal [22]. The characteristics of the dataset are summarized in Table 2.
Table 2. Characteristics of the dataset.
In the pre-processing step, data with unknown values (i.e., null) were removed. Additionally, extremely low-frequency attributes (i.e., less than 0.01%) were removed because it is challenging to develop a good model if the data distribution is imbalanced.
The classification criteria for slight injury and serious injury are as follows: “slight injury” implies an injury that requires treatment for more than five days but less than three weeks due to a traffic accident. In contrast, “serious injury” implies an injury that requires treatment for at least three weeks due to a traffic accident. Further, “death” is considered as death within 30 days from the time of a traffic accident. In this study, serious accidents and deaths were equally treated as serious accidents, including life-threatening cases [23].
Because the data are all categorical data, we used one-hot encoding to transform them into a vector space model to use conventional machine learning algorithms. However, the use of one-hot encoding can dramatically increase the number of variables, resulting in poor classification performance of the algorithm. Therefore, we grouped the attribute values to reduce the number of attributes. For example, days of occurrence were grouped by season, and accident occurrence times were grouped into dawn (0: 00–6:00), day (6:00–18:00), and night (18:00–24:00). Further, the ages were grouped into underage (0–18 years), youth (19–34 years), middle-aged (35–49 years), old-aged (50–64 years), and elderly (≥65 years). After pre-processing, the classification criteria and distribution ratios for each variable were presented with a table separated by human factors, road factors, and environmental factors. A vast number of variables could be easily seen (Table 3, Table 4 and Table 5).
Table 3. Categories and frequencies of variables (human factor).
Table 4. Categories and frequencies of variables (road factor).
Table 5. Categories and frequencies of variables (environmental factor).

3.1. Human Factors

Table 3 shows categories and frequencies of human factors. Human factors are classified into six categories: accident type, violation of law, perpetrator’s gender, perpetrator’s age, victim’s gender, and victim’s age. First, in the category of accident type, we can observe that crossing has the most significant influence on the severity of traffic accidents when considering the frequency and ratio of serious injuries. Second, in the category of the violation of law, speeding shows the highest ratio of serious injuries at 78.7 % despite the low frequency. Third, the perpetrator’s gender and age have little effect on the severity of the traffic accident, and the victim’s gender and age have a more significant impact on the severity of the traffic accident when they are women or older adults.

3.2. Road Factors

Table 4 shows categories and frequencies of road factors. Road factors are classified into four categories: road surfaces, road types, perpetrator’s vehicle types, and victim’s vehicle types. First, the condition of the road surface does not have a significant effect on the severity of traffic accidents. When the road type is a crosswalk, accidents with more serious injuries occur. Furthermore, in the category of perpetrator’s vehicle type, the proportion of accidents with serious injury is high in the order of heavy equipment and specialty vehicles, while in the category of victim’s vehicle type, the pedestrian shows the highest ratio of accidents with serious injury at 49.8%.

3.3. Environment Factors

Table 5 shows categories and frequencies of environmental factors. Environmental factors are classified into four categories: season, time, day of the week, and weather. Examining the day of the week, it appears that Sunday has fewer traffic accidents than other days of the week. Further, examining the time, day of the week, and weather, there is little difference in the ratio of serious injury accidents. In particular, in contrast to common sense, it is interesting that the proportion of serious injuries on snowy or rainy days is not higher than that on other days. Thus, based on observations, it seems that there is not much connection between environmental factors and the severity of traffic accidents compared to other factors.

4. Analytical Methods

Widely used analytical methodologies, including ensemble-based and regression-based classifications, were applied to investigate the interrelationship between a dependent variable (i.e., the severity of traffic accidents) and independent variables (i.e., human, road, and environmental factors). We also adopted clustering to group data to determine the nature of each group so that we can discover dominant patterns of severe traffic accidents. In this work, we used eXtreme Gradient Boosting (XGBoost) because of its robustness for overfitting, which is critical for the classification problem [24]; logistic regression because of its superiority for handling categorical data [25]; and DBSCAN because of the freedom of the number of clusters [26].

4.1. XGBoost

XGBoost [24] is an ensemble algorithm that combines multiple decision trees and is a boosting-based model that improves the overfitting problems, speed, and stability of existing tree-based models. XGBoost sequentially trains a decision tree on the training data, and the objective function of XGBoost is defined as follows.
O b j θ = i = 1 m l y i , y ^ i t + k = 1 K Ω f k ,       θ = f 1 , f 2 , , f K
Here, i represents the i th sample in the dataset and m represents the total number of dataset inserted into the kth tree while K is the total number of trees. y i is the class label, while y ^ i is the predicted label. l is the loss function and Ω is the regularization term.
XGBoost adopts an additive strategy to improve the value of the objective function by adding a new decision tree to the previous one at each iteration. When the t-tree is constructed, the predicted value y ^ i t can be formulated as follows.
  y ^ i t = k = 1 t 1 f k x i + f t x i = y ^ i t 1 + f t x i
According to Equations (1) and (2), the objective function can be formulated as follows.
  O b j θ t = i = 1 m l y i , y ^ i t 1 + f t x i + k = 1 t Ω f k
If the tree contains a total of Τ leaf nodes, the index of each leaf node is defined as j and the weight of the samples for each leaf node is w j . Then, the regularization term Ω(f) is defined as follows.
  Ω f = γ T + 1 2 λ j = 1 T w j 2
Here, γ and λ represent penalty factors.

4.2. Logistic Regression

Regression analysis is a statistical technique for predicting the value of dependent variables from independent variables by understanding the causal relationship between variables. It is used to analyze the relevance of dependent variables to independent variables. A typical multiple linear regression (MLR) formula is equivalent to Equation (5).
p M L R y i | x i = v + w T x i ,       i = 1 , 2 , , m  
where X = x 1 , , x m T m × n is a set of training data and Y = y 1 , , y m m is a set of labels. w n are weighting values and v represents the intercept. p M L R y i | x i is the predicted value of y i when the independent variable x i attains a certain value. A typical regression analysis can acquire any value depending on the independent variable; thus, the p M L R y i | x i value can extend to infinity. If the dependent variable is a binary categorical variable, linear regression does not properly represent the relationship between the independent and dependent variables.
Therefore, logistic regression (LR) [25] can be used instead of linear regression if the dependent variable is binary (i.e., y i 1 , + 1 ). Using logistic regression, the value of the dependent variable can be represented as a value between zero and one. Expressing logistic regression as a formula is equivalent to Equation (6).
p L R y i | x i = e x p v + w T x i y i 1 + e x p v + w T x i y i
When the independent variable x i acquires a certain value, the predicted value of p L R y i | x i has the concept of probability between 0 and 1.
The average logistic loss function is calculated from the negative log-likelihood of the logistic model with respect to all samples.
l a v g w , v = 1 m i = 1 m log 1 + exp y i w T x i + v
The model parameters w and v are determined in the direction of minimizing the average logistic loss function by a maximum likelihood estimation.
m i n i m i z e     l a v g w , v
By adding weight-regulating terms, which is a standard technique for preventing overfitting, to the mean logistic loss function, we can limit the weights from increasing in value and improve the generalization performance of our models.
m i n i m i z e     l a v g w , v + R C
Here, R(C) is the regularization function, which can have different forms depending on the regularization method. The l 1 -regularized logistic regression problem is
m i n i m i z e     l a v g w , v + 1 C i = 1 n w i  
The l 2 -regularized logistic regression problem is
m i n i m i z e     l a v g w , v + 1 C i = 1 n w i 2
where C is a regularization parameter used to adjust the balance between the magnitude of the weight vector and the average logistic loss measured by the l 1 -norm or l 2 -norm.

4.3. DBSCAN

DBSCAN [26] is an unsupervised learning method that clusters data with similar characteristics, clustering dense parts of the data. D is the user’s database, and point p ,   q D is a d -dimensional vector. Further, N e p s p = [ q D | d i s t p , q E p s ] is the set of points in the radius E p s centered on point p . When a point p satisfies the p N e p s q while p is part of a set of q and N e p s p   m i n P t s , point q is defined as the core point, and point p is directly density-reachable from point q . Thus, if there are more than m i n P t s points within the E p s radius at point p , then point q is classified as a core point. If a chain exists where p i + 1 from point p to q is directly density-reachable from p i , then point p is defined as density-reachable from point q . However, if a density-reachable point o exists from points p and q , it is defined as density-connected. When C i is considered a cluster within D , we define n o i s e = { p D |   i   : p   C i } as a noise point, which is a point that does not belong to any cluster [26].

5. Results

Experimental analysis was done through the Seoul Metropolitan Government’s traffic accident dataset with the following focuses: (i) critical factors affecting the severity of traffic accidents (Section 5.1) and (ii) representative types of traffic accidents (Section 5.2). All experiments were performed on a PC with AMD Ryzen 7 2700X Eight-Core Processor 3.7 GHz CPU and 32 Gbyte RAM, running Windows 10. All algorithms were implemented in Python. In the data preprocessing step, we used one-hot encoding to handle the categorical data. For supervised learning methods, the ratio between a training set and test set is 75/25. All results in this section are statistically significant since the p-values are less than a typical significance level 0.01. The source code of all experiments is fully available at https://github.com/hyunchul1357/traffic-accident-analysis (accessed on 13 January 2022).

5.1. Factor Analysis through XGBoost and Logistic Regression

In XGBoost, there are three hyper-parameters to be first optimized to prevent overfitting and increase accuracy. Table 6 lists the results of the hyper-parameter optimization. We have observed that when the learning rate is 0.1, the depth of the tree is 3, and the number of weak learners is 200, it achieves the highest accuracy, which is 68.95 %. However, the difference in accuracy according to hyper-parameter changes is not large. In general, hyper-parameter tuning is performed to prevent overfitting or underfitting the model in order to find accurate trends in the dataset. The reason that hyper-parameter optimization does not dramatically change the results is that the dataset has a clear tendency.
Table 6. Effects of hyper-parameters of XGBoost.
Table 7 lists the top five independent variables after learning the XGBoost.
Table 7. Importance of independent variables.
Comparing Table 7 with Table 3, Table 4 and Table 5 shows that the victim’s vehicle type = pedestrian, violation of law = signal violation, and victim’s vehicle type = two-wheeler are considered important variables in judging serious accidents, while victim’s vehicle type = passenger car and perpetrator’s vehicle type = passenger car are considered important variables in determining slight accidents. In particular, the victim’s vehicle type = pedestrian was chosen as the most important variable in determining the severity of traffic accidents.
In logistic regression, we need to choose L1 or L2 regularization and optimize the C value, which adjusts the degree of the fitting. Table 8 shows the results of the hyper-parameter optimization. Based on the experimental results, the C value was set to be 1 with L1 regularization. As in the case of XGBoost, even in logistic regression, the difference in accuracy according to hyperparameter changes is not large. This again supports the clear trend of the dataset.
Table 8. Effects of hyper-parameters of Logistic Regression.
Table 9 shows the top 10 regression coefficients. A higher value means a higher influence on the severity of traffic accidents. The result shows that whether the perpetrator is speeding has the most significant impact and that the victim’s vehicle type has a significant impact on serious traffic accidents for the case of a motorized bicycle, pedestrian, bicycle, and elderly victim. Additionally, when the perpetrator’s vehicle type is a two-wheeler, it has a high impact on the severity of traffic accidents, supporting the claim that motorcyclists are more likely to be seriously injured in a traffic crash than people in passenger cars. In fact, the death rate for two-wheelers has constantly increased in Seoul from 2010 to 2018 due to the increase in the number of single households and the need for delivery services, although the total rate of death by traffic accidents slowly decreased during the same period.
Table 9. Ten Variables leading to serious traffic accidents.
It is worth noting that the perpetrator’s violation of law = signal violation, victim’s vehicle type = pedestrian, and victim’s vehicle type = two-wheeler variables are derived by not only logistic regression but also XGBoost as critical variables affecting the severity of traffic accidents. This demonstrates the necessity to prepare countermeasures against the perpetrators’ signal violations and accidents involving two-wheeled vehicles and pedestrians.
Table 10 shows the bottom 10 regression coefficients. A lower value means a higher influence on slight traffic accidents. The results show that the backup collision is most closely related to slight accidents, followed by victim’s vehicle type = passenger car and accident type = passing on the edge of the road. In general, the rate of slight accidents appears to be high because the vehicle speed is not high when backing up or passing along the edge of the road. In addition, in Table 4, when the victim’s vehicle type is a passenger car, many slight accidents occur, and a similar trend appears in the regression analysis result.
Table 10. Ten variables leading to slight traffic accidents.

5.2. Cluster Analysis through DBSCAN

In DBSCAN, there are two input parameters: (i) eps and (ii) minPts. We determined hyper-parameters following the heuristic suggested by literature [26,27]. MinPts is set to approximately twice the dimensionality, thus 30 for a 14-dimensional space. Then, the value of eps is estimated by plotting the distance to the (MinPts−1)th nearest neighbor for each of sampled points, sorted in descending order, and finding the distance to an “elbow” of the curve. As a result of DBSCAN, we derived three major clusters corresponding to 97.8 of the entire dataset.
Table 11 shows the results of arranging the modes for each variable in each cluster after clustering. Given the high proportion of variables within the accident type = side collision and road type = at an intersection, cluster 1 appears to be a cluster for side collision accidents within intersections. Further, cluster 2 appears to be a cluster for rear-end collision accidents, given that the proportion of accident type = rear-end collision is high. In addition, cluster 3 is considered to be a cluster for pedestrian accidents, given that the proportion of the victim’s vehicle type = pedestrian is 100%. In cluster 3, violation of law = non-compliance with safe-driving obligation was 92.1%, which was much higher than that of the other two clusters. It can be inferred that non-compliance with safe-driving obligation leads to many pedestrian accidents. Overall, the clustering results show that environmental factors do not significantly influence traffic accidents given the distribution ratio. This supports the previous results, in which environmental factors such as weather do not significantly impact the occurrence of traffic accidents in Seoul. In addition, it is also interesting to note that in all clusters, the gender of the victim is predominantly male.
Table 11. Characteristics of three major clusters.

6. Conclusions and Discussion

In this study, we used a traffic accident dataset, which included accidents in Seoul from 2010 to 2018, to identify the major factors and types that affect the severity of traffic accidents. To create a good classification, less frequent or skewed data were pre-processed by being removed and re-grouped, and analyzed using XGBoost, Logistic Regression, and DBSCAN, which are the representative methodologies widely used in the field. In the XGBoost results, the case where the perpetrator violated the signal or the victim was riding a two-wheeled vehicle was also found to be an important variable in judging a serious traffic accident. In addition, the case where the victim and perpetrator’s vehicle type was a passenger car had a significant influence in judging the slight accident. In logistic regression, the top and bottom 10 variables were analyzed according to the regression coefficient values to identify factors affecting the severity of traffic accidents. As a result, the perpetrator’s violation of the law was found to affect serious traffic accidents in the order of speeding, two-wheeler or motorized bicycle, elderly, pedestrian, and bicycle. In contrast, environmental factors did not significantly affect traffic accidents. The clustering analysis results derived the top three clusters, represented by in-intersection side-crashes, rear-end collision, and clusters for pedestrians. Considering the three methodologies as a whole, environmental factors such as season, day of the week, and weather were found to be insignificant on the severity of traffic accidents. On the other hand, it is worth noting that variables for pedestrians appear in common among all of the three approaches, which would suggest establishing more preventive measures against pedestrian accidents, in order to reduce the fatal injury by the traffic accidents in Seoul.
In practice, actual traffic accidents are caused by a combination of more specific and diverse factors than the variables in the dataset used in this study. For example, in this study, variables such as speeding in violation of the law may vary depending on how fast the analysis was conducted. Factors such as driver’s vision or seat belt wearing may also affect the outcome. If data including more diverse information are available, a more specific analysis will be possible; thus, an active data opening policy is needed. Notwithstanding the aforementioned limitations, our study still provides important insights on the unique and important features related to traffic conditions in Seoul for furthering the city’s traffic safety.

Author Contributions

Conceptualization, J.K. and K.H.; methodology, J.K., K.H., I.K. and H.J.; software, H.J.; validation, J.K., I.K. and H.J.; investigation, H.J.; resources, J.K. and H.J.; data curation, H.J.; writing—original draft preparation, J.K. and H.J.; writing—review and editing, J.K., K.H. and I.K.; visualization, J.K., K.H. and H.J.; supervision, J.K. and K.H.; project administration, J.K. and K.H.; funding acquisition, J.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the research grant of the Kongju National University in 2021 and by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2021R1A4A1031509).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available at Public Data Portal, http://www.data.go.kr (accessed on 18 March 2021).

Acknowledgments

The author would like to extend their thanks to reviewers and editors for helping to improve this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Yang, J.; Bi, J.; Zhang, H.Y.; Li, F.Y.; Zhou, J.B.; Liu, B.B. Evolvement of the relationship between environmental pollution accident and economic growth in China. China Environ. Sci. 2010, 30, 571–576. [Google Scholar]
  2. Law, T.H.; Noland, R.B.; Evans, A.W. Factors associated with the relationship between motorcycle deaths and economic growth. Accid. Anal. Prev. 2009, 41, 234–240. [Google Scholar] [CrossRef] [PubMed]
  3. Kopits, E.; Cropper, M. Traffic fatalities and economic growth. Accid. Anal. Prev. 2009, 37, 169–178. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. World Health Organization (WHO). Global Status Report on Road Safety 2018. Available online: https://www.who.int/publications/i/item/9789241565684 (accessed on 2 November 2020).
  5. Centers for Disease Control and Prevention. WISQARS Injury CENTER. Available online: https://www.cdc.gov/injury/wisqars/ (accessed on 23 April 2021).
  6. Salgado, M.S.L.; Colombage, S.M. Analysis of fatalities in road accidents. Forensic Sci. Int. 1988, 36, 91–96. [Google Scholar] [CrossRef]
  7. Jacobs, G.D.; Sayer, I. Road accidents in developing countries. Accid. Anal. Prev. 1983, 15, 33–353. [Google Scholar] [CrossRef] [Green Version]
  8. Mohamed, E.A. Predicting causes of traffic road accidents using multi-class support vector machines. Comput. Commun. 2014, 11, 441–447. [Google Scholar]
  9. Shanks, N.J.; Ansari, M.; Ai-Kalai, D. Road traffic accidents in Saudi Arabia. Public Health 1994, 108, 27–34. [Google Scholar] [CrossRef]
  10. Yang, B.M.; Kim, J.H. Road traffic accidents and policy interventions in Korea. Inj. Contr. Saf. Promot. 2003, 10, 89–94. [Google Scholar] [CrossRef] [PubMed]
  11. Seoul Urban Solution Agency. Seoul Transportation, Report: Safe and Convenient Seoul Transportation that Puts People First. Available online: http://susa.or.kr/sites/default/files/resources/%5BSeoul%20Urban%20Solutions%5D%5BTransportation%5DSeoul%20Public%20Transportation%28English%29.pdf (accessed on 7 March 2021).
  12. Chong, M.; Abraham, A.; Paprzycki, M. Traffic accident analysis using machine learning paradigms. Informatica 2005, 29, 89–98. [Google Scholar]
  13. Feng, M.; Zheng, J.; Ren, J.; Liu, Y. Towards Big Data Analytics and Mining for UK Traffic Accident Analysis, Visualization & Prediction. In Proceedings of the 2020 12th International Conference on Machine Learning and Computing, Shenzhen, China, 15–17 February 2020. [Google Scholar]
  14. De Oña, J.; Mujalli, R.O.; Calvo, F.J. Analysis of traffic accident injury severity on Spanish rural highways using Bayesian networks. Accid. Anal. Prev. 2011, 43, 402–411. [Google Scholar] [CrossRef] [PubMed]
  15. Chen, H.; Zhao, Y.; Ma, X. Critical factors analysis of severe traffic accidents based on Bayesian network in China. J. Adv. Transp. 2020, 4, 8878265. [Google Scholar] [CrossRef]
  16. Dong, B.; Ma, X.; Chen, F.; Chen, S. Investigating the differences of single-vehicle and multivehicle accident probability using mixed logit model. J. Adv. Transp. 2018, 2018, 2702360. [Google Scholar] [CrossRef] [PubMed]
  17. Champahom, T.; Jomnonkwao, S.; Chatpattananan, V.; Karoonsoontawong, A.; Ratanavaraha, V. Analysis of rear-end crash on Thai highway: Decision tree approach. J. Adv. Transp. 2019, 2019, 2568978. [Google Scholar] [CrossRef] [Green Version]
  18. Ahmed, L.A. Using logistic regression in determining the effective variables in traffic accidents. Appl. Math. Sci. 2017, 11, 2047–2058. [Google Scholar] [CrossRef]
  19. Bhin, M.Y.; Son, S.K. Analysis of factors influencing traffic accident severity according to gender of bus drivers. J. Korean Soc. Transp. 2018, 36, 440–451. [Google Scholar] [CrossRef]
  20. Lim, Y.J.; Moon, H.J.; Kang, P.K. Analysis on factors of traffic accident on roads having width of less than 9 meters. J. Korean Inst. Intell. Transp. Syst. 2014, 13, 96–106. [Google Scholar]
  21. Kim, T.H.; Kim, E.K.; Rho, J.H. Analysis of Old Driver’s Accident Influencing Factors Considering Human Factors. J. Korean Soc. Saf. 2009, 24, 69–77. [Google Scholar]
  22. Public Data Portal. Available online: http://www.data.go.kr/ (accessed on 18 March 2021).
  23. The Road Traffic Authority. Available online: http://taas.koroad.or.kr/sta/acs/exs/wordArngPopup.do (accessed on 11 January 2021).
  24. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  25. Peng, C.J.; Lee, K.L.; Ingersoll, G.M. An introduction to logistic regression analysis and reporting. J. Educ. Res. 2002, 96, 3–14. [Google Scholar] [CrossRef]
  26. Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 1996, 96, 226–231. [Google Scholar]
  27. Sander, J.; Ester, M.; Kriegel, H.P.; Xu, X. Density-based clustering in spatial databases: The algorithm gdbscan and its applications. Data Min. Knowl. Disc. 1998, 2, 169–194. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.