Decision Tree Ensemble Method for Analyzing Traffic Accidents of Novice Drivers in Urban Areas
Abstract
:1. Introduction
2. Data and Methods
2.1. Accident Data
2.2. Preprocessing Step
2.3. Decision Trees
2.4. Information Root Node Variation
2.5. Split Criteria
2.5.1. Info-Gain Ratio (IGR)
2.5.2. Imprecise Info-Gain (IIG)
2.5.3. Approximate Nonparametric Predictive Inference Model (A-NMPI)
2.6. Selection of the Best Rules
- Support (S): Let us consider a rule of type ‘IF A THEN B’ (). We define support as the fraction of the data set where A and B are present. In other words, it is the probability that both the antecedent and the consequent occur.
- Probability (): It is the probability that the consequent is present given that the antecedent is present. If we have a rule , then where is the probability of .
3. Results and Discussion
3.1. Procedure to Obtain the Rule Set
3.2. Some Remarks about the Rules Related to Accidents in Intersections
3.3. Some Remarks about the Rules Related to Accidents That Do Not Occur in Intersections
3.4. Summary of the Rules Obtained
4. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- World Health Organization. Global Status Report on Road Safety: Time for Action; World Health Organization: Geneva, Switzerland, 2009. [Google Scholar]
- Tay, R. A random parameters probit model of urban and rural intersection crashes. Accid. Anal. Prev. 2015, 84, 38–40. [Google Scholar] [CrossRef]
- Theofilatos, A.; Graham, D.; Yannis, G. Factors Affecting Accident Severity Inside and Outside Urban Areas in Greece. Traffic Inj. Prev. 2012, 13, 458–467. [Google Scholar] [CrossRef] [PubMed]
- De Oña, J.; Mujalli, R.O.; Calvo, F.J. Analysis of traffic accident injury severity on Spanish rural highways using Bayesian networks. Accid. Anal. Prev. 2011, 43, 402–411. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- De Oña, J.; López, G.; Abellán, J. Extracting decision rules from police accident reports through decision trees. Accid. Anal. Prev. 2013, 50, 1151–1160. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Abellán, J.; López, G.; de Oña, J. Analysis of traffic accident severity using Decision Rules via Decision Trees. Expert Syst. Appl. 2013, 40, 6047–6054. [Google Scholar] [CrossRef] [Green Version]
- Wikman, A.S.; Nieminen, T.; Summala, H. Driving experience and time-sharing during in-car tasks on roads of different width. Ergonomics 1998, 41, 358–372. [Google Scholar] [CrossRef]
- Scott-Parker, B.; Watson, B.; King, M.J.; Hyde, M.K. Mileage, Car Ownership, Experience of Punishment Avoidance, and the Risky Driving of Young Drivers. Traffic Inj. Prev. 2011, 12, 559–567. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Underwood, G.; Crundall, D.; Chapman, P. Selective searching while driving: The role of experience in hazard detection and general surveillance. Ergonomics 2002, 45, 1–12. [Google Scholar] [CrossRef]
- Ginsburg, K.R.; Winston, F.K.; Senserrick, T.M.; García-España, F.; Kinsman, S.; Quistberg, D.A.; Ross, J.G.; Elliott, M.R. National young-driver survey: Teen perspective and experience with factors that affect driving safety. Pediatrics 2008, 121, e1391–e1403. [Google Scholar] [CrossRef]
- Boufous, S.; Ivers, R.; Senserrick, T.; Stevenson, M. Attempts at the Practical On-Road Driving Test and the Hazard Perception Test and the Risk of Traffic Crashes in Young Drivers. Traffic Inj. Prev. 2011, 12, 475–482. [Google Scholar] [CrossRef]
- Kouabenan, D.R. Occupation, driving experience, and risk and accident perception. J. Risk Res. 2002, 5, 49–68. [Google Scholar] [CrossRef]
- Underwood, G.; Chapman, P.; Brocklehurst, N.; Underwood, J.; Crundall, D. Visual attention while driving: Sequences of eye fixations made by experienced and novice drivers. Ergonomics 2003, 46, 629–646. [Google Scholar] [CrossRef]
- Underwood, G. Visual attention and the transition from novice to advanced driver. Ergonomics 2007, 50, 1235–1249. [Google Scholar] [CrossRef] [PubMed]
- Kashani, A.T.; Shariat-Mohaymany, A.; Ranjbari, A. A Data Mining Approach to Identify Key Factors of Traffic Injury Severity. PROMET Traffic Transp. 2012, 23, 11–17. [Google Scholar] [CrossRef] [Green Version]
- Savolainen, P.T.; Mannering, F.L.; Lord, D.; Quddus, M.A. The statistical analysis of highway crash-injury severities: A review and assessment of methodological alternatives. Accid. Anal. Prev. 2011, 43, 1666–1676. [Google Scholar] [CrossRef] [Green Version]
- Mujalli, R.O.; de Oña, J. Injury severity models for motor vehicle accidents: A review. Proc. Inst. Civ. Eng. Transp. 2013, 166, 255–270. [Google Scholar] [CrossRef]
- Chang, L.Y.; Wang, H.W. Analysis of traffic injury severity: An application of non-parametric classification tree techniques. Accid. Anal. Prev. 2006, 38, 1019–1027. [Google Scholar] [CrossRef]
- Kashani, A.T.; Mohaymany, A.S. Analysis of the traffic injury severity on two-lane, two-way rural roads based on classification tree models. Saf. Sci. 2011, 49, 1314–1320. [Google Scholar] [CrossRef]
- Kuhnert, P.M.; Do, K.A.; McClure, R. Combining non-parametric models with logistic regression: An application to motor vehicle injury data. Comput. Stat. Data Anal. 2000, 34, 371–386. [Google Scholar] [CrossRef]
- Pakgohar, A.; Tabrizi, R.S.; Khalili, M.; Esmaeili, A. The role of human factor in incidence and severity of road crashes based on the CART and LR regression: A data mining approach. Procedia Comput. Sci. 2011, 3, 764–769. [Google Scholar] [CrossRef]
- De Oña, J.; López, G.; Mujalli, R.; Calvo, F.J. Analysis of traffic accidents on rural highways using Latent Class Clustering and Bayesian Networks. Accid. Anal. Prev. 2013, 51, 1–10. [Google Scholar] [CrossRef] [Green Version]
- Mbakwe, A.C.; Saka, A.A.; Choi, K.; Lee, Y.J. Alternative method of highway traffic safety analysis for developing countries using delphi technique and Bayesian network. Accid. Anal. Prev. 2016, 93, 135–146. [Google Scholar] [CrossRef]
- Abdelwahab, H.; Abdel-Aty, M. Development of artificial neural network models to predict driver injury severity in traffic accidents at signalized intersections. Transp. Res. Rec. J. Transp. Res. Board 2001, 1746, 6–13. [Google Scholar] [CrossRef]
- Chang, L.Y.; Chen, W.C. Data mining of tree-based models to analyze freeway accident frequency. J. Saf. Res. 2005, 36, 365–375. [Google Scholar] [CrossRef] [PubMed]
- Abellán, J.; López, G.; Garach, L.; Castellano, J.G. Extraction of decision rules via imprecise probabilities. Int. J. Gener. Syst. 2017, 46, 313–331. [Google Scholar] [CrossRef]
- Quinlan, J.R. C4.5: Programs for Machine Learning; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1993. [Google Scholar]
- Abellán, J.; Moral, S. Building classification trees using the total uncertainty criterion. Int. J. Intell. Syst. 2003, 18, 1215–1225. [Google Scholar] [CrossRef] [Green Version]
- Walley, P. Inferences from multinomial data; learning about a bag of marbles (with discussion). J. R. Stat. Soc. Ser. B Methodol. 1996, 58, 3–57. [Google Scholar] [CrossRef]
- Abellán, J.; Baker, R.M.; Coolen, F.P. Maximising entropy on the nonparametric predictive inference model for multinomial data. Eur. J. Oper. Res. 2011, 212, 112–122. [Google Scholar] [CrossRef]
- Høye, A. How would increasing seat belt use affect the number of killed or seriously injured light vehicle occupants? Accid. Anal. Prev. 2016, 88, 175–186. [Google Scholar] [CrossRef] [PubMed]
- Grundy, C.; Steinbach, R.; Edwards, P.; Green, J.; Armstrong, B.; Wilkinson, P. Effect of 20 mph traffic speed zones on road injuries in London, 1986–2006: Controlled interrupted time series analysis. BMJ 2009, 339, b4469. [Google Scholar] [CrossRef]
- Elkus, B.; Wilson, K.R. Photochemical air pollution: Weekend-weekday differences. Atmos. Environ. 1977, 11, 509–515. [Google Scholar] [CrossRef]
- Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
- Abellán, J. Uncertainty measures on probability intervals from the imprecise Dirichlet model. Int. J. Gener. Syst. 2006, 35, 509–528. [Google Scholar] [CrossRef] [Green Version]
- Mantas, C.J.; Abellán, J. Analysis and extension of decision trees based on imprecise probabilities: Application on noisy data. Expert Syst. Appl. 2014, 41, 2514–2525. [Google Scholar] [CrossRef]
- Abellán, J.; Castellano, J.G. Improving the Naive Bayes Classifier via a Quick Variable Selection Method Using Maximum of Entropy. Entropy 2017, 19, 247. [Google Scholar] [CrossRef]
- Montella, A.; Aria, M.; D’Ambrosio, A.; Mauriello, F. Analysis of powered two-wheeler crashes in Italy by classification trees and rules discovery. Accid. Anal. Prev. 2012, 49, 58–72. [Google Scholar] [CrossRef]
- Witten, I.H.; Frank, E. Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed.; Morgan Kaufmann Series in Data Management Systems; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 2005. [Google Scholar]
Feature | Value | Meaning | N_I | N_Ni |
---|---|---|---|---|
SE: Season | 1 | Winter (between Decenber and February) | 7108 | 6446 |
2 | Spring (between March and May) | 7378 | 6599 | |
The accident occurred in | 3 | Summer (between June and August) | 7318 | 6594 |
4 | Autumn (between September and November) | 8204 | 7692 | |
F_T: Fringe time | 1 | 0:01 and 6:00 | 2489 | 2433 |
The accident happened between | 2 | 6:01 and 12:00 | 75,112 | 6780 |
3 | 12:01 and 18:00 | 11,510 | 10,406 | |
4 | 18:01 and 0:00 | 8897 | 7712 | |
S_W: Section Week | 1 | On Monday | 4302 | 3962 |
The accident occurred | 2 | On Friday | 4939 | 4745 |
3 | Between Tuesday and Thursday | 13,598 | 12,473 | |
4 | On Saturday or Sunday | 7169 | 6151 | |
I_L: Degree of involvement | 1 | One vehicle | 3510 | 6733 |
Vehicles involved in the accident | 2 | More than one vehicle | 26,498 | 20,598 |
TR_N_INT: Traced no intersection | Missing value | 5039 | ||
The no intersection tracing of the accident is a | 1 | Straight line | 20,470 | |
2 | Curve | 1822 | ||
I_T: Intersection type | 1 | T o X | 7028 | |
2 | X o + | 17,882 | ||
The intersection of the accident was | 3 | A roundabout | 4587 | |
4 | An exit or entrance link | 511 | ||
PR: Priority | Missing value | 18,134 | ||
1 | Traffic officer directions | 18 | ||
2 | A semaphore | 3286 | ||
3 | A stop signal | 2838 | ||
There was priority indicated by | 4 | A pedestrian crossing signal | 3596 | |
5 | Road signs | 1218 | ||
6 | General rule | 591 | ||
7 | Other | 327 | ||
RO_SU: Road Surface | Missing value | 2032 | 55 | |
1 | Dry and clean | 25,326 | 23,982 | |
2 | Wet | 58 | 98 | |
The surface was | 3 | Snowy/Frozen | 2364 | 2863 |
4 | Oily | 34 | 71 | |
5 | Other state | 194 | 262 | |
LUM: Luminosity | 1 | In broad daylight | 19,701 | 18,278 |
2 | In twilight | 1280 | 1377 | |
The accident happened | 3 | During the night with sufficient road illumination | 8965 | 7483 |
4 | During the night with insufficient road illumination | 62 | 193 | |
WE_CO: Weather conditions | Missing value | 2460 | 430 | |
1 | Good weather conditions | 25,489 | 24,370 | |
The accident occurred with | 2 | Fog | 61 | 96 |
3 | Light rain | 1727 | 2122 | |
4 | Adverse weather conditions | 271 | 313 | |
RES_VIS: Restricted Visibility | Missing value | 22,512 | 20,003 | |
1 | No restrictions | 6588 | 6588 | |
2 | Buildings | 29 | 54 | |
3 | Land configuration | 57 | 77 | |
Visibility restrictions during the accidents due to | 4 | Weather conditions | 60 | 69 |
5 | Glare | 43 | 52 | |
6 | Dust | 37 | 30 | |
7 | Other restrictions | 416 | 458 | |
PAV: Pavements | Missing value | 262 | 575 | |
Pavements at the accident scene | 0 | Yes | 28,982 | 25,400 |
1 | No | 764 | 1356 | |
ACT_TY: Accident type | 1 | A collision | 25,114 | 19,621 |
2 | Running over a pedestrian | 1984 | 3225 | |
The accident was | 3 | An overturn | 500 | 615 |
4 | A road exit | 1500 | 2004 | |
5 | Another kind of accident | 910 | 1866 | |
AG_FR: Age group | 1 | 20 years old or younger | 7373 | 6609 |
2 | 21–27 years old | 10,673 | 9695 | |
The driver was | 3 | 28–59 years old | 11,401 | 10,528 |
4 | 60+ years old | 561 | 499 | |
SEX | Missing value | 37 | 26 | |
The driver was | 1 | A man | 22,306 | 20,338 |
2 | A woman | 7935 | 6967 | |
MAN: Maneuvers | Missing value | 1342 | 1354 | |
1 | Following the road | 7511 | ||
2 | Overtaking | 530 | 726 | |
3 | Turning | 2410 | 761 | |
4 | Entering from another road or access | 714 | 249 | |
The driver was | 5 | Crossing an intersection | 2217 | 653 |
6 | Driving in reverse | 85 | 247 | |
7 | Doing an abrupt gear shift | 379 | 880 | |
8 | Doing another maneuver | 14,820 | 13,687 | |
SP_IN: Speed Infraction | Missing value | 8801 | 7694 | |
1 | Too fast | 920 | 1325 | |
The driver was driving | 2 | Too slow | 19 | 24 |
3 | At an appropriate speed | 20,268 | 18,288 | |
DR_INFR: Driving Infraction | Missing value | 3219 | 4660 | |
0 | Did not commit any traffic violation | 14,014 | 12,817 | |
1 | Failed to observe a traffic sign | 3328 | 1028 | |
2 | Driving on the wrong side of the road, or invading it partially | 165 | 267 | |
The driver | 3 | Was overtaking in a forbidden zone of the road | 152 | 188 |
4 | Failed to observe the security distance | 454 | 1166 | |
5 | Committed other type of infraction | 8676 | 7205 | |
OL_VEH: Old vehicle | Missing value | 13,932 | 9248 | |
1 | Two or less years old | 2661 | 2968 | |
The vehicle was | 2 | Three or more years old | 13,415 | 15,115 |
VEH_TY: Vehicle type | Missing value | 52 | 19 | |
1 | A motorbike or equivalent | 10,900 | 9109 | |
The vehicle was | 2 | A car or equivalent | 18,516 | 17,556 |
3 | A heavy vehicle | 416 | 518 | |
4 | Another type of vehicle | 123 | 129 | |
AN: Anomaly | Missing value | 5484 | 7602 | |
1 | No | 24,281 | 19,478 | |
Was there any malfunction in the vehicle? | 2 | Yes | 243 | 251 |
O_L: Number of passengers | 1 | Only the driver | 20,610 | 17,329 |
How many people were in the vehicle | 2 | Two | 4346 | 3807 |
3 | More than two | 5052 | 6195 | |
SEV: Severity | 1 | Slight injury | 27,889 | 25,164 |
The severity of the accident was | 2 | Fatal injury | 2109 | 2167 |
NR | A1 | A2 | A3 | A4 | S | Pr |
---|---|---|---|---|---|---|
2 | F_T = 1 | SP_IN = 1 | VEH_TY = 1 | DR_INFR = 5 | 0.1033% | 68.8% |
3 | WE_CO = 1 | ACT_TY = 2 | DR_INFR = 5 | SP_IN = 1 | 0.23% | 59.5% |
4 | RO_SU = 1 | ACT_TY = 2 | DR_INFR = 5 | SP_IN = 1 | 0.2166% | 58.03% |
5 | MAN = 1 | PAV = 0 | SP_IN = 1 | ACT_TY = 2 | 0.1466% | 57.89% |
6 | RO_SU = 1 | ACT_TY = 5 | DR_INFR = 5 | MAN = 1 | 0.1166% | 57.38% |
7 | DR_INFR = 5 | SP_IN = 1 | ACT_TY = 2 | 0.2432% | 57.03% | |
8 | I_L = 1 | DR_INFR = 5 | SP_IN = 1 | ACT_TY = 2 | 0.2266% | 56.66% |
9 | LUM = 1 | ACT_TY = 2 | DR_INFR = 5 | SP_IN = 1 | 0.1733% | 56.52% |
10 | I_T = 2 | DR_INFR = 1 | MAN = 8 | O_L = 1 | 0.1033% | 56.36% |
11 | SP_IN = 1 | DR_INFR = 5 | ACT_TY = 2 | RES_VIS = 1 | 0.2% | 55.55% |
12 | MAN = 8 | SP_IN = 1 | DR_INFR = 1 | 0.166% | 50.5% | |
13 | ACT_TY = 2 | DR_INFR = 1 | SP_IN = 1 | SEX = 1 | 0.17% | 46.79% |
14 | AG_FR = 2 | PAV = 0 | SP_IN = 1 | ACT_TY = 2 | 0.16% | 46.6% |
15 | VEH_TY = 1 | ACT_TY = 2 | MAN = 1 | SP_IN = 1 | 0.2% | 44.03% |
16 | LUM = 1 | PAV = 0 | SP_IN = 1 | ACT_TY = 2 | 0.33% | 38.17% |
17 | AN = 1 | PAV = 0 | SP_IN = 1 | ACT_TY = 2 | 0.45% | 37.6% |
18 | WE_CO = 1 | PAV = 0 | SP_IN = 1 | ACT_TY = 2 | 0.423% | 37.02% |
19 | F_T = 3 | PAV = 0 | ACT_TY = 2 | SP_IN = 1 | 0.1766% | 36.8% |
20 | RES_VIS = 1 | PAV = 0 | ACT_TY = 2 | SP_IN = 1 | 0.397% | 34.9% |
21 | AG_FR = 3 | PAV = 0 | SP_IN = 1 | ACT_TY = 2 | 0.1866% | 32.75% |
22 | AG_FR = 2 | ACT_TY = 2 | SP_IN = 1 | I_T = 2 | 0.11% | 31.73% |
23 | SEX = 1 | DR_INFR = 1 | ACT_TY = 1 | PR = 1 | 0.213% | 17.2% |
24 | RO_SU = 1 | ACT_TY = 1 | DR_INFR = 1 | PR = 1 | 0.29% | 16.86% |
25 | PR = 1 | DR_INFR = 1 | LUM = 1 | 0.24% | 16.78% | |
26 | WE_CO = 1 | ACT_TY = 1 | DR_INFR = 1 | PR = 1 | 0.29% | 16.6% |
27 | AN = 1 | ACT_TY = 1 | DR_INFR = 1 | PR = 1 | 0.296% | 16.45% |
28 | ACT_TY = 1 | DR_INFR = 1 | PR = 1 | 0.296% | 16.42% | |
29 | PAV = 0 | DR_INFR = 1 | ACT_TY = 1 | PR = 1 | 0.296% | 16.42% |
30 | S_W = 3 | ACT_TY = 1 | DR_INFR = 1 | PR = 1 | 0.11% | 14.86% |
31 | RES_VIS = 1 | ACT_TY = 1 | DR_INFR = 1 | PR = 1 | 0.246% | 14.77% |
32 | LUM = 1 | ACT_TY = 1 | DR_INFR = 1 | PR = 1 | 0.183% | 14.66% |
NR | A1 | A2 | A3 | A4 | S | Pr |
---|---|---|---|---|---|---|
1 | SP_IN = 1 | ACT_TY = 2 | DR_INFR = 5 | MAN = 8 | 0.117% | 71.11% |
2 | S_W = 3 | ACT_TY = 2 | DR_INFR = 5 | SP_IN = 1 | 0.314% | 64.18% |
3 | F_T = 4 | ACT_TY = 2 | DR_INFR = 5 | SP_IN = 1 | 0.245% | 62.03% |
4 | TR_N_INT = 1 | ACT_TY = 2 | DR_INFR = 5 | SP_IN = 1 | 0.574% | 59.25% |
5 | LUM = 3 | ACT_TY = 2 | DR_INFR = 5 | SP_IN = 1 | 0.179% | 59.03% |
6 | ACT_TY = 2 | DR_INFR = 5 | SP_IN = 1 | RO_SU = 1 | 0.56% | 58.85% |
7 | MAN = 1 | ACT_TY = 2 | DR_INFR = 5 | SP_IN = 1 | 0.38% | 58.76% |
8 | SEX = 1 | ACT_TY = 2 | DR_INFR = 5 | SP_IN = 1 | 0.46% | 58.6% |
9 | AN = 1 | ACT_TY = 2 | DR_INFR = 5 | SP_IN = 1 | 0.59% | 58.06% |
10 | DR_INFR = 5 | ACT_TY = 2 | SP_IN = 1 | O_L = 1 | 0.49% | 58% |
11 | LUM = 1 | ACT_TY = 2 | DR_INFR = 5 | SP_IN = 1 | 0.377% | 57.54% |
12 | I_L = 1 | DR_INFR = 5 | ACT_TY = 2 | SP_IN = 1 | 0.56% | 57.52% |
13 | ACT_TY = 2 | DR_INFR = 5 | SP_IN = 1 | RES_VIS = 1 | 0.5% | 56.61% |
14 | VEH_TY = 2 | ACT_TY = 2 | DR_INFR = 5 | SP_IN = 1 | 0.465% | 55.95% |
15 | ACT_TY = 2 | SP_IN = 1 | DR_INFR = 1 | PAV = 0 | 0.135% | 54.41% |
16 | VEH_TY = 1 | SP_IN = 1 | DR_INFR = 5 | F_T = 1 | 0.113% | 53.45% |
17 | S_W = 1 | ACT_TY = 2 | SP_IN = 1 | PAV = 0 | 0.143% | 43.82% |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Moral-García, S.; Castellano, J.G.; Mantas, C.J.; Montella, A.; Abellán, J. Decision Tree Ensemble Method for Analyzing Traffic Accidents of Novice Drivers in Urban Areas. Entropy 2019, 21, 360. https://doi.org/10.3390/e21040360
Moral-García S, Castellano JG, Mantas CJ, Montella A, Abellán J. Decision Tree Ensemble Method for Analyzing Traffic Accidents of Novice Drivers in Urban Areas. Entropy. 2019; 21(4):360. https://doi.org/10.3390/e21040360
Chicago/Turabian StyleMoral-García, Serafín, Javier G. Castellano, Carlos J. Mantas, Alfonso Montella, and Joaquín Abellán. 2019. "Decision Tree Ensemble Method for Analyzing Traffic Accidents of Novice Drivers in Urban Areas" Entropy 21, no. 4: 360. https://doi.org/10.3390/e21040360
APA StyleMoral-García, S., Castellano, J. G., Mantas, C. J., Montella, A., & Abellán, J. (2019). Decision Tree Ensemble Method for Analyzing Traffic Accidents of Novice Drivers in Urban Areas. Entropy, 21(4), 360. https://doi.org/10.3390/e21040360