Data Analysis of Two-Vehicle Accidents Based on Machine Learning
Abstract
Featured Application
Abstract
1. Introduction
2. Materials and Methods
2.1. Non-Public Parameters
2.1.1. Continuous Variables
2.1.2. Classification Variables
2.2. Common Parameters
3. Machine Learning-Based Dangerous Scenario Analysis
3.1. Data Preprocessing
3.2. Significance Analysis
3.2.1. Factor Analysis
3.2.2. Variable Significance
3.2.3. Significance of Variables Obtained from Other Machine Learning Methods
3.3. Cluster Analysis
3.3.1. Data Processing
3.3.2. Clustering
4. Results
4.1. Hierarchical Clustering
4.2. Clustering Results
5. Discussion
5.1. Scenario-Level Interpretation
5.2. Weather Effects
5.3. Cross-Method Triangulation and Method Sensitivity
- (1)
- The most significant factors obtained by all three methods, ANN, DT, and RF, were vehicle parameters, where B vehicle type had a greater impact on occupant injury than A vehicle type.
- (2)
- From an analysis of the clustering results, it can be concluded that lighting, the number of lanes, B vehicle type, the speed of vehicle A, and precipitation have a greater effect on occupant injury.
- (3)
- The significance of the variables obtained from the factor analysis showed that the first common factor, consisting of accident pattern, on-site road environment, road classification, and A vehicle speed, had the greatest impact.
- (4)
- The factor analysis method is more sensitive to small samples in the data than other machine learning methods.
5.4. Scope and Transferability Beyond Two-Vehicle Interactions
5.5. Limitations
5.6. Layered Scenario Definition and Future Work
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
ANN | artificial neural network |
SVM | support vector machines |
DT | decision trees |
LR | logistic regression |
DJ | decision jungle |
RF | random forest |
ML | machine learning |
CIDAS | China In-Depth Accident Study |
References
- Zhang, X.; Khan, M. Principles of Intelligent Automobiles; Springer: Singapore, 2019. [Google Scholar]
- World Health Organization. Global Status Report on Road Safety 2018; World Health Organization: Geneva, Switzerland, 2019. [Google Scholar]
- Zhou, H.; Li, X.; He, X.; Li, P.; Xiao, L.; Zhang, D. Research on safety of the intended functionality of automobile AEB Perception System in Typical Dangerous Scenarios of Two-Wheelers. Accid. Anal. Prev. 2022, 173, 106709. [Google Scholar] [CrossRef] [PubMed]
- Sujayanont, P.; Muttitanon, W.; Chemin, Y.; Som-Ard, J.; Tippayanate, N. Multiple logistic regression model for assessing the risk factors of traffic accidents: Khon kaen model. In Digital Health and Informatics Innovations for Sustainable Health Care Systems; IOS Press: Amsterdam, The Netherlands, 2024; pp. 1589–1593. [Google Scholar]
- Wahab, L.; Jiang, H. Severity prediction of motorcycle crashes with machine learning methods. Int. J. Crashworthiness 2020, 25, 485–492. [Google Scholar] [CrossRef]
- Wu, Q.; Song, D.; Wang, C.; Chen, F.; Cheng, J.; Easa, S.M.; Yang, Y.; Yang, W. Analysis of Injury Severity of Drivers Involved Different Types of Two-Vehicle Crashes Using Random-Parameters Logit Models with Heterogeneity in Means and Variances. J. Adv. Transp. 2023, 2023, 3399631. [Google Scholar] [CrossRef]
- Ijaz, M.; Lan, L.; Zahid, M.; Jamal, A. A Comparative Study of Machine Learning Classifiers for Injury Severity Prediction of Crashes Involving Three-Wheeled Motorized Rickshaw. Accid. Anal. Prev. 2021, 154, 106094. [Google Scholar] [CrossRef] [PubMed]
- Ma, J.; Cao, Q.; Ren, G.; Yang, Y.; Deng, Y.; Li, J. Exploring the heterogeneous effects of riding behaviours and road conditions on delivery rider severities in scooter-style electric bicycle crashes involving vehicles. Int. J. Inj. Control Saf. Promot. 2024, 31, 165–180. [Google Scholar] [CrossRef]
- Dong, X.; Zhang, Q.; Zhang, D.; Wang, C.; Zhang, T. Research and deduction of car-to-TW vehicle AEB test scenarios based on improved clustering methods. J. Adv. Transp. 2023, 2023, 2708201. [Google Scholar] [CrossRef]
- Wang, H.; Wang, X.; Peng, Y.; Lou, X.; Lee, J. An investigation of ADAS testing scenarios based on vehicle-to-powered two-wheeler accidents occurring in a county-level district in China. Transp. Saf. Environ. 2024, 6, tdae013. [Google Scholar] [CrossRef]
- Rao, R.; Cui, C.; Chen, L.; Gao, T.; Shi, Y. Quantitative testing and analysis of non-standard AEB scenarios extracted from corner cases. Appl. Sci. 2024, 14, 173. [Google Scholar] [CrossRef]
- Zhao, Z.; Jin, X.; Cao, Y.; Wang, J. Data mining application on crash simulation data of occupant restraint system. Expert Syst. Appl. 2010, 37, 5788–5794. [Google Scholar] [CrossRef]
- Field, A. Discovering Statistics Using IBM SPSS Statistics, 4th ed.; Sage Publications Ltd.: Thousand Oaks, CA, USA, 2013. [Google Scholar]
- Azhar, A.; Ariff, N.M.; Bakar, M.A.A.; Roslan, A. Classification of driver injury severity for accidents involving heavy vehicles with decision tree and random forest. Sustainability 2022, 14, 4101. [Google Scholar] [CrossRef]
- Wang, X.; Su, Y.; Zheng, Z.; Xu, L. Prediction and interpretive of motor vehicle traffic crashes severity based on random forest optimized by meta-heuristic algorithm. Heliyon 2024, 10, e35595. [Google Scholar] [CrossRef]
- Habibzadeh, M.; Hasan Mirabimoghaddam, M.; Sadat Haghighi, S.M.; Ameri, M. Presentation of artificial neural network models based on optimum theories for predicting accident severity on rural roads in Iran. Transp. Res. Interdiscip. Perspect. 2024, 25, 101090. [Google Scholar] [CrossRef]
- Gu, C.; Xu, J.; Li, S.; Gao, C.; Ma, Y. Injury risk assessment and interpretation for roadway crashes based on pre-crash indicators and machine learning methods. Appl. Sci. 2023, 13, 6983. [Google Scholar] [CrossRef]
- Song, Y.; Chitturi, M.V.; Noyce, D.A. Automated vehicle crash sequences: Patterns and potential uses in safety testing. Accid. Anal. Prev. 2021, 153, 106017. [Google Scholar] [CrossRef]
- Nitsche, P.; Thomas, P.; Stuetz, R.; Welsh, R. Pre-crash scenarios at road junctions: A clustering method for car crash data. Accid. Anal. Prev. 2017, 107, 137–151. [Google Scholar] [CrossRef]
- Esenturk, E.; Wallace, A.; Khastgir, S.; Jennings, P.A. Identification of traffic accident patterns via cluster analysis and test scenario development for autonomous vehicles. IEEE Access 2022, 10, 6660–6675. [Google Scholar] [CrossRef]
- Gibbons, J.D.; Chakraborti, S. Nonparametric Statistical Inference, 6th ed.; Chapman and Hall/CRC: New York, NY, USA, 2020. [Google Scholar]
- Perticone, A.; Barbani, D.; Baldanzini, N. An enhanced method for evaluating the effectiveness of protective devices for road safety application. Accid. Anal. Prev. 2024, 203, 107615. [Google Scholar] [CrossRef]
- Yan, R.; Hu, L.; Li, J.; Lin, N. Accident severity analysis of traffic accident hot spot areas in Changsha city considering built environment. Sustainability 2024, 16, 3054. [Google Scholar] [CrossRef]
- Ben-Shachar, M.S.; Lüdecke, D.; Makowski, D. Effectsize: Estimation of effect size indices and standardized parameters. J. Open Source Softw. 2020, 5, 2815. [Google Scholar] [CrossRef]
- Sander, U.; Lubbe, N. The potential of clustering methods to define intersection test scenarios: Assessing real-life performance of AEB. Accid. Anal. Prev. 2018, 113, 1–11. [Google Scholar] [CrossRef]
- Zhao, W.; Gong, S.; Zhao, D.; Liu, F.; Sze, N.N.; Quddus, M.; Huang, H. A spatial-state-based omni-directional collision warning system for intelligent vehicles. IEEE Trans. Intell. Transp. Syst. 2024, 25, 14344–14358. [Google Scholar] [CrossRef]
Vehicle Type | Number of A Cases | Number of B Cases | Personnel Injuries | Number of Injuries in Vehicle A | Number of Injuries in Vehicle B | Steering Type | Number of A Cases | Number of B Cases |
---|---|---|---|---|---|---|---|---|
Car | 414 | 333 | No injuries | 277 | 320 | No steering | 291 | 319 |
Trucks | 69 | 146 | Minor injuries | 132 | 114 | Left turn | 84 | 59 |
Bus | 10 | 14 | Serious injuries | 34 | 32 | Right turn | 61 | 61 |
Total | 493 | 493 | Deaths | 50 | 20 | Right lane change | 17 | 18 |
Total | 493 | 486 | Not applicable | 13 | 9 | |||
Left lane change | 11 | 7 | ||||||
Unknown | 16 | 20 | ||||||
Total | 493 | 493 |
Parameter | Parameter Values | Number of Cases |
---|---|---|
Street light status | No street lights | 218 |
Street lights off | 173 | |
Street lights on | 102 | |
Precipitation condition | No | 442 |
Rain | 41 | |
Snow | 10 | |
Time of day | Daytime | 286 |
Evening | 162 | |
Dusk | 45 | |
Road surface | Good | 434 |
Other | 30 | |
Potholes | 29 | |
Road surface condition | Dry | 394 |
Damp | 33 | |
Wet | 37 | |
Snow-covered | 21 | |
Icy | 7 | |
Other | 1 | |
Number of lanes in the direction of travel | 1 | 120 |
2 | 203 | |
3 | 119 | |
4 | 39 | |
5 | 9 | |
6 | 3 | |
Visibility | No fog | 486 |
<2000 m | 3 | |
<100 m | 1 | |
<200 m | 1 | |
<500 m | 1 | |
<1000 m | 1 | |
Road classification | Other | 174 |
National highway | 96 | |
County road | 77 | |
High speed | 73 | |
Provincial road | 44 | |
Township road | 29 | |
On-site road environment | Straight | 230 |
Cross intersection | 118 | |
General intersection | 99 | |
Curve | 39 | |
Roundabouts | 2 | |
Ramp | 2 | |
Gated intersections | 1 | |
Other | 2 | |
Accident pattern | Side impact | 213 |
Rear-end collision | 146 | |
Head-on collision | 56 | |
Collision with parked vehicle | 32 | |
Same-direction sideswipe | 29 | |
Collision with fixed object | 8 | |
Opposite-direction sideswipe | 5 | |
Multi-vehicle collision | 2 | |
Other | 2 |
KMO Measure of Sampling Adequacy | 0.652 | |
---|---|---|
Bartlett’s test of sphericity | Approximate chi-square | 1465.815 |
Degrees of freedom | 120 | |
Significance | <0.001 |
Component | Initial Eigenvalues | Extracted Load Sum of Squares | Rotated Load Sum of Squares | ||||||
---|---|---|---|---|---|---|---|---|---|
Total | Variance % | Cumulative % | Total | Percentage of Variance | Cumulative % | Total | Variance Percentage | Cumulative % | |
1 | 2.923 | 18.271 | 18.271 | 2.923 | 18.271 | 18.271 | 2.465 | 15.408 | 15.408 |
2 | 1.803 | 11.267 | 29.538 | 1.803 | 11.267 | 29.538 | 1.652 | 10.322 | 25.730 |
3 | 1.676 | 10.472 | 40.011 | 1.676 | 10.472 | 40.011 | 1.521 | 9.503 | 35.233 |
4 | 1.285 | 8.034 | 48.045 | 1.285 | 8.034 | 48.045 | 1.114 | 6.961 | 42.195 |
5 | 1.198 | 7.488 | 55.533 | 1.198 | 7.488 | 55.533 | 1.078 | 6.740 | 48.934 |
6 | 1.120 | 6.997 | 62.530 | 1.120 | 6.997 | 62.530 | 1.065 | 6.654 | 55.588 |
7 | 1.030 | 6.439 | 68.970 | 1.030 | 6.439 | 68.970 | 1.058 | 6.614 | 62.202 |
8 | 0.919 | 5.747 | 74.716 | 0.919 | 5.747 | 74.716 | 1.034 | 6.465 | 68.667 |
9 | 0.759 | 4.743 | 79.459 | 0.759 | 4.743 | 79.459 | 1.027 | 6.420 | 75.087 |
10 | 0.730 | 4.563 | 84.022 | 0.730 | 4.563 | 84.022 | 1.020 | 6.377 | 81.464 |
11 | 0.596 | 3.727 | 87.748 | 0.596 | 3.727 | 87.748 | 1.005 | 6.284 | 87.748 |
Parameter | Component | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | |
Incident patterns | 0.865 | ||||||||||
On-site roadway environment | 0.856 | ||||||||||
Roadway classification | −0.699 | ||||||||||
A speed | −0.629 | ||||||||||
Time slots | 0.897 | ||||||||||
Street lights | 0.883 | ||||||||||
Precipitation | 0.887 | ||||||||||
Road conditions | 0.842 | ||||||||||
B vehicle type | 0.897 | ||||||||||
B speed | 0.958 | ||||||||||
B steering type | 0.952 | ||||||||||
Road surface | 0.968 | ||||||||||
Number of lanes | 0.962 | ||||||||||
A vehicle type | 0.967 | ||||||||||
A steering type | 0.975 | ||||||||||
Visibility | 0.995 |
DT Model Variable Significance | RF Model Variable Significance | ANN Model Variable Significance | |||
---|---|---|---|---|---|
Independent Variables | Normalized Significance | Independent Variables | Normalized Significance | Independent Variable | Normalized Importance |
B Vehicle Type | 100.0% | A Speed | 100.0% | B Vehicle Type | 100.0% |
B Steering Type | 32.8% | B Speed | 97.9% | A Vehicle Type | 82.6% |
A Vehicle Type | 26.5% | B Vehicle Type | 57.7% | B Speed | 77.6% |
Number Of Lanes in the Direction Of Travel | 17.9% | Number of Lanes in Traveling Direction | 50.0% | B Steering Type | 73.5% |
Visibility | 15.8% | Accident Pattern | 49.6% | Number Of Lanes In The Direction Of Travel | 64.4% |
B Speed | 14.3% | B Steering Type | 45.0% | Road Surface Condition | 59.2% |
Accident Pattern | 9.5% | A Steering Type | 43.9% | Accident Pattern | 51.9% |
Road Environment in the Scenarios | 8.8% | On-Site Road Environment | 40.3% | A Steering Type | 51.3% |
A Speed | 6.6% | Street Lights | 27.7% | On-Site Road Environment | 50.9% |
A Turning Type | 4.7% | Road Surface Condition | 27.6% | Precipitation | 48.7% |
Parameter | Parameter Value | I | II | III | IV | V |
---|---|---|---|---|---|---|
Precipitation condition | No | 5.7% | 62.7% | 6.8% | 9.7% | 3.6% |
Visibility | No fog | 5.1% | 57.2% | 6.2% | 9.5% | 3.1% |
Road surface | Good | 5.1% | 60.4% | 6.9% | 10.1% | 0.0% |
A vehicle type | Car | 6.0% | 67.6% | 7.2% | 0.0% | 4.1% |
Truck | 0.0% | 0.0% | 0.0% | 66.7% | 0.0% | |
Time of day | Daytime | 4.9% | 55.2% | 8.4% | 8.7% | 4.2% |
Evening | 6.2% | 61.1% | 1.2% | 9.3% | 2.5% | |
B vehicle type | Car | 6.9% | 48.6% | 8.1% | 13.8% | 2.4% |
Truck | 0.7% | 74.7% | 1.4% | 0.0% | 5.5% | |
Street light status | No street light | 1.4% | 60.1% | 5.5% | 13.8% | 6.4% |
On | 9.8% | 52.0% | 4.9% | 6.9% | 0.0% | |
Off | 6.9% | 55.5% | 7.5% | 5.2% | 1.7% | |
B steering type | No steering | 2.2% | 67.4% | 0.9% | 13.2% | 2.5% |
Right turn | 12.7% | 36.7% | 21.5% | 1.3% | 5.1% | |
Left turn | 10.6% | 31.8% | 13.6% | 0.0% | 7.6% | |
A steering type | No steering | 3.8% | 68.7% | 0.3% | 8.6% | 3.1% |
Right turn | 5.1% | 44.9% | 6.4% | 11.5% | 7.7% | |
Left turn | 8.4% | 32.6% | 23.2% | 11.6% | 1.1% | |
Number of lanes in the direction of travel | 1 | 0.0% | 45.8% | 19.2% | 5.8% | 1.7% |
2 | 12.3% | 48.8% | 2.5% | 11.8% | 7.4% | |
3 | 0.0% | 79.0% | 0.0% | 10.1% | 0.0% | |
On-site road environment | Straight | 0.0% | 63.0% | 3.5% | 13.5% | 7.4% |
General intersection | 24.2% | 37.4% | 6.1% | 1.0% | 0.0% | |
Cross intersection | 0.0% | 63.6% | 12.7% | 5.9% | 0.0% |
Parameter | Hierarchical Clustering | k-Means Clustering | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
I | II | III | IV | V | I | II | III | IV | V | |
A vehicle type | Car | Car | Car | Truck | Car | Car | Car | Car | Car | Car |
B vehicle type | Car | Truck | Car | Car | Truck | Car | Car | Car | Car | Car |
On-site road environment | Intersection | Intersection | Crossroads | Straight | Straight | Crossroads | Straight | Straight | Straight | Intersection |
Lighting conditions | Street light at night | No street light at night | Daytime | No street light at night | Daytime | Street light at night | Street light at night | Daytime | Daytime | Daytime |
Precipitation | No | None | None | None | No | None | None | None | None | None |
Road surface | Good | Good | Good | Good | Good | Good | Good | Good | Good | Good |
Number of lanes in the direction of travel | 2 | 3 | 1 | 2 | 2 | 1 | 2 | 2 | 2 | 2 |
Visibility | No fog | No fog | No fog | No Fog | Fogless | Fogless | Fogless | Fogless | Fogless | Fogless |
A steering type | Left | No steering | Left | Left | Right | No steering | No steering | No steering | No steering | No steering |
B steering type | Right | No steering | Right | No steering | Left | No steering | No steering | No steering | No steering | No steering |
Parameter | I | II | III | IV | V |
---|---|---|---|---|---|
Speed of A (km/h) | 30–42 | 40–100 | 36–60 | 51–85 | 60–110 |
Speed of B (km/h) | 45–70 | 30–75 | 30–60 | 12–70 | 30–80 |
Percentage | 5% | 57% | 6% | 9% | 4% |
Injury rate for this scenario | 28% | 52.9% | 30% | 10.9% | 41.2% |
Scenario-level injury rates (Wilson 95% CIs) | 14.3–47.6% | 47.2–58.8% | 16.7–47.9% | 5.0–24.0% | 21.9–61.3% |
Parameter | Slight Injury | ≥Serious Injury | Total Number of Injuries | Percentage of Serious Injuries | Percentage of Injuries |
---|---|---|---|---|---|
No | 116 | 73 | 442 | 16.5% | 42.7% |
Rain | 13 | 10 | 41 | 24.4% | 56.1% |
Snow | 3 | 1 | 10 | 10% | 40.0% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gao, D.; Chen, J.; Luo, T.; Liu, Z.; Cao, L.; Chen, Z.; Wu, J. Data Analysis of Two-Vehicle Accidents Based on Machine Learning. Appl. Sci. 2025, 15, 9819. https://doi.org/10.3390/app15179819
Gao D, Chen J, Luo T, Liu Z, Cao L, Chen Z, Wu J. Data Analysis of Two-Vehicle Accidents Based on Machine Learning. Applied Sciences. 2025; 15(17):9819. https://doi.org/10.3390/app15179819
Chicago/Turabian StyleGao, Dongguang, Jiawei Chen, Tianyu Luo, Zijun Liu, Libo Cao, Zhongxiang Chen, and Jun Wu. 2025. "Data Analysis of Two-Vehicle Accidents Based on Machine Learning" Applied Sciences 15, no. 17: 9819. https://doi.org/10.3390/app15179819
APA StyleGao, D., Chen, J., Luo, T., Liu, Z., Cao, L., Chen, Z., & Wu, J. (2025). Data Analysis of Two-Vehicle Accidents Based on Machine Learning. Applied Sciences, 15(17), 9819. https://doi.org/10.3390/app15179819