Research on Accident Severity Prediction of New Energy Vehicles Based on Cost-Sensitive Fuzzy XGBoost

Huang, Shubing; Yin, Xiaoxuan; Wang, Chongming; Wang, Kun

doi:10.3390/su17125408

Open AccessArticle

Research on Accident Severity Prediction of New Energy Vehicles Based on Cost-Sensitive Fuzzy XGBoost

by

Shubing Huang

^1,*,

Xiaoxuan Yin

^2,*,

Chongming Wang

³ and

Kun Wang

¹

Traffic Management Research Institute of the Ministry of Public Security, Wuxi 214151, China

²

National Engineering Research Center for Electric Vehicles, Beijing Institute of Technology, Beijing 100081, China

³

The Center for E-Mobility and Clean Growth, Coventry University, Coventry CV1 5FB, UK

^*

Authors to whom correspondence should be addressed.

Sustainability 2025, 17(12), 5408; https://doi.org/10.3390/su17125408

Submission received: 25 April 2025 / Revised: 6 June 2025 / Accepted: 7 June 2025 / Published: 11 June 2025

(This article belongs to the Section Sustainable Transportation)

Download

Browse Figures

Versions Notes

Abstract

With the increasing acceptance of green, low-carbon, and sustainable development principles, the rising number of new energy vehicles (NEVs) has raised public concern over traffic safety risks associated with these vehicles. To assist traffic management authorities in efficiently allocating rescue resources, this paper proposes a severity prediction method for the new energy vehicle accidents based on Cost-sensitive Fuzzy XGBoost (CFXGBoost). First, chi-square filtering and wrapper methods are used to extract 20 key features strongly cor-related with accident severity. Then, A fuzzy neural network is employed to combine fuzzy inference results with original features, forming an extended feature set. Moreover, These features are used as inputs to the XGBoost model for severity prediction of the new energy vehicle traffic accidents. Finally, the proposed approach is validated using traffic accident datasets from multiple provinces and cities. Results show that the FXGBoost model achieves a prediction accuracy of 0.92 and outperforms other models in terms of precision, recall, and F1 score, demonstrating its effectiveness in accurately predicting the severity of NEV-related traffic accidents.

Keywords:

new energy vehicle accident; accident severity prediction; fuzzy XGBoost; chi-square filtering

1. Introduction

As the concept of sustainable development and enhanced ecological protection gains increasing acceptance, the use of new energy vehicles (NEVs) in daily travel has become a crucial approach for promoting societal and environmental progress. NEVs are defined as vehicles that utilize alternative energy sources other than conventional fossil fuels, including battery electric vehicles, hybrid electric vehicles, fuel cell vehicles, and other types powered by clean energy technologies [1]. As the penetration rate of NEVs continues to rise, the frequency of NEV-related accidents has also shown a corresponding increase. Consequently, the safety performance of NEVs has attracted growing attention from researchers and industry professionals alike [2]. Due to the significant differences in operating modes compared to traditional fuel vehicles, accidents involving NEVs tend to have more severe consequences, primarily in the following aspects: First, In most operational scenarios, NEVs rely primarily on electric motors for propulsion, which leads to substantially lower vehicle noise levels compared to traditional internal combustion engine (ICE) vehicles [3]. This notable difference in acoustic signatures can potentially introduce safety hazards for drivers as well as other road users, who may be less aware of the presence of quieter NEVs [4]. Second, NEV batteries are more prone to catching fire after collisions or rollovers compared to fuel vehicles, leading to more severe traffic accidents [5]. Additionally, the high starting torque of NEV motors and rapid acceleration increase the risk of accidents [6]. Finally, some NEVs adopt a single-pedal mode [7], which can lead to driver errors in emergencies. Traffic accident severity is the key information to allocate the rescue resources required by traffic management authorities. Therefore, predicting the severity of traffic accidents is crucial for aiding traffic authorities in conducting emergency rescue operations and minimizing casualties and property losses.

In recent years, considerable research has been conducted both domestically and internationally on the prediction of traffic accident severity. These studies generally focus on two key aspects: the identification of contributing factors (causal analysis) and the development of predictive models. Causal analysis aims to determine the factors that significantly influence accident severity by utilizing data from established accident databases. These datasets typically contain information on crash characteristics, human factors, vehicle types, and environmental conditions. For instance, George analyzed the influence of various vehicle categories on the severity of accidents, including passenger cars, mopeds, motorcycles, buses, and trucks. The findings of this research indicated that good weather conditions and nighttime crashes are often associated with increased severity [8]. Ditcharoen highlighted vehicle speed as the most critical factor affecting crash severity, followed by driver-related variables. Other relevant factors included vehicle type, weather conditions, alcohol impairment, and driver fatigue [9]. Azhar conducted a study using 2014 crash data involving heavy vehicles in Malaysia and applied Classification and Regression Tree (CART) and Random Forest (RF) algorithms to classify and predict injury severity among heavy vehicle drivers. The results identified collision type, driver error, number of vehicles involved, driver age, lighting conditions, and vehicle type as key predictors of injury outcomes [10]. Wang, applying the Apriori algorithm based on association rules, identified several critical factors influencing accident severity for vulnerable road users. These included regional zoning, the density of restaurants and shopping POIs (Points of Interest), life-service POI density, accident causes, and modes of transport [11]. Moreover, common methodologies employed in the causal analysis of accident severity include factor analysis, analysis of variance (ANOVA), the Apriori algorithm, Logit-based models, and so on. Li employed factor analysis to rank the importance of characteristics affecting traffic accidents and identified key influencing factors [12]. Sun conducted qualitative studies on influencing features through variance analysis, identifying significant factors [13]. Refs. [14,15,16] utilized the Apriori algorithm to mine 10 strong association rules related to traffic accident factors and analyzed these rules in depth. Alkheder combined decision trees, Bayesian networks, and support vector machines to comprehensively analyze risk factors related to traffic accident severity [17]. Eboli and Theofilatos used the Logit model to reveal the impact of vehicle type, speeding, and traffic flow on collision severity [18,19]. However, the studies mentioned above primarily focus on traditional internal combustion engine vehicles, with limited research conducted on causal analysis of accidents involving new energy vehicles (NEVs). Additionally, these studies often fail to comprehensively consider the combined impact of the human–vehicle–road–environment system on traffic accident severity.

Accident severity prediction is built upon causal analysis, utilizing algorithms such as machine learning or neural networks to establish relationships between accident data and severity levels, enabling the prediction of accident outcomes. Zhang developed the FA-RF model to predict accident severity and proposed real-time communication rules based on social robots [20]. Yang proposed a multi-task DNN framework to predict the severity of injuries, fatalities, and property damage [21]. Ospina constructed a hybrid algorithm combining genetic algorithms and simulated annealing to detect the severity of motorcycle accidents [22]. Wang developed a data-driven model based on vehicle kinematic features to achieve precise and near-real-time predictions of occupant injury severity [23]. Erzurum explored the applicability of single-class classification models in traffic accident prediction [24]. Hossain combined random forests and classification and regression tree models to predict highway traffic accidents, considering the spatial heterogeneity of accidents. Their results showed significant differences in risk factors across different road sections [25]. Liu summarized traffic flow parameters into stability coefficients to reduce input variables and used support vector machines, random forests, and logistic regression to conduct real-time accident risk predictions, improving prediction accuracy [26]. Guo incorporated user risk-driving behavior data from Amap navigation, applied the SMOTE algorithm to generate accident samples, and used logistic regression to predict accident risks, achieving promising results [27]. Yang extracted eight key accident features and four additional factors of lesser importance based on the random forest algorithm, achieving high-accuracy predictions of accident severity [28]. Alhaek proposed a deep learning-based approach to predict accident severity, by leveraging Convolutional Neural Networks (CNN) to extract spatial features from high-dimensional data and applying Bidirectional Long Short-Term Memory (BiLSTM) networks to capture the temporal dependencies among various influencing factors [29]. However, the studies discussed above do not consider the influence of uncertainty in accident characteristics on the accuracy of severity predictions, which limits the reliability of the results.

The shortcomings of these studies can be primarily attributed to two key factors: First, the widespread adoption of NEVs is relatively recent, resulting in insufficient traffic accident data. Second, NEV-specific technologies, such as battery and motor systems, increase the complexity of accident analysis. Existing accident prediction models do not fully account for the characteristics of NEVs, leading to some bias in accident severity predictions. With the implementation of national standards for NEVs, a solid foundation has been laid for studying the severity of NEV accidents. The “Technical Specification of Remote Services and Management System for Electric Vehicles—Part 3: Communication Protocol and Data Format” (GB/T32960.3—2016) [30] mandates the establishment of a three-tier monitoring platform (national, government, and enterprise levels) for NEVs to enable real-time data collection and transmission. This has, to some extent, supplemented the datasets needed for analyzing accident severity.

However, in real-world driving environments, the occurrence probabilities of accidents with different severity levels (e.g., minor, serious, and fatal) vary significantly. This leads to a highly imbalanced class distribution in accident-related datasets collected based on such standard. Such imbalance tends to cause predictive models to overfit on high-frequency minor accident samples while significantly reducing their ability to detect rare but high-risk fatal events. This severely limits the model’s generalization ability and practical value. Furthermore, the complex feature coupling and uncertainty among vehicle operating parameters (e.g., speed, road type, driving environment) further obscure the relationship between accident severity and key features, reducing both the discriminative accuracy and robustness of the model. The combined challenges of data imbalance and feature uncertainty significantly constrain the accuracy and reliability of accident severity prediction.

Therefore, to address the challenges of data imbalance and feature coupling uncertainty, this paper proposes a severity prediction method for the NEV accidents based on Cost-sensitive Fuzzy XGBoost (CFXGBoost), which integrates fuzzy neural networks and the XGBoost algorithm to achieve accurate accident severity prediction. The main contributions of this paper are as follows:

(1): Based on the traffic accident data from various provinces, 20 feature parameters strongly related to traffic accident severity were extracted using chi-square filtering and wrapper methods.
(2): A Cost-sensitive Fuzzy XGBoost method is proposed, which integrates a fuzzy neural network with the XGBoost algorithm to accurately model highly coupled and uncertain accident features. A cost-sensitive mechanism is employed to construct loss functions tailored to different levels of accident severity. This approach significantly enhances the accuracy and robustness of predicting uncertain accident severity.

The structure of this paper is as follows: Section 2 analyzes accident causes and conducts feature selection; Section 3 introduces the proposed Cost-sensitive Fuzzy XGBoost algorithm; Section 4 presents experimental validation and results analysis; and Section 5 provides conclusions.

2. Materials

2.1. Accident Experimental Data

This paper utilized over 410,000 accident records from selected provinces, involving more than 440,000 vehicles. These data provide detailed records of both direct and indirect information related to NEV accidents, including accident time, location, road conditions, lighting conditions, vehicle operating state, driver information, and accident determination causes.

2.2. Analysis of the Causes of Accident Severity

The road traffic system is a complex, spatiotemporally dynamic system. And the factors influencing traffic safety are interrelated and multifaceted. Therefore, to comprehensively assess the causes of accident severity, this paper categorizes the main factors affecting traffic safety into four aspects based on accident traffic data: human, vehicle, road, and driving environment.

2.2.1. Human Factors

Drivers collect external information through vision and hearing during driving and react based on their driving experience and habits. Age reflects differences in reaction ability and physical function among drivers. In terms of gender, male drivers are more prone to being careless and irritable leading to more serious types of accidents. Driving experience is reflected by the driver’s years of experience. Additionally, statistics show that left-hand driving is more likely to result in minor accidents. Therefore, this paper selects the driver’s age, sex, driving experience, and left-hand driving as features closely related to driving behavior [11]. The specific feature descriptions are shown in Table 1.

2.2.2. Vehicle Factors

New energy vehicles are equipped with battery and motor systems, which can easily lead to secondary hazards such as fire and explosion during accidents. Different types of vehicles have varying impacts on accident severity. For example, hydrogen fuel cell vehicles may face higher risks under certain conditions due to the flammability of hydrogen. Older vehicles, with issues like part aging and outdated technology, are more prone to experience severe accidents. High-speed vehicles, with higher collision energy and shorter reaction times, are more likely to result in severe accidents. Therefore, this paper selects vehicle type, vehicle age, speed, data warehousing time, risk type, geographic coordinates (latitude and longitude), travel direction, motor function, battery type, battery overheating, and autonomous driving as vehicle-related factors [31]. The specific descriptions are shown in Table 2, with alarm types provided in Appendix A.

2.2.3. Road Factors

The type of road a vehicle travels on also affects the severity of accidents. For instance, rural areas often lack comprehensive traffic safety facilities, making rural roads more prone to severe accidents. Additionally, intersections are common sites for more severe accidents. Therefore, this paper selects urban or rural area, road type, junction, and speed limits as road factors [32]. The specific descriptions are shown in Table 3, with road types provided in Appendix A.

2.2.4. Driving Environment Factors

The time of day in the driving environment significantly impacts accident severity. Typically, accidents are more likely to be severe at night when drivers are fatigued. Additionally, the season is a crucial factor; autumn and winter often bring foggy conditions that impair driver visibility and lead to severe traffic accidents. This paper selects crash date, crash hour, crash season, light condition, and road condition as driving environment factors. Their specific descriptions are shown in Table 4 [33].

3. Methods

3.1. Feature Selection

To reduce the dimensionality of the model and avoid issues such as overfitting, this paper employs chi-square filtering and the wrapper method to extract feature parameters for accident severity. The Boruta algorithm [34] is chosen as the wrapper method.

Chi-square filtering is a method for testing the correlation between independent variables and dependent variables. Essentially, it detects the difference between two features to eliminate variables unrelated to the target feature. Chi-square filtering returns both the chi-square value and the p-value. A larger chi-square value indicates a stronger association with the target feature, while a smaller value suggests a weaker association. The p-value represents the significance level, typically set to 0.1 or 0.05 as the critical threshold. During the experiment, if a feature has a large chi-square value and a p-value below the critical value, it rejects the null hypothesis of “no association between the two features”, indicating a high probability of correlation. Chi-square filtering calculates the chi-square values and p-values for all feature parameters and ranks them in ascending order of Chi-square values. Partial results of the Chi-square filtering are shown in Table 5.

In this paper, the p-value is set to 0.05. Based on the data analysis in the table, features such as “longitude” and “travel direction” have low chi-square values and p-values greater than 0.05, indicating a weak correlation with accident severity. Therefore, these two features are removed.

After the initial feature selection using chi-square filtering, the wrapper method is further employed to refine the feature selection for accident severity. The Boruta algorithm is chosen as the wrapper. It determines the relevance of variables by comparing the correlation between the actual features with random probes and uses full correlation for feature selection.

Boruta is a wrapper-based feature selection method that leverages a random forest classifier to evaluate relevant features. It generates shadow features by randomly permuting the original features and combines them with the original set to form an extended feature set. Feature importance is evaluated using a random forest, and each original feature’s importance is compared to the highest importance among the shadow features using Z-scores. The important features are retained; while the less important ones are removed. This process is repeated iteratively until all features are classified or the maximum number of iterations is reached. Moreover, the retained features are used in subsequent modeling.

In this study, Boruta was implemented using the BorutaPy library with a RandomForestClassifier as the base estimator. To ensure reproducibility and stable performance, the key parameters are set as follows: n_estimators = 500, max_depth = 25, max_iter = 100, alpha = 0.05, and random_state = 42. The filtering results are shown in Table 6.

According to Table 6, “Data Entry Time”, “Longitude”, “Latitude”, and “Speed Limit” were rejected. Combining the results of the chi-square filtering, five features—namely “Data Entry Time”, “Longitude”, “Latitude”, “Travel Direction”, and “Speed Limit”—are deleted.

The feature selection process is illustrated in Figure 1. The target variable represents the severity of traffic accidents, which is classified into three levels based on the severity of the accident: Level 0 represents a minor accident; Level 1 represents a serious accident; Level 2 a represents fatal accident.

3.2. Accident Severity Prediction Based on FXGBoost

XGBoost (Extreme Gradient Boosting) is an efficient and flexible enhanced decision tree algorithm, widely used in machine learning and data science [35]. Its network architecture is shown in Figure 2. Based on the gradient boosting framework, XGBoost enhances prediction accuracy by incrementally constructing tree models. It excels in processing speed and prediction accuracy when dealing with large-scale datasets.

The main structure of XGBoost includes two key components:

(1): CART Regression Trees: XGBoost uses Classification and Regression Tree (CART) regression trees to build multiple tree models for classification prediction.
(2): Gradient Boosting Algorithm: XGBoost optimizes the parameters of these tree models through the gradient boosting algorithm.

The objective function of XGBoost consists of two parts:

(1): Model Error: This represents the difference between the actual values and the predicted values.
(2): Regularization Term: This measures the structural error of the model, controlling its complexity to reduce the risk of overfitting.

XGBoost employs parallel prossing, dividing the data into blocks and constructing trees in multiple threads, which accelerates the model training process. In addition, XGBoost can handle missing values by automatically learning the optimal split for missing data, simplifying the data preprocessing workflow.

The accident feature parameters are affected by the coupling of various factors, exhibiting complex and variable characteristics. For instance, road conditions are influenced by climate, road material, and terrain. To more effectively handle the fuzzy rules and uncertain information described above, a severity prediction method for new energy vehicle accidents based on Cost-sensitive Fuzzy XGBoost is proposed by integrating fuzzy neural networks and the XGBoost algorithm. Its network architecture is shown in Figure 3. This algorithm uses the feature parameters selected in Table 6 as inputs and outputs the accident severity classification, categorized as minor, severe and fatal accident accidents.

Based on the XGBoost algorithm, the fuzzy inference results are used as additional features and input into XGBoost, leveraging its powerful gradient-boosting capabilities for modeling. First, an appropriate membership function is defined for the numerical features, and the original features are transformed into fuzzy values through these functions to generate a fuzzy feature set. The generalized bell-shaped membership function is widely used in fuzzy system modeling and pattern recognition. The membership function is calculated as follows:

f (x, a, b, c) = \frac{1}{1 + {| \frac{x - c}{a} |}^{2 b}}

(1)

where x represents the input feature; c is the center of the function, and the sample mean is taken; a is the width parameter, which controls the width of the function and takes the standard deviation; b is the shape parameter influencing the shape of the function and is a constant.

The Fuzzy Neural Network (FNN) adaptively generates fuzzy rules and processes the fuzzified features using fuzzy inference to produce fuzzy outputs. The fuzzy inference results are combined with the original features to construct an extended feature set, which is subsequently input into XGBoost for training. Leveraging XGBoost’s powerful gradient-boosting capabilities, the model is built and enhanced by weighting the combined features to improve its understanding of complex interactive features.

3.3. Cost-Sensitive Loss Function Design Based on XGBoost

In the task of predicting the severity of new energy vehicle accidents, the distribution of accident severity levels is highly imbalanced. In particular, severe accidents represent a small minority of cases. When standard loss functions are used, machine learning models tend to be biased toward the majority classes (e.g., minor accidents), resulting in poor detection performance on critical severe accident cases.

To address the class imbalance considerations, a cost-sensitive learning mechanism is introduced into the XGBoost framework by assigning different weights

w = {w_{0}, w_{1}, w_{2}}

in the loss function for minor, severe, and fatal accidents. Additionally, to mitigate the risk of gradient explosion caused by excessively large class weights, the weight values are normalized, and an early stopping strategy is employed to enhance training stability and generalization performance. The weighted cross-entropy loss function is defined as:

ζ = - \sum_{i = 1}^{N} w_{y i} . l o g (\frac{e^{f y_{i}}}{\sum_{j} e^{f_{j}}})

(2)

where

f_{j}

denotes the predicted logit score for class j,

y_{i}

is the true label of the i-th sample, and

w_{y_{i}}

is the penalty weight assigned to that label. Based on the empirical class distribution and the critical importance of severe accidents in real-world applications, the weights are empirically set as:

w_{0} = 1

,

w_{1} = 2

,

w_{2} = 5

.

4. Results and Discussion

4.1. Evaluation Indicators for Predicting Accident Severity

To measure the prediction accuracy of the algorithm, accuracy, precision, recall, and F1 score are selected as evaluation metrics. Accuracy is the proportion of correctly predicted accident severities among all accidents. Precision calculates the proportion of correctly predicted positive samples out of all samples predicted as positive, to evaluate the accuracy of the model’s prediction of positive samples.

P = \frac{T P}{T P + F P}

(3)

where

T P

denotes the number of correctly predicted samples;

F P

represents the number of incorrectly predicted samples.

Recall is used to evaluate the model’s ability to predict all positive samples, calculated as the ratio of correctly predicted positive samples to the total actual positive samples.

R = \frac{T P}{T P + F N}

(4)

where

F N

is the number of positive samples that were incorrectly predicted.

F1 score combines both precision and recall, providing a more comprehensive evaluation of the performance of the model.

F 1 = \frac{2 \times P \times R}{P + R}

(5)

4.2. Analysis of Experimental Results

The traffic accident data is divided into training and test sets in an 80:20 ratio. The training set is employed to develop the model, while the test set is used to evaluate its predictive performance. The distribution characteristics of the training and testing datasets are shown in Table 7.

As shown in Figure 4, the variation in XGBoost training loss is clearly illustrated. The blue curve represents the Standard Cross-Entropy Loss, while the red curve corresponds to the Cost-Sensitive Loss. It is evident that the introduction of the Cost-Sensitive Loss significantly improves the convergence speed of the model. After around 200 epochs, the loss tends to stabilize, with the rate of decrease slowing dramatically or even becoming negligible, indicating a more robust and stable training process.

Experimental results demonstrate that, compared to the standard cross-entropy loss, the Cost-Sensitive Loss leads to faster loss reduction, smaller fluctuations, and more stable convergence. Furthermore, the model achieves notable improvements in key performance metrics such as F1-score and Recall for minority classes. This approach greatly enhances the reliability and practical applicability of the model, particularly in extreme traffic scenarios.

The comparative experiments designed in this paper employ K-Nearest Neighbor (KNN) [36], Random Forest (RF) [37], Decision Tree (DT) [38], Bayesian Network (BN) [39], Adaboost [40], and XGBoost [35]. To ensure the validity of the comparison results, all baseline algorithms are trained and tested on the same datasets used by the proposed method. The experimental results are shown in Table 8.

The results in Table 8 show that XGBoost improves the prediction accuracy of accident severity by 0.05 and 0.03 compared to KNN and RF, respectively. Moreover, it achieves better performance across all levels of accident severity. This is because XGBoost uses gradient boosting and second-order optimization, which enables more accurate modeling of complex, non-linear relationships compared to single models like DT or distance-based methods like KNN. The proposed method enhances XGBoost by incorporating a fuzzy neural network and combining original features with fuzzy features. This effectively mitigates the accuracy loss caused by uncertainty in accident severity. The proposed method improves the prediction accuracy for minor, severe, and fatal accidents to 0.93, 0.90, and 0.89, respectively, significantly outperforming existing accident severity prediction approaches.

Moreover, considering the class imbalance in the accident severity dataset, evaluation metrics such as accuracy, recall, and F1-score are introduced. As shown in Table 8, the proposed method outperforms other methods across all three metrics. This performance gain is primarily due to the use of a tailored loss function and carefully designed weighting of contributing factors, which significantly enhances the model’s ability to accurately predict accident severity.

The consistently high performance of FXGBoost suggests that it is better equipped to capture complex, non-linear relationships, and imprecise patterns in traffic accident data, especially in the context of new energy vehicles. These findings confirm that FXGBoost is not only robust but also highly suitable for practical deployment in real-world traffic safety systems where early and accurate severity assessment is critical.

4.3. Feature Contribution Analysis

Analyzing feature contributions can further identify the key factors influencing traffic accident severity and reveal how these factors impact outcomes. This understanding can support road safety and police departments in understanding the causes of accident severity and implementing effective early warning measures. In the FXGBoost model, the contribution of each feature to accident severity prediction can be calculated by

M_{b, j} = \sum_{i = 1}^{T} δ_{b} \frac{[i] M_{b, j} [i]}{\sum_{j = 1}^{D} \sum_{i = 1}^{T} δ_{b} [i] M_{b, j} [i]}

(6)

where

M_{b, j}

represents the weight value of the j-th attribute column in the b-th sample;

δ

represents the weight.

The contribution of the 20 selected features to FXGBoost is shown in Figure 5.

According to Figure 5, the feature “Light Condition” exhibits the highest contribution to the prediction of traffic accident severity. Other influential features, in descending order of importance, include “Vehicle Speed”, “Road Condition”, and “Battery Overheating”. These results indicate that both environmental factors (such as “Lighting” and “Road Conditions”) and vehicle-related factors (such as “Speed” and “Battery Status”) play a critical role in determining the severity of traffic accidents. By identifying the key factors contributing to severe accidents (such as “Vehicle Type”, “Battery Characteristics”, and “Driving Conditions”), regulators can develop targeted preventive measures. For example, high-risk vehicle profiles revealed by the model can guide stricter safety standards, more rigorous technical inspections, or differentiated insurance pricing.

It is important to note that environmental factors are inherently dynamic and influenced by a wide range of external variables, including “Weather”, “Time of Day”, and “Geographic Context”. Such variability introduces a significant degree of uncertainty and fuzziness, which traditional models based on crisp logic may struggle to interpret effectively. To address this challenge, the proposed model incorporates a fuzzy inference mechanism, which enables the transformation of ambiguous environmental indicators into fuzzy sets. This approach improves the model’s capacity to recognize vague patterns, handle uncertain information, and generalize across complex and dynamic conditions. As a result, the proposed model becomes more robust and better suited for real-world applications where traffic scenarios are often characterized by imperfect, imprecise, or incomplete data.

5. Conclusions

To support the low-carbon and sustainable development goals of new energy vehicles (NEVs) and assist traffic management agencies in efficiently allocating emergency response resources, this study proposes a severity prediction method for NEV-related traffic accidents based on the Cost-sensitive Fuzzy XGBoost (CFXGBoost) algorithm. First, chi-square filtering and wrapper methods are used to extract 20 key feature data strongly correlated with accident severity from a dataset of over 410,000 accident records across multiple provinces. Then, a fuzzy neural network is integrated with the XGBoost algorithm to predict accident severity, effectively addressing challenges such as multi-factor coupling and the complex, dynamic nature of accident scenarios. Finally, the model’s performance is evaluated using accident datasets from various provinces. The results demonstrate that FXGBoost achieves a prediction accuracy of 0.92 and outperforms other models across all evaluation metrics. The proposed method can offer a reliable tool for traffic management authorities to assess and respond to accident severity, ensuring the sustainable development of new energy vehicles.

Future work will focus on optimizing the FXGBoost model and integrating dynamic factors such as real-time traffic data, weather conditions, and social events. This will enable real-time prediction of accident severity. Moreover, the model’s applicability will be expanded by incorporating data from a wider range of geographic regions.

Author Contributions

Conceptualization, S.H.; methodology, S.H.; validation, S.H. and K.W.; formal analysis, X.Y. and C.W.; data curation, S.H.; writing—original draft preparation, S.H. and X.Y.; writing—review and editing, X.Y.; visualization, K.W. and X.Y.; supervision, C.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Key R&D Program of China, grant number 2022YFE0207800.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available on request due to restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Alarm Type.

Code	Describe
0	Type of on-board energy storage device overcharged
1	Drive motor temperature alarm
2	High voltage interlock status alarm
3	Drive motor controller temperature alarm
4	DC-DC status alarm
5	Brake system alarm
6	DC-DC temperature alarm
7	Insulation alarm
8	Traction battery consistency difference alarm
9	Chargeable energy storage system mismatch alarm
10	SOC jump alarm
11	SOC high alarm
12	Single battery under-voltage alarm
13	Single battery overvoltage alarm
14	SOC low alarm
15	Onboard energy storage device type undervoltage alarm
16	Onboard energy storage device type overvoltage alarm
17	Battery high-temperature alarm
18	Temperature difference alarm

Table A2. Road Type.

Code	Describe
10	Expressway
11	Class I expressway
12	Class II expressway
13	Class III expressway
14	Class IV expressway
19	Substandard expressway
21	Urban expressway
22	General urban roads
25	Self-built road of unit community
26	Public parking
27	public garden
29	Other roads

References

Yuan, X.; Liu, X.; Zuo, J. The development of new energy vehicles for a sustainable future: A review. Renew. Sustain. Energy Rev. 2015, 42, 298–305. [Google Scholar] [CrossRef]
Hu, G.; Xiao, W.; Schwebel, D.C.; Yang, L.; Cheng, P.; Zhao, S.; Zhao, M. Characteristics of media-reported road traffic crashes related to new energy vehicles in China. J. Saf. Res. 2025, 92, 48–54. [Google Scholar]
Ma, X.; Niu, Z.; Zhang, Z.; Sun, S.; Li, Y. Research on the influence factors of accident severity of new energy vehicles based on ensemble learning. Front. Energy Res. 2023, 11, 1329688. [Google Scholar]
Zhang, D.W.; Dong, X.C.; Lei, Y.; Li, H.H.; Luo, J.; Zhang, C.L.; Zhao, C.Y.; Tang, K.W. Analysis of the severity of traffic accidents between new energy vehicles and pedestrians. J. Saf. Environ. 2024, 24, 1061–1069. [Google Scholar]
Li, Z.Y.; Liu, T.Y. Research on the protection of thermal runaway accidents of new energy vehicle batteries. Sci. Technol. Innov. 2023, 24, 120–122, 125. [Google Scholar]
Luan, C.Y.; Yang, X.L.; Liu, Y.; Zhu, Y.F. Research on the development status and trend of China’s new energy vehicle drive motor industry. Automob. Parts 2024, 16, 28–31. [Google Scholar]
Wu, G.; Du, R.; Li, J.; Zhou, H.; Dang, X.; Wei, J.T. Analysis of the control principle of the single pedal driving mode of electric vehicles. Automob. Electr. Appl. 2024, 5, 6–8. [Google Scholar]
George, Y.; Athanasios, T.; George, P. Investigation of road accident severity per vehicle type. Transp. Res. Procedia 2017, 25, 2076–2083. [Google Scholar] [CrossRef]
Ditcharoen, A.; Chhour, B.; Traikunwaranon, T.; Aphivongpanya, N.; Maneerat, K.; Veeris Ammarapala, V. Road traffic accidents severity factors: A review paper. In Proceedings of the 2018 5th International Conference on Business and Industrial Research (ICBIR), Bangkok, Thailand, 17–18 May 2018; pp. 339–343. [Google Scholar]
Azhar, A.; Roslan, A.; Ariff, N.M.; Abu Bakar, M.A. Classification of driver injury severity for accidents involving heavy vehicles with decision tree and random forest. Sustainability 2022, 14, 4101. [Google Scholar] [CrossRef]
Jiao, P.; Sun, X.; Lu, H.; Wang, J.; Ma, S.; Ji, L. Analyzing the risk factors of traffic accident severity using a combination of random forest and association rules. Appl. Sci. 2023, 13, 8559. [Google Scholar]
Li, Y.J. Analysis and Severity Prediction of Influencing Factors of Road Traffic Accidents in China; Jilin University: Changchun, China, 2020. [Google Scholar]
Sun, C.; Han, G.G.; Zhang, J.F. Analysis of the causes of expressway accidents and countermeasures based on analysis of variance and multiple linear regression models. Traffic Transp. 2021, 37, 7. [Google Scholar]
Feng, X.F.; Xu, S.; Yuan, J. Analysis of the causes of traffic accidents of new energy vehicles based on association rules. J. Chin. People’s Public Secur. Univ. (Nat. Sci. Ed.) 2024, 30, 37–43. [Google Scholar]
Han, T.Y.; Lv, K.G.; Xu, J.C.; Li, X.; Qiao, J. Analysis and prediction of traffic accident injuries based on APRIORI-TAN. J. Saf. Sci. Technol. China 2021, 17, 50–56. [Google Scholar]
He, L.Y. Research on Road Traffic Accident Analysis Based on Apriori and XGBoost Algorithms; Soochow University: Soochow, China, 2024. [Google Scholar]
Alkheder, S.; Alrukaibi, F.; Aiash, A. Risk analysis of traffic accidents’ severities: An application of three data mining models. ISA Trans. 2020, 106, 213–220. [Google Scholar] [CrossRef]
Eboli, L.; Forciniti, C. The severity of traffic crashes in Italy: An explorative analysis among different driving circumstances. Sustainability 2020, 12, 856. [Google Scholar] [CrossRef]
Theofilatos, A.; Yannis, G. Exploring crash injury severity on urban motorways by applying finite mixture models. Transp. Res. Procedia 2019, 41, 480–487. [Google Scholar] [CrossRef]
Zhang, C.H.; Li, Y.J.; Li, T. A road traffic accidents prediction model for traffic service robot. Libr. Hi Tech. 2022, 40, 1031–1048. [Google Scholar] [CrossRef]
Yang, Z.; Zhang, W.; Feng, J. Predicting multiple types of traffic accident severity with explanations: A multi-task deep learning framework. Saf. Sci. 2022, 146, 105522. [Google Scholar] [CrossRef]
Ospina, H.M.; Quintana, J.L.A.; Lopez, V.F.J.; Garcia, S.B.; Barrero, L.H.; Sana, S.S. Extraction of decision rules using genetic algorithms and simulated annealing for prediction of severity of traffic accidents by motorcyclists. J. Ambient. Intell. Humaniz. Comput. 2021, 12, 10051–10072. [Google Scholar] [CrossRef]
Wang, Q.; Gan, S.; Chen, W.; Li, Q.; Nie, B. A data-driven, kinematic feature-based, near real-time algorithm for injury severity prediction of vehicle occupants. Accid. Anal. Prev. 2021, 156, 10614. [Google Scholar] [CrossRef]
Erzurum, C.Z.I.; Kamisli, O.Z. Prediction of fatal traffic accidents using one-class SVMs: A case study in Eskisehir, Turkey. Int. J. Crashworthiness 2022, 27, 1433–1443. [Google Scholar] [CrossRef]
Hossain, M.; Muromachi, Y. Understanding crash mechanisms and selecting interventions to mitigate real-time hazards on urban expressways. Transp. Res. Rec. J. Transp. Res. Board 2011, 2213, 53–62. [Google Scholar] [CrossRef]
Liu, X.L.; Shan, J.; Liu, T.Z.; Rao, C.; Liu, T. Real-time risk prediction of highway traffic accidents based on the traffic flow stability coefficient. J. Transp. Inf. Saf. 2022, 4, 40. [Google Scholar]
Guo, M.; Zhao, X.H.; Yao, Y.; Wu, D.Y.; Su, Y.L.; Bi, C.F. Research on accident risk based on driving behavior and traffic operation status. J. South China Univ. Technol. (Nat. Sci. Ed.) 2022, 50, 29–38. [Google Scholar]
Yang, J.; Han, S.; Chen, Y. Prediction of traffic accident severity based on random forest. J. Adv. Transp. 2023, 2023, 7641472. [Google Scholar] [CrossRef]
Liang, W.; Li, T.; Alhaek, F.; Javed, M.H.; Rajeh, T.M. Learning spatial patterns and temporal dependencies for traffic accident severity prediction: A deep learning approach. Knowl.-Based Syst. 2024, 286, 111406. [Google Scholar]
GB/T32960.3; Technical Specification of Remote Services and Management System for Electric Vehicles—Part 3: Communication Protocol and Data Format. Chinese Standard: Beijing, China, 2016.
Obasi, I.C.; Benson, C. Evaluating the effectiveness of machine learning techniques in forecasting the severity of traffic accidents. Heliyon 2023, 9, 1–10. [Google Scholar] [CrossRef]
Ding, C.; Chen, P.; Jiao, J. Non-linear effects of the built environment on automobile-involved pedestrian crash frequency: A machine learning approach. Accid. Anal. Prev. 2018, 112, 116–126. [Google Scholar] [CrossRef]
Chen, F.; Chen, S.; Ma, X. Analysis of hourly crash likelihood using unbalanced panel data mixed logit model and real-time driving environmental big data. J. Saf. Res. 2018, 65, 153–159. [Google Scholar] [CrossRef]
Kursa, M.B.; Rudnicki, W.R. Feature selection with boruta package. J. Stat. Softw. 2010, 36, 1–13. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Taunk, K.; De, S.; Verma, S.; Swetapadma, A. A brief review of nearest neighbor algorithm for learning and classification. In Proceedings of the 2019 International Conference on Intelligent Computing and Control Systems (ICCS), Madurai, India, 15–17 May 2019; pp. 1255–1260. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Kotsiantis, S.B. Decision trees: A recent overview. Artif. Intell. Rev. 2013, 39, 261–283. [Google Scholar] [CrossRef]
Goan, E.; Fookes, C. Bayesian neural networks: An introduction and survey. Case Stud. Appl. Bayesian Data Sci. CIRM Jean-Morlet Chair 2018, 2020, 45–87. [Google Scholar]
Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]

Figure 1. Feature Selection Process.

Figure 2. XGBoost Network Architecture.

Figure 3. Fuzzy XGBoost Network Architecture.

Figure 4. FXGBoost Training Loss Curve.

Figure 5. FXGBoost Feature Contribution.

Table 1. Human factors.

Serial Number	Feature Name	Feature Description
1	Driver Age	1: Under 25 years; 2: 26–35 years old; 3: 36–45 years old; 4: 46–55 years old; 5: over 55 years old
2	Driver Sex	1: Male; 2: Female
3	Driver Experience	0: No driver’s license; 1: five years and below; 2: six–10 years; 3: 11–15 years; 4: 16–20 years; 5: more than 20 years
4	Left Hand Drive	1: Yes; 2: no

Table 2. Vehicle factors.

Serial Number	Feature Name	Feature Description
1	Vehicle Type	1: Battery electric vehicle; 2: plug-in hybrid electric vehicle; 3: fuel cell electric vehicle; 4: hybrid electric vehicle
2	Vehicle Age	1: Under 3 years; 2: 4–6 years; 3: 7–9 years; 4: over 10 years
3	Vehicle Speed	1: Under 60; 2: 61–80; 3: 81–100; 4: 101–120; 5: over 121
4	Data Warehousing Time	1: 0:00–6:00; 2: 6:00–12:00; 3: 12:00–18:00; 4: 18:00–24:00
5	Risk Type	-
6	Longitude	-
7	Latitude	-
8	Travel Direction	1: North; 2: west; 3: south; 4: east
9	Motor Functioning Properly	1: Yes; 2: no
10	Battery Type	1: Nickel cobalt manganese/nickel cobalt aluminum; 2: lithium iron phosphate; 3: lithium manganese oxide; 4: lithium cobalt oxide
11	Battery Overheating	1: Yes; 2: no
12	Autonomous Driving	1: Yes; 2: no

Table 3. Road factors.

Serial Number	Feature Name	Feature Description
1	Urban or Rural Area	1: Urban; 2: countryside
2	Road Type	-
3	Junction	1: Yes; 2: no
4	Speed Limit	1: Under 20; 2: 20–40; 3: 40–60; 4: 60–80; 5: 80–100; 6: 100–120

Table 4. Driving environment factors.

Serial Number	Feature Name	Feature Description
1	Crash Date	1: 1st; 2: 2nd; …
2	Crash Hour	1: 0:00–6:00; 2: 6:00–12:00; 3: 12:00–18:00; 4: 18:00–24:00
3	Crash Season	1: Spring; 2: summer; 3: autumn; 4: winter
4	Light Condition	1: Sunshine; 2: streetlight; 3: car light; 4: no light
5	Road Condition	1: Dry; 2: not dry

Table 5. Partial results of Chi-square filtering.

Serial Number	Feature Name	Chi-Square Value	p Value
1	Longitude	0.1687	0.5248
2	Direction of Travel	2.1548	0.1461
3	Road Type	18.1347	0.0045
4	Crash Hour	26.1684	0.0002

Table 6. Results of Chi-square filtering.

Serial Number	Feature Name	Z-Score	Filter Results
1	Driver Age	8.38	True
2	Driver Sex	7.86	True
3	Driver Experience	9.74	True
4	Left-Hand Drive	8.82	True
5	Vehicle Type	8.09	True
6	Vehicle Age	9.62	True
7	Vehicle Speed	9.77	True
8	Data Warehousing Time	2.39	False
9	Risk Type	5.73	True
10	Longitude	1.61	False
11	Latitude	3.08	False
12	Travel Direction	7.02	True
13	Motor Functioning Properly	8.87	True
14	Battery Type	8.58	True
15	Battery Overheating	8.17	True
16	Autonomous Driving	6.00	True
17	Urban or Rural Area	5.70	True
18	Road Type	5.18	True
19	Junction	7.95	True
20	Speed Limit	1.94	False
21	Crash Date	5.88	True
22	Crash Hour	7.89	True
23	Crash Season	7.36	True
24	Light_Condition	8.43	True
25	Road Condition	6.92	True

Table 7. Data set quantity distribution table.

	Minor Accident	Serious Accidents	Fatal Accident	Total Data Volume
Training set	268,136	40,356	19,723	328,215
Test Set	67,284	9846	4923	82,054

Table 8. Comparison experiment results.

Model	Precision	Recall	F1-Score	Accuracy
Model	Precision	Recall	F1-Score	Minor Accident	Severe Accident	Fatal Accident	Total
KNN	0.81	0.80	0.80	0.83	0.81	0.84	0.83
RF	0.82	0.84	0.83	0.86	0.85	0.83	0.85
DT	0.83	0.80	0.81	0.81	0.80	0.80	0.81
BN	0.86	0.84	0.85	0.86	0.85	0.84	0.85
Adaboost	0.84	0.85	0.84	0.83	0.80	0.82	0.83
XGBoost	0.87	0.84	0.85	0.88	0.88	0.86	0.88
Proposed method	0.90	0.91	0.90	0.93	0.90	0.89	0.92

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, S.; Yin, X.; Wang, C.; Wang, K. Research on Accident Severity Prediction of New Energy Vehicles Based on Cost-Sensitive Fuzzy XGBoost. Sustainability 2025, 17, 5408. https://doi.org/10.3390/su17125408

AMA Style

Huang S, Yin X, Wang C, Wang K. Research on Accident Severity Prediction of New Energy Vehicles Based on Cost-Sensitive Fuzzy XGBoost. Sustainability. 2025; 17(12):5408. https://doi.org/10.3390/su17125408

Chicago/Turabian Style

Huang, Shubing, Xiaoxuan Yin, Chongming Wang, and Kun Wang. 2025. "Research on Accident Severity Prediction of New Energy Vehicles Based on Cost-Sensitive Fuzzy XGBoost" Sustainability 17, no. 12: 5408. https://doi.org/10.3390/su17125408

APA Style

Huang, S., Yin, X., Wang, C., & Wang, K. (2025). Research on Accident Severity Prediction of New Energy Vehicles Based on Cost-Sensitive Fuzzy XGBoost. Sustainability, 17(12), 5408. https://doi.org/10.3390/su17125408

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Accident Severity Prediction of New Energy Vehicles Based on Cost-Sensitive Fuzzy XGBoost

Abstract

1. Introduction

2. Materials

2.1. Accident Experimental Data

2.2. Analysis of the Causes of Accident Severity

2.2.1. Human Factors

2.2.2. Vehicle Factors

2.2.3. Road Factors

2.2.4. Driving Environment Factors

3. Methods

3.1. Feature Selection

3.2. Accident Severity Prediction Based on FXGBoost

3.3. Cost-Sensitive Loss Function Design Based on XGBoost

4. Results and Discussion

4.1. Evaluation Indicators for Predicting Accident Severity

4.2. Analysis of Experimental Results

4.3. Feature Contribution Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI