1. Introduction
According to the data of the World Health Organization (WHO), an estimated 1.19 million persons were killed in traffic accidents worldwide in 2021. Beyond the tragic loss of life, traffic accidents impose far-reaching socioeconomic burdens: the WHO further reports that road traffic injuries account for approximately 1–3% of global GDP losses annually, with low- and middle-income countries bearing over 90% of these casualties despite having only 60% of the world’s vehicles [
1]. In China specifically, 2021 saw a total of 273,098 road traffic accidents, leading to 62,218 deaths, 281,447 injuries, and direct property damage totaling 1.45 billion yuan (Chinese currency). For expressways—critical arteries of modern transportation—their inherent high-speed nature amplifies risk; in South Korea, each expressway accident averages 0.11 deaths and 2.56 injuries, compared to 0.027 deaths and 1.57 injuries on ordinary roads [
2]. This stark contrast underscores the urgency of focusing on expressway traffic safety, as even minor collisions on high-speed roads can escalate into catastrophic outcomes, making the study of expressway accident occurrence mechanisms a critical priority.
In recent decades, the transportation safety field has witnessed the growing development of models aimed at uncovering relationships between traffic accidents and their influencing factors. Early studies predominantly relied on statistical regression techniques, with many based on the generalized linear model (GLM) framework. Prominent examples include negative binomial models [
3,
4,
5], SARIMA models [
6], Poisson models and their extensions [
7,
8], Tobit models and related variants [
9,
10,
11,
12,
13], as well as finite mixture and Markov switching models [
14,
15,
16]. GLMs are widely adopted due to their theoretical robustness and their capacity to produce interpretable coefficients for each variable, helping researchers determine both the direction and magnitude of factor effects. However, these models also exhibit significant limitations: they impose substantial data requirements, necessitating strict compliance with distributional assumptions and adequate sample sizes. When data do not satisfy these conditions—such as in cases of limited samples or deviations from presumed distributions—GLMs may yield biased estimates or fail to converge [
17], thus restricting their practical utility in real-world traffic safety contexts.
In recent years, artificial intelligence (AI) models [
18] have emerged as a promising alternative, including neural networks [
19,
20,
21], Bayesian networks [
22], classification and regression trees [
23], and support vector machines [
17]. In contrast to traditional regression methods, AI models offer the advantage of requiring no prior assumptions about model structure, enabling them to capture complex nonlinear relationships among variables and effectively model accident frequency [
24]. Nonetheless, AI models are often criticized as “black-box” approaches [
22], which considerably limits their applicability in traffic safety studies. Their lack of transparency makes it difficult to discern how specific explanatory variables (e.g., traffic density, road curvature) affect accident rates. Even with sensitivity analyses, comprehensively quantifying each factor’s influence remains challenging, thereby impeding the translation of model findings into actionable safety policies. Furthermore, neural networks—a common class of AI models—are susceptible to overfitting, particularly when trained on limited or noisy datasets, undermining their generalizability to new scenarios or road segments. In response, researchers have made continuous improvements to neural network methodologies, and the SHAP framework—rooted in game theory—has become a valuable tool for interpreting black-box models [
25]. It has been successfully applied in traffic accident analysis [
26,
27,
28], helping to balance performance with interpretability.
Another shortcoming in the existing literature is the insufficient treatment of risk factors. Many studies either neglect the effects of traffic flow and road segment attributes [
29] or examine these factors in isolation rather than in combination [
30,
31]. Rarely do investigations consider the synergistic effects of these variables on traffic safety. This fragmented perspective overlooks the complex, interdependent nature of real-world expressway accidents, where multiple elements often jointly contribute to elevated risk. Moreover, some studies depend on data that are difficult to acquire [
32], such as high-resolution real-time sensor measurements or detailed driver behavior metrics, which limit the reproducibility and scalability of the research. Consequently, there is a growing demand for methodologies that can utilize commonly available data (e.g., routine traffic counts, fundamental road geometry information) to analyze accident rates. Such an approach would greatly expand the model’s applicability and reduce barriers to further research.
This study aims to predict expressway accident rates using a novel two-layer stacking model. The key contributions of this research are as follows.
- (1)
Unlike previous studies that faced challenges in data acquisition, our model achieves accurate predictions using easily accessible data, overcoming a significant barrier in traffic accident research.
- (2)
Balanced Model Interpretability and Performance: While earlier research relied on statistical regression or opaque “black-box” AI models, our approach integrates neural networks and decision trees through a linear model. By employing SHAP, we enhance the interpretability of the model, providing clear insights into variable importance.
- (3)
Comprehensive Factor Analysis: Previous studies often examined road characteristics and traffic flow in isolation. Our model uniquely combines these factors, offering a holistic analysis of their combined impact on accident rates.
In summary, a stacking model that leverages neural networks, decision trees, and linear models to predict expressway accident rates using readily available data is developed. The model’s effectiveness is validated through comparisons with standalone neural network and decision tree models, demonstrating superior fitting and predictive performance. Additionally, SHAP values are used to elucidate the influence of various factors on accident outcomes, providing actionable insights for traffic safety improvements.
Accordingly, the remainder of this paper is organized as follows.
Section 2 specifies the collected data. The detailed implementation of the proposed stacking model and methods is described in
Section 3, and
Section 4 introduces the analysis of the results. Finally, the conclusion and recommendations for future research are presented in
Section 5.
2. Data Preparation and Preliminary Analysis
2.1. Sample Description
The study area was a specific section of an expressway in Southern China, which is an important road. This section is a 154-km-long tolled expressway with 14 toll stations.
To demonstrate the proposed stacking model and compare it with benchmark models, data from a number of resources were collected, primarily sourced from the Department of Transportation and traffic management authorities. Information on accidents that occurred in the period from 2016 to 2019 was extracted for this study. The original accident data contained information such as the time, location, and weather during each accident. The road section characteristic information included the tunnel length and width, the bridge length and width, categories of tunnels and bridges, and other information, while the traffic information comprised the monthly traffic of various vehicle types.
With these data, the next step was to determine how to divide the study area into manageable roadway sections. Easily available data were used to study the traffic accident rate, so sections were divided according to the design of toll stations to minimize the amount of additional data to be collected. The area was divided into 13 road sections based on road characteristics and research objectives. According to the information on the official website of the Meteorological Bureau, the number of sunny days on each road section of each month was calculated.
2.2. Data Preprocessing
For regression analysis, predictor variables were statistically calculated and derived from the raw data. First, the number of traffic accidents that occurred in each section per month was counted. The monthly traffic rate of each road segment was then calculated. The monthly traffic accident rate of each section is defined as the number of accidents per unit flow. To facilitate easier calculation, the obtained value was multiplied by a coefficient, and the calculation formula is as follows:
where
[numbers/(vehicle × km)] is the accident rate of the
month on section
;
is the number of accidents in the
month on section
;
is the monthly traffic volume in the
month on section
j;
is the length of segment
in kilometers;
is the statistical month; January 2016 is the first month; and December 2019 is the 48th month.
In addition, the ratio of tunnels and bridges, the variances of the tunnel and bridge widths, and the proportions of various types of tunnels and bridges in each road section were calculated according to the available road section characteristic data. Finally, according to the number of sunny days, the percentage of sunny days for each section was calculated.
Table 1 reports the definitions and descriptive statistics of the variables used for model development. The definitions of different vehicle types mentioned in
Table 1 are provided in
Table 2.
2.3. Predictor Variables
The expressway accident rate is influenced by many factors, and these factors have different degrees of impact. It is crucial to measure the impact of each factor and determine the appropriate factors for the design and construction of a reasonable expressway accident rate model. If variables that have a weak correlation with the expressway accident rate are introduced into the model, the prediction accuracy will be reduced. Therefore, a correlation analysis of the factors related to the expressway accident rate was conducted, and the Pearson correlation coefficient was used to measure the correlations among factors related to the traffic rate. The absolute value of the Pearson correlation coefficient is between 0 and 1, and the greater the absolute value, the stronger the correlation. The sign of the correlation coefficient indicates the correlation direction between variables. The correlation coefficients between the influencing factors of the expressway traffic accident rate are shown in
Figure 1.
It is generally believed that the Pearson coefficient
indicates a correlation between variables. Therefore, variables with coefficients of correlation with the expressway accident rate
were selected as model inputs (independent variables).
Table 3 reports the independent variables of the proposed model and their coefficients of correlation with the accident rate.
3. Methodology
The stacking method was used to combine three neural network models and three decision trees to construct a two-layer stacking model for the prediction of the traffic accident rate of specific road sections. The proposed model consists of seven components: a recurrent neural network (RNN), a long short-term memory (LSTM) neural network, a gated recurrent unit (GRU) neural network, a gradient boosting regression tree (GBRT), an extreme gradient boosting (XGBoost) model, a random forest (RF) model, and linear regression. Among them, the first six components constitute the first layer of the stacking model, and linear regression is the second layer of the model. The structure of the proposed stacking model is shown in
Figure 2.
3.1. Model Specification
3.1.1. Neural Network Models
The commonly used RNN, LSTM, and GRU neural networks were used as parts of the first-layer model. The RNN, LSTM, and GRU neural network structures have short-term memory, including an input layer, a hidden layer, and an output layer. The LSTM and GRU models were proposed to solve the problems of gradient disappearance and long-term dependence caused by the RNN. GRU reduces the difficulty of the LSTM model and is a variant of it. Therefore, the RNN, LSTM, and GRU models have the same structure, as shown in
Figure 3.
3.1.2. Regression Tree Models
Gradient Boosted Regression Trees (GBRT), eXtreme Gradient Boosting (XGBoost), and Random Forest (RF) are all integrated learning models based on tree models. RF contains several independent tree models. It further introduces random feature selection in the training process based on the bagging integration constructed by a tree.
Figure 4 shows the simplified structure of the RF model.
Both GBRT and XGBoost are boosting-integrated tree models. GBRT constructs trees in a continuous method, with each tree attempting to correct errors in the previous tree. XGBoost is an improvement of the traditional gradient boosting methods and performs well in regression problems.
3.1.3. Linear Regression
Linear regression is a statistical analysis method that uses the regression analysis process in mathematical statistics to determine the quantitative relationship of interdependence between two or more variables. In this study, linear regression is defined as
where
is the accident rate predicted by the stacking model,
is a constant term, and
are coefficients.
To address the potential issue of high multicollinearity among the meta-features extracted in the first layer, which may adversely impact model performance, this study utilizes Ridge regression—a regularized linear regression technique—to mitigate its effects.
3.2. Stacking Model
A larger number of studies have shown that the prediction accuracy of models can be improved via integration. There are three methods of model integration, namely proportional addition, voting, and learning. Proportional addition is a method of linearly combining different models together. The stability of neural network methods is poor because they are prone to over-training, leading to fluctuations in the prediction of new samples. In contrast, tree models have a simple algorithm, a fast-training speed, and strong stability. Therefore, these two kinds of methods can be combined via the proportional addition method to improve overall accuracy and stability. In this study, they are combined by the stacking integrated learning method.
The stacking integrated learning method was first proposed by Wolpert [
33]. The stacking method, also known as stacked generalization, is an integration method based on hierarchical model combinations. Specifically, the method divides the learning process into two parts, namely level 1 and level 2 models. The original dataset is used as the training data for the level 1 models, and the output of the level 1 models is then used as the feature of the level 2 model. The stacking method has been applied in many fields, such as medical treatment and quality management, but it is rarely used in the field of traffic accident rate modeling and prediction. In this study, multiple models based on neural networks and decision trees are combined by the stacking method to establish an integrated model with superior performance.
To accurately predict the expressway accident rate, a two-layer stacking model is proposed. The level 1 models include three neural network models (RNN, LSTM, GRU) and three decision tree models (GBRT, XGBoost, RF), while the level 2 model is the linear regression model.
Figure 5 displays the framework of the proposed model.
The Implementation of the Proposed Model
The inputs of the level 1 models are the variables reported in
Table 3, and the output is the expressway accident rate predicted by each of the level 1 models. The input of the level 2 model is a dataset composed of the prediction results of the level 1 models and the ground truth, and the output is the final prediction result. To be specific, for the level 1 models, the training set includes 90% of the data for 2016–2018, and the validation set includes the remaining 2016–2018 data and 2019 data; the training set includes 421 sets of data, while the validation set includes 203 sets of data. For the level 2 model, the prediction results of the level 1 models and the ground truth constitute the training set and verification set for level 2. Specifically, the training set for the level 2 model is the data for 2016–2018 in the level 1 validation set and the values predicted by the level 1 models from 2016 to 2018, while the validation set is the data from 2019 and the values predicted by the level 1 models for 2019; the training set for the level 2 model contains 47 sets of data, and the validation set contains 156 sets of data. The modeling process is shown in
Figure 6.
3.3. Evaluation Metrics
Two measures are used to comprehensively evaluate the performance of the proposed model, namely the root mean square error (RMSE) and the mean absolute error (MAE).
The RMSE, also known as the standard error, can well reflect the accuracy and the degree of change in data. The higher the RMSE value, the higher the precision. The MAE refers to the mean of the absolute value of the difference between the predicted value obtained by the model and the actual value.
The two measures can be calculated as follows:
where
and
respectively indicate the ground truth and estimated value of the accident rate, and
n is the number of samples.
4. Results and Discussion
This section is divided into three parts. In the first subsection, the stability of the model under different training and validation sets is verified, while the second subsection describes the prediction performance results of the seven models. The last subsection presents the importance of each feature for the proposed stacking model.
4.1. Evaluation of Model Robustness to Training Data Size
To evaluate the robustness of the model architecture against variations in training data size, we designed multiple sets of independent experiments. The core workflow was consistent across all experiments, as outlined below:
Level 1 Model Training: All Level 1 base models (e.g., LSTM, GBRT) were trained using a specific percentage of the total dataset (case1: 90%, case2: 80%, case3: 70%, case4: 60%).
Level 2 Model Training: The meta-feature matrix output by the Level 1 models was used as input, and the corresponding true labels served as the target to train the Level 2 linear meta-model. This process resulted in the construction of a complete Stacking ensemble model.
Model Validation: Finally, the fully trained Stacking model was evaluated for performance using a dataset that was not used during the model training phase.
The results of these four cases are exhibited in
Table 4 and
Figure 7. The experimental results showed that despite variations in the proportion of training data used for the Level 1 models, the model performance (e.g., RMSE) remained highly consistent. This demonstrates the model’s excellent stability.
4.2. Comparison of Predictive Performance
To thoroughly evaluate the superiority of the proposed model, this study introduced a variety of benchmark methods for comparative experiments. The types of the selected benchmark methods are consistent with those of the base models in the first layer.
Figure 8 presents the prediction results obtained using these methods. To make a fair comparison, all the benchmark methods were fine-tuned with the same input data and the same number of training epochs.
Figure 8a–g, respectively, present the results of the individual RNN, LSTM, GRU, GBRT, XGBoost, and RF models and the proposed two-layer stacking model.
As mentioned in
Section 3.3, two metrics, namely the RMSE and MAE, were used to evaluate the model performance. The comparison results are reported in
Table 5. Furthermore, to verify whether the performance advantage of the Stacking model over other base models is statistically significant, a
t-test was conducted in this study, and the results are shown in
Table 6. The findings indicate that the stacking model outperforms all base models significantly in terms of both RMSE and MAE (
p < 0.05).
To comprehensively evaluate the model performance, this study further compared the proposed stacking model with the ARIMA model (a classical time-series statistical model). The results demonstrate that:
- (1)
In terms of the RMSE, the Stacking model significantly outperforms the ARIMA model (p-value = 0.0000 < 0.05), with an average improvement of 13.6043;
- (2)
In terms of the MAE metric, the Stacking model also significantly outperforms the ARIMA model (p-value = 0.0000 < 0.05), with an average improvement of 5.8531.
To further optimize the model performance and effectively mitigate overfitting, this study first conducted parameter tuning for each base model in the first layer using Grid Search. The RMSE was used as the primary evaluation metric to identify the optimal parameter combinations. Subsequently, a 5-fold Cross-Validation strategy was adopted to evaluate the performance of both the optimized base models and the final stacking model, ensuring the reliability and unbiasedness of the evaluation results. As shown in
Table 7, the proposed model consistently outperformed all base models in terms of RMSE even after cross-validation.
The results demonstrate that the proposed stacking model for the prediction of the expressway traffic accident rate may achieve better approximation performance than the benchmark models. This is likely because the stacking model combines the advantages of both neural network and decision tree methods. This finding confirms the superiority and feasibility of the proposed model.
4.3. The Importance of Each Feature
To gain a deeper insight into the contribution of each input factor to the stacking model’s predictions and to identify the key influencing factors, this study employed the SHAP value analysis method for feature importance evaluation. As shown in
Figure 9, the analysis reveals that the PT is the most critical factor in predicting the expressway traffic accident rate, with its SHAP value being substantially higher than those of other independent variables. In contrast, the feature importance of ATW is the lowest. This finding leads to a reasonable conclusion that the proportion of tunnels has a significant impact on the accident rate, while the average width of tunnels has the least impact on the expressway traffic accident rate. Consequently, relevant stakeholders can develop targeted strategies focused on such key factors (e.g., the proportion of tunnels) to further enhance road safety.
5. Conclusions and Discussion
A prediction model for the expressway accident rate was developed using readily accessible data and evaluated against benchmark models. To demonstrate this process, data from thirteen sections of an expressway in Southern China were utilized. The collected data included crash records, traffic flow statistics, and road characteristic information, all of which were easily obtainable and practical for real-world applications. A two-layer stacking expressway accident rate prediction model was proposed, combining neural network and tree-based models through the stacking method. The proposed model consists of two layers, enabling it to effectively integrate a wide range of explanatory variables derived from multi-source data. The first layer (level 1 models) is composed of three neural network models (RNN, LSTM, and GRU) and three decision tree models (GBDT, XGBoost, RF). The second layer (level 2 model) is the linear regression model. The level 1 models were selected as benchmark methods for comparison with the proposed model for the accident rate prediction task. The comparative analyses suggest that, in general, the proposed model outperformed the benchmark methods for the accident rate prediction task in terms of lower RMSE and MAE values.
The stability of the model was validated across different training and validation sets, ensuring its robustness. Hyperparameter optimization for the model was conducted via grid search, and 5-fold cross-validation was employed to mitigate overfitting. Additionally, the SHAP method was employed to analyze feature importance, providing insights into the factors that significantly influence model performance. The results indicate that PT is the most critical factor in expressway accident rate modeling, followed by BWV, while ATW has the least impact. These findings are valuable for identifying the primary factors affecting expressway traffic accident rates and can offer theoretical support for developing targeted policies to reduce accident rates. Ultimately, such measures can help minimize casualties and property losses, enhancing overall road safety.
Although the proposed stacking model demonstrates significant potential for predicting expressway accident rates, this study has several limitations that need to be addressed:
- (a)
Robustness and generalization ability: Although the performance of the proposed stacking model is superior, its robustness and generalization ability have not been thoroughly investigated, which may restrict its broader application.
- (b)
Feature Selection: While the SHAP method was used to analyze feature importance, the selection of explanatory variables was based on easily available data. Incorporating additional relevant features, such as weather conditions, driver behavior, or real-time traffic dynamics, could enhance the model’s predictive accuracy.
- (c)
Temporal and Spatial Variability: The model may not fully capture temporal variations (e.g., seasonal changes) or spatial heterogeneity (e.g., differences in road design or traffic patterns) that could influence accident rates. Incorporating time-series analysis or spatial modeling techniques might address this limitation.
Addressing these limitations in future studies could further enhance the model’s performance, applicability, and practical utility for improving expressway safety. In addition, possible improvement methods should be carried out to verify the performance of the model while reducing its training time.