Predicting and Analyzing Road Traffic Injury Severity Using Boosting-Based Ensemble Learning Models with SHAPley Additive exPlanations

Road traffic accidents are one of the world’s most serious problems, as they result in numerous fatalities and injuries, as well as economic losses each year. Assessing the factors that contribute to the severity of road traffic injuries has proven to be insightful. The findings may contribute to a better understanding of and potential mitigation of the risk of serious injuries associated with crashes. While ensemble learning approaches are capable of establishing complex and non-linear relationships between input risk variables and outcomes for the purpose of injury severity prediction and classification, most of them share a critical limitation: their “black-box” nature. To develop interpretable predictive models for road traffic injury severity, this paper proposes four boosting-based ensemble learning models, namely a novel Natural Gradient Boosting, Adaptive Gradient Boosting, Categorical Gradient Boosting, and Light Gradient Boosting Machine, and uses a recently developed SHapley Additive exPlanations analysis to rank the risk variables and explain the optimal model. Among four models, LightGBM achieved the highest classification accuracy (73.63%), precision (72.61%), and recall (70.09%), F1-scores (70.81%), and AUC (0.71) when tested on 2015–2019 Pakistan’s National Highway N-5 (Peshawar to Rahim Yar Khan Section) accident data. By incorporating the SHapley Additive exPlanations approach, we were able to interpret the model’s estimation results from both global and local perspectives. Following interpretation, it was determined that the Month_of_Year, Cause_of_Accident, Driver_Age and Collision_Type all played a significant role in the estimation process. According to the analysis, young drivers and pedestrians struck by a trailer have a higher risk of suffering fatal injuries. The combination of trailers and passenger vehicles, as well as driver at-fault, hitting pedestrians and rear-end collisions, significantly increases the risk of fatal injuries. This study suggests that combining LightGBM and SHAP has the potential to develop an interpretable model for predicting road traffic injury severity.


Introduction
Road traffic injuries (RTIs) are a leading cause of morbidity and death on a global scale. Understanding the sequence of events preceding a road traffic injury is critical for enhancing road safety and developing interventions that reduce the prevalence and severity of road trauma. By 2030, RTIs are expected to overtake cancer as the sixth leading cause of how the varying effects of risk variables affect the severity of road traffic injuries using advanced boosting-based ensemble models.
Traditional statistical approaches, while having rigor and well-defined functional forms, frequently necessitate strict model assumptions about the relationship between independent and dependent variables [26]. The model estimation outputs may be skewed or incorrect if the assumptions are violated. Similarly, combining modern data sources frequently produces complex data sets with a large number of dimensions that are difficult to model using traditional statistical techniques. Machine learning methods, on the other hand, are highly adaptable, require few or no prior assumptions about the data, and can handle missing values, noise, and outliers [27]. In the field of traffic safety research, machine learning techniques have received a lot of attention. It has become one of the most widely used and fascinating tools for estimating the severity of injuries in road traffic accidents [28][29][30][31][32][33][34][35][36][37][38][39][40][41]. Tree-based ensemble learning models have been used in a variety of fields, including landslide susceptibility mapping [42], software bug prediction [43], and bioinformatics [44].
While high predictive performance is desirable for machine learning models, it is more critical to understand the underlying mechanism by which accidents occur, especially for the purpose of developing accident prevention countermeasures. As a result, in a number of machine learning-based traffic safety studies, sensitivity analysis was used to quantify the effects of risk factors on the severity of injuries [45,46]. To summarise, two types of sensitivity analysis exist: partial dependence plots (PDP) and local sensitivity analysis (LSA) [47]. However, both the LSA and the PDP are concerned about the assumption of independence. In practice, the assumption of independent risk variables may fail to hold true, resulting in erroneous estimates of the safety effect.
To address this issue, several researchers used the Shapley Additive exPlanations (SHAP) approach [48], which estimates the variable' total, main, and interaction effects. Hu et al. [49], for example, utilised SHAP to assess the outcomes of a convolutional neural network (CNN) model in order to assess critical variables that affect the likelihood of crashes at road intersections. The SHAP analysis was used to identify critical factors in E. coli concentration data that could be used to predict beach water quality [50]. Parsa et al. [51] used SHAP to investigate the effect of various variables on the occurrence of highway accidents. Additionally, SHAP analysis is used to rank different variables in order to identify failure modes in reinforced concrete columns [52].
In terms of data, the database's quality and breakdown dictate the statistical and machine learning approaches used and the reliability of the findings, which can assist and guide policy makers in developing strategic plans and implementing initiatives to strengthen traffic safety and thus make the roads safer. It should be noted that any approach used is constrained by the database's limitations. Nonetheless, there is a disparity among countries, and even among local jurisdictions, in terms of data uniformity. For instance, Casado-Sanz et al. [53] conducted an international comparison of various risk variables compiled from international databases. Table 1 demonstrates an international comparison of risk variables in major databases and guidelines of various countries (USA, Australia, New Zealand, and databases, and EU directive requirements) and situates Pakistan within the context of road traffic safety. In this study, we aim to address the underappreciated issue of model interpretability for predicting injury severity by developing predictive and transparent booting-based ensemble learning models. To accomplish this, we used and compared four boosting-based ensemble learning models: Novel Natural Gradient Boosting (NGBoost) [54], Categorical Boosting (CatBoost) [55], Light Gradient Boosting Machine (LightGBM) [56], and Adaptive Boosting (AdaBoost) [57]. Additionally, we broadened our comparison by including commonly used machine learning models in studies of traffic safety. We then used SHAP to conduct variable importance analyses on the optimal model in order to predict the most important risk variables for injury severity, assess their robustness, and dissect their interactions with other risk variables. The SHAP yields significant findings that can be used to guide the implementation of cost-effective traffic safety countermeasures.
The rest of this paper is organized as follows: The following section summarizes the study's methodology and data, followed by descriptions of the modelling processes, namely NGBoost, CatBoost, LightGBM, and AdaBoost, as well as Shapley Additive exPlanations. Section 3 discusses the findings and the optimal model interpretation. Finally, Section 4 summarizes the findings and makes additional research recommendations.

Methodology
The entire operational framework proposed in this study is depicted in Figure 1. To begin with, the original accident dataset is preprocessed by eliminating redundant and erroneous information. The ensemble learning models based on boosting are initially developed on the basis of data partitioning into training, validation, and testing data sets. The training data set is used to build the classification model, the validation data set is used to fine-tune the hyperparameters of the models, and the test data set is used to evaluate the models' performance. Once the optimal model with the best performance is identified, SHAP approach is utilized to establish additive attributes that are then employed to determine the importance of variables for injury severity and the contributions of various risk variables to each severity mode. used to fine-tune the hyperparameters of the models, and the test data set is used to evaluate the models' performance. Once the optimal model with the best performance is identified, SHAP approach is utilized to establish additive attributes that are then employed to determine the importance of variables for injury severity and the contributions of various risk variables to each severity mode.

Study Route
The data for this study come from the records of traffic accidents that occurred on urban and rural segments of National Highway N-5 (Peshawar to Rahim Yar Khan) between 2015 and 2019. The N-5 is a two-lane divided two-way highway that connects Torkham in Pakistan's Khyber Pakhtunkhwa province to Karachi in Sindh province, connecting major cities along its alignment as illustrated in Figure 2. It is Pakistan's longest highway, measuring 1819 km (1310 miles) in length, and passes through three provinces: Sindh, Punjab, and Khyber Pakhtunkhwa. It is a major arterial connecting the north and south of the country and carries the most of the country's traffic. Most heavy vehicles use this route to transport freight from Karachi's seaport to upcountry cities. The maximum speed limit for light transport vehicles (LTV), which includes passenger cars, pickup trucks, and vans, ais 100 kilometres per hour. The maximum speed limit for heavy transport vehicles, which includes buses, trucks, and trailers, is 90 kilometres per hour.

Study Route
The data for this study come from the records of traffic accidents that occurred on urban and rural segments of National Highway N-5 (Peshawar to Rahim Yar Khan) between 2015 and 2019. The N-5 is a two-lane divided two-way highway that connects Torkham in Pakistan's Khyber Pakhtunkhwa province to Karachi in Sindh province, connecting major cities along its alignment as illustrated in Figure 2. It is Pakistan's longest highway, measuring 1819 km (1310 miles) in length, and passes through three provinces: Sindh, Punjab, and Khyber Pakhtunkhwa. It is a major arterial connecting the north and south of the country and carries the most of the country's traffic. Most heavy vehicles use this route to transport freight from Karachi's seaport to upcountry cities. The maximum speed limit for light transport vehicles (LTV), which includes passenger cars, pickup trucks, and vans, ais 100 km per hour. The maximum speed limit for heavy transport vehicles, which includes buses, trucks, and trailers, is 90 km per hour.

Data Description
Pakistan's National Highways and Motorway Police (NH & MP) are tasked with ensuring the safety and security of the country's highways and motorways. It also keeps track of road traffic crashes by filling out a crash investigation form at the crash scene as soon as one occurs. The NH & MP utilizes a four-page crash form called "Microcomputer Accident Analysis Proforma" (MAAP) for the purpose of recording crash minutiae. The MAAP tracks each accident on National Highway N-5 using a set of twenty-two distinct variables pertaining to the environment, the crash, the roadway, the vehicle, and the drivers. Similarly, MAAP depicts injury severity levels according to four categories: fatal in-

Data Description
Pakistan's National Highways and Motorway Police (NH & MP) are tasked with ensuring the safety and security of the country's highways and motorways. It also keeps track of road traffic crashes by filling out a crash investigation form at the crash scene as soon as one occurs. The NH & MP utilizes a four-page crash form called "Microcomputer Accident Analysis Proforma" (MAAP) for the purpose of recording crash minutiae. The MAAP tracks each accident on National Highway N-5 using a set of twenty-two distinct variables pertaining to the environment, the crash, the roadway, the vehicle, and the drivers. Similarly, MAAP depicts injury severity levels according to four categories: fatal injury, major injury, minor injury, and no injury (property damage only (PDO)). In our study, we classified PDO, minor, and major injuries as "non-fatal injuries", and the rest as "fatal injuries" as part of a binary classification problem. Table 2 below summarizes these variables with their description and occurrence frequencies.

Boosting-Based Ensemble Learning Classification Models for Injury Severity
In this research, we have employed four advanced boosting-based ensemble learning models, that is, novel Natural Gradient Boosting (NGBoost), Categorical Boosting (Cat-Boost), Light Gradient Boosting Machine (LightGBM) and Adaptive Boosting (AdaBoost) to accurately predict injury severity using various risk variables. Below, we explain these four boosting-based ensemble models.

Natural Gradient Boosting (NGBoost)
NGBoost is a supervised natural gradient descent (NGD)-based boosting algorithm that was recently developed [54]. It can be used for classification as well as probabilistic regression [58]. It is composed of three fundamental components: a base learner, a parametric probability distribution, and a scoring rule. The conceptual framework of NGBoost is illustrated in Figure 3.

Boosting-Based Ensemble Learning Classification Models for Injury Severity
In this research, we have employed four advanced boosting-based ense models, that is, novel Natural Gradient Boosting (NGBoost), Categorical B Boost), Light Gradient Boosting Machine (LightGBM) and Adaptive Boostin to accurately predict injury severity using various risk variables. Below, we four boosting-based ensemble models.

Natural Gradient Boosting (NGBoost)
NGBoost is a supervised natural gradient descent (NGD)-based boost that was recently developed [54]. It can be used for classification as well a regression [58]. It is composed of three fundamental components: a base le metric probability distribution, and a scoring rule. The conceptual framewor is illustrated in Figure 3. In Figure 3, χ inp denotes the given input risk variables, e k is the base learner (Decision Tree), k is the boosting iteration, τ is the predicted target, and θ denotes the target distribution parameters. The NGBoost model creates a conditional probability distribution function P θ τ χ inp of each predicted output in the range of 0 to 1. The higher the value of the aforementioned function, the more likely it is to correctly predict the data class, and vice versa. The framework made use of boosting to construct a series of decision trees (DTs) with minimal loss during model training. In other words, each DT learned from the preceding tree and improved the next tree in order to enhance the performance of the model. Additionally, it should be noted that this is the first paper, to our knowledge, that discusses the use of the NGBoost model for predicting the severity of road traffic injuries.

Categorical Boosting (CatBoost)
The CatBoost is a novel version of the gradient-boosting decision tree algorithm. It has strong learning capabilities for dealing with extremely nonlinear data [55]. The Gradient Boosting Decision Tree (GBDT) algorithm combines numerous decision trees to create a high-accuracy model, and the progress can be expressed as Equation (1): where x denotes the variable vector, T denotes the number of trees, θ t (t = 1, 2, . . . , T) denotes a learned parameter, and f t (x, θ t ) denotes the learned decision trees that are learned. Given a set of training samples D ={(x k , y k )} n 1 , where n denotes the total number of samples in training data, x k (k = 1, 2, . . . , n) is the sample data points, and y k indicates true sample label. In order to learn the model in Equation (1), the Equation (2) objective function is required to be minimized: where y k denotes the predicted sample label, L represent the loss function, which is actually the difference between y k and y k , and Ω represent the regular function, which is employed to penalize the complexity of f t . It is defined as Equation (3): where α denotes a penalty parameter, which controls the number of leaf nodes q, β represent the regularization parameter, and ω represents the weight coefficient. Let ζ represent the loss function negative gradient, then the objective function is minimized in the direction of ζ is given by Equation (4): By and large, conventional GBDT algorithms exhibit prediction offset, impairing the model's generalization ability. CatBoost was proposed with two major enhancements to address this shortcoming [59]: (1) the ordered boosting technique was used to obtain an unbiased gradient estimation and minimise the prediction offset; (2) the oblivious tree technique has been used to improve the model's reliability and prediction speed. Additionally, to improve the strategy's handling of categorical variables, the greedy targetbased statistics strategy was strengthened by incorporating prior terms into the CatBoost algorithm, which is composed of three major steps: (1) all sample datasets are ordered randomly; (2) similar samples are chosen and the average label for similar samples is calculated; and (3) the variables in each sample are digitised by adding the prior term and its associated weight coefficients. The strategy for optimising greedy target-based statistics is expressed in Equation (5): where x i k denotes the kth sample's ith category variable, x i k denotes the corresponding variable, P denotes the increased prior value, and a denotes the weight coefficient a > 0. Prior values can be used to effectively reduce noise introduced by low-frequency variables and avoid the over-fitting phenomenon.

Light Gradient Boosting Machine (LightGBM)
LightGBM is a variation of the gradient boosting decision tree (GBDT) created by Microsoft and Peking University researchers [56] to overcome the efficiency and scalability challenges associated with GBDT when used to solve problems with high-dimensional input variables and huge data sets. Because it is based on decision tree algorithms, it splits the tree leaf-wise, whereas other boosting methods divide the tree level-wise. When growing on the same leaf, the leaf-wise method reduces loss more than the level-wise strategy, resulting in much higher classification accuracy than any of the known boosting algorithms. Tree development is shown in Figure 4 for both LightGBM and severe gradient boosting. LightGBM employs two novel techniques: exclusive feature bundling (EFB) and gradient-based one-side sampling (GOSS).
where i k x denotes the th k sample's th i category variable, i k x denotes the correspon ing variable, P denotes the increased prior value, and a denotes the weight coeffici a > 0 . Prior values can be used to effectively reduce noise introduced by low-frequen variables and avoid the over-fitting phenomenon.

Light Gradient Boosting Machine (LightGBM)
LightGBM is a variation of the gradient boosting decision tree (GBDT) created Microsoft and Peking University researchers [56] to overcome the efficiency and scalab ity challenges associated with GBDT when used to solve problems with high-dimensio input variables and huge data sets. Because it is based on decision tree algorithms, it sp the tree leaf-wise, whereas other boosting methods divide the tree level-wise. When gro ing on the same leaf, the leaf-wise method reduces loss more than the level-wise strate resulting in much higher classification accuracy than any of the known boosting al rithms. Tree development is shown in Figure 4 for both LightGBM and severe gradi boosting. LightGBM employs two novel techniques: exclusive feature bundling (EFB) a gradient-based one-side sampling (GOSS). Suppose a dataset with {x 1 , x 2 , . . . , x n } and {y 1 , y 2 , . . . , y n } as independent and dependent variables, respectively. The sum of the outputs of a set of decision tree models (e t (x)) is the predicted value of GBDT (Γ(x)), as given by Equation (6): The number of decision trees is T. Finding an approximation functionΓ that minimizes the loss function Φ(y, Γ(x)) is required to fit a GBDT model, as shown in Equation (7): Rather than using information gain to split the internal nodes of each tree as the traditional GBDT does, LightGBM splits the internal nodes using the GOSS method. Specifically, samples with higher absolute gradient values (i.e., top α × 100%) are chosen as subset A, whereas samples with lower absolute gradient values are chosen at random to form subset B (i.e., β × 100%). As a result, the samples are divided based on the variance gain V j (d) on A ∪ B as Equation (8): where, In each iteration, g i illustrates the negative gradient of the loss function for light GBM outputs. Apart from GOSS sampling, LightGBM employs the EFB technique to expedite the training process without compromising classification accuracy. Numerous applications incorporate mutually exclusive features of high-dimensional and sparse input (i.e., these features cannot be non-zero). EFB has the ability to combine these features into a single feature bundle. This algorithm can be used to make feature histograms from both these feature bundles and individual features.
In summary, LightGBM is a novel approach to ensemble learning that utilises GOSS to partition internal nodes based on variance gain and EFB to reduce the dimension of input variables. Additionally, as a decision tree-based model, LightGBM is resistant to multicollinearity. As a result, it is easy to add correlated independent variables to the LightGBM model.

Adaptive Boosting (AdaBoost)
The AdaBoost algorithm's basic idea is to make classifications by combining a series of weak learners using a weighted majority vote (or sum). It updates the data on a regular basis, taking into account the misclassifications of previous weak learners. This algorithm's basic steps can be summarised as follows [57].
Given a set of training data D ={(x k , y k )} n 1 , where n denotes the total number of data points in training data, x k (k = 1, 2, . . . , n) is the kth data point, and y k indicates true data point label. A strong classifier E(x) is generated by the following steps: • Initially, all the data points are assigned some equal weights W i.e.,: • For iterations j = 1, 2, . . . , J, Train weak learner using distribution W j , Get weak hypothesis e j and Select e j with low weighted error i.e., ξ j = Pr k∼Wj e j (x k ) j y k . Choose • Update, k = 1, 2, . . . , n to obtain W j+1 k as Equation (9): where Z j is a normalization factor (chosen such that W j+1 k will be a distribution).

•
After learning process and weight optimization, the final strong classifier is obtained (Equation (10)), which is based on a linear combination of all the weak classifiers:

Hyperparameter Tuning
Tuning hyperparameters is an essential step in the training of boosting-based ensemble learning models. This helps to improve the model's generalization performance, avoid overfitting, and reduce the complexity of the models. In this study, the classification accuracy of the model is used as a performance metric to perform tuning of hyperparameters. There are several hyperparameter tuning techniques, such as GridSearch, Random Search, and Bayesian Optimization [60]. The GridSearch and Random Search CV techniques iteratively traverse the entire space of available hyperparameter values without regard to previous results and thus become time-consuming for large parameter spaces.
In contrast, when deciding which hyperparameter set to evaluate next, Bayesian optimization considers previous evaluations. By making informed parameter combinations, it is able to focus on those areas of the parameter space that it believes will yield the most promising validation scores. This method usually requires fewer iterations to arrive at the best set of hyperparameter values [61,62]. The HyperOpt, which is an open-source Python library for Bayesian Optimization, was used in this research to tune the hyperparameters of four boosting-based ensemble learning models.

Model Evaluation
The confusion matrix (contingency table) and its associated parameters (classification accuracy, precision, recall, and F1-score), as well as the area under the receiver operating characteristic (ROC) curve, are used in this study to evaluate and select the optimal ensemble model. The confusion matrix enables us to assess the performance of an ensemble model. In the confusion matrix, the predicted class instances are represented by a column, the actual class instances by a row, and the accurate prediction by the diagonal [63]. Figure 5 illustrates the confusion matrix, which can be used to calculate a variety of metrics.
ters. There are several hyperparameter tuning techniques, such as GridSearch, Rando Search, and Bayesian Optimization [60]. The GridSearch and Random Search CV tec niques iteratively traverse the entire space of available hyperparameter values witho regard to previous results and thus become time-consuming for large parameter space In contrast, when deciding which hyperparameter set to evaluate next, Bayesian o timization considers previous evaluations. By making informed parameter combinatio it is able to focus on those areas of the parameter space that it believes will yield the m promising validation scores. This method usually requires fewer iterations to arrive at t best set of hyperparameter values [61,62]. The HyperOpt, which is an open-source Pyth library for Bayesian Optimization, was used in this research to tune the hyperparamet of four boosting-based ensemble learning models.

Model Evaluation
The confusion matrix (contingency table) and its associated parameters (classificati accuracy, precision, recall, and F1-score), as well as the area under the receiver operati characteristic (ROC) curve, are used in this study to evaluate and select the optimal ense ble model. The confusion matrix enables us to assess the performance of an ensemble mod In the confusion matrix, the predicted class instances are represented by a column, the actu class instances by a row, and the accurate prediction by the diagonal [63]. Figure 5 illustra the confusion matrix, which can be used to calculate a variety of metrics.  Equations (11)-(15) show the expression for calculating performance metrics. The expression for calculating TPR, on the other hand, is the same as for recall in Equation (13): Recall or TPR = TP TP + FN (13) To gain a better understanding of how well a model performed, an area under the receiver operating characteristic (AUC-ROC) curve is also used [64]. Although the confusion matrix provides a detailed analysis of the model's performance based on predictions for each category, the AUC is sometimes preferred because it presents a comparison based on a single value. The TPR is plotted against the FPT to create the ROC. AUC values range from 0 (completely incorrect) to 1 (perfectly correct).

Model Interpretation
Lundberg and Lee [48] proposed SHAP (SHapley Additive exPlanations) as a new method for interpreting machine learning black-box models. This is a framework for estimating each variable's contribution. It is based on local explanations and game theory [65].
If x i,j is the jth variable of the ith sample, SHAP i is the model's predicted contribution value for the ith sample, and SHAP base is the model's baseline, i.e., the average value of the target variable for all samples in the model, then the SHAP value satisfies Equation (16): where shap(x i,j ) is the SHAP value of x i,j and m is the dimension of the variables, and shap(x i,j can be thought of as the value of the jth variable in the ith sample's contribution to the final predicted value y. When shap(x i,j is greater than zero, it means that this variable has a positive impact on the predicted value. If shap(x i,j ) < 0 is true, the variable has a negative impact on the predicted value. The SHAP value has the advantage of reflecting not only the influence of the variable, but also their positive and negative effects. The SHAP value possesses three properties: local accuracy, absence, and consistency.
• Local accuracy is defined as the sum of all variable contributions equal to the model's output. This property fulfills a fundamental requirement of the additive explanatory framework, given by Equation (17): where shap j represents the SHAP value corresponding to variable j, Λ j is an indicator function that takes the value 1 when the variable appears, and 0 otherwise. • Absence i.e., the contribution of missing variable is zero. Not that a characteristic value in the structured data is empty, but that a characteristic is not observed in the sample, that is, i.e., if the model structure alters but the degree to which a particular variable influences the output increases or remains constant, the contribution of that variable to the whole will also enhance or remain constant.
Finally, the model mapped by the sigmoid function with SHAP i predicts the probability of the ith sample is given by Equation (18): where p i < 0.5, then the sample classification is equal to 0, otherwise it is 1. Based on this SHAP framework, this paper explains optimal models among the four boosting-based ensemble models in terms of injury severity.

Results
In this study, four boosting-based ensemble learning models are used to predict road traffic accident injury severity. Python 3.6, a free and open-source programming language, was used for this purpose. To train the models, we used the Python packages NGBoost, CatBoost, LightGBM, and AdaBoost, as well as the Scikit-learn library. To begin with, National Highway N-5 accident data were obtained from the National Highway and Motorway Police. For model development, we partitioned the National Highway N-5 accident data into the following three subsets: 20% of the data was used for testing and withheld for model performance evaluation, while the remaining 80% was divided into training and validation subsets. The training data set was used to develop the models, and the validation data set was used to estimate the model's performance while tuning its hyperparameters. Table 3 shows the optimal hyperparameters for four boosting-based models, obtained via Bayesian Optimization.

Performance Assessment
Four performance indicators are used to assess the developed boosting-based ensemble models: classification accuracy, recall, precision, and F1-score. These indicators are deduced from a confusion matrix generated for binary classification problems ( Figure 6). Additionally, we focused on the area under the receiver operating characteristic curve as an evaluation metric, as it is an aggregate measure. We conducted a performance comparison of the estimation results for testing dataset using the confusion matrix (Table 4) and the area under the receiver operating characteristic curve. The results of boosting-based ensemble learning models are also compared with existing models (Artificial Neural Network [66] and Logit Model [67]) that have been widely used in the analysis of road traffic injury severity.
Classification accuracy, precision, recall, F1-score, and AUC-ROC values for the Cat-Boost model were 67.34%, 67.32%, 55.87%, 51.28%, and 0.683, respectively, using the testing dataset. The classification accuracy, precision, recall, F1-score, and AUC-ROC values for the LightGBM model were 73.63%, 72.61%, 70.09%, 70.81%, and 0.713, respectively, using the testing dataset. The novel NGBoost model, which was used for the first time in the prediction and classification of road traffic injury severity, achieved classification accuracy (61.37%), precision (61.33%), recall (54.71%), F1-score (49.04%), and AUC-ROC (0.588), respectively. Similarly, AdaBoost model achieved classification accuracy (66.87%), precision (61.21%), recall (59.17%), F1-score (60.11%), and AUC-ROC (0.619), respectively. Comparing the performance of the models based on the confusion matrix with their corresponding matrices in Table 4 and AUC-ROC revealed that the Light GBM method provided the optimal estimation performance. deduced from a confusion matrix generated for binary classification problems ( Figure 6). Additionally, we focused on the area under the receiver operating characteristic curve as an evaluation metric, as it is an aggregate measure. We conducted a performance comparison of the estimation results for testing dataset using the confusion matrix (Table 4) and the area under the receiver operating characteristic curve. The results of boosting-based ensemble learning models are also compared with existing models (Artificial Neural Network [66] and Logit Model [67]) that have been widely used in the analysis of road traffic injury severity.    Numerous methods exist for determining the relative importance of variables. Certain tree-based models, such as random forest, assign variable importance automatically, and the assignment scheme has effects on the results [68]. However, variable significance is not synonymous with variable contribution. Variable importance indicates which variable has the most significant impact on a model's performance. Beyond identifying influential variables, the variable contributions provide an intuitive explanation for the considered output (fatal or non-fatal injuries). Two analyses are conducted in this study to determine the importance of each variable and its contribution to model estimation: the model-based variable importance approach and the SHAP summary plot approach.
The significance of each variable in the LightGBM model was first assessed. The Light-GBM model's trees are built following the steps outlined in Section 2.3.3. Let us consider the variable set x 1 , x 2 , . . . , x m . The variable importance score F imp Sc i is then calculated as Equation (19) based on the number of times each ith variable is used to split the training data across all trees: where W i represents the weight of each variable, and x i represents the ith variable in variable set. Figure 7a shows the best variable importance score of the National Highway N-5 accidents data variable used by the LightGBM algorithm. However, the result does not indicate how much each variable contributes to the likelihood of fatal injuries. Comparing the performance of the models based on the confusion matrix with their corresponding matrices in Table 4 and AUC-ROC revealed that the Light GBM method provided the optimal estimation performance.

Global Variable Interpretation
Numerous methods exist for determining the relative importance of variables. Certain tree-based models, such as random forest, assign variable importance automatically, and the assignment scheme has effects on the results [68]. However, variable significance is not synonymous with variable contribution. Variable importance indicates which variable has the most significant impact on a model's performance. Beyond identifying influential variables, the variable contributions provide an intuitive explanation for the considered output (fatal or non-fatal injuries). Two analyses are conducted in this study to determine the importance of each variable and its contribution to model estimation: the model-based variable importance approach and the SHAP summary plot approach.
The significance of each variable in the LightGBM model was first assessed. The LightGBM model's trees are built following the steps outlined in Section 2.3.3. Let us consider the variable set 1 2 m x ,x ,...,x . The variable importance score imp i F Sc is then calculated as Equation (19) based on the number of times each th i variable is used to split the training data across all trees: where i W represents the weight of each variable, and i x represents the th i variable in variable set. Figure 7a shows the best variable importance score of the National Highway N-5 accidents data variable used by the LightGBM algorithm. However, the result does not indicate how much each variable contributes to the likelihood of fatal injuries. The SHAP summary assessment was implemented to allow for a more comprehensive model analysis. From the SHAP summary plot, we estimated a quantitative value that aggregated the Shapely values and expressed the model contributions of variables (see Figure 7b). The vertical axis is used to arrange the input variables in ascending order of their influence, beginning with the most influential variable. The horizontal axis represents the SHAP value, and the color scale indicates the variable's significance level; blue to pinkish-red indicates low to high significance. The greater the number of data points within a given range of SHAP values, the stronger the correlation between the input variable and injury severity. The SHAP summary assessment was implemented to allow for a more comprehensive model analysis. From the SHAP summary plot, we estimated a quantitative value that aggregated the Shapely values and expressed the model contributions of variables (see Figure 7b). The vertical axis is used to arrange the input variables in ascending order of their influence, beginning with the most influential variable. The horizontal axis represents the SHAP value, and the color scale indicates the variable's significance level; blue to pinkish-red indicates low to high significance. The greater the number of data points within a given range of SHAP values, the stronger the correlation between the input variable and injury severity.
As per SHAP analysis, the most important input variable in determining the injury severity is Month_of_Year, which is ranked first in the Summary Plots, followed by Cause_of_Accident, Driver_Age and Collision_Type. The significant variables are almost consistent with the variable importance obtained by the LightGBM-based model importance approach. However, the order is different due to a difference in the evaluation procedure. It is revealed that data from older drivers have negative SHAP values, which become more negative as the driver's age increases. It confirms that drivers under the age of 30 are more likely to encounter fatal injuries. The results are consistent with past literature [69][70][71]. When comparing the months of winter to the months of spring and summer (March, June, and July), the variable Month_of_Year reveals that the majority of fatal injuries happen in the spring and summer (March, June, and July). The Cause_of_Accident, such as road surface distress, low visibility, and wrong side overtaking, are less likely to cause fatal injuries compared to bicycle rider at-fault and driver at-fault. The head-on collisions, hitting animal on road result in non-fatal injuries with SHAP values near zero or negative compared to rear-end collision, rollover and hitting obstacle on road that results in fatal injuries. Similarly, trailers and passenger cars are more prone to serious accidents.

Local Variable Interpretation
The SHAP explanatory force plot is shown in Figure 8 for two cases chosen at random from the actual estimate findings. On the graph, the base value (0.5181) represents the mean of the optimal LightGBM model estimations for the training data set. If the model's output value is greater than the base value, fatal injuries occur (i.e., a lower value than the base value). When the output of the model is greater than the base value, non-fatal injuries occur (that is, a higher value than the base value). The blue arrows indicate the magnitude of the effect of input variables that increase the likelihood of fatal injuries (increased possibility of death in accidents). The effect of input variables on the occurrence of fatal injuries is indicated by red arrows (increased possibility of no death in accidents). The area occupied by variables in each arrow indicates the extent to which that variable has an effect. Consider two examples of estimated values from the training dataset that the LightGBM model correctly classified. One instance has an estimated value that is greater than the base value, while another has an estimated value that is less than the base value. Figure 8a depicts a situation in which the estimated value (0.020) is less than the base value (0.5181). In this instance, the three variables that could potentially contribute to the fatal injuries are Driver_Age and Collision_Type. In particular, the significantly higher Driver_Age = 7 index compared to other low indexes reflects the higher expected occurrence of non-fatal injuries. Similarly, the Collision_Type = 7 index (Hit Pedestrian) reflects a higher probability of fatal injuries. Similarly, the SHAP explanatory force plot is shown in Figure 8b for another randomly selected and correctly classified instance in which the estimated value (0.800) is greater than the base value (0.5181). It demonstrates that both the month of the year and the cause of the accident played a role in predicting non-fatal injuries. To be more specific, the Month_of_Year = 9 index (September) and the Cause_of_Accident = 3 index (Tire burst) reflect a lower number of fatal injuries. However, Collision_Type = 4 index (rollover) are more prone to the occurrence of fatal injuries. sibility of death in accidents). The effect of input variables on the occurrence of fatal injuries is indicated by red arrows (increased possibility of no death in accidents). The area occupied by variables in each arrow indicates the extent to which that variable has an effect. Consider two examples of estimated values from the training dataset that the LightGBM model correctly classified. One instance has an estimated value that is greater than the base value, while another has an estimated value that is less than the base value.

Variable Interaction Analysis
The SHAP interaction plots were analysed to determine how the input variables used to estimate the optimal LightGBM model interacted in terms of their contributions (see Figure 9). The interaction analysis examined trends in Month_of_Year, Cause_of_Accident, Driver_Age, Collision_Type and Vehicle_Type. Nonetheless, other variable interactions could be evaluated as well. In Figure 9a, the scatter plots of the red and blue points illustrate the fluctuation in the Collision_Type and Collision_Type SHAP values. The SHAP value for Collision_Type is greater when Collision_Type = Hit Pedestrian, and the pattern is consistent throughout the year. This is self-evident, as the majority of locations along National Highway N-5 lack designated pedestrian crossings, increasing the likelihood that pedestrians will be struck by vehicles when they recklessly cross the road. In the majority of cases, jaywalking results in deadly injuries.
Young drivers are more likely to sustain fatal injuries than older drivers. However, drivers over the age of 50 are more likely to be involved in non-fatal crashes (Figure 9b). Young drivers are more prone to be distracted by various activities, such as ringing cell phones, texting, using a GPS device and listening to music, and these activities are connected with decreased driving performance, increased driver reaction times, decreased vehicle control, and an increased chance of a crash. Additionally, older drivers have more experience and are more likely to have non-fatal injuries. There is a possibility that young drivers are more prone to fatal injuries because they engage in risky behaviour, such as over speeding and not obeying traffic signals and signs. Sometimes, they cannot perceive the potential hazards in the surroundings and choose incorrect behaviour.
The scatter plot of Vehicle_Type and the SHAP value for Vehicle_Type are shown in Figure 9c. It demonstrates that the majority of Vehicle_Type = Trailers and Passenger Cars are more likely to result in fatal injuries as a result of rear-end crashes, rollovers, or hitting pedestrians. The trailer's involvement in the majority of fatal injuries could be explained by the fact that National Highway N-5 is used for long-haul journeys by large trailers transporting goods from Karachi's seaport to the countryside. Speed limit violations, braking failure, overloading, and driver distraction due to inattention or dozing can all contribute to trailer rollover accidents. Surprisingly, as indicated by the lower SHAP value, motorcycle accidents due to skidding result in non-fatal injuries. Surprisingly, the age group 26-30 is more likely to suffer fatal injuries while driving heavy vehicles compared to driving light vehicles (Figure 9d), consistent with a previous study [72].  The SHAP value for Month_of_Year = July is relatively high in the graph, showing that the majority of serious accidents with fatal injuries for heavier vehicles, such as trucks, trailers, and tractors, occur during July (Figure 9e). This result is also in line with past literature [73,74], which also highlighted that summer is more prone to crashes due to more vehicles traveling. However, in Pakistan this could be because National Highway The SHAP value for Month_of_Year = July is relatively high in the graph, showing that the majority of serious accidents with fatal injuries for heavier vehicles, such as trucks, trailers, and tractors, occur during July (Figure 9e). This result is also in line with past literature [73,74], which also highlighted that summer is more prone to crashes due to more vehicles traveling. However, in Pakistan this could be because National Highway N-5 is a major arterial that traverses through the Punjab region of Pakistan, which is known for its foggy winters, heavy monsoon precipitation, and sweltering summers. July has a maximum temperature of about 45 • C and sometimes reaches 50 • C. Increased tire pressures may result in a tire burst during July's severe heat. The monsoon season's torrential downpours in July may contribute to severe injuries due to wet pavements that reduce a heavy vehicle's traction and manoeuvrability. Similarly, with Month_of_Year = September, the SHAP value is relatively low. Additionally, one can also note that truck, trailer, and tractor accidents on National Highway N-5 are nearly evenly distributed throughout the year in compared other Vehicle_Type.
Additionally, Figure 9f provides valuable insight into the involvement of Vehicle_Type in accidents and the causes of those accidents. When Bicycle Rider at-Fault and Driver at-Fault (such as wrong side overtake, traffic signal violation etc.) of trailers and passenger cars are more likely to be involved in fatal collisions The reason for this might be that bicycle riders in Pakistan do not wear helmets and often ride in the wrong direction due to the absence of bicycle-specific traffic control regulations.

Conclusions
In this research, boosting-based ensemble learning models were used in conjunction with SHAP analysis to identify critical risk variables and quantify their effects on the severity of road traffic accident injuries using the National Highway N-5 accident dataset (2015-2019). Accurate models typically capture a complete picture of the underlying relationship between injury severity and risk factors. In this research, LightGBM outperforms CatBoost, novel NGBoost, AdaBoost as well as widely used ANN and logit model in terms of predictive classification accuracy, precision, recall, F1-Score, and AUC. The newly introduced LightGBM model provides another viable option for modelling injury severity.
The lack of transparency and interpretability of machine learning models is frequently chastised. This has an effect on the widespread acceptance of models for modelling in traffic and transportation safety, although these models are more flexible and frequently more accurate than traditional predictive methods. To address the interpretability issue associated with LightGBM, the SHAP analysis was used to estimate its output in order to identify significant risk variables and quantify their impact on injury severity. The SHAP analysis's results can be used to rank a risk variable's overall significance. More importantly, they can be used to investigate both the individual effects of risk variables (e.g., how specific impacts may fluctuate in response to changes in the risk variable's value) and their interaction effects.
The analysis revealed that the top four important variables that are more likely to affect injury severity are Month_of_Year, Cause_of_Accident, Driver_Age and Collision_Type. However, Type_of_Day and Work_Zone variables have the most negligible impact on the injury severity. Young drivers are more likely than older drivers to sustain fatal injuries. Improved young driver education programs, stricter driving requirements, stricter driving tests, and equipping parents with sufficient knowledge to train and educate drivers could all contribute to a reduction in the overall fatal crash rate. The month of July accounts for a large number of fatal injuries, while September accounts for a greater number of non-fatal injuries. The rainy season in the Punjab region, in addition to the high temperatures of July, may account for tire bursts and skidding, probably due to a reduction in tire-to-wet pavement friction. Extra care in driving is required during the rainy days of the monsoon (July to September) with low vehicle speed and care for runoff. It has also been observed that weekend crashes are more likely to cause fatal injuries. This finding may be due to the high traffic volume on National Highway N-5. People mostly travel to their nearby hometowns or villages on weekends, causing heavy congestion. In addition, people also travel to Sunday markets on weekends. In some cases, heavy vehicles, such as trailers, are involved in fatal crashes when drivers are drowsy at the wheel or, in some cases, crashes with bicycles. There is a need to provide bicycle-specific traffic regulations on National Highway N-5. Moreover, helmet usage should be made compulsory.
The strategy outlined in this paper can be used to conduct a large-scale analysis of traffic accidents and serves as a useful tool for policymakers and researchers engaged in traffic safety. This paper discussed only injury severity as determined by boosting ensemble in conjunction with SHAP analysis. Additional research could be conducted by using various other machine learning techniques with various additional risk factors. Additionally, future research could expand the dataset to increase its accuracy, and data-balancing techniques, such as SMOTE and ADAYSN, could be introduced to treat imbalanced data.