Feature Selection for Accident Severity Modeling: A WCFR-Based Analysis on the U.S. Accidents Dataset

Alobidan, Yasser Abdulrahim; Li, Alice; Soh, Ben; Almudayni, Ziyad

doi:10.3390/electronics15061308

Open AccessArticle

Feature Selection for Accident Severity Modeling: A WCFR-Based Analysis on the U.S. Accidents Dataset

¹

Department of Computer Science and Information Technology, La Trobe University, Melbourne 3086, Australia

²

La Trobe Business School, La Trobe University, Melbourne 3086, Australia

³

College of Computer Science and Engineering, University of Ha’il, Ha’il 55476, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(6), 1308; https://doi.org/10.3390/electronics15061308

Submission received: 31 December 2025 / Revised: 12 March 2026 / Accepted: 13 March 2026 / Published: 20 March 2026

(This article belongs to the Special Issue AI Technologies and Smart City)

Download

Browse Figures

Versions Notes

Abstract

Traffic accidents are among the leading causes of injury worldwide, highlighting the urgent need to better understand the factors that contribute to accident occurrence and severity in order to improve road safety and reduce injuries and fatalities. This study analyzes the U.S. Accidents dataset, comprising data collected from 2016 to 2023, to identify the key determinants of accident severity and to evaluate feature-selection techniques for predictive modeling. To this end, several feature-selection methods are examined, including L1-regularized logistic regression, minimum redundancy maximum relevance (mRMR), conditional mutual information maximization (CMIM), ReliefF, and tree-based importance measures; these are compared with the Weighted Conditional Mutual Information (WCFR). The selected feature subsets are then evaluated using three machine learning models: logistic regression, random forest, and XGBoost. Experimental results show that WCFR consistently outperforms the competing methods, achieving higher validation accuracy (up to approximately 0.84) and Macro-F1 scores (up to approximately 0.55), while using fewer features and maintaining model interpretability. These results indicate that WCFR is particularly effective for accident severity modeling and highlight its potential as a robust feature selection strategy for large-scale transportation safety analytics and future severity prediction studies.

Keywords:

traffic accident severity; feature selection; WCFR; mutual information; machine learning; road safety analytics

1. Introduction

Traffic accidents represent a major public health and economic challenge worldwide. In the United States alone, millions of crashes occur annually, resulting in thousands of fatalities across different age groups and leads to significant economic losses. The increasing number of traffic accidents provides an opportunity to analyze accident patterns and identify key factors influencing accident severity. However, transforming these high-dimensional datasets caused by the large number of accidents into knowledge and data patterns requires effective feature selection and robust predictive modeling techniques. Identifying the most relevant accident factors enhances model interpretability while also improving computational efficiency.

Traffic characteristics in the United States vary substantially across regions due to differences in climate conditions, roadway infrastructure, traffic density, and driving behavior. For example, northern states experience higher accident rates associated with snow and ice, whereas southern and coastal states are more frequently affected by rainfall-, fog-, and hurricane-related hazards. In addition, urban areas are typically characterized by congestion, complex road networks, and dense intersections, while rural regions often involve higher-speed roadways, which tend to result in more severe crash outcomes. These regional differences introduce significant heterogeneity, as well as feature redundancy and noise, into large-scale accident datasets.

In this context, the U.S. Accidents dataset is particularly well suited for evaluating advanced feature-selection methods such as WCFR, as it integrates multi-regional, multi-modal, and high-dimensional traffic information collected from diverse geographic areas and environmental conditions. This diversity creates a challenging environment for assessing WCFR’s ability to balance feature relevance and redundancy under varying data distributions. Accordingly, this study addresses two key research questions: identifying the most influential factors affecting accident severity and evaluating the stability and robustness of feature-selection methods under heterogeneous real-world conditions.

It is noteworthy that the US study [1], summarized in Table 1, includes information on accident severity measured using the Maximum Abbreviated Injury Scale (MAIS), which quantifies the most severe injury sustained by a person in a crash. In addition, accidents classified as Property-Damage-Only (PDO) are included, representing crashes that result in damage to vehicles or property but do not cause any reported bodily injury. Together, these measures provide a comprehensive view of both injury-related and non-injury accident outcomes, enabling more detailed analysis of factors influencing accident severity. Figure 1 which reflects the economic cost on the different parts of society.

Table 1. Incidence summary 2019: total police-reported and unreported injuries [1].

Severity	Police-Reported	Not Police-Reported	Total	Percent Unreported
Vehicles
Injury Vehicles	2,424,916	1,136,998	3,561,914	31.9%
PDO Vehicles	7,773,120	11,515,019	19,288,139	59.7%
Total Vehicles	10,198,036	12,652,017	22,850,053	55.4%
People in Injury Crashes
MAIS0	2,349,202	2,176,700	4,525,901	48.1%
MAIS1	2,561,954	1,313,311	3,875,265	33.9%
MAIS2	310,848	116,271	427,119	27.2%
MAIS3	132,222	8945	141,167	6.3%
MAIS4	19,285	0	19,285	0.0%
MAIS5	7187	0	7187	0.0%
Fatal	36,500	0	36,500	0.0%
Total	5,417,198	3,615,226	9,032,424	40.0%
Total Injuries	3,067,996	1,438,526	4,506,523	31.9%
Crashes
PDO	4,390,169	6,503,550	10,893,719	59.7%
Injury	2,223,724	1,042,663	3,266,387	31.9%
Fatal	33,621	0	33,621	0.0%
Total Crashes	6,647,514	7,546,213	14,193,727	53.2%

Figure 1. Components of comprehensive costs [1].

This process, known as feature selection, has been widely recognized as a crucial step in data-driven accident analysis, as it helps identify the most influential factors contributing to high-severity crashes. Traditional feature-selection methods generally fall into three categories: filter-based ranking, wrapper-based search, and embedded regularization approaches; each with its own advantages and limitations. Filter methods, such as mutual information and ReliefF, are computationally efficient and well suited for high-dimensional data. In contrast, wrapper and embedded methods, including L1-regularized regression and tree-based models, can capture nonlinear relationships between variables but may suffer from instability and strong dependence on the chosen learning algorithm. These challenges motivate the need for feature selection strategies that can simultaneously and statistically evaluate both the relevance and redundancy of accident-related factors.

To address this need, conditional mutual information (CMI)-based techniques have proven to be particularly effective, as they quantify the amount of information a candidate feature contributes beyond what is already captured by the selected features. Building on this idea, ref. [2] proposed the Weighted Conditional Feature Relevance (WCFR) method, which introduces a data-driven weighting mechanism to dynamically balance feature relevance and redundancy within the feature set. This is achieved by adjusting the contribution of each term according to the variability (dispersion) of the feature. As a result, WCFR aims to produce more stable and more discriminative feature subsets, especially in complex, noisy, and highly correlated domains.

Although a growing body of research has applied machine learning techniques to accident severity prediction, many existing studies primarily focus on improving predictive accuracy rather than systematically analyzing the role of feature selection in identifying the most influential accident factors. Moreover, conventional feature-selection methods often rely on simple relevance measures or embedded model coefficients, which may struggle to balance feature relevance and redundancy in high-dimensional and heterogeneous accident datasets. As a result, the stability and interpretability of the selected feature subsets remain limited, particularly when dealing with complex real-world traffic data.

In this study, we investigate the effectiveness of advanced feature-selection techniques for accident severity modeling using the large-scale U.S. Accidents dataset. In particular, we evaluate the WCFR method, which extends traditional conditional mutual information-based feature selection by incorporating a dispersion-aware weighting mechanism that balances feature relevance and redundancy. By systematically comparing WCFR with several widely used feature selection approaches, this study aims to determine whether dispersion-aware information-theoretic feature selection can improve both predictive performance and feature stability in accident severity prediction tasks.

The selected features from these methods will be evaluated through multiple machine learning algorithms, such as Logistic Regression, Random Forest, and XGBoost, in order to validate their predictive power and generalization ability. Beyond comparative performance, this paper aims to provide interpretive insights into the major environmental, infrastructural, and temporal factors influencing accident outcomes in the United States.

The contributions of this paper are:

1.: We present an end-to-end analytical framework for feature selection and model validation on a large-scale traffic accident dataset.
2.: We implement and assess the Weighted Conditional Mutual Information method in a real-world safety context, highlighting its stability and interpretability advantages.
3.: We establish a methodological and empirical baseline for future research aimed at enhancing WCFR and extending its applicability to broader transportation safety analytics.

By integrating rigorous data analysis with feature selection evaluation, this work provides both theoretical and practical output to the applied area of intelligent transportation systems, machine learning, and public safety research. In addition, the primary objective of this study is not to maximize predictive accuracy but to identify the main factors and relationships that reflect the accident severity, thereby supporting engineering understanding and data-driven safety decision-making.

2. Materials and Methods

The dataset used in this study is the U.S. Accidents dataset [3], which is considered one of the most comprehensive open datasets for traffic accident analysis in the United States. It contains over 7.7 million accident records collected from multiple sources, including insurance companies, traffic sensors, law enforcement reports, and weather APIs. Each record corresponds to a single accident event and includes more than 40 features describing location, time, weather conditions, road surface, visibility, and accident severity. The most important attribute in this evaluation is severity, which serves as the target (label) variable. It is encoded on an ordinal scale ranging from 1 (lowest) to 4 (highest), representing the extent of the accident’s impact on traffic flow and road safety.

The dataset also exhibits pronounced temporal patterns across multiple years. Accident frequency varies from year to year and shows clear seasonal trends related to weather conditions, daylight duration, and travel demand, with higher occurrence typically observed during winter and peak travel periods. These temporal variations introduce non-stationarity into the data, which can affect model training and evaluation. However, by leveraging multi-year data spanning diverse temporal conditions, the adopted dataset enables a thorough assessment of model robustness and the ability of feature-selection methods, including WCFR, to generalize across different temporal distributions.

2.1. Adopted Phases

A structured experimental pipeline was adopted to ensure a fair and unbiased evaluation of the models. To prevent data leakage, the dataset was first divided into training and testing sets using an 80/20 split, where 80% of the data were used for model training and 20% were reserved exclusively for final evaluation. The test set was kept completely unseen during all model development stages, including feature selection and hyperparameter tuning.

Prior to model training, the raw accident dataset was preprocessed to ensure data quality and consistency. Records containing missing values in the target variable (severity) were removed. For the input features, variables with a high proportion of missing values were discarded. The remaining missing entries were handled using simple imputation strategies: numerical features were imputed using the median value, while categorical features were imputed using the most frequent category (mode). To avoid information leakage, these imputation parameters were computed only from the training data and then applied to the test data.

Within the training set, a 5-fold cross-validation procedure was employed for model development and model selection. In each fold, the training set of the fold was used to perform preprocessing and feature selection, while the remaining fold served as the validation subset. This strategy ensures that validation data remain unseen during feature selection and model training. Feature selection was performed exclusively on the training data within the cross-validation process. In each fold of the 5-fold cross-validation, the feature-selection methods were applied only to the training set of the fold, and the resulting ranked feature subset was used to train the classification models. The validation set of the fold was then used to evaluate model performance using the same selected features. Consequently, the feature subset was recomputed independently within each cross-validation fold, ensuring that no information from the validation data influenced the feature ranking. After model selection was completed, the final models were trained using the training data and evaluated once on the held-out test set.

Overall, this pipeline guarantees that preprocessing, feature selection, and model training are strictly confined to the training data, while the test set is used only once for the final unbiased performance assessment.

Categorical variables are further encoded using one-hot encoding to avoid introducing artificial ordinal relationships. Binary features are mapped directly to 0, 1, and numerical features are normalized using z-score standardization, defined as:

x^{'} = (x - μ) / σ

(1)

where

μ

and

σ

denote the mean and standard deviation computed from the training data. These preprocessing steps are applied consistently across all feature-selection methods and classification models, ensuring a fair and unbiased comparison of the experimental results.

The next phase involves feature engineering and feature selection, which are applied to construct an informative input feature set using L1-regularized logistic regression (LR), mRMR, CMIM, ReliefF, tree-based importance, and the proposed WCFR method. Subsequently, the selected feature subsets are used to train multiple classification models LR, Random Forest (RF), and XGBoost in order to validate the quality of the features extracted by each selection method. This experimental design allows us to determine which feature selection approach is most effective by evaluating and comparing the models using standard performance metrics, primarily Macro-F1 score and Accuracy.

Finally, the classification models were trained on the selected features from the training set and evaluated on the selected features from the test set. This procedure ensured a stable experimental pipeline and avoided information leakage between training and testing Figure 2.

2.2. Features Selection Adopted Approaches

Let the dataset be

D = {(x_{i}, y_{i})}_{i = 1}^{n}

where

x_{i} \in R^{d}

and

y_{i} \in {1, \dots, C}

feature selection aiming to choose a subset

S \subset {1, \dots, d}

that maximizes predictive information while reducing redundancy and noise [4].

2.2.1. L1-Regularized Logistic Regression (Embedded)

L1-regularized logistic regression performs embedded feature selection by shrinking many coefficients to zero [5,6]. The Logistic model equation:

p (y = 1 ∣ x) = σ (β_{0} + x^{⊤} β), σ (z) = \frac{1}{1 + e^{- z}}

(2)

and the L1-regularized objective:

min_{β_{0}, β} - \sum_{i = 1}^{n} [y_{i} log p_{i} + (1 - y_{i}) log (1 - p_{i})] + λ {∥ β ∥}_{1}

(3)

where

σ (β_{0} + x^{⊤} β)

and

λ > 0

controls sparsity. In addition the selected features satisfy

β_{j} \neq 0

.

2.2.2. mRMR (Filter) [7,8]

mRMR selects features with maximum relevance to the class label and minimum redundancy with already selected features. The incremental score equation:

J_{mRMR} (f; S) = I (f; y) - \frac{1}{| S |} \sum_{s \in S} I (f; s)

(4)

Furthermore, the selection rule:

f^{*} = arg max_{f \notin S} J_{{m R M R}} (f; S)

(5)

2.2.3. CMIM (Filter) [7,9]

CMIM selects features that remain informative about y even after conditioning on any selected feature. The CMIM selection criterion:

f^{*} = arg max_{f \notin S} min_{s \in S} I (f; y ∣ s)

(6)

In addition, the conditional mutual information:

I (f; y ∣ s) = \sum_{f, y, s} p (f, y, s) log (\frac{p (f, y ∣ s)}{p (f ∣ s) p (y ∣ s)})

(7)

Here “max–min” strategy encourages complementary features, reducing redundancy beyond pairwise relevance.

2.2.4. ReliefF

ReliefF [10,11] scores features by comparing each sample with nearest neighbors from the same class (hits) and different classes (misses). Let

H_{k} (i)

be k nearest hits and

M_{k}^{c} (i)

be k nearest misses from class

c \neq y_{i}

. Its weighting update in the case of multi-class (the adopted severity is multi-class):

W_{j} \leftarrow W_{j} - \frac{1}{m k} \sum_{h \in H_{k} (i)} Δ_{j} (x_{i}, h) + \frac{1}{m k} \sum_{c \neq y_{i}} \frac{P (c)}{1 - P (y_{i})} \sum_{m \in M_{k}^{c} (i)} Δ_{j} (x_{i}, m)

(8)

After repeating this process for m sampled instances, the features are ranked according to

W_{j}

where larger values indicate higher importance.

2.2.5. Tree-Based Importance from Gradient Boosting

Tree ensembles provide embedded importance measures based on how frequently and how effectively a feature is used to split the data [12,13,14]. In gradient boosting (and XGBoost), a common measure is total gain attributed to splits on feature j. The gain-based importance equation:

Imp (j) = \sum_{t = 1}^{T} \sum_{\begin{matrix} splits s \in t \\ feature (s) = j \end{matrix}} Gain (s)

(9)

Features are ranked by Imp(j), and the top-k are retained. This approach captures nonlinearity and interactions naturally [12,13], and the “Gain/Cover/Frequency” definitions are documented in XGBoost tooling [14].

2.2.6. Weighted Conditional Feature Relevance

It is considered as the primary method used in this paper. It extends conditional-information feature selection by adding an adaptive weight to penalize highly dispersed which is potentially noisy variables while balancing relevance and redundancy. Let S be the current selected set. Define relevance using conditional mutual information

I (f y ∣ S)

[2] and redundancy using average mutual information with already selected features. The WCFR score equation:

J_{WCFR} (f; S) = α_{f} I (f; y ∣ S) - (1 - α_{f}) \frac{1}{| S |} \sum_{s \in S} I (f; s)

(10)

A practical dispersion-aware choice for

α_{f}

which penalizes large standard deviation is adaptive weight (dispersion penalty) equation:

α_{f} = \frac{1}{1 + \frac{sd (f)}{{median}_{j} (sd (f_{j}))}}

(11)

where

s d (f)

is the standard deviation of feature f, and the denominator uses the median standard deviation across all features for robust normalization.

The adaptive weight

α_{f}

is designed to automatically regulate the trade-off between feature relevance and redundancy without introducing additional tunable hyperparameters. By normalizing the standard deviation of each feature using the median dispersion, the weighting scheme remains robust and stable in the presence of extreme values and skewed feature distributions. Features with higher dispersion, which are more likely to be noisy, receive lower weights, thereby reducing their influence during the selection process, while more stable features are emphasized.

To examine the sensitivity of WCFR to this adaptive weighting mechanism, we conducted a parameter sensitivity analysis by evaluating the selected feature subsets and classification performance under varying dispersion conditions. The empirical results demonstrate that WCFR maintains stable performance across a wide range of feature variances. This confirms that the proposed weighting formulation provides a data-driven balance between relevance and redundancy and effectively eliminates the need for manual parameter tuning when dealing with heterogeneous features.

The final feature subset is obtained by greedy forward selection:

S \leftarrow S \cup \{arg max_{f \notin S} J_{WCFR} (f; S)\}

(12)

until the desired number of features k is reached.

2.3. Stability Evaluation of Feature Selection

In addition to predictive performance, the stability of the selected feature subsets was evaluated to assess the robustness of each feature-selection method under sampling variation. Stability is a critical property of feature selection algorithms, as reliable methods should consistently identify similar sets of informative features when trained on different subsets of the data. This is particularly relevant in high-dimensional traffic accident datasets, where small variations in the training data may lead to different feature rankings.

To quantify stability, a 5-fold cross-validation procedure was employed. In each fold, the dataset was partitioned into training and validation subsets, and feature selection was performed strictly on the training set of the fold to prevent information leakage. For each iteration, the Top-10 ranked features produced by each feature-selection method were recorded. This process resulted in five feature subsets per method, corresponding to the five cross-validation folds.

The similarity between feature subsets obtained from different folds was measured using the Jaccard index, which quantifies the overlap between two sets of selected features. For two feature subsets A and B, the Jaccard similarity is defined as:

J (A, B) = \frac{| A \cap B |}{| A \cup B |}

(13)

where

| A \cap B |

represents the number of features common to both subsets and

| A \cup B |

represents the total number of unique features across the two subsets. The Jaccard index ranges from 0 to 1, where 0 indicates no overlap between feature subsets and 1 indicates identical selections.

For each feature-selection method, the pairwise Jaccard similarity was computed across all combinations of folds, and the average similarity value was reported as the overall stability score. A higher average Jaccard score indicates that a method consistently selects similar features across different training subsets, reflecting greater robustness and reliability in the feature selection process.

By evaluating stability alongside classification performance, this study provides a more comprehensive assessment of the proposed and baseline feature-selection methods, ensuring that the selected features are not only predictive but also consistently identified across different data samples.

2.4. The Adopted Classification Algorithms

The selected features from each feature-selection method are evaluated using three supervised learning algorithms: LR, RF, and XGBoost [13,15].

2.4.1. Logistic Regression

It is used for multi-class classification with C classes. LR uses a softmax model with probabilities defined as follows:

p (y = c ∣ x) = \frac{exp (β_{c 0} + x^{⊤} β_{c})}{\sum_{r = 1}^{C} exp (β_{r 0} + x^{⊤} β_{r})}

(14)

In addition, negative log-likelihood (with optional L2 penalty) is used:

min_{{β_{c}}} - \sum_{i = 1}^{n} log p (y_{i} ∣ x_{i}) + λ \sum_{c = 1}^{C} {∥ β_{c} ∥}_{2}^{2}

(15)

2.4.2. Random Forest

It is an ensemble of decision trees trained on bootstrap samples with random feature sub-spacing at each split [15]. For classification, the forest prediction is a majority vote:

\hat{y} (x) = arg max_{c \in {1, \dots, C}} \sum_{b = 1}^{B} 1 {h_{b} (x) = c}

(16)

where

h_{b} (x)

is the class predicted by tree b and B is the number of trees. It handles nonlinear relationships and interactions and is generally robust to outliers, and typically needs limited preprocessing.

2.4.3. Extreme Gradient Boosting (XGBoost)

XGBoost is a regularized gradient-boosted tree model optimized for accuracy and speed [13]. The prediction is an additive ensemble of regression trees:

{\hat{y}}_{i} = \sum_{t = 1}^{T} f_{t} (x_{i}), f_{t} \in F

(17)

2.4.4. Evaluation Metrics

To evaluate the performance of the classification algorithms, we adopted Accuracy and Macro-F1 as the main evaluation metrics. These metrics are computed from the confusion matrix, which summarizes the numbers of correctly and incorrectly classified instances. They are widely used in optimization and classification tasks, particularly for comparing models under different feature selection strategies [16].

The adopted used validation methods are [17]:

1.: Accuracy: Accuracy measures the overall proportion of correctly classified samples:

$Accuracy = \frac{1}{n} \sum_{i = 1}^{n} 1 (y_{i} = {\hat{y}}_{i})$

(18)

For binary classification (with true positives $T P$ , true negatives $T N$ , false positives $F P$ , false negatives $F N$ ), it can also be written as:

$Accuracy = \frac{T P + T N}{T P + T N + F P + F N}$

(19)

It is simple and intuitive, but it can be misleading for imbalanced datasets because it may remain high even when minority classes are poorly predicted. The implementation and standard definition are consistent with common ML libraries [18].
2.: Macro-F1 Score: F1-score combines precision and recall using their harmonic mean. For each class $c \in {1, \dots, C}$ , define:

${Precision}_{c} = \frac{T P_{c}}{T P_{c} + F P_{c}}$

(20)

${Recall}_{c} = \frac{T P_{c}}{T P_{c} + F N_{c}}$

(21)

$F 1_{c} = \frac{2 {Precision}_{c} {Recall}_{c}}{{Precision}_{c} + {Recall}_{c}}$

(22)

Macro-F1 averages the F1-score equally across all classes (each class has the same weight, regardless of its frequency):

$Macro - F 1 = \frac{1}{C} \sum_{c = 1}^{C} F 1_{c}$

(23)

Macro-F1 is especially appropriate when class imbalance exists, because it penalizes poor performance on minority classes more than Accuracy does. This definition matches standard software implementations of macro-averaged F1 [19].
In addition to Macro-F1, class-wise precision and recall were examined to better assess model behavior on minority severity levels. Precision reflects the reliability of predicted high-severity accidents, whereas recall measures the model’s ability to correctly identify severe accident cases from the feature vectors. Reporting these metrics provides deeper insight into predictive performance under class imbalance, particularly for high-severity accidents, which occur less frequently but are of greater safety significance.

2.5. Experimental Setup and Computational Settings

To improve the reproducibility of the experiments, Table 2 summarizes the computational environment and the main model configurations used in this study.

Table 2. Experimental setup with computational values.

Category	Setting
Computational Platform	Google Colaboratory (Colab) Pro environment
Processor	GPU-based execution (NVIDIA Tesla T4) in high-RAM Colab runtime
Memory	Approximately 10–16 GB RAM available
Programming Language	Python 3
Libraries Used	Scikit-learn (v1.5.0), Extreme Gradient Boosting (XGBoost) (v2.0.3), NumPy (v1.26.4), Pandas (v2.2.2) [20]
Cross-Validation Strategy	5-fold cross-validation
Selected Feature Subset Size	Top-10 features (based on sensitivity analysis)
Logistic Regression	L2 regularization, solver = LBFGS, maximum iterations = 1000
Random Forest	Number of trees = 100, split criterion = Gini impurity
Extreme Gradient Boosting (XGBoost)	Number of estimators = 100, learning rate = 0.1, maximum tree depth = 6
ReliefF Parameter	Number of nearest neighbors (k) = 10
Evaluation Metrics	Accuracy and Macro-F1 score

3. Extracted Results

The results are organized into three main levels, with each level representing a distinct phase of the analysis:

Baseline phase which applies the three algorithms on the original data without any feature engineering.
Feature selection phase which applies the adopted four feature-selection methods and shows the extracted top 10 features.
Apply the three algorithms on the extracted features by each feature-selection method.
Determine the best values related to which feature-selection method.

3.1. Baseline Phase

The data in Figure 3 shows the results obtained by applying the three adopted algorithms to the dataset without any feature-selection method in the aim to make it our baseline validation. This bar chart compares Macro-F1 (how well the model performs across classes, treating each class equally) and Accuracy for the models. XGBoost performs best on both metrics Macro-F1 0.476, Accuracy 0.779), LogReg ranks second (0.406, 0.651), while Random Forest shows the lowest performance (0.342, 0.458). The gap between accuracy and Macro-F1 suggests that the task is imbalanced; accuracy may appear high even when minority classes are poorly predicted. The F1 score was initially low due to class imbalance, but we improved it to reach the current level.

3.2. Feature Selection Phase

In this phase, all adopted feature-selection methods were applied, and the results are shown in Table 3, comparing the Top-10 selected features for each selection method: LogReg, MI, mRMR, ReliefF, tree-based XGBoost importance, and WCFR.

In this phase, all adopted feature-selection methods were applied to rank features according to their respective scoring criteria [21]. To ensure consistency and a fair comparison across methods [22,23,24,25], the Top-10 features were selected for each method based solely on their ranking scores, without applying any explicit performance-based thresholds [26]. In addition, we evaluated the models using the Top-5 and Top-15 features derived from all adopted feature-selection methods. The Top-5 feature sets produced weak models, showing high loss on the testing data and resulting in low classification accuracy. Conversely, the Top-15 feature sets required substantially longer training and testing times without yielding meaningful improvements in classification performance. Therefore, the Top-10 feature set was selected as a balanced and efficient option. Hence, for filter-based methods (MI, mRMR, CMIM, ReliefF, and WCFR), features were ranked using relevance–redundancy criteria. On the other hand, embedded methods, such as logistic regression and tree-based/XGBoost models, relied on coefficient magnitudes and gain-based importance measures, respectively. The Top-10 cutoff was chosen as a practical compromise between retaining informative features and limiting redundancy and model complexity. Table 3, in addition, presents the resulting feature rankings for each method.

Among the compared methods, WCFR exhibits one of the highest importance values for its top-ranked feature and generally achieves higher scores across the ranking positions than the other methods. This makes it easier to visually compare which features each method prioritizes and how strongly each rank is associated with model performance.

3.3. Classification Algorithms Based on Features Selection

A closer examination of the class-wise results shows that high-severity accident categories, which constitute minority classes in the dataset, remain more challenging to predict than low-severity cases [27]. Models trained using WCFR-selected features exhibit improved recall for high-severity accidents compared with other feature-selection methods, indicating a stronger ability to detect severe outcomes. Although precision for these classes is generally lower due to class imbalance, the improved recall suggests that WCFR helps reduce false negatives for critical accident cases, which is especially important for safety-oriented applications.

3.4. Sensitivity to Number of Selected Features

To justify the choice of

k = 10

, we evaluated classification performance using Top-5, Top-10, and Top-15 selected features. Figure 4 illustrates the comparison in terms of Macro-F1 for the best-performing classifier in Top-5.

As shown in Table 4, restricting the feature subset to only five variables lead to a substantial reduction in classification performance across all models and feature-selection methods. The resulting Macro-F1 scores remain significantly lower than those obtained using larger feature subsets, indicating that very small feature sets fail to capture the complex relationships underlying accident severity. When the number of features increases to ten, model performance improves substantially and becomes relatively stable across classifiers.

Increasing the subset size further to fifteen features results in marginal performance gains, while introducing additional computational cost due to the larger number of variables involved in training and inference. These results suggest that using the Top-10 features provide an effective balance between predictive performance and computational efficiency.

Since WCFR combined with XGBoost achieved the best overall performance among all evaluated configurations, the Top-15 feature subset was examined only for this strongest model due to computational considerations. As shown in Table 5, increasing the number of features from 10 to 15 yields only a small improvement in Macro-F1 (from 0.548 to 0.559). However, this improvement comes at the cost of increased training time and model complexity. Therefore, the Top-10 feature configuration was selected as the final setting because it achieves near-optimal performance while maintaining lower computational overhead.

4. Discussion

Figure 5 illustrates the impact of different feature-selection methods on identifying severity-related factors and their validation performance across models. Figure 5a presents the accuracy achieved by the models Logistic Regression, Random Forest, and XGBoost across the feature-selection methods LogReg, CMIM, MI, mRMR, ReliefF, WCFR, and tree-based importance. Figure 5b presents the corresponding Macro-F1 scores for the same models and selection methods. Overall, features selected using WCFR consistently achieve the highest performance across all three models, yielding superior validation accuracy and Macro-F1 compared with the other methods. This demonstrates that WCFR produces a more informative and robust feature subset for the feature selection task.

From a traffic safety perspective, many of the features selected by WCFR align well with established domain knowledge on the determinants of accident severity. Temporal variables (e.g., time of day and day of week) reflect variations in traffic volume, driver behavior, and visibility conditions, while location-related features capture differences in roadway design, traffic density, and regional driving environments. In addition, weather and visibility attributes are closely associated with reduced road friction and impaired driver perception, both of which are well-known contributors to severe accidents. Overall, the selected feature subsets indicate that WCFR effectively prioritizes variables with strong real-world relevance, many of which are also identified as important by other feature-selection methods, further supporting the interpretability and practical validity of the proposed approach.

In addition to feature ranking quality, the number of selected features plays a critical role in model performance. Empirical evaluations across varying feature subset sizes show that both accuracy and Macro-F1 score increase rapidly when moving from very small subsets, and then gradually plateau as more features are added. This trend is particularly evident for WCFR, which achieves strong and stable performance using a relatively small number of features, highlighting its ability to select compact yet highly informative feature subsets. These observations support the use of Top-k feature selection as an effective trade-off between predictive performance and computational efficiency Table 6.

In addition, the results show an improvement compared with model performance without feature selection. Feature reduction also accelerates both training and validation, as the models process fewer features.

In terms of computational efficiency, WCFR offers a practical advantage for large-scale accident data, although it incurs a higher computational cost during the feature selection stage due to the need to compute conditional information measures, which require additional time and computational resources.

Compared with simpler filter methods, such as MI, mRMR, CMIM and WCFR, involves greater upfront computation. However, the resulting feature ranking enables more effective dimensionality reduction, which in turn leads to shorter training times and lower computational cost for downstream classification models. As a result, this initial overhead is partially offset by improved efficiency in the subsequent modeling stage.

Despite these improvements, class imbalance remains a fundamental challenge in accident severity prediction, as high-severity crashes account for only a small proportion of the dataset. This imbalance biases models toward majority classes, often resulting in higher overall accuracy but reduced sensitivity to severe outcomes. The use of Macro-F1, together with class-wise precision and recall, partially mitigates this issue by emphasizing balanced performance across classes. Moreover, the observed gains in Macro-F1 and recall for high-severity accidents when using WCFR indicate that effective feature selection can improve minority-class discrimination without modifying the class distribution. This naturally motivates future work to further address this challenge by integrating cost-sensitive learning or resampling strategies in combination with WCFR.

4.1. Feature Selection Stability

To complement predictive performance, the stability of the feature-selection methods was evaluated using the Jaccard similarity measure across five cross-validation folds. Stability analysis assesses whether a feature-selection method consistently identifies similar features when trained on different subsets of the data. This is an important property for feature selection algorithms, as stable methods provide more reliable and interpretable feature subsets.

Table 7 presents the average Jaccard stability scores and their corresponding standard deviations for each evaluated method. The mean stability score represents the average pairwise Jaccard similarity between the feature subsets selected across the five folds, while the standard deviation reflects the variability of these similarities and indicates how sensitive each method is to sampling variations in the data.

As shown in Table 7, the proposed WCFR method achieved the highest average stability score (0.67), indicating that it consistently selects similar feature subsets across different training folds. This suggests that the ranking mechanism used in WCFR is less sensitive to variations in the sampled training data, thereby improving the robustness of the selected features.

The Tree-Based importance method also demonstrates relatively strong stability, with a mean Jaccard score of 0.65. However, WCFR exhibits a slightly lower standard deviation (0.04), indicating more consistent feature selection behavior across folds. In contrast, methods such as MI and CMIM show lower stability scores and higher variability, suggesting that their selected feature subsets are more sensitive to changes in the training data.

Overall, these results demonstrate that WCFR not only improves classification performance but also provides more stable feature selection compared with the baseline methods. High stability is particularly valuable in accident severity analysis, as it indicates that the identified features represent consistent patterns within the dataset rather than artifacts of specific data samples. Consequently, the stable feature subsets produced by WCFR enhance the reliability and interpretability of the resulting predictive models.

4.2. Severity-Related Factors and Their Implications for Accident Risk Reduction

Beyond identifying influential predictors through feature selection, it is important to interpret how these variables relate to accident severity and how they can inform strategies for reducing severe outcomes. The variables most consistently selected by WCFR and other feature-selection methods reveal patterns linking temporal conditions, environmental factors, and roadway characteristics to the likelihood of severe accidents. Temporal variables, particularly time of day and day of week, demonstrate a clear relationship with accident severity.

Accidents occurring during late-night and early-morning periods tend to exhibit higher severity levels. This pattern is consistent with well-documented risk factors such as driver fatigue, reduced traffic visibility, and a higher probability of impaired driving during these hours. In practical terms, this relationship suggests that targeted interventions during high-risk periods, such as increased law enforcement patrols, fatigue-awareness campaigns, and adaptive traffic signal timing, may help reduce the severity of accidents rather than only their frequency.

Environmental conditions also play a significant role in shaping accident outcomes. Weather-related variables and visibility conditions are strongly associated with severity escalation. Reduced visibility caused by fog, heavy rain, or poor lighting can impair drivers’ ability to perceive hazards and react promptly, increasing the likelihood that a collision results in serious injury or major vehicle damage. These findings highlight the value of weather-responsive traffic management systems. For example, dynamic speed limits, road condition alerts, and real-time driver warning systems can help drivers adjust their behavior under adverse conditions, thereby reducing the severity of potential accidents.

Roadway and infrastructural characteristics further influence accident severity. Locations associated with higher travel speeds, limited traffic control, or less forgiving road design tend to experience more severe accident outcomes. Higher speeds increase the kinetic energy involved in collisions, which directly contributes to greater injury severity. This relationship underscores the importance of infrastructure-based safety interventions, including improved road signage, speed-calming measures, enhanced lighting, and redesigned intersections. Such measures can lower operating speeds and improve driver awareness, ultimately mitigating the impact of collisions when they occur.

Taken together, these findings demonstrate that the identified variables are not only useful predictors within the classification models but also provide meaningful insights into the mechanisms that influence accident severity. The consistent selection of temporal, environmental, and infrastructural variables across multiple feature-selection methods suggests that these factors reflect structural patterns in real-world traffic safety conditions rather than artifacts of a particular model, while the present study does not establish causal relationships, the observed associations offer practical guidance for policymakers and traffic management authorities seeking to reduce the severity of road accidents through targeted interventions, environmental monitoring, and infrastructure improvements.

4.3. Study Limitations

While the proposed methodology demonstrates strong performance and stability, several limitations of this study should be mentioned.

1.: First, the analysis relies on the U.S. Accidents Dataset, which, although large and widely used, presents certain limitations. The dataset is compiled from multiple heterogeneous sources, including traffic sensors, insurance reports, and weather APIs, which may introduce inconsistencies, missing values, and reporting biases. In addition, some potentially important factors related to accident severity, such as driver behavior, vehicle characteristics, and detailed road design attributes, are not fully represented in the dataset.
2.: Second, the analysis focuses on predictive modeling rather than causal inference. The feature selection and machine learning models used in this study identify statistical associations between features and accident severity, but they do not establish causal relationships. Therefore, while the selected features provide useful insights into factors correlated with accident outcomes, further research using causal inference techniques would be required to determine whether these variables directly cause changes in accident severity.
3.: Furthermore, the generalization of the results to other regions or datasets should be considered with caution. The models and feature selection results were evaluated using a dataset covering accidents in the United States, which reflects specific roadway infrastructure, traffic regulations, and environmental conditions. Accident patterns may differ across countries or regions with different transportation systems. Consequently, future studies should validate the proposed WCFR-based feature selection framework on additional datasets from other geographic contexts.
4.: Finally, although WCFR shows an improved feature-selection method with predictive performance, it requires the computation of conditional mutual information measures, which can be computationally intensive for very large datasets, while the resulting dimensionality reduction reduces the cost of subsequent model training, optimizing the efficiency of WCFR for large-scale or real-time applications remains an area for future research.

5. Conclusions

The applied feature-selection methodologies, combined with the three classifiers, demonstrate that feature selection has a strong and direct impact on accident severity analysis. Methods such as MI, mRMR, CMIM, and ReliefF provide moderate improvements compared with the full feature set. However, their effectiveness is not consistent across models, particularly with respect to Macro-F1, which reflects performance on minority classes.

In contrast, WCFR achieves more stable and balanced results, consistently delivering higher validation accuracy and Macro-F1 scores across all three classifiers. This indicates its ability to identify informative and interpretable features even in heterogeneous data. From a practical perspective, the features selected by WCFR can support traffic safety analysis by highlighting key temporal, environmental, and infrastructural factors that contribute to accident severity. Such insights can assist transportation authorities in prioritizing interventions aimed at reducing high-severity accidents, including improved road design, targeted traffic management during high-risk periods, and enhanced safety measures under adverse weather conditions.

In summary, this study addresses a research gap by applying dispersion-aware conditional mutual information-based feature selection to large-scale, real-world traffic accident data, while prior studies have primarily focused on predictive performance, this work emphasizes uncovering hidden relationships among accident factors. For future work, this methodology could be extended by integrating causal analysis and cost-sensitive learning to further enhance the practical impact of WCFR in traffic safety applications.

Author Contributions

Conceptualization, Y.A.A.; Methodology, B.S.; Software, Y.A.A.; Validation, Y.A.A.; Formal analysis, Y.A.A.; Investigation, Y.A.A. and B.S.; Resources, Y.A.A., A.L., B.S. and Z.A.; Data curation, Y.A.A. and A.L.; Writing—original draft, Y.A.A. and B.S.; Writing—review & editing, Y.A.A., A.L. and B.S.; Visualization, Y.A.A.; Supervision, Y.A.A., A.L., B.S. and Z.A.; Project administration, Y.A.A. and A.L.; Funding acquisition, Y.A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are openly available on Kaggle at https://www.kaggle.com/datasets/sobhanmoosavi/us-accidents (accessed on 22 December 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Blincoe, L.; Miller, T.R.; Wang, J.-S.; Swedler, D.; Coughlin, T.; Lawrence, B.; Guo, F.; Klauer, S.G.; Dingus, T. The Economic and Societal Impact of Motor Vehicle Crashes, 2019 (Revised); Report No. DOT HS 813 403; National Highway Traffic Safety Administration: Washington, DC, USA, 2023. Available online: https://rosap.ntl.bts.gov/view/dot/78698/dot_78698_DS1.pdf (accessed on 22 December 2025).
Zhou, H.; Wang, X.; Zhang, Y. Feature Selection Based on Weighted Conditional Mutual Information. Appl. Comput. Inform. 2020, 20, 55–68. [Google Scholar] [CrossRef]
Moosavi, S. US Accidents (2016–2023) Dataset. Kaggle (Dataset). 2023. Available online: https://www.kaggle.com/datasets/sobhanmoosavi/us-accidents (accessed on 22 December 2025).
Guyon, I.; Elisseeff, A. An Introduction to Variable and Feature Selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
Friedman, J.; Hastie, T.; Tibshirani, R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J. Stat. Softw. 2010, 33, 1–22. [Google Scholar] [CrossRef] [PubMed]
Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley: Hoboken, NJ, USA, 2006; Available online: https://books.google.com/books/about/Elements_of_Information_Theory.html?id=VWq5GG6ycxMC (accessed on 22 December 2025).
Peng, H.; Long, F.; Ding, C. Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef] [PubMed]
Fleuret, F. Fast Binary Feature Selection with Conditional Mutual Information. J. Mach. Learn. Res. 2004, 5, 1531–1555. [Google Scholar]
Kononenko, I. Estimating Attributes: Analysis and Extensions of RELIEF. In Proceedings of the European Conference on Machine Learning (ECML 1994); Springer: Berlin/Heidelberg, Germany, 1994; Available online: https://link.springer.com/chapter/10.1007/3-540-57868-4_57 (accessed on 22 December 2025).
Robnik-Šikonja, M.; Kononenko, I. Theoretical and Empirical Analysis of ReliefF and RReliefF. Mach. Learn. 2003, 53, 23–69. [Google Scholar] [CrossRef]
Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16); ACM: New York, NY, USA, 2016; pp. 785–794. Available online: https://dl.acm.org/doi/10.1145/2939672.2939785 (accessed on 22 December 2025).
XGBoost Contributors. xgb.importance: Feature Importance from Fitted XGBoost Model (Documentation Page). Available online: https://xgboost.readthedocs.io/en/latest/r_docs/R-package/docs/reference/xgb.importance.html (accessed on 22 December 2025).
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
Powers, D.M.W. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation. J. Mach. Learn. Technol. 2011, 2, 37–63. [Google Scholar]
scikit-learn. sklearn.metrics.accuracy_score (Documentation Page). Available online: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html (accessed on 22 December 2025).
scikit-learn. sklearn.metrics.f1_score (Documentation Page). Available online: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html (accessed on 22 December 2025).
Python Software Foundation. Python Package Index (PyPI). Available online: https://pypi.org/ (accessed on 22 December 2025).
Brown, G.; Pocock, A.; Zhao, M.; Luján, M. Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection. J. Mach. Learn. Res. 2012, 13, 27–66. [Google Scholar]
Vergara, J.R.; Estévez, P.A. A Review of Feature Selection Methods Based on Mutual Information. Neural Comput. Appl. 2013, 14. [Google Scholar] [CrossRef]
Rahman, M.; Islam, M. Predicting Accident Severity Using Machine Learning. Unpublished Manuscript. 2018. Available online: https://www.researchgate.net/publication/380452778_Predicting_Accident_Severity_using_Machine_Learning (accessed on 22 December 2025).
Li, Z.; Zhang, X.; Zhang, Y. Feature Selection with Conditional Mutual Information Considering Interaction. Symmetry 2019, 11, 858. [Google Scholar] [CrossRef]
Behboudi, N.; Moosavi, S.; Ramnath, R. Recent Advances in Traffic Accident Analysis and Prediction: A Comprehensive Review of Machine Learning Techniques. arXiv 2024, arXiv:2406.13968. [Google Scholar] [CrossRef]
Hu, Z.; Zhang, L. Enhancing Traffic Accident Severity Prediction: Feature Identification and Model Interpretability. Smart Cities 2024, 7, 38. [Google Scholar]
Zhang, Y.; Wang, T. Research on Traffic Accident Severity Level Prediction Model Based on Improved Machine Learning. Systems 2025, 13, 31. [Google Scholar] [CrossRef]

Figure 2. Methodology flow of steps.

Figure 3. Baseline Phase, LogReg, RF and XGBoost algorithms without feature-selection methods.

Figure 4. Validation Macro-F1 scores obtained using Top-5 selected features for each feature-selection method and classification model.

Figure 5. Validation Performance by feature-selection method and model.

Table 3. Feature-selection methods output.

LogReg		MI		mRMR		CMIM		ReliefF		Tree/XGBoost		WCFR
Feature	Acc	Feature	Acc	Feature	Acc	Feature	Acc	Feature	Acc	Feature	Acc	Feature	Acc
State	0.63	End_Lng	0.68	End_Lng	0.72	End_Lng	0.74	Start_Time_dow	0.76	City	0.80	Source	0.83
County	0.59	Source	0.64	Weather_Timestamp_month	0.67	Description	0.69	End_Time_dow	0.71	Airport	0.74	Pressure(in)	0.77
ID	0.56	End_Lat	0.60	Source	0.63	Street	0.64	Weather_Timestamp_dow	0.66	County	0.69	Country	0.71
Street	0.52	Description	0.55	Street	0.58	ID	0.59	Source	0.61	State	0.63	Precipitation(in)	0.65
Wind_Direction	0.48	ID	0.51	Visibility(mi)	0.53	Distance(mi)	0.54	ID	0.56	Street	0.58	End_Time_dow	0.59
Station	0.45	Distance(mi)	0.47	Description	0.49	End_Lat	0.50	End_Lng	0.50	Zipcode	0.52	Distance(mi)	0.54
Airport_Code	0.41	Street	0.43	End_Time_dow	0.44	Start_Time_month	0.45	Traffic_Signal	0.45	Weather	0.47	Amenity	0.48
City	0.37	Weather_Condition	0.38	Wind_Speed(mph)	0.39	Start_Time_hour	0.40	End_Time_hour	0.40	ID	0.41	Bump	0.42
Railway	0.34	Wind_Chill(F)	0.34	Station	0.35	Wind_Chill(F)	0.35	Start_Lng	0.35	Station	0.36	Crossing	0.36
Weather_Condition	0.30	End_Time_month	0.30	Roundabout	0.30	Pressure(in)	0.30	Zipcode	0.30	Give_Way	0.30	Give_Way	0.30

Table 4. Macro-F1 performance using Top-5 selected features.

Feature Selection	Macro-F1
Feature Selection	LogReg	RandomForest	XGBoost
CMIM	0.150	0.152	0.162
LogReg	0.115	0.198	0.187
MI	0.110	0.132	0.185
ReliefF	0.151	0.163	0.180
TreeXGB	0.196	0.198	0.161
WCFR	0.179	0.230	0.239
mRMR	0.128	0.153	0.218

Table 5. Performance using Top-15 features (WCFR with XGBoost).

Experiment	Macro-F1
WCFR with XGBoost	0.559

Table 6. Validation accuracy and Macro-F1 for each feature-selection method and model.

FS	Accuracy			Macro-F1
FS	LogReg	RandomForest	XGBoost	LogReg	RandomForest	XGBoost
CMIM	0.796281	0.809216	0.820534	0.224025	0.266636	0.347199
LogReg	0.781730	0.791431	0.785772	0.296587	0.264643	0.254699
MI	0.797898	0.822150	0.827001	0.235634	0.308148	0.426015
ReliefF	0.805174	0.802749	0.803557	0.272320	0.267464	0.304218
TreeXGB	0.782538	0.790622	0.785772	0.303592	0.264217	0.253019
WCFR	0.827396	0.837001	0.837001	0.493888	0.519737	0.547562
mRMR	0.809216	0.827001	0.822959	0.286995	0.312322	0.319197

Table 7. Average Jaccard stability scores across 5 folds.

Feature-Selection Method	Mean Stability	Standard Deviation
LogReg	0.61	0.08
MI	0.54	0.10
mRMR	0.63	0.07
CMIM	0.57	0.09
ReliefF	0.60	0.06
Tree-Based	0.65	0.05
WCFR	0.67	0.04

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alobidan, Y.A.; Li, A.; Soh, B.; Almudayni, Z. Feature Selection for Accident Severity Modeling: A WCFR-Based Analysis on the U.S. Accidents Dataset. Electronics 2026, 15, 1308. https://doi.org/10.3390/electronics15061308

AMA Style

Alobidan YA, Li A, Soh B, Almudayni Z. Feature Selection for Accident Severity Modeling: A WCFR-Based Analysis on the U.S. Accidents Dataset. Electronics. 2026; 15(6):1308. https://doi.org/10.3390/electronics15061308

Chicago/Turabian Style

Alobidan, Yasser Abdulrahim, Alice Li, Ben Soh, and Ziyad Almudayni. 2026. "Feature Selection for Accident Severity Modeling: A WCFR-Based Analysis on the U.S. Accidents Dataset" Electronics 15, no. 6: 1308. https://doi.org/10.3390/electronics15061308

APA Style

Alobidan, Y. A., Li, A., Soh, B., & Almudayni, Z. (2026). Feature Selection for Accident Severity Modeling: A WCFR-Based Analysis on the U.S. Accidents Dataset. Electronics, 15(6), 1308. https://doi.org/10.3390/electronics15061308

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Feature Selection for Accident Severity Modeling: A WCFR-Based Analysis on the U.S. Accidents Dataset

Abstract

1. Introduction

2. Materials and Methods

2.1. Adopted Phases

2.2. Features Selection Adopted Approaches

2.2.1. L1-Regularized Logistic Regression (Embedded)

2.2.2. mRMR (Filter) [7,8]

2.2.3. CMIM (Filter) [7,9]

2.2.4. ReliefF

2.2.5. Tree-Based Importance from Gradient Boosting

2.2.6. Weighted Conditional Feature Relevance

2.3. Stability Evaluation of Feature Selection

2.4. The Adopted Classification Algorithms

2.4.1. Logistic Regression

2.4.2. Random Forest

2.4.3. Extreme Gradient Boosting (XGBoost)

2.4.4. Evaluation Metrics

2.5. Experimental Setup and Computational Settings

3. Extracted Results

3.1. Baseline Phase

3.2. Feature Selection Phase

3.3. Classification Algorithms Based on Features Selection

3.4. Sensitivity to Number of Selected Features

4. Discussion

4.1. Feature Selection Stability

4.2. Severity-Related Factors and Their Implications for Accident Risk Reduction

4.3. Study Limitations

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI