Recognizing Risk Driving Behaviors with an Improved Crested Porcupine Optimizer and XGBoost

Su, Juan; Shen, Tong; Tang, Fuli; You, Xue; He, Qingling; Lu, Xiaojuan; Li, Yikang; Luo, Shenglin

doi:10.3390/su18062804

Open AccessArticle

Recognizing Risk Driving Behaviors with an Improved Crested Porcupine Optimizer and XGBoost

by

Juan Su

¹,

Tong Shen

²,

Fuli Tang

¹,

Xue You

¹,

Qingling He

^3,*

,

Xiaojuan Lu

^3,*,

Yikang Li

³ and

Shenglin Luo

³

¹

School of Economics and Management, Qingdao Institute of Technology, Jiaozhou 266300, China

²

School of Art and Design, Xi’an University of Technology, Xi’an 710048, China

³

School of Traffic and Transportation, Lanzhou Jiaotong University, Lanzhou 730070, China

^*

Authors to whom correspondence should be addressed.

Sustainability 2026, 18(6), 2804; https://doi.org/10.3390/su18062804

Submission received: 29 December 2025 / Revised: 22 February 2026 / Accepted: 10 March 2026 / Published: 12 March 2026

(This article belongs to the Collection Accident Prevention and Risk Management for Safe and Sustainable Transportation)

Download

Browse Figures

Versions Notes

Abstract

The effective recognition of risky driving behaviors holds technical potential for supporting accident prevention and sustainable transportation. However, existing intelligent algorithms for optimizing deep learning models in this field often suffer from slow convergence and high errors. This study proposes a novel hybrid model (ICPO-XGBoost) for risky driving behavior classification. The improved crested porcupine optimizer (ICPO) was developed using logistic-tent composite mapping for population initialization, a hybrid mechanism combining refraction opposition-based learning and Cauchy mutation to avoid local optima, and an adaptive variable spiral search with inertia weight to balance global and local search. The ICPO was then employed to optimize the hyperparameters of the XGBoost classifier. The ICPO demonstrated superior optimization accuracy and convergence speed compared to benchmark algorithms. The ICPO-XGBoost model achieved accuracy, precision, recall, and F1 scores of 96.2%, 95.4%, 95.8%, and 95.6%, respectively, for classifying and identifying risky driving behaviors. Compared to various benchmark models, these results represent increases of 12.7–24.8%, 14.8–31.8%, 14.9–31.0%, and 15.0–32.4%, respectively. For specific driving behavior categories (normal driving, slow driving, short-distance tailgating, sudden acceleration/deceleration, frequent lane changing, and forced lane changing), the precision, recall, and F1 scores of the ICPO-XGBoost model fell within the ranges of 84.8–99.2%, 87.5–100.0%, and 86.2–99.2%, respectively. Compared to benchmark models, these metrics show increases of 1.5–75.8%, 5.8–68.1%, and 3.3–72.6%, respectively. Notably, the model significantly improved accuracy in identifying sudden acceleration/deceleration behaviors. The results of this model facilitate the classification and early warning of risky driving behaviors, thereby reducing the frequency of such behaviors, lowering the risk of traffic accidents, and enhancing road traffic safety.

Keywords:

traffic safety; risky driving behavior; crested porcupine optimizer (CPO); hybrid improvement strategy; XGBoost; classification and recognition

1. Introduction

The widespread issue of traffic-related injuries and fatalities has become a major global concern, resulting in more than 1.2 million deaths annually and imposing significant socioeconomic costs worldwide [1]. With the increasing vehicle ownership and usage in developing countries, if effective measures are not implemented at the levels of human factors, vehicles, roads, and the environment, the number of deaths due to road traffic accidents is projected to rise to 2.4 million by 2030, making it the seventh leading cause of death worldwide. In China, the growing demand for motorized travel, the increase in mixed traffic volumes, and the weak safety awareness among road users have particularly exacerbated traffic safety issues [2]. According to statistics, in 2024, China’s total car ownership reached 352.68 million vehicles, and the number of licensed motor vehicle drivers reached 523.2 million, representing increases of 7.2% and 4.3%, respectively, compared to 2023. In the same year, a total of 150,054 road traffic accidents involving motor vehicles occurred in China, resulting in direct economic losses of 10.4345 billion yuan, which marked a 10.7% increase compared to the previous year [3]. As operators of vehicles, participants in road traffic, and perceivers of environmental conditions, drivers play a dominant role in the road traffic system [4]. Drivers’ risk perception of the road traffic environment or improper operations are prone to cause abnormal vehicle trajectories, leading to road traffic accidents [5]. How to calibrate, fit, and determine the parameters of regression and classification recognition models based on the characteristics of vehicle driving trajectories, in order to build an accurate classification and recognition model for risky driving behaviors, is the key to identifying and warning against such behaviors. The outcomes of this research can lower the occurrence rate of risky driving behaviors and improve the level of road traffic safety.

Research on driving behavior classification and recognition has primarily relied on analyzing vehicle trajectory characteristics, including lateral and longitudinal velocity [6], acceleration/deceleration patterns, and lateral offset [7]. The methodological framework in this domain encompasses three main categories: mathematical statistics models, machine learning approaches, and deep learning architectures. Within mathematical statistics, representative models include the multi-objective recognition (MOR)-based driving risk identification model [8], the risk index-based behavior classification model [9], and the Gaussian mixture hidden Markov model (GMM-HMM) [10]. Machine learning models such as k-means clustering [11], back propagation neural network [12], support vector machine [13], and random forest [14] offer higher recognition accuracy compared to the aforementioned models. However, they struggle to overcome difficulties in determining kernel function parameters under large data sample conditions, and their optimization accuracy, convergence rate, and generalization capabilities need improvement [15].

By exploring driving behavior data features, such as driver personal attributes, driving habits, road section accident rates, and lane-changing duration under large data sample conditions [16], driving behavior recognition models based on long short-term memory (LSTM) [17], multi-channel convolutional neural network and long short-term memory (MCT-CNN-LSTM) [18], XGBoost [19], and transformer [20] have been constructed. Compared with traditional machine learning models, these approaches demonstrate superior capability in capturing nonlinear relationships and complex correlations within driving behavior data, ultimately enhancing model accuracy and robustness [21]. Driving behavior recognition models, such as particle swarm optimized-least squares support vector machine (PSO-LSSVM) [22], genetic algorithm-convolutional neural network (GA-CNN) [23], and salp swarm algorithm-back propagation (SSA-BP) [24], address a major drawback of deep learning—excessive reliance on manual parameter tuning. By mitigating this issue, they improve the generalization ability of predictive models.

The crested porcupine optimizer (CPO) is characterized by its simple structural principle and limited number of model parameters. Compared with other existing intelligent algorithms, the CPO algorithm offers advantages in higher solution accuracy and faster convergence speed [25], leading to its widespread application in engineering problems such as path planning [26], traffic accident severity identification [27], and mechanical fault classification and detection [28]. However, existing research suggests that the circular selection strategy employed during the initialization phase of the CPO algorithm tends to reduce the diversity and quality of the population. Furthermore, limitations in the individual position update mechanisms during the perception and physiological defense phases diminish the algorithm’s ability to balance the relationship between global search and local exploitation. These defects in the optimization mechanism of the CPO algorithm increase its risk of falling into local optima, thereby compromising its solution accuracy and convergence rate. Existing research on the improvement of the crested porcupine optimizer (CPO) primarily includes two aspects: first, employing strategies such as sine chaotic mapping [29], circle chaotic mapping [30], and the good point set strategy [31] to enhance the circular selection generation mechanism of the CPO. This increases the uniformity and diversity of the solution space distribution during the population initialization phase, thereby improving the quality of the solution set during the optimization process. Second, utilizing strategies including elite opposition-based learning [32], the Lévy flight strategy [33], the velocity vector mechanism of the particle swarm optimization algorithm [34], the Cauchy mutation operator [35], adaptive t-distribution-based perturbation [27], and adaptive neighborhood search [36] to optimize the position update methods of individual crested porcupines during the perception and physiological defense phases. This expands the search space for optimal feasible solutions of the CPO algorithm and enhances its ability to balance local exploitation and global search.

Existing research has primarily focused on exploring the correlation characteristics between different influencing factors and driving behaviors in publicly available foreign datasets, constructing driving behavior classification and recognition models mainly based on machine learning and intelligent algorithms, and optimizing deep learning parameters. However, it has overlooked influencing factors such as the domestic mixed traffic conditions and driving behavior habits. This paper first employs logistic-tent composite mapping to initialize the porcupine population, thereby enhancing its diversity and quality. To overcome the algorithm’s tendency to become trapped in local optima and converge prematurely, a hybrid mechanism combining refraction opposition-based learning and Cauchy mutation is introduced. Subsequently, the integration of adaptive variable spiral search and inertia weights refines the individual position update process, effectively balancing global exploration with local exploitation and improving convergence accuracy. Finally, a classification and recognition model is developed using the ICPO-XGBoost algorithm, which is applied to clustered trajectory data of risky driving behaviors. The ICPO-XGBoost model, by accounting for the mixed traffic operating conditions and driver behavioral habits, enhances both the classification accuracy and practical applicability of risk driving behavior recognition.

The remainder of the paper is organized as follows:

In Section 2, the strategies and roles of the improved porcupine optimization algorithm are detailed. The numerical simulation results of the hybrid strategy improved porcupine optimization algorithm for optimizing 12 benchmark test functions are comparatively analyzed, and the ICPO-XGBoost-based risky driving behavior classification and recognition model is constructed.

In Section 3, the K-means algorithm, nearest neighbor imputation method, and Savitzky–Golay filter are used to cluster and smooth/denoise the risky driving behavior data. The results of the ICPO-XGBoost-based risky driving behavior classification and recognition model are discussed, and the classification accuracy and applicability of the proposed model are comparatively analyzed against existing models.

2. Risk Driving Behavior Classification and Recognition Model Based on ICPO-XGBoost

In this study, the terms “recognition,” “classification,” and “identification” are used with distinct meanings. “Recognition” refers to the overall process of detecting risky driving behaviors from trajectory data. “Classification” specifically denotes the assignment of detected behaviors into predefined categories (e.g., sudden acceleration/deceleration, lane changing). “Identification” is used interchangeably with “recognition” to describe the broader task of behavior detection. This distinction is maintained throughout the manuscript to ensure clarity and consistency.

This paper presents an improved crested porcupine optimizer (ICPO) addressing limitations in population initialization and position updating mechanisms of the original CPO. Key enhancements include logistic-tent chaotic mapping for superior population diversity and quality, refraction opposition-based learning with Cauchy mutation for expanded search space and reduced local optima risk, and adaptive variable spiral search with inertia weight strategy for optimized perception-defense phase updates. This synergistic hybrid approach significantly advances the CPO algorithm’s optimization accuracy and convergence performance.

2.1. Design of Hybrid Strategy Improved Crested Porcupine Optimization Algorithm

2.1.1. Population Initialization Optimization Design Based on Logistic-Tent Composite Mapping

The crested porcupine optimizer (CPO) [25] is a nature-inspired metaheuristic that models the defensive behaviors of crested porcupines. The algorithm employs a cyclic selection mechanism for population initialization, which enhances population diversity and quality, thereby reducing susceptibility to local optima and accelerating convergence. The corresponding mathematical representations are given in Equation (1) (population initialization) and Equation (2) (cyclic selection).

\overset{⇀}{X_{i}} = \overset{⇀}{L} + \overset{⇀}{r} \times (\overset{⇀}{U} - \overset{⇀}{L}) | i = 1, 2, \dots, N

(1)

where

\overset{⇀}{X_{i}}

is the

i

-th population individual;

\overset{⇀}{L}

and

\overset{⇀}{U}

represent the upper and lower bounds of the population size, respectively;

\overset{⇀}{r}

is a random vector within the range [0, 1];

N

denotes the population size.

N = N_{\min} + (N^{'} - N_{\min}) \times (1 - (\frac{t % \frac{T_{\max}}{T}}{\frac{T_{\max}}{T}}))

(2)

where

T

is the number of cyclic selection iterations;

T_{\max}

is the maximum number of cyclic selection iterations;

%

is the modulo operator;

N_{\min}

is the minimum size of the newly generated population through cyclic selection.

The perception defense strategy mimics the crested porcupine’s predator intimidation through feather ruffling and noise generation. Equations (3) and (4) mathematically formulate these two behaviors, respectively.

\overset{⇀}{X_{i}^{t + 1}} = \overset{⇀}{X_{i}^{t}} + τ_{1} \times |2 \times τ_{2} \times \overset{⇀}{x_{c p}^{t}} - \overset{⇀}{y_{i}^{t}}|

(3)

where

\overset{⇀}{x_{c p}^{t}}

is the optimal individual at the current iteration;

\overset{⇀}{y_{i}^{t}}

is the individual vector of the population at the current iteration;

τ_{1}

is a random number following a normal distribution;

τ_{2}

is a random number in the range [0, 1].

\overset{⇀}{x_{i}^{t + 1}} = (1 - \overset{⇀}{U_{1}}) \times \overset{⇀}{x_{i}^{t}} + \overset{⇀}{U_{1}} \times (\overset{⇀}{y} + τ_{3} (\overset{⇀}{x_{r 1}^{t}} - \overset{⇀}{x_{r 2}^{t}}))

(4)

where

r_{1}

and

r_{2}

are random integers within the range [1, N];

τ_{3}

is a random value generated in the range [0, 1].

The physiological defense strategy emulates the crested porcupine’s dual defense mechanisms against predators: olfactory secretion and physical confrontation. The corresponding mathematical models are presented in Equation (5) and Equation (6), respectively.

\overset{⇀}{x_{i}^{t + 1}} = (1 - \overset{⇀}{U_{1}}) \times \overset{⇀}{x_{i}^{t}} + \overset{⇀}{U_{1}} \times (\overset{⇀}{x_{r 1}^{t}} + S_{i}^{t} \times (\overset{⇀}{x_{r 2}^{t}} - \overset{⇀}{x_{r 3}^{t}}) - τ_{3} \times \overset{⇀}{δ} \times γ_{t} \times S_{i}^{t})

(5)

where

\overset{⇀}{x_{i}^{t + 1}}

is the position of the

i

-th population individual at the

t

-th cycle selection iteration;

\overset{⇀}{δ}

is a parameter controlling the search direction;

γ_{t}

is the defense factor;

S_{i}^{t}

is the scent diffusion factor;

r_{3}

is a random integer within the range [1, N].

\overset{⇀}{x_{i}^{t + 1}} = \overset{⇀}{x_{c p}^{t}} + (α (1 - τ_{4}) + τ_{4}) \times (δ \times \overset{⇀}{x_{c p}^{t}} - \overset{⇀}{x_{i}^{t}}) - τ_{5} \times δ \times γ_{t} \times \overset{⇀}{F_{i}^{t}}

(6)

where

α

is the convergence speed factor;

τ_{4}

is a random value generated in the range [0, 1];

\overset{⇀}{F_{i}^{t}}

is the average force affecting the

i

-th individual in the population.

The loop-based selective initialization in the CPO may reduce population diversity and quality. This can lead to uneven individual distribution and a limited global search space. To address this, we replace the original strategy with a logistic-tent combined chaotic map. This map is chosen for its strong spatial traversal, fast iteration, and high applicability [37]. In optimizing risk driving behavior data, it enhances the uniformity and ergodicity of population distribution, which improves the algorithm’s convergence speed and solution accuracy. The equation for the logistic-tent combined map is as follows:

z (t + 1) = \{\begin{array}{l} r z, (1 - z (t)) + (4 - r) z (t), 0 < z (t) \leq 0.3 \\ r z, (1 - z (t)) + (4 - r) (1 - z (t)), 0.3 < z (t) < 1 \end{array}

(7)

where

z (t)

is the value after the

t

-th iteration;

r

is a random parameter in the range (0, 4).

2.1.2. Hybrid Optimization Mechanism Based on Refraction Opposition-Based Learning and Cauchy Mutation to Broaden the Search Region

Refraction Opposition-Based Learning

By expanding the search space of feasible optimal solutions, the refraction opposition-based learning strategy significantly boosts the CPO algorithm’s optimization capability during its early iterations. As the iterative process continues, this mechanism continues to broaden the search scope, thereby effectively mitigating the risk of the algorithm converging on a local optimum. The underlying principle of this strategy is depicted in Figure 1.

As shown in Figure 1, the known solution lies on the search interval

[a, b]

on the

x

-axis, with the origin

o

as the midpoint of

[a, b]

. The normal is the

y

-axis, and the angles of incidence and refraction are

α

and

β

, respectively. The lengths of the incident and refracted rays are

l

and

l^{*}

. Therefore, the equation for the refractive index is:

n = \frac{\sin α}{\cos α} = \frac{\frac{a + b}{2} - x}{x^{*} - \frac{a + b}{2}} \cdot \frac{l^{*}}{l}

(8)

Let

δ = l^{*} / l

, and substitute it into Equation (8). Extending this to a multidimensional space yields the refraction opposition solution as:

x_{i, j}^{*} = \frac{a_{j} + b_{j}}{2} + \frac{a_{j} + b_{j}}{2 δ} - \frac{x_{i, j}}{δ}

(9)

where

x_{i, j}

represents the position of the

i

-th particle in the

j

-th dimension in the original population;

x_{i, j}^{*}

denotes the refraction opposition solution corresponding to

x_{i, j}

;

a_{j}

and

b_{j}

represent the upper and lower bounds of the search space in the

j

-th dimension, respectively.

2.: Cauchy Mutation

The Cauchy distribution is a continuous probability distribution characterized by an undefined mathematical expectation and a probability density function that decays slowly from its peak, approaching the horizontal axis asymptotically. This study leverages the Cauchy mutation mechanism—derived from this distribution—to perturb feasible solutions throughout CPO’s iterative process, thereby enhancing search efficiency. Integrating this mechanism into the position update strategy strengthens the algorithm’s global exploration capability and improves its ability to escape local optima. The standard Cauchy distribution function is defined as follows:

f (x) = \frac{1}{π (1 + x^{2})}

(10)

After applying the Cauchy mutation mechanism to perturb the optimal feasible solution during the iterative execution of the CPO algorithm, the optimal crested porcupine individual position can be expressed as:

x_{n e w b e s t} = x_{b e s t} + x_{b e s t} \times C a u c h y (0, 1)

(11)

3.: Hybrid Mechanism

The proposed hybrid mechanism dynamically updates individual positions in the CPO population by alternately executing refraction opposition-based learning and Cauchy mutation with equal probability. Within this framework, refraction opposition-based learning expands the search space, enabling exploration of a broader range of potential optimal solutions. Concurrently, Cauchy mutation introduces stochastic perturbations that reduce the risk of premature convergence to local optima and accelerate convergence toward the global optimum. The mathematical representation of this hybrid mechanism is provided below.

x_{n e w b e s t, 1} = \{\begin{cases} \frac{a + b}{2} + \frac{a + b}{2 δ} - \frac{x_{i}}{δ}, λ < 0.5 \\ x_{b e s t} + x_{b e s t} \times C a u c h y (0, 1), λ \geq 0.5 \end{cases}

(12)

where

λ

is a random number in the range [0, 1].

2.1.3. Adaptive Variable Spiral Search and Inertia Weight

An adaptive variable spiral search strategy [38] is integrated into the position update mechanism of the CPO algorithm to strengthen its global exploration capability. By emulating variable spiral motion trajectories, this approach broadens the algorithm’s search scope and enhances its capacity to escape local optima. The mathematical formulation is provided in Equations (13) and (14).

\vec{X} (t + 1) = \vec{D^{'}} \cdot e^{b l} \cdot \cos (2 π l) + \vec{X^{*}} (t)

(13)

\vec{D^{'}} = |2 r \vec{X^{*}} (t) - \vec{X} (t)|

(14)

where

\vec{X} (t + 1)

represents the position of the

t + 1

-th individual;

\vec{X^{*}}

is the position of the global optimal individual;

b

is a constant defining the shape of the spiral position;

l

is a random number in the range [−1, 1];

r

is a random variable in the range [0, 1].

To refine the position updates of individual crested porcupines, this paper incorporates an adaptive variable spiral search strategy alongside an inertia weight. This combination strengthens the guiding influence of the optimal individual throughout the iterative process, thereby accelerating convergence toward the global optimum. The mathematical expression for the inertia weight is provided in Equation (15).

ω = 0.2 \cos (0.5 π (1 - \frac{t}{T}))

(15)

The conventional fixed spiral search strategy often entraps the algorithm in local optima. To overcome this drawback, this paper modifies the spiral constant into a parameter that dynamically evolves with the iteration count. This transformation allows the spiral shape to adjust adaptively, thereby broadening the search scope and strengthening the algorithm’s global optimization performance. The expression for this variable spiral parameter is provided in Equation (16).

η^{*} = e^{5 \cos (π (1 - \frac{t}{T}))}

(16)

The negative correlation between spiral shape and iteration number, shown in Equation (16), is fundamental to the ICPO algorithm’s design. In the initial iterations, this facilitates a comprehensive global search. In later stages, as the search narrows around promising solutions, the adaptive variable spiral search strategy and inertia weight—used to refine individual position updates—ensure an effective balance between global and local search. This leads to superior optimization accuracy and a faster convergence rate. The updated position update methods for different stages of the ICPO algorithm are presented as follows:

\overset{⇀}{X_{i}^{t + 1}} = ω \overset{⇀}{X_{i}^{t}} + η^{*} \cdot D \cdot e^{b l} \cos (2 π p) + τ_{1} \times |2 \times τ_{2} \times \overset{⇀}{x_{c p}^{t}} - \overset{⇀}{y_{i}^{t}}|

(17)

\overset{⇀}{x_{i}^{t + 1}} = (1 - \overset{⇀}{U_{1}}) \times ω \overset{⇀}{x_{i}^{t}} + (1 - \overset{⇀}{U_{1}}) \times η^{*} \cdot D \cdot e^{b l} \cos (2 π p) + \overset{⇀}{U_{1}} \times (\overset{⇀}{y} + τ_{3} (\overset{⇀}{x_{r 1}^{t}} - \overset{⇀}{x_{r 2}^{t}}))

(18)

\overset{⇀}{x_{i}^{t + 1}} = (1 - \overset{⇀}{U_{1}}) \times ω \overset{⇀}{x_{i}^{t}} + (1 - \overset{⇀}{U_{1}}) \times η^{*} \cdot D \cdot e^{b l} \cos (2 π p) + \overset{⇀}{U_{1}} \times (\overset{⇀}{x_{r 1}^{t}} + S_{i}^{t} \times (\overset{⇀}{x_{r 2}^{t}} - \overset{⇀}{x_{r 3}^{t}}) - τ_{3} \times \overset{⇀}{δ} \times γ_{t} \times S_{i}^{t})

(19)

\overset{⇀}{x_{i}^{t + 1}} = ω \overset{⇀}{x_{c p}^{t}} + η^{*} \cdot D \cdot e^{b l} \cos (2 π p) + (α (1 - τ_{4}) + τ_{4}) \times (δ \times \overset{⇀}{x_{c p}^{t}} - \overset{⇀}{x_{i}^{t}}) - τ_{5} \times δ \times γ_{t} \times \overset{⇀}{F_{i}^{t}}

(20)

2.2. Validation of the Effectiveness of the Hybrid Strategy Improved Crested Porcupine Optimizer (ICPO)

2.2.1. Comparative Analysis of Numerical Simulation Experiment Results

To evaluate the optimization performance of the CPO algorithm enhanced by the hybrid strategy, this paper conducts numerical simulation experiments using 12 benchmark test functions, which include unimodal, multimodal, and composite modalities. The details of these 12 benchmark test functions are presented in Table 1.

To maintain consistency and rigor in the comparative evaluation, all algorithms—including CPO, salp swarm algorithm (SSA), whale optimization algorithm (WOA), and particle swarm optimization (PSO)—were configured with identical parameters: a population size of 50 and a maximum iteration count of 500. Each algorithm was executed 30 times independently, and the resulting mean values and standard deviations were recorded. The statistical outcomes for each benchmark function are summarized in Table 2.

ICPO’s optimization performance on multimodal functions f₅–f₈ outperforms that of CPO, SSA, WOA, and PSO, primarily due to the hybrid mechanism of refraction opposition-based learning and Cauchy mutation. This mechanism enhances the algorithm’s ability to navigate complex search spaces and avoid local optima, thereby increasing the likelihood of convergence to the global optimum. For composite multimodal functions f₉–f₁₂, the adaptive variable spiral search and inertia weight—applied to improve position updates during the perception and physiological defense stages—ensure a dynamic balance between global and local search. This balance significantly improves both the precision of the solution and the rate of convergence.

To explore the statistical characteristics of the numerical simulation experimental results of the ICPO algorithm, this paper, based on 30 optimization results of benchmark functions solved by each algorithm, constructed the boxplot statistical chart shown in Figure 2. The box plots in Figure 2 illustrate the distribution of optimization outcomes: the central mark indicates the median, the box edges mark the 25th and 75th percentiles, and “+” symbols identify outliers. Examining these plots shows that ICPO’s results are characterized by a tighter interquartile range and minimal outliers. This pattern confirms the algorithm’s enhanced stability and robustness compared to others.

To rigorously assess the robustness and fairness of ICPO’s performance, the Wilcoxon rank-sum test was employed at a 5% significance level. Pairwise comparisons were performed between ICPO and each benchmark algorithm (CPO, SSA, WOA, and PSO), yielding p-values denoted as CP1, CP2, CP3, and CP4, respectively. The complete test results are presented in Table 3. As shown in the table, all p-values for the 12 benchmark functions are below 2.50 × 10⁻⁵, well within the 5% significance threshold. These findings provide strong statistical evidence that the superior optimization accuracy and convergence speed achieved by ICPO are not due to random chance.

2.2.2. Comparative Analysis of Algorithm Convergence

The convergence curves in Figure 3 evaluate algorithms’ convergence accuracy, speed, and local optima avoidance. Key observations are as follows:

For f₁ and f₂, ICPO achieves superior accuracy after ~150 iterations, attributed to logistic-tent mapping enhancing population quality and global search. For f₃, f₅, f₇, and f₈, multiple inflection points in ICPO’s curves demonstrate that adaptive variable spiral search and inertia weight strategies effectively balance exploration and exploitation, preventing premature convergence. For f₆, f₉, and f₁₀, ICPO converges to theoretical optima within 50 iterations, benefiting from logistic-tent mapping’s improved population diversity for rapid early convergence. For f₄, f₁₁, and f₁₂, ICPO matches comparison algorithms by iteration 200, then leverages the refraction opposition-based learning and Cauchy mutation hybrid mechanism—combined with inertia weight—to continuously escape local optima and maintain global search.

2.2.3. Comparative Analysis of Algorithm Runtime Performance

To provide an intuitive comparison of algorithmic runtime, this paper examines the performance of ICPO and CPO across 12 benchmark functions classified by dimension. Both algorithms were subjected to 30 independent runs on the same platform, with the average computational time from these experiments summarized in Figure 4. The average runtimes required by the ICPO algorithm for solving unimodal and composite benchmark test functions are 0.891 s and 1.142 s, respectively, representing improvements of 3.86% and 2.31% compared to the CPO algorithm. The overall mean runtime of the ICPO algorithm across all 12 benchmark test functions is 1.049 s, which is 1.62% faster than that of the CPO algorithm.

The synergistic integration of logistic-tent composite mapping, a hybrid mechanism combining refraction opposition-based learning with Cauchy mutation, adaptive variable spiral search, and inertia weighting, significantly accelerates the convergence rate of the original CPO algorithm.

Quantitative assessment of ICPO’s optimization accuracy is conducted through MAE ranking analysis on 12 benchmark functions. The ranking results validate the algorithm’s precision performance. ICPO demonstrates superior optimization accuracy, as evidenced by its MAE performance on 12 benchmark functions (Table 4). Among all algorithms evaluated, ICPO achieves the smallest MAE value, securing the top ranking. This outcome confirms ICPO’s enhanced ability to closely approximate the theoretical optima of benchmark functions.

Ablation experiments were conducted to isolate the impact of each strategy. CPO1–CPO4 represent CPO enhanced with logistic-tent mapping, refraction-Cauchy hybrid, adaptive spiral search, and inertia weight, respectively. Wilcoxon rank-sum tests between ICPO and each variant (Table 5) show p-values below 1.47 × 10⁻¹³ for all 12 benchmark functions, confirming statistically significant performance differences (α = 0.05). These results establish that the multi-strategy integration in ICPO yields superior optimization performance compared to any single-strategy enhancement.

2.3. Construction of the ICPO-XGBoost Classification and Recognition Model

2.3.1. Extreme Gradient Boosting Algorithm

XGBoost is an advanced ensemble learning algorithm that iteratively combines multiple low-precision decision trees through weighted accumulation of their leaf node weights, thereby constructing high-accuracy predictive models [39]. The cumulative leaf node weights across all decision trees in the XGBoost model are given by:

{\bar{y}}_{i} (k) = \sum_{k}^{K} f_{k} (x_{i})

(21)

where

{\bar{y}}_{i} (k)

represents the sum of leaf weights in the ensemble model across all decision trees after

k

iterations;

f_{k} (x_{i})

denotes the leaf node weight of data sample

x_{i}

on the

k

-th decision tree.

The XGBoost objective function comprises a loss term for measuring prediction error and a regularization term for controlling model complexity, ensuring both accuracy and generalizability. Its mathematical formulation is:

O_{b j} = \sum_{i = 1}^{m} L (y_{i}, {\bar{y}}_{i}) + \sum_{k = 1}^{k} Ω (f_{k})

(22)

where

L

is the loss function;

Ω

denotes the regularization function;

m

indicates the sample count on the

k

-th decision tree.

2.3.2. Steps and Process of the ICPO-XGBoost Recognition Model

In this study, the K-means algorithm is employed to cluster and screen data for the classification and recognition of risky driving behaviors, and missing data are handled using the nearest neighbor imputation method. Based on the data smoothed and denoised by the Savitzky–Golay filter, the ICPO algorithm is utilized to optimize the XGBoost parameters. The trained model is then fitted to obtain the classification and recognition results for risky driving behaviors. The steps of the ICPO-XGBoost-based classification and recognition model for risky driving behaviors are as follows, with the workflow illustrated in Figure 5:

1.: Set the ICPO algorithm parameters based on preprocessed risky driving behavior trajectory features: population size = 30, number of iterations = 40.
2.: Initialize the algorithm population parameters using the logistic-tent composite mapping.
3.: Calculate the fitness value of each individual in the current population and determine the position of the optimal individual.
4.: Use Equations (15) and (16) to update the value of parameters $ω$ and $η^{*}$
5.: Apply the refraction opposition-based learning and Cauchy mutation hybrid mechanism for opposition-based learning evaluation and selection. Update the optimal crested porcupine population individual position using Equation (12).
6.: Refresh population individual positions via adaptive variable spiral search and inertia weight, as expressed in Equations (17)–(20).
7.: Check for maximum iteration attainment. If satisfied, terminate iterations and output the optimal individual position and fitness value. Otherwise, return to step (2).
8.: Extract the ICPO-derived global optimal feasible solution and use it to establish XGBoost model parameters.
9.: Use the ICPO-XGBoost model for the classification and recognition of risky driving behaviors and validate the classification and recognition performance.

3. Case Study

3.1. Preprocessing of Risky Driving Behavior Data

The experimental data in this paper are sourced from the vehicle risk driving behavior trajectory dataset of Shanghai’s North-South Elevated Road [40]. This dataset encompasses six categories of driving behaviors: normal driving, frequent lane changing, forced lane changing, short-distance tailgating, slow driving, and sudden acceleration/deceleration. To reduce computational costs during model validation, key feature indicators for driving behavior classification—namely longitudinal velocity, lateral velocity, and lateral offset position—were selected from the dataset. The K-means clustering algorithm was employed to partition the risk-driving behaviors, resulting in a total of 3000 data samples across different behavior types. The K-means algorithm was configured with a maximum of 300 iterations and 6 clusters. To enhance the quality of the data used for risk driving behavior classification and to mitigate the impact of random fluctuations on the classification results, the nearest neighbor imputation method was applied to fill in missing values within the data samples. Additionally, data smoothing and noise reduction were performed using a Savitzky–Golay filter. The results of the data preprocessing for lateral velocity, longitudinal velocity, and lateral offset position are illustrated in Figure 6, Figure 7 and Figure 8, respectively. Statistical representations of the vehicle trajectory data for each risk driving behavior category are shown in Figure 9.

3.2. Analysis of Risky Driving Behavior Classification and Recognition Results

The experimental design involved dividing the preprocessed risky driving behavior dataset into training (60%), validation (20%), and test (20%) sets. The training set enabled derivation of the regression function, the validation set facilitated parameter adjustment, and the test set confirmed classification performance. Parameter optimization was performed using the ICPO algorithm with population size 30, maximum iterations 40, variable dimension dim = 3, and bounds set to LB = [0.001, 10, 0.0001] and UB = [0.01, 50, 0.1]. The optimization process leveraged the risky driving behavior data from Section 3.1 to determine optimal parameters for the fitted regression classification model. The resulting fitness variation across iterations is presented in Figure 10.

Figure 11 presents the following information: within each green and red square, the upper value denotes the number of samples for a specific type of risky driving behavior, while the lower value represents its corresponding percentage in the total test sample set. For the white squares in the last row, the upper value indicates the percentage of correctly identified samples relative to the total test samples of that particular behavior type. For the white squares in the last column, the upper value indicates the percentage of correctly identified samples relative to the total samples classified as that behavior type. The gray square in the bottom-right corner reports the overall correct and incorrect classification rates for all risky driving behavior test samples.

Taking the fourth column in Figure 11 as an example, out of the 66 test samples for sudden acceleration/deceleration, 56 were correctly identified as sudden acceleration/deceleration, 4 were incorrectly identified as normal driving, 1 was incorrectly identified as slow driving, and 5 were incorrectly identified as close car-following. The value of 9.3% in the fourth row and fourth column represents the proportion of correctly identified sudden acceleration/deceleration test samples (56) relative to the total number of test samples (600).

3.3. Comparative Analysis of Classification and Recognition Errors for Risky Driving Behaviors

Using the risk driving behavior data from Section 3.1 of the paper as input for each model, the ICPO-XGBoost model achieved a classification accuracy of 96.2% for risk driving behavior recognition. This represents an improvement of 12.7% to 24.8% compared to models such as PSO-LSSVM [22], XGBoost [19], and Random Forest [14]. The average runtime of the ICPO-XGBoost model was 365.3 s, which is 53.3 s shorter than that of PSO-LSSVM. For the classification of risk driving behaviors, the ICPO-XGBoost model attained precision, recall, and F1 scores of 95.4%, 95.8%, and 95.6%, respectively. These metrics show increases of 14.8% to 31.8% in precision, 14.9% to 31.0% in recall, and 15.0% to 32.4% in the F1 score when compared to PSO-LSSVM, XGBoost, and Random Forest. A comparison between the actual and predicted classification results for risk driving behaviors by the ICPO-XGBoost, PSO-LSSVM, XGBoost, and Random Forest models is presented in Figure 12.

The overall performance metrics of the recognition results for both ICPO-XGBoost and XGBoost models are shown in Figure 13.

A comparison of the recognition performance metrics for different risk driving behavior features is presented in Table 6. As shown in Table 6, both frequent lane changing and forced lane changing achieved precision, recall, and F1 scores all above 98.1%. In contrast, sudden acceleration/deceleration yielded the lowest scores, with precision, recall, and F1 values of 84.8%, 87.5%, and 86.2%, respectively. The relatively lower recognition accuracy of the ICPO-XGBoost model for sudden acceleration/deceleration behavior can be attributed primarily to its typically short duration. Furthermore, the radar-derived trajectory data for such behaviors may not adequately capture variations in individual driver attributes and habits, which could also contribute to the model’s performance limitation in this category [41]. Normal driving and sudden acceleration/deceleration share highly similar vehicle trajectory characteristics. Additionally, the radar-based analysis used to detect sudden acceleration/deceleration behaviors has inherent limitations, making it prone to misclassification between the two. Future research should focus on extracting more detailed data features through video detection and analysis to improve the recognition accuracy for these behaviors.

All evaluated models (ICPO-XGBoost, PSO-LSSVM, XGBoost, and Random Forest) demonstrated their highest overall recognition accuracy for the short-distance tailgating behavior. For this category, precision, recall, and F1 scores were all above 89.3%. Specifically, the ICPO-XGBoost model achieved performance ranges of 84.8%~99.2% for precision, 87.5%~100.0% for recall, and 86.2%~99.27% for the F1 score across the six driving behavior types (normal driving, slow driving, short-distance tailgating, sudden acceleration/deceleration, frequent lane changing, and forced lane changing). Compared to the baseline XGBoost model, these results represent substantial relative improvements: precision increased by 1.5% to 75.8%, recall by 5.8% to 68.1%, and the F1 score by 3.3% to 72.6%. Notably, for the challenging sudden acceleration/deceleration behavior, the ICPO-XGBoost model attained precision, recall, and F1 scores of 84.8%, 87.5%, and 86.2%, respectively. While these are its lowest scores, they still surpass the performance of all other benchmark models (PSO-LSSVM, XGBoost, and Random Forest) on this specific behavior.

3.4. Sensitivity Analysis of Parameters in the Risky Driving Behavior Classification and Recognition Model

In the ICPO algorithm, the iteration count quantifies the position update steps taken by population individuals as they move toward the identified optimal solution. This process leverages the intrinsic features of the risky driving behavior dataset to progressively converge upon the global optimum. Building upon the parameter settings for the risky driving behavior classification and recognition results model, the iteration count was set to 10, 20, 30, …, 100, respectively. The corresponding average accuracy results for classification are shown in Figure 14a. It can be observed that, with other parameters held constant, after 40 iterations, the accuracy of the risky driving behavior classification results does not exhibit a linear positive correlation with the iteration count. However, the runtime required for the classification results increases significantly. Increasing the iteration count of the ICPO algorithm does not substantially improve the optimization accuracy or convergence rate of the fitted regression parameters for the classification model, and it may lead to redundant and ineffective iterative computations, thereby increasing model runtime.

Population size dictates the ICPO algorithm’s search space; larger spaces typically increase the probability of finding the global optimum. Testing population sizes from 10 to 100 in increments of 10 revealed that beyond 30, accuracy no longer correlates linearly with size, as shown in Figure 14b for risky driving behavior classification. However, the runtime for the classification results increases significantly. Increasing the population size of the ICPO algorithm does not markedly enhance the optimization accuracy or convergence rate of the fitted regression parameters for the classification model. Moreover, unnecessarily enlarging the search space can lead to feature information redundancy and slower model processing [42].

3.5. Sensitivity Analysis of Factors Influencing Risky Driving Behavior Classification and Identification

To quantitatively analyze the sensitivity of factors influencing risky driving behavior identification on the classification results of the ICPO-XGBoost model, this study employs SHAP values to assess the sensitivity effects and contribution levels of these influencing factors. The SHAP values of the influencing factors in the ICPO-XGBoost model are presented in Figure 15. Figure 15a illustrates the scatter plot of SHAP values, depicting the overall relationship between the classification and identification results of risky driving behaviors and the sensitivity effects of the influencing factors. The value of each scatter point in Figure 15a reflects the proportion of the sensitivity effect of the influencing factor. The degree of dispersion of the influencing factors is proportional to their sensitivity effect. As shown in Figure 15a, the lateral offset position exhibits the greatest sensitivity effect on the classification and identification results of risky driving behaviors. Figure 15b presents a ranked visualization of the mean absolute SHAP values of factors influencing risky driving behavior classification and identification. Lateral offset position is the most significant factor affecting the classification and identification results of risky driving behaviors, followed by lateral speed and longitudinal speed, in that order.

4. Conclusions

Based on vehicle trajectory data capturing risky driving behaviors on the Shanghai North-South Passageway, this study constructed a risky driving behavior classification and recognition model using ICPO-XGBoost. Twelve benchmark test functions and risky driving behavior data were selected for numerical simulation experiments and validation of classification performance. The main findings are summarized as follows:

This paper enhances the CPO algorithm through four key hybrid strategies: Logistic-tent composite mapping for population initialization, a hybrid mechanism combining refraction opposition-based learning with Cauchy mutation, adaptive variable spiral search, and inertia weight for position updates. The resulting ICPO algorithm achieves superior population diversity and quality, enabling an effective balance between global exploration and local exploitation while reducing susceptibility to local optima. Consequently, both optimization accuracy and convergence speed are improved. Statistical validation using the Wilcoxon rank-sum test across 12 benchmark functions confirms ICPO’s superiority over CPO, SSA, WOA, and PSO in mean values, standard deviations, and convergence behavior. Specifically, results on unimodal and multimodal functions validate the effectiveness of the logistic-tent mapping and the refraction-Cauchy hybrid mechanism, while performance on composite multimodal functions demonstrates the adaptive spiral search and inertia weight strategies’ capacity to balance exploration and exploitation.
The ICPO-XGBoost model achieved precision, recall, and F1-score ranges of 84.8–99.2%, 87.5–100.0%, and 86.2–99.27%, respectively, for classifying risky driving behaviors such as normal driving, slow driving, close car-following, sudden acceleration/deceleration, frequent lane changes, and aggressive lane changes. Compared to the XGBoost model, these metrics improved by 1.5–75.8%, 1.8–52.2%, and 1.7–71.9%, respectively. The ICPO-XGBoost model effectively identified sudden acceleration/deceleration behaviors, enhancing the applicability of risky driving behavior classification and recognition results. When the iteration count and population size of the ICPO-XGBoost model were set to 40 and 30, respectively, the model achieved optimal overall recognition performance. Excessively large settings for these two parameters did not significantly improve classification accuracy but substantially increased the runtime of the model.

This study is based on vehicle trajectory data of risky driving behaviors, without considering the mechanisms of influence between factors such as road type, geometric design, traffic operating conditions, driver characteristics, and traffic accidents on risky driving behaviors. Future research directions include the following:

Quantitatively analyze the effects of factors such as road type, geometric design, traffic operating conditions, driver characteristics, and traffic accidents on risky driving behaviors. Develop a classification indicator system for risky driving behaviors that incorporates factors such as vehicle acceleration and lane occupancy time, and propose a more refined method for classifying risky driving behaviors.
Based on the quantitative analysis results of influencing factors for different types of risky driving behaviors, improve the optimization mechanism and structure of the crested porcupine optimizer. Design a deep learning approach that can adjust hyperparameters and the weights of influencing factors in real time according to different traffic operating environments and risky driving behavior characteristics, thereby enhancing the accuracy and robustness of the classification and identification model for risky driving behaviors.

Author Contributions

Conceptualization, T.S. and X.L.; Data curation, S.L., Y.L. and X.L.; Funding acquisition, Q.H.; Investigation, T.S., X.Y. and Q.H.; Methodology, Q.H. and X.L.; Software, J.S., F.T. and X.L.; Supervision, Q.H., and X.L.; Validation, J.S., F.T. and S.L.; Visualization, F.T., X.Y., Q.H. and Y.L.; Writing—original draft, Q.H. and X.L.; Writing—review and editing, J.S., T.S., X.Y. and S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Lanzhou 2025 Annual Philosophy and Social Sciences Planning Project (grant number 25-A21), the Young Scholars Science Foundation of Lanzhou Jiaotong University (grant number 2025021), and the National Natural Science Foundation of China (grant number 72571121).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be made available on request from the author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CPO	Crested Porcupine Optimizer
ICPO	Improved Crested Porcupine Optimizer
SSA	Salp Swarm Algorithm
WOA	Whale Optimization Algorithm
PSO	Particle Swarm Optimization

References

Pérez-Sala, L.; Curado, M.; Tortosa, L.; Vicent, J.F. Deep learning model of convolutional neural networks powered by a genetic algorithm for prevention of traffic accidents severity. Chaos Solit. Fract. 2023, 169, 113245. [Google Scholar] [CrossRef]
Gao, X.; Ci, Y.; Yuen, K.F.; Wu, L.; Li, R. Hybrid traffic flow prediction model for emergency scenarios with scarce historical data. Eng. Appl. Artif. Intell. 2025, 145, 110219. [Google Scholar] [CrossRef]
Ma, X.; Huo, Z.; Lu, J.; Wong, Y.D. Deep forest with SHapley additive explanations on detailed risky driving behavior data for freeway crash risk prediction. Eng. Appl. Artif. Intell. 2025, 141, 109787. [Google Scholar] [CrossRef]
Zhang, J.; Pei, Y.; Sun, J.; Lai, Y.; Wang, S.; Easa, S.M. Achieving bus electrification: Strategy for bus fleet replacement in cold region. Transp. Res. D Transp. Environ. 2025, 145, 104825. [Google Scholar] [CrossRef]
Qi, W.; Ma, S.; Fu, C. An improved car-following model considering the influence of multiple preceding vehicles in the same and two adjacent lanes. Phys. A 2023, 632, 129356. [Google Scholar] [CrossRef]
Chen, Y.; Wang, K.; Lu, J.J. Feature selection for driving style and skill clustering using naturalistic driving data and driving behavior questionnaire. Accid. Anal. Prev. 2023, 185, 107022. [Google Scholar] [CrossRef] [PubMed]
Fei, M.; Zhou, W.; Zhao, H.; Pan, C.; Shi, D.; An, X. Enhancing Driving Safety Evaluation Through Correlation Analysis of Driver Behavior. Sustainability 2025, 17, 4067. [Google Scholar] [CrossRef]
Chen, S.; Xue, Q.; Zhao, X.; Xing, Y.; Lu, J.J. Risky driving behavior recognition based on vehicle trajectory. Int. J. Environ. Res. Public Health 2021, 18, 12373. [Google Scholar] [CrossRef]
Zhu, S.; Li, C.; Fang, K.; Peng, Y.; Jiang, Y.; Zou, Y. An Optimized Algorithm for Dangerous Driving Behavior Identification Based on Unbalanced Data. Electronics 2022, 11, 1557. [Google Scholar] [CrossRef]
Wen, J.; Xu, Y.; Dai, M.; Lyu, N. Mathematical Modeling and Parameter Estimation of Lane-Changing Vehicle Behavior Decisions. Mathematics 2025, 13, 1014. [Google Scholar] [CrossRef]
Ma, L.; Qu, S.; Song, L.; Zhang, J.; Ren, J. Human-like car-following modeling based on online driving style recognition. Electron. Res. Arch. 2023, 31, 3264–3290. [Google Scholar] [CrossRef]
Li, J.; Wang, J.; Long, X.; Zhang, H.; Xie, M. Prediction of following vehicles’ longitudinal game behaviour in lane-changing game scenarios considering driving style. Transp. B Transp. Dyn. 2025, 13, 2553218. [Google Scholar] [CrossRef]
Bagheri, E.; Barshooi, A.H. Nighttime driver behavior prediction using taillight signal recognition via CNN-SVM classifier. Vis. Comput. 2025, 41, 6219–6235. [Google Scholar] [CrossRef]
Jahangiri, A.; Berardi, V.J.; Machiani, S.G. Application of real field connected vehicle data for aggressive driving identification on horizontal curves. IEEE Trans. Intell. Transp. Syst. 2017, 19, 2316–2324. [Google Scholar] [CrossRef]
Xing, Z.; Huang, M.; Peng, D. Overview of machine learning-based traffic flow prediction. Digit. Transp. Saf. 2023, 2, 164–175. [Google Scholar] [CrossRef]
Xiao, D.; Zhang, B.; Chen, Z.; Xu, X.; Du, B. Connecting tradition with modernity: Safety literature review. Digit. Transp. Saf. 2023, 2, 1–11. [Google Scholar] [CrossRef]
Shangguan, Q.; Fu, T.; Wang, J.; Fang, S.; Fu, L. A proactive lane-changing risk prediction framework considering driving intention recognition and different lane-changing patterns. Accid. Anal. Prev. 2022, 164, 106500. [Google Scholar] [CrossRef]
Chen, K.; Diao, Y.; Wang, Y.; Zhang, X.; Zhou, Y.; Gu, M.; Zhang, B.; Hu, B.; Li, M.; Li, W.; et al. MCT-CNN-LSTM: A Driver Behavior Wireless Perception Method Based on an Improved Multi-Scale Domain-Adversarial Neural Network. Sensors 2025, 25, 2268. [Google Scholar] [CrossRef] [PubMed]
Wu, J.; Yue, T.; Zhang, M.; Lv, B.; Tian, Y.; Wang, J. Driving Behavior Characterization in Expressway Merging Areas Based on Roadside LiDAR. IEEE Sens. J. 2025, 25, 18481–18491. [Google Scholar] [CrossRef]
Liu, X.; Huang, H.; Bian, J.; Zhou, R.; Wei, Z.; Zhou, H. Generating intersection pre-crash trajectories for autonomous driving safety testing using Transformer Time-Series Generative Adversarial Networks. Eng. Appl. Artif. Intell. 2025, 160, 111995. [Google Scholar] [CrossRef]
Ma, C.; Liu, T. Survey of short-term traffic flow prediction based on LSTM. Int. J. Mod. Phys. C 2025, 36, 2450177. [Google Scholar] [CrossRef]
Zhao, D.; Zhao, S. Sparse least squares support vector machine based methods for vehicle driving behavior recognition. Proc. Inst. Mech. Eng. D J. Automob. Eng. 2023, 238, 1392–1404. [Google Scholar] [CrossRef]
Aljohani, A.A. Real-time driver distraction recognition: A hybrid genetic deep network based approach. Alex. Eng. J. 2023, 66, 377–389. [Google Scholar] [CrossRef]
Yang, X.; Xiang, K.; Yuan, S.; Huang, J. Vehicle Driving Behavior Recognition and Optimization Strategies Based on Cloud Computing and SSA-BP Algorithm. Stud. Inform. Control. 2024, 33, 17–28. [Google Scholar] [CrossRef]
Abdel-Basset, M.; Mohamed, R.; Abouhawwash, M. Crested Porcupine Optimizer: A new nature-inspired metaheuristic. Knowl.-Based Syst. 2024, 284, 111257. [Google Scholar] [CrossRef]
Li, J.; Bai, J.; Wang, J. High-Precision Trajectory-Tracking Control of Quadrotor UAVs Based on an Improved Crested Porcupine Optimiser Algorithm and Preset Performance Self-Disturbance Control. Drones 2025, 9, 420. [Google Scholar] [CrossRef]
Chen, F.; Liu, X.Q.; Yang, J.J.; Liu, X.K.; Ma, J.H.; Chen, J.; Xiao, H.Y. Traffic accident severity prediction based on an enhanced MSCPO-XGBoost hybrid model. Sci. Rep. 2025, 15, 25729. [Google Scholar] [CrossRef]
Duan, N.; Zeng, Y.; Dao, F.; Xu, S.; Luo, X. Fault Diagnosis of Hydro-Turbine Based on CEEMDAN-MPE Preprocessing Combined with CPO-BILSTM Modelling. Energies 2025, 18, 1342. [Google Scholar] [CrossRef]
Zhang, H.; Guo, C.; Zhai, D.; Wang, Y.; Liu, H.; Chen, F.; Xu, D. Application of Improved Crown Porcupine Optimizer in UAV Path Planning Based on Dynamic Weighted JAYA-CPO Attack Strategy. Prot. Control Mod. Power Syst. 2025, 10, 101–127. [Google Scholar] [CrossRef]
Wang, H.; Zhang, L.; Liu, B. Research and design of a hybrid DV-hop algorithm based on the chaotic crested porcupine optimizer for wireless sensor localization in smart farms. Agriculture 2024, 14, 1226. [Google Scholar] [CrossRef]
Liu, S.; Jin, Z.; Lin, H.; Lu, H. An improve crested porcupine algorithm for UAV delivery path planning in challenging environments. Sci. Rep. 2024, 14, 20445. [Google Scholar] [CrossRef]
Adalja, D.; Patel, P.; Mashru, N.; Jangir, P.; Jangid, R.; Gulothungan, G.; Khishe, M. A new multi objective crested porcupines optimization algorithm for solving optimization problems. Sci. Rep. 2025, 15, 14380. [Google Scholar] [CrossRef] [PubMed]
Gao, Q.; Jin, A.; Wang, C.; Zhang, L.; Zhang, Q.; Xiong, X.; Yang, F. AEGWCPO: A cooperative path planning method for UAV swarm penetration in complex mountainous environments. Meas. Sci. Technol. 2025, 36, 116301. [Google Scholar] [CrossRef]
Chen, S.; Chen, C.; Li, Y.; Meng, L.; Wei, L.; Guan, B. Optimization of Operating Parameters Scheme for Water Injection System Based on a Hybrid Particle Swarm–Crested Porcupine Algorithm. Sustainability 2025, 17, 8057. [Google Scholar] [CrossRef]
Liu, H.; Zhou, R.; Zhong, X.; Yao, Y.; Shan, W.; Yuan, J.; Xiao, J.; Ma, Y.; Zhang, K.; Wang, Z. Multi-strategy enhanced crested porcupine optimizer: CAPCPO. Mathematics 2024, 12, 3080. [Google Scholar] [CrossRef]
Sriprateep, K.; Pitakaso, R.; Khonjun, S.; Srichok, T.; Luesak, P.; Gonwirat, S.; Kaewta, C.; Kosacka-Olejnik, M.; Enkvetchakul, P. Multi-objective optimization of resilient, sustainable, and safe urban bus routes for tourism promotion using a hybrid reinforcement learning algorithm. Mathematics 2024, 12, 2283. [Google Scholar] [CrossRef]
Gao, Z.-M.; Zhao, J.; Zhang, Y.-J. Review of chaotic mapping enabled nature-inspired algorithms. Math. Biosci. Eng. 2022, 19, 8215–8258. [Google Scholar] [CrossRef]
Zhang, J.; He, Q.; Lu, X.; Xiao, S.; Wang, N. A FIG-IWOA-BiGRU Model for Bus Passenger Flow Fluctuation Trend and Spatial Prediction. Mathematics 2025, 13, 3204. [Google Scholar] [CrossRef]
Wang, T.; Han, Y.; Li, W.; Ye, X.; Yuan, Q. A takeover risk assessment approach based on an improved ANP-XGBoost algorithm for human-machine driven vehicles. IEEE Access 2024, 12, 48379–48387. [Google Scholar] [CrossRef]
Wang, J.; Fu, T.; Shangguan, Q. Wide-area vehicle trajectory data based on advanced tracking and trajectory splicing technologies: Potentials in transportation research. Accid. Anal. Prev. 2023, 186, 107044. [Google Scholar] [CrossRef]
Yuan, Q.; Yan, R.; Tan, A.Y.X.; Xu, Q.; Wang, J. Technology roadmap of risk identification and collision avoidance decision-making in autonomous vehicles for domestic animals. Int. J. Crashworthiness 2025, 30, 294–305. [Google Scholar] [CrossRef]
Tatarczak, A.; Gola, A. An integrated fuzzy multi-criteria approach for partner selection in horizontal cooperation. Arch. Transp. 2025, 74, 23–42. [Google Scholar] [CrossRef]

Figure 1. Principle of refraction: opposition-based learning.

Figure 2. Statistical results of benchmark test function simulations.

Figure 3. Convergence curves of benchmark functions.

Figure 4. Comparison of runtime between ICPO and CPO algorithms.

Figure 5. Framework of the ICPO-XGBoost risk driving behavior classification model.

Figure 6. Lateral speed smoothing.

Figure 7. Longitudinal speed smoothing.

Figure 8. Parallel misalignment smoothing.

Figure 9. Statistics of risk driving behavior recognition data.

Figure 10. Fitness values.

Figure 11. Confusion matrix.

Figure 12. Classification and recognition results of risky driving behavior: (a) ICPO-XGBoost; (b) PSO-LSSVM; (c) XGBoost; (d) Random Forest.

Figure 13. The overall performance index of the model.

Figure 14. Impact of model parameters on recognition results.

Figure 15. SHAP values of model influencing factors.

Table 1. Information on the 12 benchmark test functions.

Type	Function	Dimension	Search Range	Theoretical Optimum
Unimodal benchmark test functions	$f_{1} (x) = \sum_{i = 1}^{n} x_{i}^{2}$	30	$[- 100, 100]$	0
	$f_{2} (x) = \sum_{i = 1}^{n} \|x_{i}\| + \prod_{i = 1}^{n} \|x_{i}\|$	30	$[- 10, 10]$	0
	$f_{3} (x) = \sum_{i = 1}^{n - 1} [100 {(x_{i + 1} - x_{i}^{2})}^{2} + {(x_{i} - 1)}^{2}]$	30	$[- 30, 30]$	0
	$f_{4} (x) = \sum_{i = 1}^{n} {([x_{i} + 0.5])}^{2}$	30	$[- 100, 100]$	0
Multimodal benchmark test functions	$F_{5} (x) = \sum_{i = 1}^{n} - x_{i} \sin (\sqrt{\|x_{i}\|})$	30	$[- 500, 500]$	$- 418.98 \times D i m^{n}$
	$F_{6} (x) = \sum_{i = 1}^{n} [x_{i}^{2} - 10 \cos (2 π x_{i}) + 10]$	30	$[- 5.12, 5.12]$	0
	$F_{7} (x) = \frac{1}{4000} \sum_{i = 1}^{n} x_{i}^{2} - \prod_{i = 1}^{n} \cos (\frac{x_{i}}{\sqrt{i}}) + 1$	30	$[- 600, 600]$	0
	$\begin{array}{l} F_{8} (x) = 0.1 \{\sin^{2} (3 π x_{1}) + \sum_{i = 1}^{n} {(x_{i} - 1)}^{2} [1 + \sin^{2} (3 π x_{i} + 1)] \\ + {(x_{n} - 1)}^{2} [1 + \sin^{2} (2 π x_{n})]\} + \sum_{i = 1}^{n} u (x_{i}, 5, 100, 4) \end{array}$	30	$[- 50, 50]$	0
Composite modalities benchmark test functions	$F_{9} (x) = {\sum_{i = 1}^{11} [a_{i} - \frac{x_{1} (b_{i}^{2} + b_{1} x_{2})}{b_{i}^{2} + b_{1} x_{3} + x_{4}}]}^{2}$	4	$[- 5, 5]$	0.0003
	$\begin{array}{l} F_{10} (x) = [1 + {(x_{1} + x_{2} + 1)}^{2} (19 - 14 x_{1} + 3 x_{1}^{2} - 14 x_{2} + 6 x_{1} x_{2} + 3 x_{2}^{2})] \\ \times [30 + {(2 x_{1} - 3 x_{2})}^{2} \times (18 - 32 x_{1} + 12 x_{1}^{2} + 48 x_{2} - 36 x_{1} x_{2} + 27 x_{2}^{2})] \end{array}$	2	$[- 2, 2]$	3
	$F_{11} (x) = - \sum_{i = 1}^{7} {[(X - a_{i}) {(X - a_{i})}^{T} + c_{i}]}^{- 1}$	4	$[0, 10]$	−10.4028
	$F_{12} (x) = - \sum_{i = 1}^{10} {[(X - a_{i}) {(X - a_{i})}^{T} + c_{i}]}^{- 1}$	4	$[0, 10]$	−10.5363

Table 2. Simulation results of benchmark test functions.

Test Function	Metric	ICPO	CPO	SSA	WOA	PSO
f₁	Average	5.63 × 10⁻²³⁹	5.27 × 10⁻⁹⁷	4.11 × 10⁻³⁹	3.58 × 10⁻⁷⁵	3.59 × 10²
f₁	Standard	0	2.80 × 10⁻⁹⁶	2.24 × 10⁻³⁸	1.05 × 10⁻⁷⁴	2.18 × 10²
f₂	Average	4.85 × 10⁻¹²⁰	3.30 × 10⁻³⁸	2.10 × 10⁻²¹	1.84 × 10⁻⁵⁰	1.76 × 10¹
f₂	Standard	2.64 × 10⁻¹¹⁹	1.81 × 10⁻³⁷	1.09 × 10⁻²⁰	9.44 × 10⁻⁵⁰	1.01 × 10¹
f₃	Average	1.56 × 10⁻⁰⁴	2.70 × 10¹	2.58 × 10¹	2.80 × 10¹	1.62 × 10⁴
f₃	Standard	3.91 × 10⁻⁰⁴	5.78 × 10⁻¹	3.54 × 10⁻¹	4.53 × 10⁻¹	1.31 × 10⁴
f₄	Average	5.55 × 10⁻⁷	2.09 × 10⁻¹	2.50 × 10⁻⁵	3.82 × 10⁻¹	3.40 × 10²
f₄	Standard	8.48 × 10⁻⁷	7.18 × 10⁻²	1.90 × 10⁻⁵	1.64 × 10⁻¹	1.97 × 10²
f₅	Average	−1.16 × 10⁴	−9.98 × 10³	−9.71 × 10³	−5.10 × 10³	−5.10 × 10³
f₅	Standard	1.79 × 10³	2.65 × 10³	4.33 × 10²	4.77 × 10²	4.77 × 10²
f₆	Average	0	0	0	7.58 × 10⁻¹⁵	3.76
f₆	Standard	0	0	0	2.47 × 10⁻¹⁴	6.71
f₇	Average	6.23 × 10⁻⁸	9.75 × 10⁻⁷	8.74 × 10⁻³	1.86 × 10⁻²	5.35
f₇	Standard	1.06 × 10⁻⁷	1.07 × 10⁻⁶	2.38 × 10⁻³	9.01 × 10⁻³	2.47
f₈	Average	5.70 × 10⁻⁷	7.50 × 10⁻⁴	1.44 × 10⁻¹	4.70 × 10⁻¹	2.32 × 10¹
f₈	Standard	7.71 × 10⁻⁷	2.79 × 10⁻³	3.57 × 10⁻²	2.47 × 10⁻¹	2.92 × 10¹
f₉	Average	3.14 × 10⁻⁴	3.07 × 10⁻⁴	3.08 × 10⁻⁴	1.72 × 10⁻³	7.99 × 10⁻³
f₉	Standard	1.48 × 10⁻⁵	8.29 × 10⁻⁹	2.81 × 10⁻⁸	5.07 × 10⁻³	9.10 × 10⁻³
f₁₀	Average	3.00	3.00	3.00	3.00	3.90
f₁₀	Standard	6.59 × 10⁻⁵	1.69 × 10⁻¹⁵	4.11 × 10⁻⁴	3.60 × 10⁻⁹	4.93
f₁₁	Average	−1.04 × 10¹	−1.02 × 10¹	−7.47	−1.01 × 10¹	−9.37
f₁₁	Standard	2.47 × 10⁻⁶	9.70 × 10⁻¹	3.26	1.39	2.38
f₁₂	Average	−1.05 × 10¹	−1.05 × 10¹	−6.66	−1.01 × 10¹	−1.02 × 10¹
f₁₂	Standard	3.43 × 10⁻⁶	3.12 × 10⁻⁵	3.59	1.75	1.05

Table 3. Wilcoxon Rank-Sum test p-values.

Test Function	CP1	CP2	CP3	CP4
f₁	2.78 × 10⁻²²	1.26 × 10⁻⁸³	1.26 × 10⁻⁸³	2.46 × 10⁻⁸²
f₂	1.05 × 10⁻⁵²	2.51 × 10⁻⁸²	1.26 × 10⁻⁸³	4.88 × 10⁻⁸¹
f₃	1.26 × 10⁻⁸³	1.26 × 10⁻⁸³	1.26 × 10⁻⁸³	1.26 × 10⁻⁸³
f₄	1.26 × 10⁻⁸³	1.26 × 10⁻⁸³	1.26 × 10⁻⁸³	1.26 × 10⁻⁸³
f₅	1.29 × 10⁻⁸³	1.27 × 10⁻⁸³	1.27 × 10⁻⁸³	1.28 × 10⁻⁸³
f₆	1.16 × 10⁻¹⁵	8.44 × 10⁻⁴¹	2.67 × 10⁻⁸⁵	1.26 × 10⁻⁸³
f₇	1.26 × 10⁻⁸³	1.26 × 10⁻⁸³	1.26 × 10⁻⁸³	1.26 × 10⁻⁸³
f₈	1.26 × 10⁻⁸³	1.26 × 10⁻⁸³	1.26 × 10⁻⁸³	1.26 × 10⁻⁸³
f₉	2.76 × 10⁻¹⁹	3.16 × 10⁻⁴⁹	1.26 × 10⁻⁸³	1.26 × 10⁻⁸³
f₁₀	3.34 × 10⁻²¹	1.26 × 10⁻⁸³	2.50 × 10⁻⁵	1.26 × 10⁻⁸³
f₁₁	1.26 × 10⁻⁸³	1.26 × 10⁻⁸³	1.26 × 10⁻⁸³	1.26 × 10⁻⁸³
f₁₂	1.26 × 10⁻⁸³	1.26 × 10⁻⁸³	1.26 × 10⁻⁸³	1.26 × 10⁻⁸³

Table 4. Algorithm MAE ranking.

Algorithm	MAE	Rank
ICPO	8.31 × 10¹	1
CPO	2.18 × 10²	2
SSA	2.41 × 10²	3
WOA	6.25 × 10²	4
PSO	2.04 × 10³	5

Table 5. p-values of the Wilcoxon rank-sum test.

Test Function	PC1	PC2	PC3	PC4
f₁	1.78 × 10⁻²²	3.26 × 10⁻⁵³	5.16 × 10⁻⁷¹	7.46 × 10⁻⁸²
f₂	1.05 × 10⁻³²	2.51 × 10⁻⁵⁵	3.26 × 10⁻⁷³	4.88 × 10⁻⁸¹
f₃	2.26 × 10⁻²⁴	5.26 × 10⁻³⁷	4.26 × 10⁻⁶³	1.26 × 10⁻⁸³
f₄	1.64 × 10⁻⁵³	2.47 × 10⁻⁶³	3.64 × 10⁻⁷³	7.67 × 10⁻⁸⁹
f₅	8.77 × 10⁻⁶³	3.73 × 10⁻⁴³	7.23 × 10⁻⁵³	1.28 × 10⁻⁸⁷
f₆	1.34 × 10⁻¹⁵	8.42 × 10⁻⁴¹	2.67 × 10⁻⁷⁵	1.66 × 10⁻⁸⁵
f₇	1.47 × 10⁻¹³	6.26 × 10⁻³⁶	7.63 × 10⁻⁵³	6.46 × 10⁻⁹²
f₈	7.77 × 10⁻⁴³	2.45 × 10⁻⁵⁷	3.68 × 10⁻⁷³	1.26 × 10⁻⁸⁸
f₉	2.75 × 10⁻¹⁹	3.98 × 10⁻⁴¹	4.66 × 10⁻⁶²	7.27 × 10⁻⁸³

Table 6. Comparison of feature recognition performance metrics across models.

Category	Precision (%)				Recall (%)				F1-Score (%)
Category	ICPO-XGBoost	PSO-LSSVM	XGBoost	Random Forest	ICPO-XGBoost	PSO-LSSVM	XGBoost	Random Forest	ICPO-XGBoost	PSO-LSSVM	XGBoost	Random Forest
Normal Driving	93.0	89.1	49.1	49.5	92.3	89.1	68.4	68.0	92.6	89.1	57.1	57.3
Slow Driving	99.2	95.5	97.1	86.4	99.2	86.3	55.4	56.7	99.2	90.7	70.6	68.5
Close Car-following	99.2	90.0	97.7	91.9	96.0	89.3	94.2	90.4	97.6	89.7	95.9	91.1
Sudden Acceleration/Deceleration	84.8	49.4	9.0	10.5	87.5	55.9	35.3	19.4	86.2	52.4	14.3	13.6
Frequent Lane Changing	98.1	72.7	82.9	59.2	100.0	76.9	69.0	67.4	99.0	74.8	75.3	63.0
Aggressive Lane Changing	98.4	87.1	82.5	84.7	100.0	87.8	88.8	87.0	99.2	87.4	85.5	85.8

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Su, J.; Shen, T.; Tang, F.; You, X.; He, Q.; Lu, X.; Li, Y.; Luo, S. Recognizing Risk Driving Behaviors with an Improved Crested Porcupine Optimizer and XGBoost. Sustainability 2026, 18, 2804. https://doi.org/10.3390/su18062804

AMA Style

Su J, Shen T, Tang F, You X, He Q, Lu X, Li Y, Luo S. Recognizing Risk Driving Behaviors with an Improved Crested Porcupine Optimizer and XGBoost. Sustainability. 2026; 18(6):2804. https://doi.org/10.3390/su18062804

Chicago/Turabian Style

Su, Juan, Tong Shen, Fuli Tang, Xue You, Qingling He, Xiaojuan Lu, Yikang Li, and Shenglin Luo. 2026. "Recognizing Risk Driving Behaviors with an Improved Crested Porcupine Optimizer and XGBoost" Sustainability 18, no. 6: 2804. https://doi.org/10.3390/su18062804

APA Style

Su, J., Shen, T., Tang, F., You, X., He, Q., Lu, X., Li, Y., & Luo, S. (2026). Recognizing Risk Driving Behaviors with an Improved Crested Porcupine Optimizer and XGBoost. Sustainability, 18(6), 2804. https://doi.org/10.3390/su18062804

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Recognizing Risk Driving Behaviors with an Improved Crested Porcupine Optimizer and XGBoost

Abstract

1. Introduction

2. Risk Driving Behavior Classification and Recognition Model Based on ICPO-XGBoost

2.1. Design of Hybrid Strategy Improved Crested Porcupine Optimization Algorithm

2.1.1. Population Initialization Optimization Design Based on Logistic-Tent Composite Mapping

2.1.2. Hybrid Optimization Mechanism Based on Refraction Opposition-Based Learning and Cauchy Mutation to Broaden the Search Region

2.1.3. Adaptive Variable Spiral Search and Inertia Weight

2.2. Validation of the Effectiveness of the Hybrid Strategy Improved Crested Porcupine Optimizer (ICPO)

2.2.1. Comparative Analysis of Numerical Simulation Experiment Results

2.2.2. Comparative Analysis of Algorithm Convergence

2.2.3. Comparative Analysis of Algorithm Runtime Performance

2.3. Construction of the ICPO-XGBoost Classification and Recognition Model

2.3.1. Extreme Gradient Boosting Algorithm

2.3.2. Steps and Process of the ICPO-XGBoost Recognition Model

3. Case Study

3.1. Preprocessing of Risky Driving Behavior Data

3.2. Analysis of Risky Driving Behavior Classification and Recognition Results

3.3. Comparative Analysis of Classification and Recognition Errors for Risky Driving Behaviors

3.4. Sensitivity Analysis of Parameters in the Risky Driving Behavior Classification and Recognition Model

3.5. Sensitivity Analysis of Factors Influencing Risky Driving Behavior Classification and Identification

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI