Traditional adaptive RRT methods often rely on static goal biasing or distance-based step-size adjustment. The improved Bi-RRT in this study introduces a closed-loop feedback mechanism that integrates failure-driven state transitions, density-based dynamic repulsion, angular constraints, and a continuous expansion strategy. This mechanism enables the algorithm to adapt its search behavior to local geometric features rather than relying on purely random exploration.
3.1. Improved Bi-RRT Algorithm
- (1)
Multi-level sampling strategy driven by expansion failure count
Existing RRT methods often rely on fixed empirical probability distributions for sampling. This method introduces a multi-level state transition mechanism. A dynamic multi-level adaptive sampling strategy based on the expansion failure count is proposed to improve the expansion efficiency of the random tree in complex environments. The reference direction is defined from the node , closest to the goal, toward .
The algorithm is divided into the following four adaptive stages:
(1) Goal-directed Stage ():
In collision-free regions along the reference direction, a goal-directed greedy strategy accelerates tree expansion. The goal node can be selected as the sampling point, with sampling also performed within a small neighborhood around , promoting rapid convergence toward the goal.
(2) Locally directed expansion Stage ():
When exceeds the threshold of the previous stage, the presence of small obstacles ahead of the tree is inferred. Sampling is then restricted to a ±80° sector centered on the reference direction . This specific angular sector was geometrically selected to ensure a forward-biased expansion while strictly preventing inefficient backward or purely perpendicular tree growth. This sector is divided into equal sub-sectors (), within which candidate sampling points are generated with small Gaussian perturbations. After collision checking, the candidate with the minimum Euclidean distance to is selected as the expansion target.
(3) Wide-area detour expansion Stage ():
When exceeds the threshold of the previous stage, a wider obstacle ahead is inferred. The sampling range is expanded to a ±120° sector. These thresholds and subsequent allocation probabilities were determined through extensive preliminary simulations to optimally balance greedy exploitation and necessary random exploration. With a probability of 30%, samples are drawn within the ±80° region to maintain a baseline goal-directed progression, while with probability 70%, sampling is performed in the lateral regions spanning ±80° to ±120°, facilitating detours around obstacles.
(4) Global Escape expansion Stage ():
When all preceding expansion attempts fail, the current node is considered to be trapped in a deep local minimum, such as a U-shaped configuration. Sampling is extended to the full 360° space. The sampling distribution is biased toward the rear region to promote backtracking and escape from spatial traps; a probability of 10% is assigned to the forward ±80° region, 20% to the regions between ±80° and ±120°, and the remaining 70% to the regions between ±120° and ±180°.
Preliminary experiments identified and as suitable optimization ranges. Values below 3 tend to trigger excessive strategy switching, whereas values above 15 are frequently associated with prolonged search stagnation in highly constrained regions, such as U-shaped obstacles. The upper-level PSO performs online optimization within these ranges, reducing the reliance on manually selected threshold values.
Figure 2 illustrates the multi-level adaptive sampling strategy guided by the number of expansion failures:
- (2)
Dynamic Invalid-Radius Suppression Mechanism Based on Local Density
A newly generated candidate node
is accepted if its distance to any existing tree node
satisfies the condition
. In this context, the adaptive invalid radius
is defined as:
where
denotes the base invalid radius,
represents the expansion step size, and
is the invalid-radius coefficient. Additionally,
indicates the total number of nodes in the tree,
corresponds to the number of nodes within a local neighborhood centered at
with radius
, and
is the density penalty factor.
In sparse regions, , allowing the algorithm to perform fine-grained exploration. In contrast, in high-density regions, increases, and candidate nodes falling within the invalid radius are excluded from connection.
- (3)
Reference-Angle Dispersion Optimization
Existing Bi-RRT algorithms typically rely solely on Euclidean distance to maintain node diversity and rarely consider the angular distribution of branches, which tends to produce redundant parallel paths. To address the issue, this study introduces an angle-repulsion constraint to enhance the diversity of search directions. The algorithm maintains a set of directions,
, corresponding to existing branches within the neighborhood of the nearest node
. For each newly generated candidate sampling angle, the minimum angular distance to any direction in the set
is computed as:
where
denotes the current minimum angular,
is the angle-repulsion threshold, and
represents the candidate sampling angle. If
, the direction is considered overly congested, and
is forcibly reset.
- (4)
Continuous Expansion Based on Hysteresis Decision and Directional Inertia
Conventional Bi-RRT relies on single-step expansion, which is prone to path oscillations in narrow passages. This strategy introduces a continuous expansion mechanism to accelerate tree growth and improve obstacle avoidance stability.
(1) To utilize directional inertia for accelerating the growth of the random tree, when the algorithm is in the obstacle-avoidance stage () and a new node is successfully generated, a greedy probing expansion is performed five times along the current sampling direction. If any candidate node encounters a collision or exceeds the boundary, the inertia-based expansion is immediately terminated.
(2) Single-step sampling may be insufficient for escaping obstacle regions. A hysteresis decision mechanism is introduced to maintain the current sampling strategy until a stable exit is achieved. After the inertia-based expansion, a multi-step straight-line collision check is performed from the new node toward the global goal . If the path is collision-free, the branch of the random tree is considered to have exited the obstacle, is reset to zero, and the algorithm returns to the direct-steering mode. Otherwise, remains unchanged, and the lateral detour mode is maintained in the next iteration.
3.2. Multi-Strategy Improved PSO Algorithm
Due to the high coupling between parameters such as the expansion step size and the sampling-angle constraint threshold, the presence of multiple local optima in the solution space, and the strong stochastic noise inherent in the evaluation process, conventional PSO is prone to premature convergence in the context of Bi-RRT algorithms. To address these limitations, this study proposes an improved PSO algorithm that integrates high-dimensional particle encoding, logistic mapping, a comprehensive CLS mechanism, and a stagnation-based reset strategy.
Figure 3 (Flowchart of the Multi-Strategy Improved PSO Algorithm) shows a detailed flowchart of the proposed multi-strategy improved PSO algorithm. Colored highlights indicate the components that contribute to the improvement of the search process. Furthermore, a linearly decreasing scheme is employed to dynamically adjust the inertia weight throughout the optimization process.
where
denotes the maximum number of iterations,
,
, and
is the current iteration number.
- (1)
High-Dimensional Particle Encoding Structure
The position vector of the
-th particle,
, is defined as a five-dimensional hyperparameter set encompassing three categories of parameters: motion characteristics, decision thresholds, and spatial constraints.
where
denotes the base expansion step size. The state transition thresholds
and
correspond to the conditions governing the transitions within the multi-level sampling strategy. The dynamic suppression parameters
and
regulate the effective range of the density-based adaptive invalid radius
, where
represents the base invalid-radius coefficient, and
denotes the density-penalty strength. The complete list of optimized parameters is summarized in
Table 2.
- (2)
Particle Population Diversity Enhancement
The logistic chaotic map is employed to enhance the diversity of the particle population within the five-dimensional hyperparameter search space. The velocity update formula utilizes a CLS. CLPSO enables each particle, when updating its velocity across different dimensions, to learn from the individual historical bests of distinct particles in the population, without relying on global best guidance. This dimensional decoupling mechanism helps prevent the swarm from premature convergence to local optima.
- (3)
Stagnation-Detection-Based Population Diversity Maintenance
Each particle is equipped with a stagnation counter, . During each iteration, when updating , if the fitness value of particle changes, the counter is reset to zero; otherwise, it is incremented by one. Once exceeds the predefined threshold , the particle is considered to have become trapped in a local optimum or to have lost its search capability. The following remedial actions are then applied to the particle to restore population entropy and guide the algorithm out of local stagnation:
1. Position Reset: The particle’s position is randomly regenerated within the solution space.
2. Kinetic Energy Injection: The particle is assigned a relatively large random initial velocity.
3.4. Path Post-Processing and Trajectory Smoothness Optimization
This study proposes a PSO-based trajectory collaborative optimization strategy with range-jitter constraints. Redundant nodes in the initial path are first removed using a line-of-sight method, and the resulting path nodes are mapped to the initial control points of a B-spline curve. Subsequently, the PSO algorithm performs fine-tuning of the control points within their respective searchable regions to identify the optimal set of B-spline control points.
1. Jitter Space Construction and Particle Encoding
Let the set of path nodes after redundancy elimination be denoted as
. Except for the start and end nodes, which are fixed, the searchable region of each intermediate node
is defined as a circular domain
centered at its original position:
where
denotes the radius of the searchable domain for each node.
is set to 0.3
, where
is the Bi-RRT expansion step size. This bounded-jitter constraint limits excessive deviation from the original feasible path while preserving sufficient optimization flexibility.
If a control point enters an obstacle region during particle updates, the corresponding particle is either projected onto the boundary of the feasible domain or reset to its last valid position. Furthermore, after each control-point update, the resulting B-spline trajectory is uniformly discretized and evaluated for collisions. Trajectories that intersect obstacles are assigned a large penalty during fitness evaluation, preventing them from being selected as feasible solutions. Consequently, the optimization process is restricted to collision-free trajectories, ensuring obstacle avoidance throughout trajectory generation. This optimization process is illustrated in
Figure 5.
2. Collaborative Fitness Evaluation
The fitness function
directly evaluates the B-spline trajectory
induced by the current control points, thereby jointly optimizing control point placement and trajectory quality. The evaluation integrates path length, smoothness, and safety:
where
denotes the arc length of the generated curve, and
represents a collision penalty term defined via a penalty function. This term is assigned a sufficiently large value, for example,
, when any sampling point collides with an obstacle.
denotes the unit tangent vector at sampling point
, where
indexes the discretized curve. The weights
and
correspond to the curve length and smoothness terms, respectively, with
; in this study, they are set to
and
.
3.5. Multi-Target Visit Sequence Planning Based on Hybrid Graph Weights
A complete graph model with hybrid edge weights is constructed, and a greedy algorithm is employed to determine an AGV operation sequence that minimizes both energy consumption and time cost.
- (1)
Problem Modeling and Hybrid Weight Strategy
The AGV task scheduling problem is formulated as a complete undirected weighted graph . The vertex set consists of the starting node and target locations to be visited.
A two-level weighting scheme is adopted for the edge weights . If the straight-line path between nodes and is collision-free, is defined as the Euclidean distance between them. Otherwise, an obstacle-avoiding path is generated using the improved PSO and Bi-RRT algorithm, and the corresponding path length is assigned to . If path planning fails, is set to infinity.
- (2)
Construction of the Comprehensive Obstacle-Avoidance Cost Matrix
An environment complexity penalty factor is introduced to adjust the physical path length, thereby constructing a composite cost matrix
that reflects traversal cost. For non-line-of-sight paths, the matrix elements are defined as follows:
where
denotes the element of the composite cost matrix, and
represents the length of the collision-free obstacle-avoiding path between nodes
and
. The
denotes the baseline detour cost incurred by obstacle avoidance. The Euclidean distance between the two nodes is denoted by
. The coefficient
is the obstacle density factor, where
denotes the total number of grid cells along the straight-line path between two nodes, and
represents the number of obstacle-occupied grid cells among them. The parameter
is the path complexity coefficient, and
denotes the number of turning points along the path where the turning angle satisfies
.
- (3)
Greedy-Based Visit Sequence Optimization
Based on the cost matrix
, the multi-target planning problem is formulated as an optimization over permutations
, with the objective of minimizing the global operational cost function
:
The solution is subject to the constraint , and , which forms a permutation of the target set. Considering that the preceding Bi-RRT stage has already consumed part of the computational resources and that real-time performance is critical in warehouse scheduling, a nearest-neighbor greedy strategy is adopted to obtain a near-optimal sequence. The time complexity of this strategy is , which enables millisecond-level responsiveness for multi-target tasks.