4.1. Overall Design of the Algorithm
Although GA, Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO), and Variable Neighborhood Search (VNS) are all applicable to combinatorial optimization, we adopted GA because TAS is a permutation-structured decision space with precedence and multi-vehicle coupling. The GA is well-suited for the permutation-based structure of the task assignment subproblem. The A* algorithm effectively exploits the RMFS graph topology for deterministic shortest-path routing. The paragraph also notes that the overall design is parallelizable, data-agnostic, and amenable to future enhancements such as learning-based guidance or local search.
Recent advances in adaptive and hyper-heuristic optimization, such as the quantum-inspired framework for dynamic multi-objective disaster logistics, have shown that learning-guided coordination can improve search efficiency in complex combinatorial problems [
31]. At the same time, several studies have achieved notable progress in improving avoidance efficiency and coordination performance in multi-robot systems through hybrid metaheuristic approaches [
32]. Building on this insight, the proposed BLP framework adopts a coordinated heuristic with a bi-level architecture that integrates TAS with CPP. The hybrid heuristic, denoted as GA-A*-CP, combines a GA for TAS with an enhanced A*-based collision-avoidance mechanism for CPP. The overall structure of the framework is illustrated in
Figure 2.
Through an iterative TAS decisions (see
Figure 2, green dashed lines) and CPP results (see
Figure 2, red dash-dot line) coordination process, the GA-A*-CP algorithm continuously exchanges feedback between the two levels, enabling successive refinements and convergence towards a jointly optimal solution. At the upper level, the GA serves as a population-based metaheuristic that explores large combinatorial solution spaces by simulating the process of natural selection [
33]. It generates TAS solutions that are passed to the lower level. At the lower level, the A* algorithm is a heuristic-guided search algorithm that finds cost-minimizing paths with high computational efficiency [
34]. After computing the path with the shortest distance and minimal turning cost, the algorithm generates CPP for all AGVs by considering collision avoidance and queuing conditions within the picking workspace. Based on the results of CPP, the total operational cost of all AGVs and the task completion times of the pickers are calculated. These performance metrics are then returned to the upper level, where GA evaluates the quality of the current TAS solution and updates it accordingly. This feedback-driven process continues iteratively until the upper-level objective converges to a satisfactory value or the maximum number of iterations is reached. Through this coordination mechanism, the algorithm effectively integrates TAS and CPP, thereby reducing overall system cost and enhancing operational efficiency.
4.2. Detailed Description of the Algorithm
The detailed steps of the proposed hybrid GA-A*-CP algorithm for solving the BLP model are outlined as follows:
Step 1: Generate initial population using real-number encoding
The GA starts by generating an initial population based on real-number encoding. To ensure balanced task allocation and avoid AGV idleness, tasks are grouped according to the number of AGVs. Within each group, tasks are randomly allocated to AGVs, with each AGV scheduled only once per group. As shown in
Figure 3, each chromosome represents a TAS solution. The chromosome length equals the number of tasks; each gene position corresponds to a task, and the gene value indicates the allocated AGV.
Step 2: Embed TAS solution into CPP framework
The TAS solution determined by the upper-level GA is passed to the lower level as input for CPP. The lower level first applies an improved A* algorithm to generate a preliminary trajectory that minimizes both travel distance and the number of turns. This trajectory is then refined using a collision avoidance strategy and a prediction mechanism to ensure feasibility in a complex environment. Each path for completing a task is divided into four segments based on the starting and destination locations. Three of these segments are computed using the improved A* algorithm, while the remaining segment corresponds to a fixed path within the picking workspace. The detailed procedure for generating conflict-free paths is as follows:
First, the node expansion strategy of the improved A* algorithm is refined. The algorithm determines the initial expansion node and its direction based on the spatial relationship between the start and goal locations. As shown in
Figure 4, the search space is divided into four areas. The location of the goal determines the first node and counterclockwise expansion order. For example, if the goal lies in Area 1, the search starts at node ①. This directional expansion prioritizes nodes toward the goal location, improving both accuracy and convergence speed.
Second, the A* algorithm is improved by increasing the weight of the turning penalty. This is amplified to reduce unnecessary turns and prevent AGVs from bypassing the vicinity of the goal point. This adjustment discourages inefficient detours and promotes more direct paths. This is amplified to reduce unnecessary turns and prevent AGVs from bypassing the vicinity of the goal point. Since each turn involves deceleration, steering, and acceleration, turning typically incurs higher time costs. The path search process of the improved A* algorithm is illustrated in
Figure 5, where the evaluation function defined in Equation (32):
In the evaluation function of the improved A* algorithm, the total cost is composed of three parts. First, represents the actual cost from the start node to the current parent node. Second, denotes the estimated minimum cost from a child node to the goal node, which, under RMFS scenarios, is calculated using Manhattan distance. Third, a turning cost of is added if the movement from the child node to the goal node requires a direction change. This penalty is applied when the child node is not aligned horizontally or vertically with the goal node; otherwise, it is zero.
Figure 5 illustrates the node expansion process. The AGV searches from node L1 (AGV location) to L2 (task location). L1 is the initial parent node and is added to the closed list. Based on expansion rules, four feasible directions from L1 are explored, and their heuristic values are computed. Child nodes are placed in the open list; the one with the smallest total evaluation cost is selected and moved to the closed list. The process repeats until the goal node L2 is added to the closed list. The final path is constructed by tracing back through the parent nodes. In this stage, the AGV operates in a no-load state and is allowed to traverse beneath pods. For stages L2-L3 and L3-L2, where the AGV is in a loaded state moving between the task location and the picking station (and back), only the free aisles between pods are available.
Third, a dynamic priority-based collision avoidance strategy is employed. After the shortest paths for each AGV are generated using the improved A* algorithm, potential conflicts are resolved without altering the planned trajectories, by assigning dynamic priorities and introducing waiting times where necessary. As shown in
Figure 6, AGV priority is determined by two rules: (1) AGVs in the loaded state are prioritized over those in the no-load state due to their higher unit-time operating cost and task urgency; and (2) among AGVs in the same load state, those with higher obstacle avoidance cost are given higher priority.
Fourth, the queue prediction mechanism is performed in the picking workspace. By calculating the entry time into the workspace and the departure time from the picking station for each AGV, the queuing condition and corresponding waiting time can be estimated. When an AGV is predicted to arrive at the picking workspace during a period of congestion, it chooses to wait near the bottom of the pods if it is in a no-load state. The AGV remains in this waiting location until a queue-free window is available, at which point it proceeds to the picking station.
Step 3: Feed CPP results back to optimize TAS decision
The feedback from CPP is used at the upper level to improve the TAS decision-making.
First, the CPP results are used to compute the pickers’ task completion time and the total cost of all AGVs for each individual in the population; based on this, the best individual is selected to advance to the next generation.
Second, an adaptive crossover operator is applied with a crossover probability of 0.6. As shown in
Figure 7, two chromosomes are randomly selected from the population and a task group is randomly chosen from each as the crossover unit. The selected task groups are then exchanged between the parent chromosomes to produce two offspring.
Third, a single-point mutation operation is performed with a mutation probability of 0.06. During mutation, it is ensured that each AGV is scheduled only once within the same task group to maintain feasibility. The algorithm iterates over a defined number of generations, during which the TAS solution is continuously optimized.
Step 4: Check iteration limit and output final results
The algorithm iterates up to a predefined number of times in search of a solution with the minimal AGV operational cost. If the total cost of all AGVs continues to decrease or remains stable, the process proceeds until the maximum iteration count is reached. If the cost increases, the algorithm terminates immediately and the previously best-performing solution is selected. The final TAS and CPP results are then output. If the stopping condition is not met, the algorithm returns to Step 2 for further iteration.
In a consistent bi-level coordination, the lower-level A* serves as a deterministic optimal oracle for TAS, with a fixed tie-breaking rule removing randomness. The upper level employs elitism with an non-worsening acceptance rule; thus, accepted updates do not deteriorate the objective. Since the feasible TAS set is finite and the number of alternations is bounded with a stop after consecutive non-improving rounds, the TAS–CPP alternation terminates in many finite steps at a fixed point or at an -stable solution in static settings. To prevent oscillations in symmetric or bottlenecked layouts, stable tie-breaking for cost-equivalent solutions and per-round limits on assignment changes are imposed. In rolling or stochastic settings, guarantees are stated in terms of bounded re-optimization with non-worsening objectives rather than asymptotic convergence.