1. Introduction
As robotics technology continues to evolve, an increasing number of robots are being integrated into daily human activities, leading to more frequent collaboration between humans and robots [
1]. The wheeled humanoid robot (WHT), a hybrid design that integrates a bipedal upper body with a wheeled mobile base, has garnered significant attention from both academia and industry (
Figure 1). The configuration of WHT combines the advantages of humanoid morphology, which enables natural collaboration with human-centric environments (e.g., operating tables and door handles) and facilitates dexterous manipulation tasks, with the efficiency and stability of wheeled locomotion. The wheeled base alleviates the inherent limitations of traditional bipedal robots, such as gait instability, low locomotion speed, and high energy consumption, thereby enabling fast, efficient, and stable motion on flat terrain. These capabilities make WHT highly promising for a wide range of applications, including warehouse robots, service robotics, medical rehabilitation, and public assistance.
However, this vision of human–robot collaboration also poses a fundamental challenge: path safety. As robots operate in dynamic human-shared spaces, their path planning is no longer merely finding the shortest path, but rather a dynamic decision-making process that must prioritize human safety above all else. Robot motion must not only be efficient, but also highly reliable, predictable, and collision-free. Even minor incidents can result in physical harm and severely undermine public trust in robotic systems. Hence, ensuring path safety amid unpredictable human movements is a critical prerequisite for unlocking the potential of such robots.
Motion planning in dynamic, uncertain human-shared environments represents a fundamental challenge in the field of mobile robotics. Unlike static or deterministically moving obstacles, human motion is inherently stochastic and impossible to predict with perfect accuracy. This uncertainty stems from the complex intentions, social conventions, and reactive decision-making of humans. For instance, a person may suddenly change his/her direction, stop without obvious cues, or interact with other individuals. Traditional planning approaches, such as velocity obstacles [
2,
3] and the dynamic window approach [
4], often rely on the assumption that dynamic obstacles move in a deterministic manner (e.g., at constant velocity). However, this assumption frequently fails in human-shared environments where human motion is inherently stochastic and unpredictable. Traditional methods perform real-time obstacle avoidance [
5,
6]; however, those methods lack foresight and are easily trapped in local optima. Safe motion planning must explicitly account for the multi-modal probability distributions of human movements and generate trajectories that not only proactively avoid potential conflicts but also maintain global optimality. Our previous research [
7,
8] has shown that resolving conflicts at the planning level is an efficient way to avoid potential human–robot collisions.
To achieve safe motion planning, it is essential to predict how humans will move and how human risk will be during the execution of tasks by both humans and robots. Our purpose is to solve conflicts at the planning level by taking human risk distributions into account while achieving a high task success rate and improving WHT and human collaboration efficiency. To this end, this paper presents a safety-oriented motion planning (SOMP) algorithm that accounts for unknown human motions at the planning level and generates an optimal, safe path. The primary contributions of this paper are outlined as follows:
We introduce an SOMP framework that accounts for unknown human motions at the planning level and generates an optimal safe path.
We develop an -diversity path generator (-DPG), built on a dynamic-quadrant stochastic sampling RRT method that computes multiple diverse paths across the environment according to a diversity metric , thereby enhancing the algorithm’s robustness to environmental changes.
A constraint-fusion mechanism combining hard and soft constraints is proposed for conflict resolution, where hard constraints provide a safety guarantee by forbidding visiting high-risk areas, while soft constraints support optimization.
We conduct a series of comparative simulations to validate the proposed methods, and the results demonstrate their efficiency in planning safe paths and their robustness to environmental changes.
The rest of this paper is organized as follows.
Section 2 presents a review of related works.
Section 3 introduces the problem description and formulation. The proposed method is presented in
Section 4, followed by the simulation results and discussions in
Section 5. Finally, we provide a brief conclusion in
Section 6.
2. Related Works
To avoid human–robot collisions, various approaches have already been developed. Korkmaz [
9] focused on the human-aware navigation of a robot, where the robot follows a global path and rearranges its path by dynamically detecting obstacles around it. The majority of studies are based on rapidly nexploring random trees (RRT) [
10], which explore the environment by sampling. Hyotish et al. [
11] developed an asymptotically optimal version of RRT based on incremental sampling that covers the whole space. They used a dynamic replanning scheme in which the robot rectifies or modifies its path when an unknown moving or static snag obstructs it. To build more effective path planning algorithms, many improved RRT algorithms have been proposed in [
12,
13,
14,
15]. Herrera-Medina et al. [
16] combined YOLO with an RRT algorithm for optimizing path planning and human detection. The RRT algorithm generates collision-free paths, and YOLO is applied for human detection and localization. Aiswarya et al. [
17] proposed a human-aware motion planning method with RRT in an Industry 4.0 environment. Rajendran et al. [
18] provided a human-aware motion planner, building upon RRT-connect. However, their approach was designed specifically for manipulators and is not well-suited for solving the path-planning problem of mobile robots in environments with stochastically moving humans.
The studies mentioned above demonstrate the effectiveness of RRT and RRT* algorithms in motion planning. They both employ an online scheme that continuously monitors a robot’s surroundings and replans a new path whenever a human obstructs the existing path. It is commonly considered the safest solution for avoiding moving obstacles or humans. However, it focuses too heavily on immediate conflicts and lacks foresight, which prevents it from adopting proactive strategies; instead, it can only respond passively by replanning detours. Consequently, the robot becomes easily trapped in local optima and fails to complete its task within the allocated time budget. Furthermore, the computational cost of continuous replanning is high.
Mi et al. [
19] developed a safe A* algorithm to plan a safe local path by considering stochastic human risks, which is a searching-based method. However, their algorithm is limited and highly depends on the reward design. There are also many other famous searching-based algorithms for path planning, such as D*. Choi et al. [
20] developed an enhanced D* algorithm for optimizing the path where a robot moves on sidewalks. They analyzed humans’ trajectory data from lidar sensors and identified the average distance and angle of avoidance at which people start to avoid autonomous robots. The searching-based method is more efficient in determining optimal paths in a static environment. For a dynamic environment, it must combine risk constraints or an online replanning scheme to avoid moving obstacles.
Machine learning (ML) has been widely applied in robotics and has shown its effectiveness in decision-making. We consider it a learning-based method, such as reinforcement learning (RL), deep reinforcement learning (DRL), supervised learning, Monte Carlo tree search (MCTS), etc. Mi et al. [
21] constructed a double-layer RL framework for determining an optimal path, enabling a robot to work in an environment shared with forklifts. EL-Shamouty et al. [
22] developed a framework using DRL to strengthen the intelligence and safety of a robot that collaborates with humans. Chen et al. [
23] proposed a safe RL controller incorporating control barrier functions (CBFs), using Gaussian processes to model environmental dynamics and uncertainties. However, Gaussian processes are limited in capturing multi-peaked uncertainty distributions. Zhang et al. [
24] developed an online collision detection and identification method for human-collaborative robots based on supervised learning and Bayesian theory. To improve adaptability to environmental changes, Mi et al. [
25] introduced an encoded RL approach that integrates formal methods with reinforcement learning. Li et al. [
26] proposed a self-learning Monte Carlo tree search algorithm (SL-MCTS) that combines the MCTS algorithm with a two-branch neural network (PV-Network) and has the ability to continuously improve its problem-solving ability in single-player scenarios.
Learning-based approaches are effective at adapting to dynamic environments: RL methods learn an optimal policy via trial-and-error, while MCTS operates without domain knowledge. These methods typically employ reward mechanisms aimed at maximizing a predefined objective function or minimizing its cost. We summarize the above-mentioned methods in
Table 1, classified into three types.
There are also many other methods. Tonola et al. [
27] effectively addressed the motion planning of the issue by leveraging a set of offline pre-computed paths to enable efficient online re-planning and trajectory generation. Alvankarian et al. [
28] proposed an optimal, local path-planning algorithm with IsoCost-based dynamic programming, which overcomes obstacles of any shape. In industrial human–robot collaboration scenarios, safety and efficiency are two crucial factors for human–robot collaboration. It is challenging to ensure human safety while not sacrificing task efficiency.
To balance safety and productivity, Zhang et al. [
24] developed an online collision-detection and identification scheme that can be triggered before further injuries are caused. On the other hand, in order to facilitate collaboration, it is vital to collect the state of humans as well as predict their motions. The most common approach is human aware-based robot motion planning [
20,
29,
30,
31]. For example, Abdelshafy et al. [
32] developed a safe human–robot interactive algorithm by proposing a real-time human awareness, detection, and avoidance algorithm, enabling the system to respond in real time to human presence and movement. Apart from the human safety, human comfort also needs to be considered. Aéraïz-Bekkis et al. [
33] discussed how movement uncertainty of a robot can provide a unified explanation for the modulation of human comfort with robots.
Another approach is a statistical learning method that can make a highly accurate prediction of a person’s long-term motion. For instance, Kamezaki et al. [
34] proposed a new concept of a dynamic collaborative workspace based on humans’ predicted trajectory. Similarly, Chen et al. [
35] presented a human–robot safety framework based on human hand trajectory prediction. In addition to human prediction, Xiao et al. [
36] employed a 3D convolutional neural network for human action recognition.
The literature focuses almost entirely on the motion planning of robotic arms, whose primary objective is to avoid collisions with humans rather than achieve human–robot collaborative operations in the true sense.
Bruckschen et al. [
37] presented a novel human-aware navigation approach that relies on the long-term prediction of human movements and is applied to a service robot acting in indoor environments. Literature on human–robot collaboration of this kind is relatively scarce. The main reasons for this are rooted in uncertainty. In dynamic human environments such as warehouses, supermarkets, and workshops, the behavior of human operators is unpredictable and, therefore, difficult to integrate into the robot system’s path-planning analysis. As such, risk constraints are applied in path planning [
38,
39]; however, risk constraints are hard to estimate.
In this paper, we construct hard and soft human risk constraints by simulating human trajectories and calculating risk distributions, which are efficient and simple to compute. Unlike traditional risk-aware methods, we generate multiple diverse paths distributed over the whole environment and solve potential conflicts with both hard and soft constraints to find an optimal safe path, ensuring the safety of the robot and humans, instead of online replanning solutions.
3. Problem Description and Formulation
A wheeled humanoid robot (hereafter referred to as “robot”) and K humans () work in a shared environment. Both the robot and humans have their own tasks. The robot and each human are assigned a goal position that they have to arrive at, where the following are true:
is the finite state set of the robot, defines the state of the robot at discrete time-step t, and and are the initial state and goal state, respectively,
is the finite sate set of humans, defines the state of K humans at time-step t, and and are the initial state and goal state of human k, respectively,
represents the action set of the robot and humans.
The robot’s movements are controlled by a system controller, whereas human movements are not. Each human moves toward his or her goal independently, without considering the robot. The trajectory of each human is unknown in advance and can only be observed after the human reaches the goal. For the current time , we can observe the state of human k, ; however, is unknown.
The robot is required to reach its goal position within the given time budget T while avoiding conflicts with humans. Let denote the path of the robot, where . A conflict occurs when the robot and a human k either occupy the same position or traverse the same edge. Specifically,
A task is successfully completed if and only if the robot reaches the goal without collisions and within the time budget T; otherwise, it is regarded as a failure.
The problem is formulated as resolving potential conflicts between the robot and humans and planning a safe path for the robot in the absence of prior knowledge about human motions. Unlike traditional approaches, we aim to solve potential conflicts at the planning level instead of maneuvering around human beings during encounters.
4. Methodology
The architecture of the proposed SOMP is illustrated in
Figure 2. It consists of four main modules: the
-diverse path generator, the human risk creator, the conflict solver, and the safety-first optimizer. Specifically, the
-diverse path generator generates multiple initial paths to form the initial solution set. The human risk creator predicts the trajectories of randomly moving humans and computes both hard and soft constraints. In the conflict solver, potential high-risk conflicts are solved based on hard constraints. Finally, an optimal path is obtained through safety-first optimization with soft constraints. The functionality of each module is described in detail as follows.
4.1. -Diverse Path Generator: Robust Initial Path Set
In this section, we introduce the -diverse path generator (-DPG) to produce multiple diverse paths distributed across the entire environment. The core part is a dynamic quadrant stochastic sampling RRT method with a diversity metric .
Here, a map
with a 2D grid world of
.
Figure 3 illustrates an example in which the environment is divided into four quadrants, denoted as
,
,
, and
. This approach is commonly applied in settings such as warehouses and unmanned supermarkets, where shelving structures exist.
Let
define the vertex set of vertices in quadrant
. Hence,
and
. For any
, we record those vertices already visited in path
x as
. That is,
For each quadrant,
, we define an exploration rate
to show its exploring process, which is calculated by
where
is the size of a set.
At the beginning, the initial paths set
. For any quadrant
,
; thus, the exploration rate,
, is written as
Once a path is found,
. At least a quadrant
is explored, and its exploration rate
.
If quadrant
is fully explored, that is, all vertices of quadrant
i have been added to
,
and the exploration rate
.
Let
define the sampling rate of
. We propose a dynamic quadrant stochastic sampling policy. For each quadrant
, the sampling rate
dynamically changes along the exploration rate
, where
With initialization , the sampling rate of each quadrant follows the uniform distribution, where , where () is the probability that the robot chooses a dynamic quadrant stochastic sampling policy (DQSSP). The robot takes a probability of to sample the goal position.
The diversity metric
for scenarios with shelving structures can be calculated by
where
D is the length of a shelf. For a map with random obstacles or without obstacles, the
is provided by the human. Based on the diversity metric
, the robot searches multiple diverse paths.
Figure 4 shows the framework of
-DPG. The basic searching structure is a SQSSP-based RRT. Let
be the search tree, where
is the node set and
is the edge set.
is the root node.
, and
is the goal node.
Based on the SQSSP, a sampling point
is obtained. Then, for
, we calculate the Manhattan distance from
n to
, written as
. A nearest-neighbor node
with the minimum distance is chosen:
Hence, the tree grows in a direction from to . It moves from toward and produces a new node . If there are no obstacles between and , we add as the child of ; otherwise, we discard it. We repeat sampling and growing the search tree until a path x is found.
We measure the diversity of path
x against an existing path
(
using the Jaccard similarity:
where
is the size of the intersection set, and
is the union set of two paths. If
, add
x to
; otherwise, discard
x and regenerate a new path until the computational budget runs out or
reaches the terminal condition, with a desired path number
L. The procedures are shown in Algorithm 1.
| Algorithm 1: η-DPG. |
![Applsci 16 01500 i001 Applsci 16 01500 i001]() |
4.2. Human Risk Creator: Modeling Uncertain Human Motions
In this paper, we assume that human motions are not affected by the robot. We perform stochastic simulations to model the risks caused by humans, written as . The risks are divided into hard constraints and soft constraints . Hereby, we show how we model uncertain human motions and construct hard and soft constraints.
Figure 5 shows the stochastic motion model of a human
k. We know that the human
k will travel to his/her goal location. However, their trajectory is unknown. We model human motions as follows. The probability of
a is modeled as
with respect to
A. Note that
is unknown.
All humans follow the following policies:
: a human tends to favor actions that bring him/her closer to the goal;
: a human never collides with another human , where and never happen.
At state
, a human
k takes an action
, and the state transition is denoted by
. The set of available actions for human
k at
is a subset of
A, denoted as
. We define
as the specific action that minimizes the remaining distance to the goal, and
Let
be a set of prioritized actions comprising all possible optimal actions,
. To encourage goal-directed movements, we assign higher conditional probabilities to actions in
C. Actions are sampled based on this principle from the following conditional probability distribution:
where
is the size of the action set,
, and
.
With the observation
(vertex
v is occupied by human
k), we simulate stochastic movements from
. Then, we calculate the conditional probability of
v occupied by human
k at
by
where
and
represent the simulation times and visited times of
v at
t, respectively. The risk value of
v occupied by
K humans at
t is indicated by
, and
The human risk set is obtained. Based on , we classify it into hard constraints and soft constraints .
;
, ;
, if , we have , where is a scale parameter, and is the mean of () and contains those vertices with higher risk than ;
contains those vertices with lower risk than ;
.
Algorithm 2 shows the procedures of creating hard and soft constraints.
| Algorithm 2: Hard and soft constraint creator. |
![Applsci 16 01500 i002 Applsci 16 01500 i002]() |
4.3. Conflict Solver: Hard Constraint-Based Resolving
We solve conflicts by prohibiting the robot from entering high-risk areas, as defined by the hard constraints,
. A key advantage of this approach is that it provides a safety guarantee. The framework is shown as
Figure 6. We apply hard constraints
to each path
x (
) and check whether potential conflicts exist.
For any path , if and , the robot occupies vertex v, where it encounters a high risk of colliding with a human. We consider this condition as a potential conflict. In detail, a potential conflict exists if
(potential vertex conflict);
(potential edge conflict)
We solve these potential conflicts by applying hard constraints to the robot. Hereafter, we show the details.
For each time step t,
check a potential conflict at state exists or not;
if a conflict exists, directly forbid the robot from occupying v at time step t by adding the action ;
, …, ;
;
repeat until no potential conflicts exist, and update x.
Finally, we check for goal satisfaction () and prune any updated path x from that fails this condition.
4.4. Safety-First Optimization: Soft Constraint-Based Risk Minimization
Hard constraints
prevent the robot from entering areas with a high risk of conflict. However, due to unknown human motions, not all constraints can be enforced as hard constraints, as this could result in no path being found. Therefore, we model these low-conflict-risk areas as soft constraints
, as shown in
Figure 7.
During the current stage, each path
x of a multiple diverse path set
avoids hard constraints. Based on the soft constraints
, we define a risk function
as
Based on the risk level associated with each
x, we can obtain a path with minimum risk by
Hence, we can obtain a set of paths
based on the safety-first policy. Finally, following the safety-first and shortest-second policies, we consider the length of the path and generate an optimal path from
by
where
computes the length of a path from
to goal
.
We summarize how an optimal safe path is generated based on the hard and soft constraints in Algorithm 3.
| Algorithm 3: Conflict resolve and optimization-based on hard and soft constraints. |
![Applsci 16 01500 i003 Applsci 16 01500 i003]() |
5. Results and Discussions
In this paper, the main scope is to verify the effectiveness of the proposed approach in reducing conflicts while maintaining a high task success rate. For a fair comparison, all methods are evaluated under the same motion model. During the current stage, we use a naive motion model, which is simple but enough for algorithm verification. In the future, we can exchange the motion model with a realistic one as the proposed algorithm is compatible with different motion models.
A series of simulations are conducted to verify the proposed method, and a comparative analysis is performed with established conventional methods: A*, RRT, MDP, and DDQN. These algorithms serve as representatives of search-based, sampling-based, and learning-based (RL and DRL) methods, respectively. We verify our algorithm under the following scenarios:
The parameters are set as in
Table 2. To evaluate the quality of each path, we use a reward scheme with the reward design as follows:
Goal reward: if the robot reaches the goal.
Step cost: cost for a robot moving during each step.
Conflict penalty: if the robot collides with a human.
We run 100 simulations for each scenario and evaluate the performance by the average conflict number, task success rate, and path reward. The simulations are performed with a device with an Intel Core i9-10900 CPU running at 2.80 GHz and 32 GB of memory. The simulations are conducted in a custom Python (version 3.12.7)-based software environment.
5.1. Results on Scenario 1: Investigation on —How Hard and Soft Constraints Affect the Algorithm’s Performance
Figure 9 illustrates the restrictiveness of hard and soft constraints with
.
Figure 10 presents the experimental results of analyzing parameter
under an environment configuration of a
map with
humans. We resolve conflicts by employing both hard and soft constraints, which are defined based on parameter
and the mean risk value. Vertices satisfying
are assigned to the hard constraint set
, representing hazardous regions that the robot must avoid. As
decreases, more vertices are incorporated into the hard constraint set, making it increasingly restrictive (as illustrated in
Figure 9). In the meantime, the conflict number also decreases, as shown in
Figure 10. In addition, it narrows the feasible region. Consequently, an excessively small
may lead to the absence of a valid solution. As shown in
Figure 10, no feasible path is found when
.
Table 2.
Parameter setting.
Table 2.
Parameter setting.
| Parameter | Value |
|---|
| Simulation times of humans | | 2000 |
| Probability of DQSSP | | 0.9 |
| Probability of Goal-bias | | 0.1 * |
| Scale parameter | | ** |
When
increases, the role of soft constraints becomes more significant. Unlike hard constraints, soft constraints do not prohibit the robot from visiting high-risk vertices; instead, they assess the overall risk level of a path and select the path with the minimum risk. A major limitation of soft constraints is that they cannot provide a safety guarantee, and the generated path may still visit vertices with high risk values. This explains why the conflict number increases with
, as illustrated in
Figure 10. In the meantime, the task success rate decreases. As illustrated in
Figure 10, the reward increases as
becomes larger. The reason is that a higher
relaxes the constraints, enabling the robot to take shorter paths, whereas a smaller
forces the robot to detour around hard-constraint vertices, resulting in longer paths. As shown in
Figure 10, the hard constraints have almost no effect when
. The reward, task success rate, and conflict number (
) are the same as without using hard constraints, shown as “wo” in
Figure 10.
In this paper, we employ both hard and soft constraints: hard constraints provide safety guarantees by prohibiting access to high-risk areas, while soft constraints optimize the overall risk level of the path. By comprehensively considering the task success rate and conflict number, which are shown in
Figure 10,
is a good choice with a higher task success rate and fewer conflicts. Therefore, in the following simulations, we set
.
5.2. Results on Scenario 2: Different Maps, 10 × 20, 20 × 40, and 40 × 80
In scenario 2, three different sizes of warehouse grid maps (
Figure 8) are used for the verification of our proposed method. The performance evaluation compares four methods—A*, RRT, MDP, and DDQN—across three key metrics: conflict number, task success rate, and reward.
Table 3 and
Figure 11 and
Figure 12 summarize the comparison results. The performance of the proposed method is comprehensively evaluated against the other four baseline algorithms.
The simulation results demonstrate that the proposed method outperforms the other four methods across the , , and warehouse grid maps. As observed, the proposed approach consistently achieves the lowest conflict number and the highest task success rate under all experimental settings. Specifically, in the map, the proposed method achieves an average conflict number of , which is significantly lower than those of other methods: A* (0.75), RRT (0.71), MDP (0.55), and DDQN (0.58). Meanwhile, the task success rate reaches , indicating nearly perfect completion performance. Similar trends are observed in larger maps ( and ), where the proposed method maintains a zero-conflict rate and a success rate of , demonstrating superior robustness and scalability.
In contrast, traditional search-based method A* and sampling-based method RRT exhibit frequent conflicts, which reflects their limited adaptability in dynamic environments. The learning-based DDQN shows unstable performance with fluctuating rewards, especially when increasing the map size, due to the sparse reward. The MDP approach performs substantially better; however, its performance still falls far short of the proposed method. When the map size increases, it becomes a sparse environment, and the collision risk between the robot and humans becomes increasingly diluted. This explains why the average conflict number decreases from map to , .
Note that DDQN achieves the average conflict number of 0.01 in map ; however, it performs the worst among all methods as the robot fails to reach the goal position in all 100 simulations, while MDP achieves a success rate of . There are several factors. One is the sparse reward. Another reason is that the growing state space makes it impossible for the DDQN agent to sufficiently explore or approximate the value function over all possible configurations. In comparison, the MDP solver exploits the known transition structure and computes global optimal values via explicit value iteration, without the need for exhaustive sampling. Moreover, DDQN relies on neural function approximation, which introduces estimation bias and instability in high-dimensional sparse-reward settings. When the agent encounters unvisited or rarely sampled states—common in large maps—its Q-value predictions become unreliable, leading to suboptimal or inconsistent policies. In contrast, MDP methods guarantee convergence to the optimal policy given an accurate transition model, ensuring consistent performance. What is more, in dynamic environments where human obstacles move stochastically, the system exhibits non-stationary transitions. Model-free algorithms like DDQN tend to overfit past experiences and fail to adapt to the shifting transition probabilities, whereas MDP frameworks can explicitly encode such stochasticity within the transition model and maintain robustness under varying human behaviors.
Figure 11d shows the conflict numbers at each time step of 100 simulations. We can easily determine that most of the conflicts occur from time step
to
in the four baseline methods. In the proposed approach, there is no conflict from time step
to
, which proves the efficiency of the proposed SOMP planner in resolving conflicts with humans based on the hard and soft constraints.
Overall, these results validate that the proposed method effectively balances safety and efficiency, significantly outperforming benchmark algorithms across varying map complexities. The findings confirm that integrating risk-aware constraints into the planning process leads to more reliable and globally consistent navigation behavior. This also further confirms our concept that we can solve conflicts at the planning level instead of continuously replanning.
Figure 13 illustrates the process of the proposed SOMP method. Initially, eight diverse paths were generated (top figure). These were then processed by the conflict solver, in which the dangerous areas in the hard constraints set were avoided, and the paths were eliminated to four paths (middle figure). Subsequently, the application of the safety-first optimization criterion selected a single optimal path, as shown in the bottom figure. As shown in
Figure 13, the proposed method did not prioritize the shortest path but instead generated a safer trajectory that directed the robot upward, away from regions occupied by humans. The dashed lines depict representative trajectories of human agents.
5.3. Results on Scenario 3: Increasing Human Numbers K = 4, 6, 8, 10, 15
In this scenario, we evaluate the performance of the proposed method on a more complex environment by increasing the number of humans
K. The simulation environment is a
grid world involving 4 ∼ 15 humans, as shown in
Figure 8c.
Table 4 and
Figure 14 summarize the simulation results. In this scenario, DDQN shows a poor performance and no single task is achieved.
When , the proposed approach completely eliminates conflicts () and achieves a perfect task success rate (), outperforming all baseline methods (A*, RRT, MDP), which exhibit non-zero conflicts and lower success rates. Although the proposed method yields a slightly lower reward in this case, its ability to ensure full task completion without conflicts highlights its effectiveness in sparse scenarios.
As the number of humans increases to , and 10, the proposed method continues to maintain near-zero conflict counts and high task success rates (ranging from 0.93 to 0.94), while all baselines show a clear decline in performance with higher conflict numbers and reduced success rates. Notably, the DDQN method fails completely in these settings, achieving no successful tasks.
In a more crowded setting with
, the proposed method still achieves a lower conflict number (0.40 ± 0.65) and a higher task success rate (0.67 ± 0.47) compared to the best-performing baseline. Moreover, it attains a superior reward (−0.05 ± 0.11), indicating better overall performance in dense environments where other methods struggle. The proposed method reduces conflicts by 77.8%, 76.6%, and 71.4%
; and improves the task success rate by 168.0%, 109.4%, and 91.4%
, compared with A*, RRT, and MDP, respectively.
Figure 14 shows the trend of conflict number, task success rate, and reward along human numbers.
Figure 15 shows the details of the conflict that occurs during each time step.
In summary, with an increasing number of humans, the environment becomes increasingly crowded and complex. This results in the increment of conflicts (
Figure 14a) and the decrement of the task success rate (
Figure 14b). The simulation results show that the proposed method exhibits strong scalability and robustness across varying human densities, significantly reducing conflicts and maintaining high task completion rates, especially in sparse to moderately crowded scenarios. Its performance remains competitive even under crowded conditions, where conventional methods show notable degradation.
The simulation results show that the proposed algorithm is more robust to environment changes. The robustness mainly comes from the -diverse path generator as it explores diverse paths all over the environment. This provides the robustness to uncertainties; however, it also increases the computational cost. The hard constraint-based conflict solver provides a safety guarantee that the robot never visits dangerous areas, and soft constraint-based safety optimization finds an optimal path.
In this paper, we pay more attention to finding a safe path and do not consider the computational time. We show the computational time in
Table 5. The computational cost of the proposed method is higher than that of A* and RRT. As shown in
Table 5, the computation time is affected by both the map size and the number of humans in the environment. When the map size increases, the computational time grows, mainly because the algorithm needs more time to generate multiple diverse paths. Similarly, as the number of humans increases, the computational time also rises due to the additional effort required for conflict resolution. Specifically, a larger number of humans increases the likelihood of human–robot conflicts and results in more vertices being treated as hard constraints. Consequently, the algorithm requires more time to resolve more potential conflicts.
5.4. Limitations and Future Studies
During the current stage, our focus is primarily on resolving conflicts at the planning level, while certain aspects are not yet considered. There are several limitations. First, although the proposed method substantially reduces the number of conflicts, it does not provide a theoretical safety guarantee. This is the key barrier to deployment in real-world environments. Second, a more realistic human motion model should be investigated. The current motion model is used for algorithm verification. However, it is not sufficient for real-world applications. Although human–robot collaboration has been widely studied, many companies and logistics centers still restrict close human–robot co-working in warehouses, largely because no existing algorithm can simultaneously provide rigorous safety guarantees and maintain high operational efficiency. In human–robot co-working scenarios in logistics and warehousing, the potential effects of robots on humans’ psychological perceptions and behavioral patterns remain insufficiently understood and warrant further investigation.
In the future, we will optimize the proposed method to enhance path safety and accelerate the searching speed. The proposed algorithm is compatible with different human motion models. We will expand our validation with more realistic human motion models to include more complex environments.