Collision-Free Path Planning in Dynamic Environment Using High-Speed Skeleton Tracking and Geometry-Informed Potential Field Method

Kawawaki, Yuki; Murakami, Kenichi; Yamakawa, Yuji

doi:10.3390/robotics14050065

Open AccessArticle

Collision-Free Path Planning in Dynamic Environment Using High-Speed Skeleton Tracking and Geometry-Informed Potential Field Method

by

Yuki Kawawaki

^1,*

,

Kenichi Murakami

²

and

Yuji Yamakawa

³

¹

Graduate School of Engineering, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan

²

Institute of Industrial Science, The University of Tokyo, Tokyo 153-8505, Japan

³

Interfaculty Initiative in Information Studies, The University of Tokyo, Tokyo 153-8505, Japan

^*

Author to whom correspondence should be addressed.

Robotics 2025, 14(5), 65; https://doi.org/10.3390/robotics14050065

Submission received: 28 March 2025 / Revised: 30 April 2025 / Accepted: 14 May 2025 / Published: 17 May 2025

(This article belongs to the Special Issue Visual Servoing-Based Robotic Manipulation)

Download

Browse Figures

Versions Notes

Abstract

In recent years, the realization of a society in which humans and robots coexist has become highly anticipated. As a result, robots are expected to exhibit versatility regardless of their operating environments, along with high responsiveness, to ensure safety and enable dynamic task execution. To meet these demands, we design a comprehensive system composed of two primary components: high-speed skeleton tracking and path planning. For tracking, we implement a high-speed skeleton tracking method that combines deep learning-based detection with optical flow-based motion extraction. In addition, we introduce a dynamic search area adjustment technique that focuses on the target joint to extract the desired motion more accurately. For path planning, we propose a high-speed, geometry-informed potential field model that addresses four key challenges: (P1) avoiding local minima, (P2) suppressing oscillations, (P3) ensuring adaptability to dynamic environments, and (P4) handling obstacles with arbitrary 3D shapes. We validated the effectiveness of our high-frequency feedback control and the proposed system through a series of simulations and real-world collision-free path planning experiments. Our high-speed skeleton tracking operates at 250 Hz, which is eight times faster than conventional deep learning-based methods, and our path planning method runs at over 10,000 Hz. The proposed system offers both versatility across different working environments and low latencies. Therefore, we hope that it will contribute to a foundational motion generation framework for human–robot collaboration (HRC), applicable to a wide range of downstream tasks while ensuring safety in dynamic environments.

Keywords:

HRC; collision avoidance; high-speed processing; deep learning; human pose estimation; path planning

1. Introduction

In recent years, the decline in the total population and labor shortages due to aging have become critical issues. To maintain and improve economic standards with a limited workforce, it is essential to enhance work efficiency. Furthermore, Industry 5.0 increasingly requires robots to collaborate more closely with humans [1]. As a result, the realization of a society where humans and robots coexist is highly anticipated. This will result in the wide range of robotic applications, and robots must handle a wide variety of tasks, including not only task with stationary objects but also dynamic ones [2]. Additionally, scenarios where humans and robots work together in close proximity are anticipated. In such cases, it is crucial for robots to ensure safety by performing evasive behaviors in response to sudden human motions. Thus, robots are required to possess versatility in terms of their operating environment, as well as the ability to operate with low latency to ensure safety.

As for developing this kind of robot system, human–robot collaboration (HRC) has been studied with various approaches to enhance versatility and safety. Ajoudani et al. highlighted the importance of HRC in industrial settings, emphasizing the need for adaptive and safe robotic systems [3]. HRC spans a wide range of applications, including cooperative assembly and disassembly tasks [4,5,6], handovers [7], dynamic peg-in-hole operations [8], deformable object manipulation [9,10,11], and human–robot teamwork [4,12].

These days, advances in artificial intelligence (AI) have significantly contributed to enhancing versatility in HRC systems. The integration of AI in robotics has enabled more adaptive and intelligent interactions between humans and robots [7]. Rosenberger et al. [13] realized human-to-robot handovers using a deep learning-based pipeline from object and human pose detection to motion generation. This system supports various objects and workspaces through a generalizable approach; however, the high computational cost of deep learning often results in processing delays. To mitigate such delays, Wang et al. [6] proposed a proactive human action recognition method that enables efficient assembly. This approach facilitates data-efficient adaptation across domains. Although it retains versatility in domain adaptation, it remains sensitive to training data and real-world uncertainties. Due to processing delays and sensitivity to situational variations, deep learning-based systems are still limited to static tasks and specific environments.

To expand robotic manipulability, Zhou et al. [9] demonstrated the real-time manipulation of complex deformable linear objects by leveraging low-dimensional topological features encoded in a latent space. However, this method was validated only for 2D motion and is limited to specific object types. Alternatively, high-speed cameras and processing techniques have recently gained attention for achieving dynamic manipulation [14]. Yamakawa et al. [8] accomplished a peg-in-hole task involving human–robot collaboration by tracking board poses at 1 kHz using reflective markers. Nevertheless, the system is sensitive to lighting conditions, which limits its applicability in more versatile environments.

To ensure safety, collision-free path planning is essential. In a deep learning-based system [13], the human body and hands are segmented, and a path is planned relative to the target pose. However, as is common, this kind of approach suffers from latency. On the other hand, control barrier function-based methods [15] and potential field-based methods [16] have achieved good real-time performance but still face challenges in general settings, as discussed in Section 2.

Thus, the trade-off between versatility and real-time performance remains a significant challenge in the field of HRC [17]. Deep learning-based methods enhance versatility while usually suffering from latency, and they are limited to static environments. High-speed camera-based methods are promising for enhancing the robot’s flexibility by making use of its precise detection and control method, but these methods are dependent on the environment.

Therefore, this work aims to resolve the trade-off between versatility and high responsiveness in HRC by combining both deep learning and high-speed processing methods, as illustrated in Figure 1a. We address four challenges related to versatility that have not been comprehensively resolved in previous work: (P1) avoiding local minima, (P2) suppressing oscillations, (P3) adapting to dynamic environments, and (P4) handling obstacles with arbitrary 3D shapes. To tackle these challenges, we propose a holistic high-speed framework that integrates high-speed human skeleton tracking with collision-free path planning, as depicted in Figure 1b,c. For skeleton tracking, we combine off-the-shelf deep learning-based detection with high-speed tracking using dense optical flow and dynamically adjust the search area to extract the target joint motion. For collision-free path planning, we propose three key techniques based on the potential field method:

Regional division of the potential field into free, precautionary, and repulsive zones to suppress oscillations (P2).
Tangential motion to suppress oscillations and enhance adaptability to dynamic environments (P2 and P3).
Spherical virtual obstacles to avoid local minima and handle arbitrary 3D obstacles (P1 and P4).

In addition, we introduce lightweight methods for minimum distance calculations between two line segments and present techniques for ensuring asymptotic stability and avoiding singularities to achieve efficient robot motion generation.

In this paper, we handled simple tasks, such as reaching static targets and following dynamic targets in close proximity to humans, demonstrating a fundamental system that can be applied to various downstream tasks while ensuring safety in dynamic environments. We validated the effectiveness of the proposed system through both simulation and real-world collision-free path planning experiments.

Our contributions are as follows:

Fundamental and comprehensive system design from tracking to robot control, with versatility in terms of working environments and high responsiveness.
High-speed skeleton tracking combining deep learning-based detection and optical flow-based motion extraction with dynamic search area adjustment.
High-speed geometry-informed potential field path planning method applicable to dynamic environments, complex obstacles, and human skeletons.

This paper is structured as follows: Section 2 provides an overview of recent HRC research, tracking methods, and path planning techniques. Section 3 describes high-speed skeleton tracking and dynamic search area adjustment. Section 4 explains the path planning approach for avoiding collisions. Section 5 analyzes the effectiveness of high-frequency feedback control and each proposed method through the simulation of collision-free path planning. Section 6 evaluates our system through real-world collision-free path planning experiments. Finally, the paper concludes with Section 7.

2. Related Work

In this section, we first review previous research in HRC, highlighting its achievements and remaining challenges, and clarify our research role. Then, we introduce the key techniques for constructing HRC, including human pose estimation and path planning methods.

2.1. Human–Robot Collaboration (HRC)

The field of HRC is rapidly evolving, with researchers exploring diverse approaches to create more intelligent, safe, and efficient collaborative systems. The integration of advanced sensing, prediction, and learning techniques is paving the way for more natural and effective human–robot collaboration across various applications.

Recent HRC research encompasses a wide range of tasks, from static to dynamic scenarios. Some studies focus on static tasks, such as contactless object handovers [13,18,19], cup-catching [20], meat cutting [21], and furniture assembly [22]. These methods employ deep learning to detect objects [13], predict human motion trajectories [20], anticipate actions [6,19,22], estimate human intentions [23,24], and perform path planning using tools such as GG-CNN (Generative Grasping CNN) [25] and Contact-GraspNet [26]. In addition to deep learning-based approaches, optimization-based methods have been proposed to improve task performance in assembly, disassembly, and delivery [4,5,11]. Furthermore, multimodal systems—rather than relying solely on visual information—hold promise for enabling more natural human–robot cooperation. Several studies combine vision with other sensing modalities, such as force/torque sensors for physical interaction [27], or IMUs for contact detection and motion recognition [21,28]. Language-based interfaces have also been explored to facilitate more intuitive communication [23]. Although recent research has led to more versatile HRC systems in terms of working environments, manipulated objects, and tasks, deep learning-based methods still struggle to handle dynamic tasks in real-time due to their high computational demands.

Regarding research for dynamic scenarios, tasks such as ball position control [27], peg-in-hole operations [8,29], deformable object manipulation [9,10,11], and dressing assistance [30] have been investigated. However, some approaches prioritized measuring the state of shared objects over tracking human motion in order to reduce computational costs [8,9,27]. This design choice limits their adaptability in environments where robots must simultaneously execute primary tasks and avoid collisions with humans. In contrast, Zhang et al. proposed a more versatile real-time system for dressing assistance by employing a hierarchical multi-task control framework with Gaussian processes to optimize robot trajectories [30]. While the system achieved real-time performance, its 30 Hz processing speed remained inadequate for highly dynamic environments, as discussed in Section 5.1.

Thus, achieving both high-speed human pose estimation and responsive path planning is essential—an issue that this work aims to address.

2.2. Human Pose Estimation

For human pose estimation, segmentation and skeleton tracking methods are commonly used. Segmentation methods, such as Mask R-CNN [31], RefineNet [32], and the Pyramid Scene Parsing Network [33], provide detailed estimations of human shapes and features but are computationally demanding, making them unsuitable for real-time applications. In contrast, deep learning-based detection models like OpenPose [34] and YOLO-pose [35] offer more practical, lower-latency solutions. However, these models may still struggle in dynamic environments, such as proximal human–robot collaboration, where rapid response is critical. Yamakawa et al. achieved 1 kHz fingertip tracking by combining CNN-based detection with image moment-based processing, but its sensitivity to lighting conditions limits its applicability in general settings [36]. In this paper, to resolve this trade-off between versatility and low low latency, we propose a skeleton tracking method, which is described in Section 3.

2.3. Path Planning

In the context of robot path planning and control, Villani et al. [7] provided a comprehensive review of safety-critical issues in HRC scenarios. They highlighted the importance of reliable sensing and prediction algorithms for anticipating human movements and intentions, enabling robots to respond appropriately in dynamic environments. Ensuring safety remains a key challenge in HRC scenarios, with some studies focusing on safety monitoring systems [37] and others incorporating human ergonomic considerations into task planning [28].

To ensure human safety, collision-free path planning has been extensively studied. Deep learning-based approaches [38] and search-based methods [39] have been explored, but they often face challenges related to high computational costs.

A promising alternative involves control barrier function-based methods, which ensure that the system’s trajectories remain within a safe set [15,40], and potential field methods, which guide the robot toward its target while repelling it from obstacles. In this paper, we adopted a potential field-based method due to their transparency and intuitiveness. However, conventional potential field-based methods are prone to falling into local minima and exhibiting oscillatory behaviors. Various solutions have been proposed to address these issues by modifying the potential field [41], introducing virtual obstacles [42], and designing relay points to avoid oscillatory motion around the repulsive field [43]. Furthermore, to enhance adaptability in dynamic environments, Xia et al. incorporated obstacle positions, velocities, the target location, and the robot’s position into the potential field [16]. Although these works improved collision-free path planning, they only consider simple spherical obstacles, limiting their applicability to real-world scenarios.

In human–robot collaborative scenarios, the robot must navigate around obstacles with complex shapes. The method proposed in [44] considers entire obstacle shapes for avoidance in 2D, but its high computational cost makes it unsuitable for dynamic environments. As far as we know, no existing path planning method effectively addresses all four of the following challenges: (P1) avoiding local minima, (P2) suppressing oscillations, (P3) adapting to dynamic environments, and (P4) handling obstacles with arbitrary 3D shapes. Therefore, this work aims to develop a path planning method that overcomes these challenges while maintaining high-speed processing performance.

3. High-Speed Skeleton Tracking

Here, we describe hybrid tracking, dynamic search area adjustment, and solutions for short-term occlusions.

3.1. Hybrid Tracking

As shown in Figure 2a, we adopt a hybrid approach that combines off-the-shelf deep learning-based detection with high-speed tracking using dense optical flow, implemented via the Farnebäck method [45].

As depicted in Figure 2a,b, to continuously and accurately track joints, we track each joint using optical flow at a high frequency and update the search area shape when deep learning-based detection results become available.

When exchanging tracker information between deep learning-based and optical flow-based results, the tracked position from the optical flow is updated if the current position deviates by more than 20 pixels from the latest deep learning-based detection; otherwise, the optical flow position is retained. This approach is particularly effective when the human individual moves quickly, as deep learning-based detection requires longer processing times, leading to positional delays and reduced tracking accuracy.

Furthermore, as depicted in the blue box in Figure 2a, the tracking thread processes optical flow calculations for each joint in parallel to achieve the high-speed tracking of multiple joints. Our system focuses on six key joints—shoulders, elbows, and wrists on both sides—since arm movements are typically the most dynamic and rapid. The positions of the human’s head and feet are calculated as the midpoints between the shoulders and between the feet, respectively.

3.2. Dynamic Search Area Adjustment

In our tracking method, defining appropriate regions of interest (ROIs) for each joint across varying scenarios is inherently challenging. To address this, we employ motion-based optical flow rather than template-based tracking. In optical flow tracking, correctly configuring the search area is critical for accurately capturing the target and extracting its motion. Therefore, the detection thread determines appropriate search areas by considering the spatial relationships between joints.

Our method consists of the following steps:

Calculate the minimum distance for each joint ( $d_{\min}$ ).
Determine the half-diagonal length ( $\tilde{r}$ ), as defined in Equations (1) and (2).
Define the shape of the search area (Equation (4)).
Compute the four corner coordinates of the search area (Equation (5)).

We apply this procedure to the arm skeleton model, with the resulting search areas illustrated in Figure 3.

First, we explain how to determine the search area size in Step 2. Based on the value of

d_{\min}

obtained in Step 1, the half-diagonal length

\tilde{r}

is calculated using Equation (1).

\tilde{r} = η \cdot d_{\min}

(1)

Here,

η

denotes a dynamic scaling coefficient.

To compute

η

, we first define three parameters:

α

,

β

, and

γ

. The parameter

α

represents the minimum allowable value for

\tilde{r}

, while

β

is the upper bound for

d_{\min}

, ensuring

d_{\min} \leq β

. The parameter

γ

defines the maximum allowable value for

\tilde{r}

. The value of

η

is then computed using Equation (2).

η = \{\begin{matrix} \frac{α}{d_{\min}} & if d_{\min} \leq α, \\ 1 - (1 - γ) \frac{d_{\min} - α}{β - α} & if α < d_{\min} \leq β . \end{matrix}

(2)

To ensure that

\tilde{r}

reaches its maximum value when

d_{\min} = β

, the parameters must satisfy the following constraint:

γ \geq \frac{β}{2 β - α}

(3)

In this study, we set

α = 20 \sqrt{2}

pixels,

γ = 60 \sqrt{2}

pixels, and

β = 0.65

.

Next, we explain how to determine the shape of the search area in Step 3. When estimating joint motion, the movement of the link between a joint and its parent joint is essential. Therefore, the angle of the search area’s diagonal is calculated based on the vector to the parent joint, denoted as

\overset{=}{v}

. For example, in the case of the right elbow joint,

\overset{=}{v}

is the vector from the right elbow to the right shoulder.

Using

\overset{=}{v}

and

\tilde{r}

, the diagonal’s angle is determined according to Equation (4).

\begin{matrix} If & \frac{1}{a} \leq |\nabla (\overset{=}{v})| \leq a : \\ {[x_{1}, y_{1}]}^{T} = \tilde{r} \cdot \frac{\overset{=}{v}}{| \overset{=}{v} |} + x_{j} \\ {[x_{2}, y_{2}]}^{T} = - \tilde{r} \cdot \frac{\overset{=}{v}}{| \overset{=}{v} |} + x_{j} \\ Else : \\ {[x_{1}, y_{1}]}^{T} = \tilde{r} \cdot {[cos (π / 4), sin (π / 4)]}^{T} + x_{j} \\ {[x_{2}, y_{2}]}^{T} = - \tilde{r} \cdot {[cos (π / 4), sin (π / 4)]}^{T} + x_{j} \end{matrix}

(4)

Here, a is a threshold that defines the maximum allowable gradient for the diagonal orientation, which is set to

1 / 2

. The term

| \nabla \overset{=}{v} |

denotes the absolute value of the gradient of

\overset{=}{v}

, and

x_{j}

represents the position of the joint.

Finally, we calculate the four corners of the search area—

l e f t

,

r i g h t

,

t o p

, and

b o t t o m

—using Equation (5). An example of the search area surrounding a joint is illustrated in Figure 3.

\begin{matrix} l e f t & = min (x_{1}, x_{2}), r i g h t = max (x_{1}, x_{2}), \\ t o p & = min (y_{1}, y_{2}), b o t t o m = max (y_{1}, y_{2}) \end{matrix}

(5)

3.3. Handling Short-Term Occlusion

To ensure smooth tracking, we employ a Kalman filter with acceleration modeling and eliminate discontinuous data. To filter out such discontinuities, new data are accepted only if the positional difference remains within a dynamically adjusted threshold,

Δ_{MAX}

, as defined in Equation (6). If the new data exceed this threshold, the predicted value from the Kalman filter is adopted instead.

Δ_{MAX} = V_{\max} \cdot Δ t \cdot N_{{NotUpdate}_{KF}}^{2}

(6)

Here,

V_{\max}

represents the admissible maximum velocity, set to 200 pixels/s;

Δ t

denotes the time interval between detections;

N_{{NotUpdate}_{KF}}

indicates the number of consecutive frames in which the Kalman filter is not updated. This formulation is based on the variance scaling rule

V (a X) = a^{2} \cdot V (X)

, where

N_{{NotUpdate}_{KF}}

corresponds to the scaling factor a.

4. Collision-Free Path Planning

The block diagram of the proposed method is shown in Figure 4a. In the proposed method, to achieve both versatility and high responsiveness, path planning is executed using the minimum distance calculation method between two line segments and the potential method using geometric information around the robot.

As shown in Table 1, this chapter presents methods that address four challenges: (P1) avoiding local minima, (P2) suppressing oscillations, (P3) adapting to dynamic situations, and (P4) handling arbitrary 3D-shaped obstacles. Section 4.1 describes the method for calculating the minimum distance between two line segments. Section 4.2 details the design of the potential field, which consists of three regions for suppressing oscillations and effectively avoiding obstacles:

Free zone: $R_{\tan} < d_{\min}$ ;
Precautionary zone: $R_{rep} < d_{\min} \leq R_{\tan}$ ;
Repulsive zone: $d_{\min} \leq R_{rep}$ .

Here,

d_{\min}

denotes the minimum distance between the robot and the obstacle.

R_{\tan}

represents the radius of the precautionary zone, and

R_{rep}

denotes the radius of the repulsive zone, with the condition

R_{\tan} > R_{rep}

. Section 4.2 introduces a method for setting the tangent vector,

v_{\tan}

, to suppress oscillations and adapt to dynamic environments. By designing the direction of the tangent vector based on geometric relationships, the method reduces the number of hyperparameters while improving system transparency and adaptability compared to conventional approaches [16,43]. Section 4.3 presents a global path generation method for handling arbitrary 3D-shaped obstacles by introducing a spherical virtual obstacle to escape local minima. This method enables global path planning with a reduced computational load compared to conventional techniques [44]. Finally, Section 4.4 discusses asymptotic stability and introduces a method for avoiding singularities.

4.1. Minimum Distance Between Human and Robot

In our setup, with a focus on lightweight computation, we calculate the minimum distance between the human skeleton model and the robot link. As shown in Figure 5a, the minimum distance

| \vec{ST} |

between two 3D line segments AB and CD is calculated as follows for each case:

$\vec{m} ‖ \vec{n}$ : Minimum distance between two line segments in 2D in Equation (7)
$(\vec{m} ∦ \vec{n}) \land C_{s} \land C_{t}$ : Minimum distance can be calculated in Equation (7)
$(\vec{m} ∦ \vec{n}) \land \bar{C_{s}} \land \bar{C_{t}}$ : Find the perpendiculars from points A and B to segment CD, and from points C and D to segment AB and then calculate the minimum value within the condition of $C_{s} \land C_{t}$ .

Here,

\vec{m}

and

\vec{n}

are the unit vectors of

\vec{AB}

and

\vec{CD}

, respectively. Two conditions

C_{s}

and

C_{t}

are defined as

C_{s} : 0 \leq s \leq | \vec{AB} |

and

C_{t} : 0 \leq t \leq | \vec{CD} |

, meaning that S and T are within line segments AB and CD, respectively. The derivations for cases 2 and 3 are described in the following paragraphs.

In case 2, Equation (7) is derived by applying conditions

\vec{ST} ⊥ \vec{m}

and

\vec{ST} ⊥ \vec{n}

.

\begin{matrix} if \vec{m} ∦ \vec{n} : \\ s = \frac{\vec{AC} \cdot \vec{m} - (\vec{AC} \cdot \vec{n}) (\vec{m} \cdot \vec{n})}{1 - {(\vec{m} \cdot \vec{n})}^{2}} \\ t = \frac{(\vec{AC} \cdot \vec{m}) (\vec{m} \cdot \vec{n}) - \vec{AC} \cdot \vec{n}}{1 - {(\vec{m} \cdot \vec{n})}^{2}} \\ Else : \\ \vec{ST} = \vec{AC} - (\vec{AC} \cdot \vec{m}) \vec{m} \end{matrix}

(7)

In case 3, when at least S or T lies outside line segments AB and CD, at least one of the edges is A, B, C, or D. The function

f (s, t) = {| \vec{ST} |}^{2} = {| \vec{AC} + t \vec{n} - s \vec{m} |}^{2}

, which represents the distance between two line segments AB and CD, has a unique solution under

\vec{m} ∦ \vec{n}

, and

\nabla^{2} f (s, t) > 0

holds. Therefore,

f (s, t)

is a convex function, allowing a contour plot to be drawn, as shown in Figure 5b. As a result, the problem of finding the minimum distance between the two line segments is equivalent to evaluating the minimum value of

f (s, t)

at each endpoint: A, B, C, and D.

4.2. Potential Filed

As shown in Figure 4b, we design the motion of the robot’s end-effector based on three regions: the free, precautionary, and repulsive zones, without complicating the potential functions compared to previous work [16,43]. Typically, attractive motion can hinder obstacle avoidance performance. Therefore, the proposed method avoids using attractive velocity in the precautionary and repulsive regions. Alternatively, instead of using attractive velocity, we design the tangent motion,

v_{\tan}

, to simultaneously achieve both target-reaching and obstacle avoidance. Thus, the velocity of the end-effector is calculated using Equations (8)–(10).

\begin{matrix} if ‘ Free zone ’ (R_{\tan} < d_{\min}) : \\ v = \min (s_{att}, s_{\max}) \cdot {\hat{v}}_{att} \end{matrix}

(8)

\begin{matrix} if ‘ Precautionary zone ’ (R_{rep} < d_{\min} \leq R_{\tan}) : \\ v = \{\begin{matrix} \min (s_{att}, s_{\max}) \cdot {\hat{v}}_{\tan} & (if {\hat{v}}_{\tan} = {\hat{v}}_{att}) \\ s_{\max} \cdot {\hat{v}}_{\tan} & else \end{matrix} \end{matrix}

(9)

\begin{matrix} if ‘ Repulsive zone ’ (d_{\min} \leq R_{rep}) : \\ v = \{\begin{matrix} \min (s_{att}, s_{\max}) \cdot {\hat{v}}_{\tan} & (if {\hat{v}}_{\tan} = {\hat{v}}_{att}) \\ s_{\max} \cdot \frac{{\hat{v}}_{\tan} + {\hat{v}}_{rep}}{| {\hat{v}}_{\tan} + {\hat{v}}_{rep} |} & else \end{matrix} \end{matrix}

(10)

Here,

{\hat{v}}_{att}

,

{\hat{v}}_{\tan}

, and

{\hat{v}}_{rep}

are unit vectors representing the attractive, tangent, and repulsive motions, respectively. The methods for calculating these vectors, along with

s_{att}

and

s_{\max}

, are provided in the following section.

4.2.1. Attractive Motion

The unit attractive vector,

{\hat{v}}_{att}

, is defined as the vector pointing from the robot’s end-effector to the target. The attractive speed,

s_{att}

, is defined by Equation (11) to ensure that the robot decelerates near the target point and avoids excessive speeds when far from the target.

\begin{matrix} s_{att} = \{\begin{matrix} S_{{att}_{\max}} \cdot (1 - cos (\frac{π}{2} \cdot \frac{d_{rob 2 tar}}{R_{rob 2 tar}})) & if d_{rob 2 tar} \leq R_{rob 2 tar} \\ S_{{att}_{\max}} & else \end{matrix} \end{matrix}

(11)

Here,

d_{rob 2 tar}

is the distance between the robot’s end-effector and the target.

S_{{att}_{\max}}

represents the maximum linear speed, which is set to 1 m/s, and

R_{rob 2 tar}

denotes the radius of the deceleration zone, which is set to 0.1 m.

For orientation control, we use rotation vectors and apply the same function, where the maximum angular speed

Ω_{{att}_{\max}}

is 6.28 rad/s, and the radius of the deceleration zone is

1 / 6

rad. The orientation control of the robot’s end-effector is computed as follows:

Convert the rotation vector $n_{ee}$ , representing the end-effector’s orientation relative to the rotation matrix $R_{ee}$ .
Compute $R_{ee 2 goal}$ from the rotation matrices $R_{ee}$ and $R_{goal}$ of the end-effector and target orientations, respectively.
Convert $R_{ee 2 goal}$ into the rotation vector $n_{ee 2 goal}$ .
Compute the control input for orientation using the same method as that for position control.

4.2.2. Repulsive Motion

The unit repulsive vector,

{\hat{v}}_{rep}

, is defined as the vector pointing from the closest point in the human skeleton to the closest point on the robot. The repulsive speed,

s_{\max}

, is defined by Equation (12) to reduce the robot’s speed when a human is nearby.

\begin{matrix} s_{\max} = \{\begin{matrix} S_{fast} & if R_{\max} < d_{\min} \\ \frac{S_{fast} - S_{middle}}{R_{\max} - R_{\tan}} \cdot (d_{\min} - R_{\max}) + S_{fast} & if R_{\tan} \leq d_{\min} \leq R_{\max} \\ (S_{middle} - S_{slow}) \cdot (1 - cos (\frac{π}{2} \cdot \frac{d_{\min} - R_{rep}}{R_{\tan} - R_{rep}})) + S_{slow} & if R_{rep} \leq d_{\min} \leq R_{\tan} \\ S_{slow} & if R_{danger} < d_{\min} \leq R_{rep} \\ 0 & if d_{\min} \leq R_{danger} \end{matrix} \end{matrix}

(12)

Here,

S_{fast}

is 1 m/s,

S_{middle}

is 0.5 m/s, and

S_{slow}

is 0.25 m/s.

R_{\max}

is set to 1 m, and

R_{danger}

represents the minimum admissible distance between the human and the robot. If

d_{\min}

is smaller than

R_{danger}

, the robot is commanded to stop.

4.2.3. Tangent Motion

To suppress oscillations and enable applications in dynamic environments, the method for determining the tangent vector

v_{\tan}

is described. As illustrated in Figure 6a,b, the tangent vector

v_{\tan}

is determined based on a safety assessment that considers two factors: the positional relationship among the robot, the target, and the obstacle (Figure 6a), and the velocity of the obstacle,

v_{obs}

(Figure 6b). In Figure 6a, the radius of the circle surrounding the obstacle corresponds to the radius of the repulsive zone,

R_{rep}

. In Figure 6b, the positions of the robot and the obstacle are computed using the two points obtained from the minimum distance calculation described in Section 4.1.

The safety assessment is performed as follows. For the positional relationship, the situation is classified into six categories, as shown in Figure 6a, using inner product calculations. Regarding the obstacle’s velocity,

v_{obs}

, two conditions are defined based on the geometric configuration around the robot:

$C_{dynamic}$ : Dynamic condition $| v_{obs} | \geq V_{ref}$ ;
$C_{disturbance}$ : The obstacle crosses the line segment between the robot and the target (the red region shown in Figure 6b).

Based on these two conditions, safety is evaluated as follows:

Danger: $C_{dynamic} \land C_{disturbance}$ ;
Safe: ${\bar{C}}_{dynamic} \lor (C_{dynamic} \land {\bar{C}}_{disturbance})$ .

Finally, according to Table 2, the tangent vector is selected from the following three types:

I (target-based): Points in the direction of the target.
II ( $v_{obs}$ -based): Points in the opposite direction of $v_{obs}$ .
III (target-directed): Points directly toward the target.

4.3. Virtual Obstacle Setting

To avoid the local minima caused by obstacles with complex 3D shapes, global path planning that considers the overall geometry is required. In this study, a virtual spherical obstacle is defined based on the relative positions of the human skeleton’s center of mass, the robot, and the target, as expressed in Equation (13):

\vec{{OB}_{v}} = \{\begin{matrix} {\vec{OB}}_{c} + r_{human} \frac{\vec{B_{c} R}}{| \vec{B_{c} R} |} & if \vec{B_{c} R} \cdot \vec{B_{c} G} < 0 \\ {\vec{OB}}_{c} & if \vec{B_{c} R} \cdot \vec{B_{c} G} \geq 0 \end{matrix}

(13)

Here,

\vec{{OB}_{c}}

denotes the position of the skeleton’s center of mass,

\vec{B_{c} R}

is the direction vector from the center of mass to the robot, and

r_{human}

represents the minimum distance between the center of the human joints and each joint.

4.4. System Asymptotic Stability and Singularity Avoidance

Here, we describe asymptotic stability with feedforward terms and a technique to avoid singular points.

To enhance the performance of tracking dynamic targets, we incorporate the feedforward term

{\dot{x}}_{tar}

, as shown in Figure 4a. The function V is defined as

V = 1 / 2 {Δ x}^{T} K Δ x > 0

, where

Δ x = x_{tar} - x_{ee}

and K is a positive definite matrix. In the proposed block diagram, the error dynamics are given by

Δ \dot{x} = - s_{att} Δ x

, which yields

\dot{V} = - {Δ x}^{T} K s_{att} Δ x < 0

under conditions where there are no disturbances from obstacles and

| J^{+} | \neq 0

. Therefore, based on Lyapunov’s stability theory, the robot’s end-effector orientation is asymptotically stable with respect to the target orientation

x_{tar}

.

To avoid singularities that may arise during robot arm control, the end-effector’s velocity

v

is converted into joint velocity

\dot{q}

using the damped least squares method [47], as given in Equation (14):

\dot{q} = {(J + λ I)}^{- 1} v

(14)

The damping factor

λ

is adaptively set according to Equation (15), increasing near singular configurations where the Jacobian determinant becomes small:

λ = \{\begin{matrix} 0.0 & (c_{1} < c) \\ λ_{\max} cos (\frac{π}{2} \cdot \frac{c - c_{2}}{c_{1} - c_{2}}) & (c_{2} \leq c \leq c_{1}) \\ λ_{\max} & (c < c_{2}) \end{matrix}

(15)

Here,

c = det (J)

, and the predefined parameters are

c_{1} = 0.02

,

c_{2} = 0.018

, and

λ_{\max} = 0.1

.

5. Simulation

The experiment results are available from [48].

The processing was performed on a PC with the following specifications:

CPU: Intel(R) Core(TM) i9-10980XE;
GPU: NVIDIA RTX A5000/NVIDIA RTX 5000 Ada.

In the simulation, we modeled a 6 DoF UR5e robot from Universal Robots and approximated it as a single link connecting the end-effector to the elbow joint. The minimum distance was calculated between the robot’s line segment and the human skeleton, which includes the following segments: left wrist to left elbow, left elbow to left shoulder, left shoulder to right shoulder, right shoulder to right elbow, right elbow to right wrist, and head to foot. We set the admissible minimum distance,

R_{d a n g e r}

, to 0.08 m.

In this section, we present five verifications:

Why is high-frequency feedback control necessary in HRC scenarios?
Effect of the precautionary zone and tangent motion.
Effect of the radius of the precautionary zone $R_{t a n}$ .
Effect of the virtual obstacle.
Effectiveness under cluttered situations.

5.1. High-Frequency Feedback Control and Human Safety

To verify that high-frequency feedback enhances safety when humans and robots are in close proximity, we conducted a simulation in which a human repetitively opened and closed their arms while the robot attempted to reach a static target. The feedback frequencies were set to 10, 20, 30, 50, 100, 200, 333, and 500 Hz, and the human’s arm speed was set to 1.0 m/s.

The results are shown in Figure 7. From Figure 7a, the order of the coordinates along the x-, y-, and z-axes consistently followed the same pattern, with higher feedback frequencies (e.g., 500 Hz) reacting faster than lower ones (e.g., 10 Hz). This indicates that the robot responded more quickly to human motion at higher feedback frequencies. Additionally, as shown in Figure 7b,c, higher feedback frequencies resulted in larger minimum distances between the human and the robot during the simulation. Notably, the minimum distance exceeded the safe threshold of

R_{danger} = 0.08

m when the feedback frequency was above 50 Hz. In more detail, Figure 7d shows that at 10 Hz, the estimated minimum distance (plotted as dashed blue lines) exhibited significant latency compared to that at 500 Hz. This delay led to inaccurate estimations, as seen in Figure 7e.

Thus, we conclude that human safety is enhanced when the robot responds with low latency to human motion by accurately estimating their states.

5.2. Effect of the Precautionary Zone and Tangent Motion

To verify the effectiveness of the proposed potential field settings, particularly the precautionary zone and tangential vector described in Section 4.2 and Section 4.2.3, we conducted a simulation in which a human repetitively opened and closed their arms while the robot attempted to reach a static target, as depicted in Figure 8a. This setup is similar to the previous simulation. The human arm speed was set to 0.5 m/s. We compared two methods: the conventional method (original potential field method without the precautionary zone) and the proposed method.

First, in terms of processing speed, our path planning method operated at over 10,000 Hz, which is extremely fast. The results are shown in Figure 8, where green plots represent the conventional method and blue plots represent the proposed method. From Figure 8a, it can be observed that the proposed method (blue plots) not only reached the target but also prevented the oscillations observed in the conventional method (green plots) in the y- and z-coordinates, particularly during the time interval from

t = 0

to 1500 ms, highlighted in orange and light blue. These oscillations in the conventional method occurred near the boundary between the free zone and the repulsive zone.

As shown in the left panel of Figure 8c, in the conventional method, both attractive speed (

s_{a t t}

) and repulsive speed (

s_{r e p}

) were active simultaneously. This indicates that the robot attempted to approach the target while being repelled by the obstacle, causing fluctuations. In contrast, with the proposed method, tangential motion (

s_{t a n}

) played a dominant role for most of the time. This encouraged the robot to maintain a larger distance from obstacles, thereby reducing the risk of collision. Such behavior was observed in Figure 8, where the proposed method (blue plots) consistently maintained a larger minimum distance and provided more space for movement compared to the conventional method.

In conclusion, the proposed precautionary zone and tangential vector effectively prevent robot oscillations (P2), handle dynamic obstacles (P3), and achieve a higher convergence rate to the target.

5.3. Effect of the Radius of the Precautionary Zone $R_{t a n}$

We verify the sensitivity of our method to the radius of the precautionary zone

R_{t a n}

. As illustrated in Figure 9a, we conducted a simulation in which the robot attempted to reach a static target while avoiding a human stretching their arms and approaching at four speeds: 0.25 m/s, 0.5 m/s, 0.75 m/s, and 1.0 m/s. The radius of the precautionary zone

R_{t a n}

ranged from 0.2 m to 0.5 m in increments of 0.05 m.

We adopted two types of robot speed settings, as defined in Equation (12): speed distributions that vary according to

R_{\tan}

(0.2∼0.5 m), as shown in Figure 9b, and a fixed speed distribution across all conditions based on

R_{\tan} = 0.5

m, depicted in Figure 9c. These two settings were used to analyze the effects of speed distribution and

R_{\tan}

alone.

We hypothesize two effects in terms of the robot’s distance to the human and convergence time to the target:

For a slow-moving human, a larger radius of the precautionary zone $R_{t a n}$ leads to a greater minimum distance to the human.
A larger radius of the precautionary zone $R_{t a n}$ results in quicker convergence to the target.

With a larger

R_{t a n}

, the robot initiates evasive motion earlier, enabling a smoother and shorter detour. Regarding distance to the human, this reduces abrupt evasive actions, resulting in a larger minimum distance. Regarding convergence time, early evasive motion allows for a more efficient path toward the target. However, the effect on distance is conditioned on the human moving slowly. As described in Section 4.2.3, tangent motion allows the robot to approach the target while satisfying safety criteria. Therefore, after the initial evasive maneuver, the robot is likely to choose a target-directed path even in close proximity to the human. This tendency becomes more pronounced with faster human motion, as the human passes by the robot in a short period.

The results are shown in Figure 9b,c. These plots depict the relationship between the radius of the precautionary zone

R_{\tan}

and the average distance, minimum distance, and convergence time. The average distance is calculated by averaging the minimum distance at each time step until the robot enters the convergence range. Convergence time is defined as the time when the positional error falls below 0.02 m.

Under both speed distribution conditions, the green plots (human speed = 0.5 m/s) exhibit a different trend from the others. This deviation was caused by a singularity-avoidance behavior induced by Equation (14). Therefore, we focus on the remaining three scenarios.

According to the left panels in Figure 9b,c, the average distance remains constant or decreases as

R_{\tan}

increases. From the middle panels, for slow human motion (0.25 m/s), the minimum distance increases with a larger

R_{\tan}

, while the opposite trend is observed for faster human motion. These results support the first hypothesis.

The right panels in Figure 9b,c show that convergence time either decreases or remains steady as

R_{\tan}

increases, reaching a minimum around

R_{\tan} = 0.3

m or 0.4 m. These results support the second hypothesis.

Thus, we conclude that a larger precautionary zone radius

R_{t a n}

enables earlier evasive motion and a more efficient approach to the target. Moreover, in this simulation, we confirm that the robot successfully reaches the target across various

R_{\tan}

values and robot speed distributions.

5.4. Effect of the Virtual Obstacle

To verify the effectiveness of the virtual obstacle setting described in Section 4.3, we conducted a simulation in which a human made a circle with their arms in front of the robot while the robot attempted to reach a static target. The robot’s end-effector, the center of the human arms, and the target were aligned along the x-axis. We compared two methods: the conventional method (the original potential field method without the virtual obstacle) and the proposed method.

The results are shown in Figure 10, where green plots represent the conventional method and blue plots represent the proposed method. From Figure 10a, the proposed method (blue plots) successfully reached the target, whereas the conventional method (green plots) did not. As shown in Figure 10b, the conventional method failed to find an appropriate path and remained in front of the center of the arms, a position considered a local minimum. Figure 10c shows the transition of the virtual obstacle’s position, which was calculated based on the human’s joint positions to prevent the robot from becoming stuck in a local minimum. The position changed from

t = 0

to 4000 ms and then remained constant, indicating that the virtual obstacle was no longer active because the system had moved to the side of the target. Figure 10d shows that the proposed method maintains a safe distance,

R_{danger}

.

However, one limitation of our method was observed. The blue plots, particularly within the orange box in Figure 10a, exhibit oscillations from

t = 8000

to

10, 000

ms because the closest point on the robot’s link was not near the end-effector but around the elbow joint. Here, the distance to the human was as small as 0.09 m, as shown in Figure 10d. This required additional time to evade the obstacle, which is left as future work.

In conclusion, the proposed virtual obstacle setting enables the robot to plan paths globally and avoid local minima (P1) when navigating complex obstacles (P4) with minimal processing time. Moreover, we need to implement a whole-joint control algorithm to achieve more finer-grained control.

5.5. Effectiveness Under Cluttered Situation

In this simulation, we verified the evasive performance in a cluttered situation. As illustrated in Figure 11a, we conducted an experiment in which the robot followed a dynamic target moving close to the dynamic full human skeletal model. We additionally added six joints: the left and right hip, knee, and ankle joints. The dynamic target moved diagonally from the right knee to the left shoulder at 0.2 m/s. The human successively brought an opposite wrist and knee closer together to induce the robot’s evasive motions against various human links.

The results are shown in Figure 11, where green plots represent the conventional method and blue plots represent the proposed method. From Figure 11b, it is evident that the proposed method (blue plots) followed the target (dashed black plots) better than the conventional method (green plots). Specifically, the average 3D positional and rotational errors were 0.084 m and 0.011 rad, respectively, while the conventional method exhibited errors of 0.164 m and 0.011 rad. The proposed method thus reduced the positional error by half.

We examined the reason for this difference using Figure 11c,d. First, we note that the region shaded in orange and green in Figure 11c corresponds to the robot choosing the target-directed tangent motion (III) or other motions (I,II) as defined in Table 2. Therefore, this does not always result in oscillatory motion, as seen in the conventional method.

As shown in the left panel of Figure 11c, under the conventional method, both attractive and repulsive speeds were activated throughout the entire period, indicating that the robot remained near the boundary of the repulsive zone. Indeed, the green plots in Figure 11d show that the minimum distance under the conventional method fluctuated around 0.2 m, equivalent to the radius of the repulsive zone. Thus, these oscillations around the boundary between the attractive and repulsive zones led to inefficient target following. In contrast, in the proposed method, the robot utilized a target-motion balancing approach to both reach the target and avoid collisions. The blue plots in Figure 11d confirm that the robot effectively suppressed oscillations even in close proximity to the human. However, it still suffered from oscillations when the robot was too close to the human, as indicated by the orange circles in Figure 11, in order to maintain a safe distance. This problem could be alleviated by implementing whole-joint control to achieve finer-grained control.

In conclusion, the proposed method achieves efficient collision-free motion generation under the dense and full human skeletal model, successfully balancing the approach to the dynamic target and the maintenance of a safe distance.

5.6. Summary of the Simulation

We verified six aspects through simulation experiments as follows:

High-frequency feedback control enables the robot to respond with low latency to human motion, maintaining a larger distance from the human even during unpredictable abrupt movements.
Path planning: Our path planning method runs at over 10,000 Hz, which is extremely fast.
Path planning: The precautionary zone and tangential vector (Section 4.2 and Section 4.2.3) effectively suppress robot oscillations (P2), handle dynamic obstacles (P3), and improve convergence to the target.
Path planning: A larger precautionary zone radius $R_{t a n}$ enables earlier evasive motion and a more efficient approach to the target.
Path planning: The virtual obstacle setting (Section 4.3) allows the robot to plan paths globally and avoid local minima (P1) when navigating complex obstacles (P4) with minimal processing time.
Path planning: Our method achieves efficient collision-free motion generation under the dense and full human skeletal model, successfully balancing the approach to the dynamic target and the maintenance of a safe distance.

With the proposed path planning method, we have addressed four challenges: (P1) avoiding the local minima, (P2) suppressing oscillations, (P3) adapting to dynamic environments, and (P4) handling arbitrarily shaped 3D obstacles.

On the other hand, our system has some limitations. In collision-free path planning, our method considers only the line segment between the end-effector and the elbow joint, which can result in ineffective evasive motion when the elbow joint needs to avoid an obstacle. Moreover, the system does not account for the robot’s kinematics in path planning, potentially leading to kinematically impossible configurations. Therefore, future work will focus on whole-joint control that incorporates the robot’s kinematics.

6. Real-World Experiment

The experimental results are available from [48].

We conducted skeleton tracking and four real-world collision-free path planning experiments involving static and dynamic targets and static and dynamic obstacles. In these experiments, obstacles represent the human skeleton, including eight joints, left and right shoulders, elbows and wrists, and the head and feet. We set the radius of the precautionary zone,

R_{\tan}

, to 0.35 m, and the admissible minimum distance,

R_{danger}

, to 0.1 m. The purpose of each experiment in collision-free path planning is shown in Table 3.

As shown in Figure 1a, we used two XIMEA cameras to capture images at a frame rate of 250 fps. The PC specifications were the same as in the simulation.

Before experiments, we conducted camera and eye-to-hand calibration. We calculated the intrinsic and extrinsic parameters of two cameras using a checkerboard and Zhang’s method. Subsequently, we estimated the transformation matrix from the left camera frame to the robot base frame using eye-to-hand calibration. The average 3D positional error was approximately 15 mm.

6.1. High-Speed Skeleton Tracking

We verified the effectiveness of the proposed high-speed tracking method, dynamic search area adjustment (Section 3.2), and the solution for short-term occlusion defined by Equation (6).

6.1.1. Effect of High-Speed Skeleton Tracking

To evaluate our high-speed tracking method in various situations, we conducted tracking experiments on human arm joints performing dynamic motions in both bright and dark environments and evaluated the tracking accuracy using the root mean squared error (RMSE). We compared three methods as follows:

Conventional method: YOLOv8m-pose detection [49].
Control method that uses a hybrid tracking system and static search area with its size tuned as 60 pixels in advance.
Proposed method.

As a referential data, we used the YOLOv8x-pose model. Furthermore, we applied a Kalman filter to the raw data to smooth tracking data.

According to Table 4, our system achieves high-speed tracking at 250 Hz for the simultaneous tracking of six joints, which is eight times faster than the conventional method. As a result, the proposed tracking system demonstrates a significant improvement in tracking accuracy. Furthermore, since the proposed method achieved a lower RMSE than the control method, we conclude that the dynamic search setting positively contributes to improved tracking accuracy.

We analyze the results in detail. In a bright environment, the proposed method reduces the RMSE across all joints, with a reduction of up to approximately 37% for the left wrist. In a dark environment, shoulder movements were relatively limited, resulting in similar RMSE values across all methods. However, for the elbows and wrists, which exhibited more dynamic motion, the proposed method achieved the lowest RMSE, with up to approximately 25% improvement for the right wrist. These results demonstrate that our system is effective in both bright and dark environments.

During rapid human motion, particularly from 1250 to 1750 ms in Figure 12a and from 1200 to 1800 ms in Figure 12b, the blue plots for the right wrist show lower values than the red plots. This indicates that our system effectively tracks fast movements with high precision. However, the blue plots occasionally exhibit higher RMSE values than the red plots, likely due to occlusions or inaccuracies in speed estimation using the optical flow-based method. Addressing occlusions and developing evaluation metrics to assess the quality of optical flow will be the focus of future work.

Regarding the search area transitions shown in Figure 13, the shapes of the blue boxes were dynamically adjusted to include the target joint based on the geometrical relationship. This confirms that the proposed method effectively adjusted the search area according to the human pose, demonstrating its applicability to a wide range of scenarios. A detailed analysis is presented in the following section.

6.1.2. Effect of Dynamic Search Area Adjustment

We examine the detailed effects of dynamic search area adjustment using the same video in the bright environment, as described in Section 6.1.1. The following methods are evaluated:

Static search area with a fixed size of 60 pixels (pre-tuned);
Dynamic size adjustment;
Dynamic adjustment of both size and shape.

As shown in Table 5, adjusting only the search area size slightly reduces the RMSE. In contrast, adjusting both the size and shape of the search area results in a significant RMSE reduction of up to 20%. Incorporating the link orientation to guide the extraction of parent-link motion enables more precise motion estimation. Therefore, the proposed dynamic search area configuration effectively enhances motion extraction.

6.1.3. Effect of the Solution for Short-Term Occlusion

We evaluate the effectiveness of the proposed solution for short-term occlusion, defined by Equation (6), in tracking joints occluded by the robot’s arm. The tracking results are shown in Figure 14. The orange plots, representing the filtered data, successfully handle short-term jumps around

t = 8100

ms in the left wrist and

t = 11, 500

ms in the right elbow. However, as observed during time intervals

t = 7700

–8200 ms and

t = 9900

–

10, 200

ms, filtering fails to address long-term occlusions. Therefore, while our system is effective for short-term occlusions, further refinement is necessary to handle long-term occlusions.

6.2. Static Target and Static Obstacle

In this experiment, a human stood in front of the robot, making a circle with their arm to simulate a scenario in which the robot must avoid a human holding a large board—a situation that typically involves complex human shapes. As shown in Figure 15a,b, both the blue plots (trajectories without obstacles) and the red plots (with obstacles) converge to the target pose, indicating that the robot successfully reached the target, including the desired end-effector orientation. According to the blue plots in Figure 15b, the robot reached the target in approximately 2.0 s (by 7500 ms) under no obstacles. In contrast, the red plots show that it took about 3.0 s to reach the target in the presence of a human.

We analyze in more detail how the robot avoided collisions, from skeleton tracking to motion generation. The 2D tracking results of human joints, shown in Table 6, reveal that the proposed method achieved a lower RMSE than the slower conventional method, except for the wrists and the right elbow. This discrepancy was caused by the detection of noisy motions within the search area, which led to oscillatory tracking results. Additionally, the RMSE of the right shoulder was higher than that of the other joints because it was largely occluded by the person’s head. In the 3D tracking of the human joints—shoulders, elbows, and wrists—depicted in Figure 15c, the estimated positions exhibit fluctuations, particularly in the x-coordinates, which correspond to the depth in the camera frame. This issue arises not only from slight human motion but also from the triangulation process, which uses the center point of the joint’s region of interest (ROI)—a point that is inherently difficult to define accurately. As future work, we plan to employ block matching [50] within the ROI to estimate the 3D position, thereby improving robustness.

For motion generation, in Figure 15d, the distributions of attractive, repulsive, tangential, and total speed components are plotted. The gray plots, representing the total end-effector speed, vary according to the minimum distance between the robot and the human, demonstrating that the robot adaptively adjusts its speed to ensure safety. Up to

t = 6000

ms, the robot operated within the repulsive zone. Thereafter, the green plots, representing tangential motion, dominate, indicating that the robot employed tangential motion to avoid the human. Furthermore, Figure 15e shows that the black plots, representing the minimum distance, remain above the dashed red line, which indicates the admissible minimum distance

R_{danger}

.

In conclusion, the proposed system enables the robot to successfully reach the static target while maintaining a safe distance from the static, complex-shaped human.

6.3. Static Target and Dynamic Obstacle

In this experiment, a human moved toward the robot with their arms raised to simulate a scenario in which the robot must avoid a human carrying a board. The target pose was the same as in the static target and static obstacle scenario.

As shown in Figure 16a,b, the red plots (trajectory with obstacles) converge to the target pose, indicating that the robot successfully reached the target while avoiding the moving human. From the red plots in Figure 16b, the robot initially followed the same trajectory as it did without the human. However, around

t = 5500

ms, the robot entered the precautionary zone and executed an evasive maneuver, causing the red plots to deviate from the blue plots. It took approximately 2.5 s for the robot to reach the target.

We analyze the results from skeleton tracking to motion generation. The 2D tracking results of human joints, shown in Table 7, indicate that the proposed method achieved a relatively lower RMSE than the slower conventional method, except for the shoulder and wrist. Since the human moved toward the robot at a moderate speed, there was no significant difference in tracking accuracy under this condition. From the 3D tracking results shown in Figure 16c, the x-coordinates of all joints increased, indicating that our system appropriately estimated the position of the human moving toward the camera.

Regarding motion generation, in Figure 16d, the robot operated in the repulsive zone between

t = 6000

and 7500 ms; however, the red plots in Figure 16b do not show any oscillations. Therefore, the robot effectively avoided the human without oscillation, even within the repulsive zone, by considering the human’s movement. Figure 16e shows that the black line, representing the minimum distance, remains above the dashed red line, indicating that the robot maintained a safe distance from the human.

Thus, we conclude that the proposed system enables the robot to reach the static target while maintaining a safe distance, even in the presence of a dynamic human.

6.4. Dynamic Target and Static Obstacle

In this experiment, a human stood in front of the robot while it followed a dynamic target to simulate a scenario in which the robot must perform a task in close proximity to a human. The target moved repetitively along all axes. As shown in Figure 17a,b, the blue plots follow the black plots (target) with slight latency. In contrast, the red plots (trajectory with obstacles) often deviate from the target trajectory to avoid collisions.

We evaluate this scenario from skeleton tracking to motion generation. The 2D tracking results of human joints, shown in Table 8, indicate that the proposed method exhibited a higher RMSE—on average, by one pixel—than the slower conventional method. This was due to the detection of small and noisy motions within the search area, which led to oscillatory tracking results. As a result, as shown in Figure 17c, the 3D positions of each joint exhibited fluctuations, similarly to those observed in the static target and static obstacle scenario.

As for motion generation, from Figure 17d, the robot remained in either the repulsive or precautionary zone throughout the task to avoid collisions. Nevertheless, as shown in Figure 17b, it successfully reached the target while accounting for the human’s state. Thus, the robot effectively reached the target even when operating in close proximity to a human. Figure 17e shows that the black line, representing the minimum distance, remained above the threshold, indicating that the robot maintained a safe distance from the human.

We conclude that the robot is capable of performing dynamic tasks while maintaining a safe distance from a static human.

6.5. Dynamic Target and Dynamic Obstacle

In this experiment, a human moved their arms around the robot while it followed a dynamic target, simulating a scenario in which a human and a robot must perform different tasks simultaneously. The target pose was the same as in the dynamic target and static obstacle scenario. As shown in Figure 18a,b, the red plots (trajectory with obstacles) often deviate from the target to avoid collisions and exhibit oscillatory motions as the robot attempts to avoid the human. This oscillation was caused by three factors: the human’s dynamic motion, the robot’s slower movement in close proximity to the human, and the system’s consideration of the line segments connecting the robot’s end-effector and the elbow joint (which are relatively long). In future work, all joints should be controlled by considering the robot’s full kinematic chain.

We analyze this scenario from skeleton tracking to motion generation. According to Table 9, the proposed method achieved up to 10% lower RMSE for faster-moving joints such as the wrists while showing a higher RMSE for relatively static joints like the shoulders. The 3D tracking of the human joints, as shown in Figure 18c, indicates that the positions of some joints—particularly the wrists—occasionally experienced jumps due to occlusion by the robot. Although short-term occlusion was addressed by the measures described in Section 3, these were insufficient for handling long-term occlusion. This issue will be addressed in future work by adding another camera to the end-effector or implementing additional tracking countermeasures.

With respect to motion generation, Figure 18d shows that the robot remained in the repulsive or precautionary zone for most of the time. The red plots in Figure 18b indicate that the robot successfully reached the target while accounting for the human’s state. Figure 18e shows that the black line, representing the minimum distance, remained above the threshold, confirming that the robot maintained a safe distance from the human.

Thus, we conclude that the robot is capable of performing dynamic tasks while the human simultaneously performs another dynamic task in close proximity to the robot.

6.6. Summary of Collision-Free Path Planning

We summarize the capabilities of collision avoidance in Table 10 and outline the limitations of our system. First, we confirm the following five capabilities:

Tracking: Our high-speed skeleton tracking (Section 3) operates at approximately 250 Hz, eight times faster than the conventional method, demonstrating its effectiveness in tracking fast-moving joints under both bright and dark environments.
Static target and static obstacles: Our system exhibits evasive capabilities and can navigate around complexly shaped obstacles.
Static target and dynamic obstacles: By utilizing the precautionary zone and the tangential vector $v_{t a n}$ , our system prevents oscillatory motion in dynamic environments, effectively extending the robot’s operational space to dynamic scenarios.
Dynamic target and static obstacles: The robot follows a dynamic target while maintaining a safe distance from obstacles, enabling dynamic tasks in proximity to humans.
Dynamic target and dynamic obstacles: The robot successfully follows a dynamic target in a dynamic environment, showcasing its capability to perform simultaneous and distinct dynamic tasks alongside humans.

Through real-world experiments, our system successfully addresses three out of four challenges: (P2) suppressing oscillations, (P3) adapting to dynamic situations, and (P4) handling arbitrarily shaped 3D obstacles. We conclude that our system not only enhances the robot’s capability for dynamic manipulation but also expands its operational space to accommodate dynamic environments.

Second, our system has limitations in tracking and robot control. Therefore, we will propose potential solutions for future work, as outlined below:

Tracking: Implementing an evaluation metric to determine how meaningful the motions extracted by high-speed tracking are; optical flow, in this paper, is responsible for eliminating inappropriate motions.
Tracking: Developing a tracking system that is robust to occlusion by adding a camera to the robot’s end-effector or implementing alternative solutions to mitigate occlusion issues.
Tracking: Revising the triangulation method to use multiple points instead of a single representative point. Block matching [50] within a limited region could be a potential solution if it can achieve sufficient processing speeds.
Robot control: Extending the path planning method to account for the robot’s kinematics, thereby preventing kinematically impossible configurations.
Robot control: Implementing a whole-joint control method to enhance the robot’s evasive capabilities.

7. Conclusions

To make a human–robot collaboration (HRC) system more general, we proposed a comprehensive system that addresses the trade-off between versatility and high responsiveness. The system consists of two main components: tracking and path planning. In tracking, we implemented a high-speed skeleton tracking approach that combines deep learning-based detection with optical flow-based motion extraction. Additionally, dynamic search area adjustment was proposed to include the target joint and extract the desirable motion. For path planning, we developed a high-speed path planning model based on geometry-informed potential field designed to overcome four key challenges: (P1) avoiding local minima, (P2) suppressing oscillations, (P3) ensuring applicability to dynamic environments, and (P4) handling obstacles with arbitrary 3D shapes.

We verified the effectiveness of the proposed system through a series of simulations and real-world collision-free path planning experiments. First, through simulation experiments, we evaluated the effectiveness of the high-frequency feedback control and the proposed robot path planning methods. The high-frequency feedback control enables the robot to respond quickly to environmental changes, ensuring safety in proximal HRC scenarios. For robot path planning, it operates at over 10,000 Hz per iteration, which is extremely fast. The precautionary zone and tangent motion prove effective in preventing oscillations (P2) and handling dynamic obstacles (P3). In particular, a larger precautionary zone radius

R_{t a n}

enables earlier evasive motion and a more efficient approach to the target. The robot also successfully reaches the target across various values of

R_{\tan}

and different speed distributions. Virtual obstacles enable global path planning to avoid the local minima when navigating complex-shaped obstacles (P1 and P4). Moreover, our method achieves the efficient tracking of the dynamic target in close proximity to the dense and complete human skeletal model, successfully balancing the approach to the dynamic target with the maintenance of a safe distance. Thus, our simulation experiments confirms that the proposed path planning method successfully addresses all four challenges.

Second, we conducted skeleton tracking and four real-world collision-free path planning experiments with static and dynamic targets under both static and dynamic human conditions. For high-speed skeleton tracking, we confirms that it operates at 250 Hz, approximately eight times faster than the conventional YOLOv8m-pose method [49]. This leads to the proposed tracking system’s significant improvement in tracking accuracy. Furthermore, dynamically adjusting both the size and shape of the search area enables precise motion estimation by considering other joints’ positions and the parent-link motion. Through four real-world collision-avoidance experiments, we confirm that our system effectively addresses three of the four challenges: (P2) suppressing oscillations, (P3) adapting to dynamic environments, and (P4) handling arbitrary 3D-shaped obstacles.

In contrast, our proposed method still has limitations. In high-speed tracking, the current dynamic search area adjustment is designed for a single person; therefore, we need to extend it to multiple-people-aware methods that are robust to cluttered situations. Moreover, since we adopt optical flow, we should make it more robust to background noise by evaluating the meaningfulness of the motions extracted via high-speed tracking and eliminating inappropriate motions. In triangulation, our system suffers from fluctuations in 3D positions and occlusion. To address this, we plan to implement block matching within a limited region and extend our system to use multi-view cameras.

In collision-free path planning, our method currently considers only the line segment between the end-effector and the elbow joint, which may lead to ineffective evasive motion when the elbow joint needs to avoid an obstacle. Therefore, we plan to extend our control algorithm to include whole-joint methods. Furthermore, we will account for the robot’s kinematics to ensure kinematically feasible configurations.

Nevertheless, as demonstrated in extensive experiments, our system provides a foundational motion generation framework for HRC to achieve both versatility and high responsiveness in dynamic environments. It not only enhances the robot’s ability for dynamic manipulation but also expands its operational range to better accommodate dynamic environments.

Author Contributions

Conceptualization, Y.K. and Y.Y.; methodology, Y.K., K.M., and Y.Y.; software, Y.K., K.M., and Y.Y.; validation, Y.K.; formal analysis, Y.K.; investigation, Y.K.; resources, Y.K.; data curation, Y.K.; writing—original draft preparation, Y.K.; writing—review and editing, Y.K., K.M., and Y.Y.; visualization, Y.K.; supervision, Y.Y.; project administration, Y.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially supported by JSPS KAKENHI Grant Number 24K22303.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Demir, K.A.; Döven, G.; Sezen, B. Industry 5.0 and Human-Robot Co-working. Procedia Comput. Sci. 2019, 158, 688–695. [Google Scholar] [CrossRef]
Li, H.; Ding, X. Adaptive and intelligent robot task planning for home service: A review. Eng. Appl. Artif. Intell. 2023, 117, 105618. [Google Scholar] [CrossRef]
Ajoudani, A.; Zanchettin, A.M.; Ivaldi, S.; Albu-Schäffer, A.; Kosuge, K.; Khatib, O. Progress and prospects of the human–robot collaboration. Auton. Robot. 2018, 42, 957–975. [Google Scholar] [CrossRef]
Yin, T.; Zhang, Z.; Liang, W.; Zeng, Y.; Zhang, Y. Multi-man–robot disassembly line balancing optimization by mixed-integer programming and problem-oriented group evolutionary algorithm. IEEE Trans. Syst. Man Cybern. Syst. 2024, 54, 1363–1375. [Google Scholar] [CrossRef]
Hémono, P.; Nait Chabane, A.; Sahnoun, M. Multi-objective optimization of human–robot collaboration: A case study in aerospace assembly line. Comput. Oper. Res. 2025, 174, 106874. [Google Scholar] [CrossRef]
Wang, T.; Liu, Z.; Wang, L.; Li, M.; Wang, X.V. Data-efficient multimodal human action recognition for proactive human–robot collaborative assembly: A cross-domain few-shot learning approach. Robot. Comput.-Integr. Manuf. 2024, 89, 102785. [Google Scholar] [CrossRef]
Villani, V.; Pini, F.; Leali, F.; Secchi, C. Survey on human–robot collaboration in industrial settings: Safety, intuitive interfaces and applications. Mechatronics 2018, 55, 248–266. [Google Scholar] [CrossRef]
Yamakawa, Y.; Matsui, Y.; Ishikawa, M. Development of a Real-Time Human-Robot Collaborative System Based on 1 kHz Visual Feedback Control and Its Application to a Peg-in-Hole Task. Sensors 2021, 21, 663. [Google Scholar] [CrossRef] [PubMed]
Zhou, P.; Zheng, P.; Qi, J.; Li, C.; Lee, H.-Y.; Duan, A.; Lu, L.; Li, Z.; Hu, L.; Navarro-Alarcon, D. Reactive human–robot collaborative manipulation of deformable linear objects using a new topological latent control model. Robot. Comput.-Integr. Manuf. 2024, 88, 102727. [Google Scholar] [CrossRef]
Wang, Z.; Qureshi, A.H. DeRi-IGP: Learning to manipulate rigid objects using deformable objects via iterative grasp-pull. arXiv 2025, arXiv:2309.04843. [Google Scholar] [CrossRef]
Guo, X.; Fan, C.; Zhou, M.; Liu, S.; Wang, J.; Qin, S.; Tang, Y. Human–robot collaborative disassembly line balancing problem with stochastic operation time and a solution via multi-objective shuffled frog leaping algorithm. IEEE Trans. Autom. Sci. Eng. 2024, 21, 4448–4459. [Google Scholar] [CrossRef]
Sarker, S.; Green, H.N.; Yasar, M.S.; Iqbal, T. CoHRT: A collaboration system for human-robot teamwork. arXiv 2024, arXiv:2410.08504. [Google Scholar]
Rosenberger, P.; Cosgun, A.; Newbury, R.; Kwan, J.; Ortenzi, V.; Corke, P.; Grafinger, M. Object-Independent Human-to-Robot Handovers using Real Time Robotic Vision. arXiv 2020, arXiv:2006.01797. [Google Scholar] [CrossRef]
Ishikawa, M. High-speed vision and its applications toward high-speed intelligent systems. J. Robot. Mechatronics 2022, 34, 912–935. [Google Scholar] [CrossRef]
Zhang, D.; Van, M.; Sopasakis, P.; McLoone, S. An NMPC-ECBF Framework for Dynamic Motion Planning and Execution in Vision-Based Human-Robot Collaboration. arXiv 2024, arXiv:2304.06923. [Google Scholar]
Xia, X.; Li, T.; Sang, S.; Cheng, Y.; Ma, H.; Zhang, Q.; Yang, K. Path Planning for Obstacle Avoidance of Robot Arm Based on Improved Potential Field Method. Sensors 2023, 23, 3754. [Google Scholar] [CrossRef]
Lasota, P.A.; Fong, T.; Shah, J.A. A survey of methods for safe human–robot interaction. Found. Trends^® Robot. 2017, 5, 261–349. [Google Scholar] [CrossRef]
Wang, Z.; Liu, Z.; Ouporov, N.; Song, S. ContactHandover: Contact-Guided Robot-to-Human Object Handover. arXiv 2024, arXiv:2404.01402. [Google Scholar]
Mascaro, E.V.; Sliwowski, D.; Lee, D. HOI4ABOT: Human-Object Interaction Anticipation for Human Intention Reading Collaborative roBOTs. arXiv 2024, arXiv:2309.16524. [Google Scholar]
Kothari, A.; Tohme, T.; Zhang, X.; Youcef-Toumi, K. Enhanced Human-Robot Collaboration using Constrained Probabilistic Human-Motion Prediction. arXiv 2023, arXiv:2310.03314. [Google Scholar]
Wright, R.; Parekh, S.; White, R.; Losey, D.P. Safely and Autonomously Cutting Meat with a Collaborative Robot Arm. arXiv 2024, arXiv:2401.07875. [Google Scholar] [CrossRef] [PubMed]
Zhuang, Z.; Ben-Shabat, Y.; Zhang, J.; Gould, S.; Mahony, R. GoferBot: A Visual Guided Human-Robot Collaborative Assembly System. arXiv 2023, arXiv:2304.08840. [Google Scholar]
Mathur, P. Proactive Human-Robot Interaction using Visuo-Lingual Transformers. arXiv 2023, arXiv:2310.02506. [Google Scholar]
Javdani, S.; Admoni, H.; Pellegrinelli, S.; Srinivasa, S.S.; Bagnell, J.A. Shared autonomy via hindsight optimization for teleoperation and teaming. arXiv 2017, arXiv:1706.00155. [Google Scholar] [CrossRef]
Morrison, D.; Corke, P.; Leitner, J. Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv 2018, arXiv:1804.05172. [Google Scholar]
Sundermeyer, M.; Mousavian, A.; Triebel, R.; Fox, D. Contact-GraspNet: Efficient 6-DoF grasp generation in cluttered scenes. arXiv 2021, arXiv:2103.14127. [Google Scholar]
Ghadirzadeh, A.; Bütepage, J.; Maki, A.; Kragic, D.; Björkman, M. A Sensorimotor Reinforcement Learning Framework for Physical Human-Robot Interaction. arXiv 2016, arXiv:1607.07939. [Google Scholar]
Olivas-Padilla, B.E.; Papanagiotou, D.; Senteri, G.; Manitsaris, S.; Glushkova, A. Computational Ergonomics for Task Delegation in Human-Robot Collaboration: Spatiotemporal Adaptation of the Robot to the Human Through Contactless Gesture Recognition. arXiv 2022, arXiv:2203.11007. [Google Scholar]
Jha, D.K.; Jain, S.; Romeres, D.; Yerazunis, W.; Nikovski, D. Generalizable Human-Robot Collaborative Assembly Using Imitation Learning and Force Control. arXiv 2022, arXiv:2212.01434. [Google Scholar]
Zhang, F.; Cully, A.; Demiris, Y. Personalized Robot-Assisted Dressing Using User Modeling in Latent Spaces. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017), Vancouver, BC, Canada, 24–28 September 2017. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. arXiv 2018, arXiv:1703.06870. [Google Scholar]
Lin, G.; Milan, A.; Shen, C.; Reid, I. RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation. arXiv 2016, arXiv:1611.06612. [Google Scholar]
Zhou, B.; Zhao, H.; Puig, X.; Fidler, S.; Barriuso, A.; Torralba, A. Scene Parsing Through ADE20K Dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5122–5130. [Google Scholar] [CrossRef]
Cao, Z.; Hidalgo, G.; Simon, T.; Wei, S.-E.; Sheikh, Y. OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. arXiv 2019, arXiv:1812.08008. [Google Scholar] [CrossRef]
Maji, D.; Nagori, S.; Mathew, M.; Poddar, D. YOLO-Pose: Enhancing YOLO for Multi-Person Pose Estimation Using Object Keypoint Similarity Loss. arXiv 2022, arXiv:2204.06806. [Google Scholar]
Yamakawa, Y.; Yoshida, K. Teleoperation of High-Speed Robot Hand with High-Speed Finger Position Recognition and High-Accuracy Grasp Type Estimation. Sensors 2022, 22, 3777. [Google Scholar] [CrossRef]
Mun, Y.-J.; Huang, Z.; Chen, H.; Niu, Y.; You, H.; McPherson, D.L.; Driggs-Campbell, K. User-Friendly Safety Monitoring System for Manufacturing Cobots. arXiv 2023, arXiv:2307.01886. [Google Scholar]
Ichnowski, J.; Avigal, Y.; Satish, V.; Goldberg, K. Deep Learning Can Accelerate Grasp-Optimized Motion Planning. Sci. Robot. 2020, 5, eabd7710. [Google Scholar] [CrossRef]
Karaman, S.; Frazzoli, E. Sampling-based Algorithms for Optimal Motion Planning. arXiv 2011, arXiv:1105.1186. [Google Scholar]
Nguyen, Q.; Sreenath, K. Exponential Control Barrier Functions for Enforcing High Relative-Degree Safety-Critical Constraints. In Proceedings of the 2016 American Control Conference (ACC), Boston, MA, USA, 6–8 July 2016; pp. 322–328. [Google Scholar] [CrossRef]
Zhang, H.; Zhu, Y.; Liu, X.; Xu, X. Analysis of Obstacle Avoidance Strategy for Dual-Arm Robot Based on Speed Field with Improved Artificial Potential Field Algorithm. Electronics 2021, 10, 1850. [Google Scholar] [CrossRef]
Chen, Y.; Chen, L.; Ding, J.; Liu, Y. Research on Real-Time Obstacle Avoidance Motion Planning of Industrial Robotic Arm Based on Artificial Potential Field Method in Joint Space. Appl. Sci. 2023, 13, 6973. [Google Scholar] [CrossRef]
Wang, X.; Wu, Q.; Wang, T.; Cui, Y. A Path-Planning Method to Significantly Reduce Local Oscillation of Manipulators Based on Velocity Potential Field. Sensors 2023, 23, 9617. [Google Scholar] [CrossRef]
Jia, Q.; Wang, X. An Improved Potential Field Method for Path Planning. In Proceedings of the 2010 Chinese Control and Decision Conference, Xuzhou, China, 26–28 May 2010; pp. 2265–2270. [Google Scholar]
Farnebäck, G.; Bigun, J.; Gustavsson, T. Two-frame motion estimation based on polynomial expansion. In Image Analysis; Bigun, J., Gustavsson, T., Eds.; Springer: Berlin/Heidelberg, Germany, 2003; pp. 363–370. [Google Scholar]
Kawawaki, Y.; Yamakawa, Y. Real-time Collision Avoidance in Dynamic Environments Using High-speed Skeleton Tracking and the Velocity Potential Method. In Proceedings of the 42nd Annual Conference of the Robotics Society of Japan (RSJ2024), Osaka, Japan, 4–6 September 2024. [Google Scholar]
Deo, A.S.; Walker, I.D. Overview of damped least-squares methods for inverse kinematics of robot manipulators. J. Intell. Robot. Syst. 1995, 14, 43–68. [Google Scholar] [CrossRef]
Demonstration Video. Available online: http://www.hfr.iis.u-tokyo.ac.jp/research/CollisionAvoidance/index-e.html (accessed on 2 March 2025).
Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLO; Version 8.0.0; AGPL-3.0. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 10 January 2025).
Setyawan, R.A.; Sunoko, R.; Choiron, M.A.; Rahardjo, P.M. Implementation of Stereo Vision Semi-Global Block Matching Methods for Distance Measurement. Indones. J. Electr. Eng. Comput. Sci. 2018, 12, 585. [Google Scholar] [CrossRef]

Figure 1. (a) Our motivation of this work. (b) Experimental setup. Ring lights were installed to enhance image brightness. A stereo camera is used in the setup. (c) Flowchart of the proposed system.

Figure 2. Skeleton tracking flowchart. (a) High-speed skeleton tracking. The orange box represents the detection thread, and the blue one represents the tracking thread. (b) Hybrid tracking combining deep learning-based joint detection and optical flow-based skeleton tracking [46].

Figure 3. Arm skeleton model and an example for the search area around a joint.

x_{LS}

and

x_{RS}

represent positions of the left and right shoulders,

x_{LE}

and

x_{RE}

represent positions of the left and right elbows, and

x_{LW}

and

x_{RW}

represent positions of the left and right wrists.

\tilde{r}

represents a half-diagonal length of the search area.

Figure 3. Arm skeleton model and an example for the search area around a joint.

x_{LS}

and

x_{RS}

represent positions of the left and right shoulders,

x_{LE}

and

x_{RE}

represent positions of the left and right elbows, and

x_{LW}

and

x_{RW}

represent positions of the left and right wrists.

\tilde{r}

represents a half-diagonal length of the search area.

Figure 4. Overview of the proposed path planning method. (a) Block diagram of the proposed collision-free path planning method.

x_{tar}

: Target pose.

x_{hum}

: Human joints’ position.

x_{rob}

: Robot joints’ position.

J^{+}

: Pseudo-inverse Jacobian matrix. (b) Potential field division. Blue represents the free region, green indicates the precautionary region, and red denotes the repulsive region.

Figure 4. Overview of the proposed path planning method. (a) Block diagram of the proposed collision-free path planning method.

x_{tar}

: Target pose.

x_{hum}

: Human joints’ position.

x_{rob}

: Robot joints’ position.

J^{+}

: Pseudo-inverse Jacobian matrix. (b) Potential field division. Blue represents the free region, green indicates the precautionary region, and red denotes the repulsive region.

Figure 5. Calculation of the minimum distance [46]. (a) Minimum distance between two line segments, AB and CD. (b) Contour map of the distance for which its minimum value is at (

s^{*}, t^{*}

) under constraint (

0 \leq s \leq s_{\max}, 0 \leq t \leq t_{\max}

).

Figure 5. Calculation of the minimum distance [46]. (a) Minimum distance between two line segments, AB and CD. (b) Contour map of the distance for which its minimum value is at (

s^{*}, t^{*}

) under constraint (

0 \leq s \leq s_{\max}, 0 \leq t \leq t_{\max}

).

Figure 6. Calculation of the tangent vector

v_{\tan}

. The circle represents the robot, the cross denotes the obstacle, and the star indicates the target. (a) State classification from ➀ to ➅ is based on the geometric relationship between the robot and the obstacle within the precautionary and repulsive zones. (b) Safety assessment considering the robot, the obstacle, the obstacle’s velocity

v_{obs}

, and the target: (i) danger—an obstacle is passing between the robot and the target; (ii) safe—the robot can approach the target. (c) Candidates for

v_{\tan}

: I. Targe-based: the robot moves in the direction of the target; II.

v_{obs}

-based: the robot moves in the opposite direction of

v_{obs}

; III. target-directed: the robot directly moves toward the target.

Figure 6. Calculation of the tangent vector

v_{\tan}

. The circle represents the robot, the cross denotes the obstacle, and the star indicates the target. (a) State classification from ➀ to ➅ is based on the geometric relationship between the robot and the obstacle within the precautionary and repulsive zones. (b) Safety assessment considering the robot, the obstacle, the obstacle’s velocity

v_{obs}

, and the target: (i) danger—an obstacle is passing between the robot and the target; (ii) safe—the robot can approach the target. (c) Candidates for

v_{\tan}

: I. Targe-based: the robot moves in the direction of the target; II.

v_{obs}

-based: the robot moves in the opposite direction of

v_{obs}

; III. target-directed: the robot directly moves toward the target.

Figure 7. Results of the robot motion when the feedback frequency was changed to 10, 20, 33, 50, 100, 200, 333, and 500 Hz: (a) robot’s end-effector transition and target pose for each feedback frequency. From left to right: 3D positions (X, Y, Z) and components of the rotation vector (

r_{x}

,

r_{y}

, and

r_{z}

). (b) Transition of the minimum distance between the human and the robot for each feedback frequency. The red dashed line represents the admissible minimum distance,

R_{d a n g e r}

. (c) Relationship between the feedback frequency and the minimum distance during the simulation. The red dashed line represents

R_{d a n g e r}

. (d) Transition of the minimum distance. The dashed black line represents the actual minimum distance, the dashed blue line denotes the estimated minimum distance, and the red dashed line indicates

R_{d a n g e r}

. The upper graph corresponds to 10 Hz, while the bottom graph corresponds to 500 Hz. (e) Situation at t = 4482 ms for 10 and 500 Hz. The black line represents the actual human pose, the dashed blue line represents the estimated human pose, the light blue arrow indicates the current robot’s end-effector pose, and the magenta arrow shows the target pose.

Figure 7. Results of the robot motion when the feedback frequency was changed to 10, 20, 33, 50, 100, 200, 333, and 500 Hz: (a) robot’s end-effector transition and target pose for each feedback frequency. From left to right: 3D positions (X, Y, Z) and components of the rotation vector (

r_{x}

,

r_{y}

, and

r_{z}

). (b) Transition of the minimum distance between the human and the robot for each feedback frequency. The red dashed line represents the admissible minimum distance,

R_{d a n g e r}

. (c) Relationship between the feedback frequency and the minimum distance during the simulation. The red dashed line represents

R_{d a n g e r}

. (d) Transition of the minimum distance. The dashed black line represents the actual minimum distance, the dashed blue line denotes the estimated minimum distance, and the red dashed line indicates

R_{d a n g e r}

. The upper graph corresponds to 10 Hz, while the bottom graph corresponds to 500 Hz. (e) Situation at t = 4482 ms for 10 and 500 Hz. The black line represents the actual human pose, the dashed blue line represents the estimated human pose, the light blue arrow indicates the current robot’s end-effector pose, and the magenta arrow shows the target pose.

Figure 8. Results of robot motion using the conventional method (original potential field method without the precautionary zone) and the proposed method. Green plots represent the conventional method, while blue plots represent the proposed method. (a) Trajectory of the robot’s end-effector for the conventional and proposed methods. (b) Transition of the robot’s end-effector and target pose. The orange and light blue areas are enlarged. (c) Distribution of attractive, repulsive, and tangent speeds. (d) Minimum distance between the human skeleton and the robot. The dashed red line represents

R_{d a n g e r}

.

Figure 8. Results of robot motion using the conventional method (original potential field method without the precautionary zone) and the proposed method. Green plots represent the conventional method, while blue plots represent the proposed method. (a) Trajectory of the robot’s end-effector for the conventional and proposed methods. (b) Transition of the robot’s end-effector and target pose. The orange and light blue areas are enlarged. (c) Distribution of attractive, repulsive, and tangent speeds. (d) Minimum distance between the human skeleton and the robot. The dashed red line represents

R_{d a n g e r}

.

Figure 9. Results of the robot motion with different

R_{t a n}

from 0.2 m to 0.5 m: (a) Temporal transition of the simulation from

t = 2

ms to

t = 602

ms. (b,c) Relationship between the radius of the precautionary zone

R_{\tan}

and the average distance, minimum distance, and convergence time for each human’s approaching speed under different speed settings (Equation (12)). (b) With speed distributions that vary according to

R_{\tan}

. (c) With a fixed speed distribution across all conditions, based on

R_{\tan} = 0.5

m.

Figure 9. Results of the robot motion with different

R_{t a n}

from 0.2 m to 0.5 m: (a) Temporal transition of the simulation from

t = 2

ms to

t = 602

ms. (b,c) Relationship between the radius of the precautionary zone

R_{\tan}

and the average distance, minimum distance, and convergence time for each human’s approaching speed under different speed settings (Equation (12)). (b) With speed distributions that vary according to

R_{\tan}

. (c) With a fixed speed distribution across all conditions, based on

R_{\tan} = 0.5

m.

Figure 10. Results of the robot motion using the conventional method (original potential field method without the virtual obstacle) and the proposed method. Green plots represent the conventional method, while blue plots represent the proposed method: (a) Transition of the robot’s end-effector and the target pose. (b) Trajectory of the robot’s end-effector for both the conventional and proposed methods. (c) Transition of the virtual obstacle’s position. (d) Minimum distance between the human skeleton and the robot. The dashed red line indicates

R_{d a n g e r}

.

Figure 10. Results of the robot motion using the conventional method (original potential field method without the virtual obstacle) and the proposed method. Green plots represent the conventional method, while blue plots represent the proposed method: (a) Transition of the robot’s end-effector and the target pose. (b) Trajectory of the robot’s end-effector for both the conventional and proposed methods. (c) Transition of the virtual obstacle’s position. (d) Minimum distance between the human skeleton and the robot. The dashed red line indicates

R_{d a n g e r}

.

Figure 11. Results of robot motion using the conventional method and the proposed method. (a) Trajectory of the robot’s end-effector for the conventional and proposed methods. (b) Transition of the robot’s end-effector and target pose. The orange and light blue areas are enlarged. (c) Distribution of attractive, repulsive, and tangent speeds. (d) Minimum distance between the human skeleton and the robot. The dashed red line represents

R_{d a n g e r}

.

Figure 11. Results of robot motion using the conventional method and the proposed method. (a) Trajectory of the robot’s end-effector for the conventional and proposed methods. (b) Transition of the robot’s end-effector and target pose. The orange and light blue areas are enlarged. (c) Distribution of attractive, repulsive, and tangent speeds. (d) Minimum distance between the human skeleton and the robot. The dashed red line represents

R_{d a n g e r}

.

Figure 12. RMSE for each joint is evaluated across three methods: proposed (blue), control (green), and conventional (red). The joints are listed from top to bottom as follows: left shoulder (LS), right shoulder (RS), left elbow (LE), right elbow (RE), left wrist (LW), and right wrist (RW). (a) is the result in the bright environment, and (b) is the one in the dark.

Figure 13. Search area around each joint. (a) is in the bright environment, and (b) is in the dark. The blue box indicates the search areas defined by the proposed method, indicated in black, while the red box represents the fixed-size search areas, centered around the detected positions using the conventional method.

Figure 14. Tracking results when some joints—specifically, the left wrist (left figure) and the right elbow (right figure)—are occluded by the robot’s links. Blue plots represent the original measurements, while orange plots show the data filtered by Equation (6). Green circles indicate situations involving invalid jumps or occlusions.

Figure 15. Result in static target and static obstacle: (a) Trajectory comparison. The black lines represent the human skeleton, the light blue arrow denotes the robot’s initial pose, the magenta arrow indicates the target’s pose, blue plots show the trajectory without obstacles, and red plots show the trajectory with obstacles. (b) Transition of the robot’s end-effector. Black plots represent the target pose, blue plots denote the trajectory without obstacles, and red plots indicate the trajectory with obstacles. (c) Three-dimensional positions of both shoulders, elbows, and wrists. Red plots represent the shoulders, green plots denote the elbows, and blue plots indicate the wrists. (d) Distribution of attractive, repulsive, tangent, and total speed. Blue represents attractive speed, orange denotes repulsive speed, green indicates tangent speed, and gray represents total speed. (e) Minimum distance between the human skeleton and the line segment connecting the robot’s end-effector and the elbow joint. The red dashed line represents the minimum admissible distance,

R_{d a n g e r}

. Here, “L” stands for left, “R” stands for right, “S” stands for shoulder, “E” stands for elbow, and “W” stands for wrist. These settings apply to the following four figures in the collision-free path planning experiments.

Figure 15. Result in static target and static obstacle: (a) Trajectory comparison. The black lines represent the human skeleton, the light blue arrow denotes the robot’s initial pose, the magenta arrow indicates the target’s pose, blue plots show the trajectory without obstacles, and red plots show the trajectory with obstacles. (b) Transition of the robot’s end-effector. Black plots represent the target pose, blue plots denote the trajectory without obstacles, and red plots indicate the trajectory with obstacles. (c) Three-dimensional positions of both shoulders, elbows, and wrists. Red plots represent the shoulders, green plots denote the elbows, and blue plots indicate the wrists. (d) Distribution of attractive, repulsive, tangent, and total speed. Blue represents attractive speed, orange denotes repulsive speed, green indicates tangent speed, and gray represents total speed. (e) Minimum distance between the human skeleton and the line segment connecting the robot’s end-effector and the elbow joint. The red dashed line represents the minimum admissible distance,

R_{d a n g e r}

. Here, “L” stands for left, “R” stands for right, “S” stands for shoulder, “E” stands for elbow, and “W” stands for wrist. These settings apply to the following four figures in the collision-free path planning experiments.

Figure 16. Results of the static target and dynamic obstacle. (a) Trajectory comparison. (b) Transition of robot’s end-effector. (c) Three-dimensional position of both shoulders, elbows, and wrists. (d) Distribution of attractive, repulsive, tangent, and total speeds. (e) Minimum distance between the human skeleton and the line segment connecting the robot’s end-effector and the elbow joint.

Figure 17. Results of the dynamic target and static obstacle. (a) Trajectory comparison. (b) Transition of robot’s end-effector. (c) Three-dimensional position of both shoulders, elbows, and wrists. (d) Distribution of attractive, repulsive, tangent, and total speed. (e) Minimum distance between the human skeleton and the line segment connecting the robot’s end-effector and the elbow joint.

Figure 18. Results of the dynamic target and dynamic obstacle. (a) Trajectory comparison. (b) Transition of the robot’s end-effector. (c) Three-dimensional position of both shoulders, elbows, and wrists. (d) Distribution of attractive, repulsive, tangent, and total speed. (e) Minimum distance between the human skeleton and the line segment connecting the robot’s end-effector and the elbow joint.

Table 1. Proposed methods and their purposes.

Method	(P1) Local Minimum	(P2) Oscillation	(P3) Dynamic Situation	(P4) Complex Obstacle	High-Speed
Minimum distance (Section 4.1)				✓	✓
Region division (Section 4.2)		✓	✓
Tangent motion (Section 4.2.3)		✓	✓		✓
Virtual obstacle (Section 4.3)	✓			✓	✓

Table 2. How to select the tangent vector

v_{\tan}

. States ➀ to ➅ correspond to those shown in Figure 6a.

Table 2. How to select the tangent vector

v_{\tan}

. States ➀ to ➅ correspond to those shown in Figure 6a.

State	➀,➃		➁,➄		➂,➅
$v_{obs}$	(i)	(ii)	(i)	(ii)	(i)	(ii)
$v_{t a n}$	II	III	II	I	II	III

Table 3. The conditions and purposes of each experiment. The challenge indices correspond to those presented in this paper: (P1) avoiding the local minima, (P2) suppressing oscillations, (P3) adapting to dynamic environments, and (P4) handling arbitrarily shaped 3D obstacles.

Condition		Challenges				Purpose
Target	Obstacle	(P1)	(P2)	(P3)	(P4)	Purpose
Static	Static	✓			✓	Can the robot reach the static target while avoiding a human holding a complex-shaped object?
Static	Dynamic		✓	✓		Can the robot reach the static target while avoiding a human moving toward it?
Dynamic	Static		✓	✓		Can the robot follow a dynamic task in close proximity to a human?
Dynamic	Dynamic			✓	✓	Can the robot and human perform different dynamic tasks simultaneously?

Table 4. Results of skeleton tracking for arm joints, including both shoulders, elbows, and wrists, are evaluated using RMSE. The bold number indicates the best value for each metric. “LS” and “RS” represent left and right shoulders, “LE” and “RE” represent left and right elbows, and “LW” and “RW” represent left and right wrists.

Situation	Method	Frame Rate (fps)	RMSE (Pixel)
Situation	Method	Frame Rate (fps)	LS	RS	LE	RE	LW	RW
Bright	Conventional method	30	7.92	6.9	13.56	12.21	18.7	14.04
	Control method	250	6.7	6.02	9.69	8.69	14.77	10.27
	Proposed method	250	7.22	5.98	7.97	7.02	11.82	8.23
Dark	Conventional method	30	3.46	2.89	5.13	4.89	9.52	9.0
	Control method	250	3.44	2.82	4.57	4.0	8.01	6.96
	Proposed method	250	3.54	3.32	4.16	3.79	7.47	6.78

Table 5. Ablation study of the proposed skeleton tracking techniques. The bold number indicates the best value for each metric.

Method	Frame Rate (fps)	RMSE (Pixel)
Method	Frame Rate (fps)	LS	RS	LE	RE	LW	RW
Static search area	250	6.7	6.02	9.69	8.69	14.77	10.27
Dynamic size	250	7.73	6.3	9.66	8.32	14.16	10.23
Dynamic size and shape	250	7.22	5.98	7.97	7.02	11.82	8.23

Table 6. Results of skeleton tracking under static targets and static obstacles. Arm joints, including both shoulders, elbows, and wrists, are evaluated using RMSE. The bold number indicates the best value for each metric.

Camera	Method	Frame Rate (fps)	RMSE (Pixel)
Camera	Method	Frame Rate (fps)	LS	RS	LE	RE	LW	RW
Left	Conventional method	30	4.28	5.43	5.02	3.6	4.43	3.15
Left	Proposed method	250	2.66	4.79	4.82	4.46	5.21	3.64
Right	Conventional method	30	6.15	11.93	4.78	6.71	3.63	5.19
Right	Proposed method	250	4.53	9.05	4.66	5.24	3.25	4.73

Table 7. Result of skeleton tracking under static targets and dynamic obstacles. Arm joints, including both shoulders, elbows, and wrists, are evaluated using RMSE. The bold number indicates the best value for each metric.

Camera	Method	Frame Rate (fps)	RMSE (Pixel)
Camera	Method	Frame Rate (fps)	LS	RS	LE	RE	LW	RW
Left	Conventional method	30	3.28	6.06	4.53	3.68	3.4	2.71
Left	Proposed method	250	4.74	5.83	3.16	3.19	3.31	3.43
Right	Conventional method	30	3.48	5.62	4.0	7.21	4.74	4.2
Right	Proposed method	250	4.78	5.25	3.59	7.15	4.87	3.84

Table 8. Result of skeleton tracking under dynamic target and static obstacle. for arm joints, including both shoulders, elbows, and wrists, are evaluated using RMSE. The bold number indicates the best value for each metric.

Camera	Method	Frame Rate (fps)	RMSE (Pixel)
Camera	Method	Frame Rate (fps)	LS	RS	LE	RE	LW	RW
Left	Conventional method	30	4.04	3.52	2.08	3.13	2.23	3.01
Left	Proposed method	250	4.8	4.99	4.04	3.48	2.72	4.38
Right	Conventional method	30	2.9	5.84	2.81	3.86	2.68	4.08
Right	Proposed method	250	3.77	4.8	4.07	4.18	3.02	4.84

Table 9. Result of skeleton tracking under the dynamic target and dynamic obstacle. Arm joints, including both shoulders, elbows, and wrists, are evaluated using RMSE. The bold number indicates the best value for each metric.

Camera	Method	Frame Rate (fps)	RMSE (Pixel)
Camera	Method	Frame Rate (fps)	LS	RS	LE	RE	LW	RW
Left	Conventional method	30	3.53	3.06	5.28	4.79	5.8	5.78
Left	Proposed method	250	3.57	3.36	4.8	4.89	5.24	5.25
Right	Conventional method	30	3.32	3.06	5.05	3.94	5.97	5.21
Right	Proposed method	250	3.81	3.44	4.76	4.24	5.48	5.11

Table 10. Summary of real-world experiments. The challenge indices correspond to those presented in this paper.

Condition		Challenges				Result
Target	Obstacle	(P1)	(P2)	(P3)	(P4)	Result
Static	Static				✓	The robot was able to reach the static target while keeping a safe distance from the static and complex-shaped human.
Static	Dynamic		✓	✓		The robot was able to reach the static target while keeping a safe distance even in the presence of a dynamic human.
Dynamic	Static		✓	✓		The robot was able to follow the dynamic target while maintaining a safe distance from a static human.
Dynamic	Dynamic			✓	✓	The robot was able to follow the dynamic target while the human performed another dynamic task in close proximity to the robot.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kawawaki, Y.; Murakami, K.; Yamakawa, Y. Collision-Free Path Planning in Dynamic Environment Using High-Speed Skeleton Tracking and Geometry-Informed Potential Field Method. Robotics 2025, 14, 65. https://doi.org/10.3390/robotics14050065

AMA Style

Kawawaki Y, Murakami K, Yamakawa Y. Collision-Free Path Planning in Dynamic Environment Using High-Speed Skeleton Tracking and Geometry-Informed Potential Field Method. Robotics. 2025; 14(5):65. https://doi.org/10.3390/robotics14050065

Chicago/Turabian Style

Kawawaki, Yuki, Kenichi Murakami, and Yuji Yamakawa. 2025. "Collision-Free Path Planning in Dynamic Environment Using High-Speed Skeleton Tracking and Geometry-Informed Potential Field Method" Robotics 14, no. 5: 65. https://doi.org/10.3390/robotics14050065

APA Style

Kawawaki, Y., Murakami, K., & Yamakawa, Y. (2025). Collision-Free Path Planning in Dynamic Environment Using High-Speed Skeleton Tracking and Geometry-Informed Potential Field Method. Robotics, 14(5), 65. https://doi.org/10.3390/robotics14050065

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Collision-Free Path Planning in Dynamic Environment Using High-Speed Skeleton Tracking and Geometry-Informed Potential Field Method

Abstract

1. Introduction

2. Related Work

2.1. Human–Robot Collaboration (HRC)

2.2. Human Pose Estimation

2.3. Path Planning

3. High-Speed Skeleton Tracking

3.1. Hybrid Tracking

3.2. Dynamic Search Area Adjustment

3.3. Handling Short-Term Occlusion

4. Collision-Free Path Planning

4.1. Minimum Distance Between Human and Robot

4.2. Potential Filed

4.2.1. Attractive Motion

4.2.2. Repulsive Motion

4.2.3. Tangent Motion

4.3. Virtual Obstacle Setting

4.4. System Asymptotic Stability and Singularity Avoidance

5. Simulation

5.1. High-Frequency Feedback Control and Human Safety

5.2. Effect of the Precautionary Zone and Tangent Motion

5.3. Effect of the Radius of the Precautionary Zone R t a n

5.4. Effect of the Virtual Obstacle

5.5. Effectiveness Under Cluttered Situation

5.6. Summary of the Simulation

6. Real-World Experiment

6.1. High-Speed Skeleton Tracking

6.1.1. Effect of High-Speed Skeleton Tracking

6.1.2. Effect of Dynamic Search Area Adjustment

6.1.3. Effect of the Solution for Short-Term Occlusion

6.2. Static Target and Static Obstacle

6.3. Static Target and Dynamic Obstacle

6.4. Dynamic Target and Static Obstacle

6.5. Dynamic Target and Dynamic Obstacle

6.6. Summary of Collision-Free Path Planning

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

5.3. Effect of the Radius of the Precautionary Zone $R_{t a n}$