Next Article in Journal
Investigating the Propagation Mechanisms and Visualization of Airwaves in Marine CSEM Using the Fictitious Wave Domain Method
Next Article in Special Issue
Evaluating the Impact of Aggregation Operators on Fuzzy Signatures for Robot Path Planning
Previous Article in Journal
Search for Dark Matter Using Levitated Nanoparticles Within a Bessel–Gaussian Beam via Yukawa Coupling
Previous Article in Special Issue
Mapping Manual Laboratory Tasks to Robot Movements in Digital Pathology Workflow
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Analysis of Deep-Learning Methods in an ISO/TS 15066–Compliant Human–Robot Safety Framework

Institute of Robotics, Johannes Kepler University, 4040 Linz, Austria
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(23), 7136; https://doi.org/10.3390/s25237136
Submission received: 5 October 2025 / Revised: 19 November 2025 / Accepted: 20 November 2025 / Published: 22 November 2025

Abstract

Over the last years collaborative robots have gained great success in manufacturing applications where human and robot work together in close proximity. However, current ISO/TS-15066-compliant implementations often limit the efficiency of collaborative tasks due to conservative speed restrictions. For this reason, this paper introduces a deep-learning-based human–robot–safety framework (HRSF) that aims at a dynamical adaptation of robot velocities depending on the separation distance between human and robot while respecting maximum biomechanical force and pressure limits. The applicability of the framework was investigated for four different deep learning approaches that can be used for human body extraction: human body recognition, human body segmentation, human pose estimation, and human body part segmentation. Unlike conventional industrial safety systems, the proposed HRSF differentiates individual human body parts from other objects, enabling optimized robot process execution. Experiments demonstrated a quantitative reduction in cycle time of up to 15% compared to conventional safety technology.

1. Introduction

The steady reduction of process execution time and improved machine flexibility is a primary goal in industrial automation. In this context, the introduction of collaborative robots (cobots) has emerged as a promising new technology that could particularly handle challenges in manufacturing industries [1,2]. Since the design of cobots allows operation in close proximity to humans, it is mandatory to guarantee human safety under all circumstances, i.e., potential collisions between human and robot shall not cause physical harm. For this reason, the main requirements for the integration of collaborative systems in industrial environments have been laid down in the technical specification of the International Organization for Standardization ISO/TS 15066 [3].
In practice, safety assessments typically determine the expected force and pressure levels for given robot poses and speeds before plant installation. To ensure compliance, robot velocities are often limited to the lowest admissible value across the workspace, which can significantly constrain process efficiency, particularly in high-throughput tasks. Consequently, even in situations where a collision is unlikely, the process flow may be unnecessarily delayed.
Existing industrial solutions, such as laser scanners or proximity sensors, provide conservative separation monitoring but lack the capability to differentiate humans from other moving objects or to identify individual body parts—both critical for dynamic safety regulation under ISO/TS 15066. Vision-based methods using RGB-D data have been explored [4,5,6], with depth information enabling moving-object detection [7]. However, traditional approaches generally treat all detected objects uniformly, failing to exploit the potential of selectively optimizing robot motion based on human-specific spatial information. To address these limitations, we propose a conceptual human–robot–safety framework (HRSF) that integrates deep learning with RGB-D input to enable accurate spatial localization of both the human body and individual body parts. The framework dynamically adjusts robot velocity according to the required separation distance for each body part while maximizing process efficiency. The HRSF is a performance and feasibility study for vision-based separation monitoring (SSM) under ISO/TS 15066 constraints. While it demonstrates how body-part-aware perception could influence task execution times and motion efficiency, it is not a certified or deployable safety system, and the experiments are intended to evaluate performance and conceptual adherence rather than formal compliance.This approach explicitly leverages the predictive capabilities of deep learning to distinguish humans from other objects and to account for variable poses, providing an advantage over conventional sensor-based methods. The primary research question guiding this study was as follows: Can the proposed deep-learning-based HRSF improve process execution times under constraints of ISO/TS 15066 compared to state-of-the-art industrial safety technology? To investigate this, we systematically evaluated the accuracy and robustness of multiple deep learning architectures in a collaborative manufacturing scenario.
To clearly articulate the novelty of our approach, we summarize the main contributions of this work:
  • A body-part-aware RGB-D safety framework aligned with ISO/TS 15066. We introduce a human–robot–safety framework (HRSF) that uses deep learning to estimate the 3D locations of individual human body parts and maps them to the body-part-specific safety limits specified in ISO/TS 15066. In contrast to existing whole-body detection approaches, the framework enables differentiated separation distances that reflect the varying biomechanical tolerances of different human regions.
  • A systematic comparison of deep learning architectures for safety-critical spatial perception. We benchmark multiple state-of-the-art RGB-D models for body and body part localization with respect to accuracy, robustness, latency, and failure modes—metrics that have rarely been evaluated together in prior vision-based human–robot collaboration (HRC) safety work. This analysis supports a more realistic assessment of whether body part granularity can meaningfully improve safety-aware robot motion.
  • A dynamic velocity-adaptation scheme based on body part proximity. We implement and evaluate a separation-monitoring controller that adjusts robot velocity according to the nearest detected body part and its corresponding ISO/TS 15066 threshold. This differentiates our work from existing RGB-D safety systems that rely on uniform safety margins and cannot exploit less conservative limits when nonsensitive body regions are closest.
  • Experimental validation in a real collaborative manufacturing scenario. Using a KUKA LBR iiwa 7-DOF robot (KUKA AG, Augsburg, Germany) we demonstrate the feasibility of the proposed framework in a representative screwing task and measured the operational impact of body-part-aware velocity regulation. Although limited in subject number, task diversity, and repetitions, these experiments provide initial evidence for how fine-grained human perception can influence cycle time under real processing latencies.
Following this, we provide a comparative overview of the related approaches, including conventional industrial sensing, RGB-D and skeleton-based human detection, and other learned segmentation methods. Table 1 contrasts each method with our HRSF in terms of sensing modality, alignment with ISO/TS 15066, body part awareness, dynamic velocity adaptation capability, and experimental validation. This comparison highlights where the proposed framework advances the state of the art and clarifies the specific gaps it addresses in vision-based HRC safety.

2. Safety Aspects According to ISO/TS 15066

In the manufacturing industry the use of cobots has allowed man and machine to work in shared work places. Especially for those tasks where human and robot work together in close proximity, the robot speed must be reduced or the robot must be stopped in order to avoid safety-critical collision scenarios. Typically, the robot speed is adjusted to a constant value that is in accordance with the most restrictive collision situation that is possible at a particular work place, which ultimately increases the robot cycle time for process execution. Indeed, when all body parts of humans within the collaborative workspace are located at a predefined safety distance from the machine, the robot speed might be increased in order to reduce the robot cycle time. To this end, the proposed HRSF applies different deep learning approaches for the spatial localization of humans in the shared workspace, and accordingly, the robot speed is adapted to the maximal value allowed. In order to apply the HRSF in manufacturing environments, it must comply with the core requirements from ISO/TS 15066.
In general, the ISO/TS 15066 regulation distinguishes four different types of collaborative operation:
  • Safety-rated monitored stop.
  • Hand guiding.
  • Speed and separation monitoring (SSM).
  • Power and force limiting (PFL).
Most research activities have mainly focused on either one of these four collaboration modes. For speed and separation monitoring (SSM), a common field of research aims to avoid collisions with real-time path planning methods [19,20], 3D dynamic safety zones [21], approaches for the anticipation of human motion with gradient optimization techniques [22], stochastic trajectory optimization [23], optimization-based planning in dynamic environments [24], or with learning-based methods [25], whereas for power and force limiting (PFL), the effects of collisions are of particular interest [26,27,28,29,30].
In contrast to the above cited research, the proposed HRSF combines the normative guidelines of collaboration operation modes (c) and (d). At first, the regulations for SSM are applied when operators are identified in the shared workspace. Accordingly, whenever a critical distance is reached, the robot speed is reduced to a value that corresponds to the requirements for an operation in PFL mode. Consequently, the framework does not exclude collisions per se but reduces the robot velocity according to the necessary separation distance between human and robot to a maximum level that does not violate the prescribed biomechanical force and pressure thresholds. This allows for the operation of the robot in the most efficient way, i.e., with the maximum velocity allowed that will not endanger humans. The proposed framework can be used for any type of industrial robot system. However, since industrial heavy payload robots are mainly equipped without power- and force-limiting sensors, close proximity of human and machine in shared workspaces can only be exploited if a framework combining SSM and PFL and is used. To this end, this study solely focused on the framework usage with collaborative robots.

3. Localization of Humans and Human Body Parts in the Workspace

3.1. Relevant Deep Learning Approaches

To safely adapt robot velocities, humans entering the shared workspace must be detected reliably under varying illumination, viewpoints, and body postures. Modern deep learning methods clearly outperform classical computer-vision techniques for this task [31]. Depending on the particular recognition task, different deep learning approaches can be applied. In the following, two classes of algorithms are distinguished, which are either human body related or human body part related.

3.1.1. Human Body Recognition

Current state-of-the-art image-based human detectors rely on convolutional neural networks (CNNs). These methods typically output a bounding box—defined by its center ( x C , y C ), width w, and height h—along with a classification score. Two main architectural families dominate current research:
  • Region-proposal networks such as R-CNN and Faster R-CNN [32,33,34,35].
  • Regression-based detectors such as MultiBox, SSD, or YOLOv4 [36,37,38], which jointly perform localization and classification.

3.1.2. Human Body Segmentation

In applications where pixel-level classification is needed, segmentation approaches provide richer spatial information than do bounding boxes. CNN-based models represent the current benchmark for segmentation tasks [39,40,41]. Mask R-CNN [42] remains one of the most accurate frameworks, using an ROI-Align module to reduce feature misalignment and an additional mask-prediction branch. Although faster alternatives such as YOLACT++ [43] exist, Mask R-CNN generally achieves higher segmentation accuracy.

3.1.3. Human Pose Estimation

Human pose estimation aims to determine the positions of anatomical key points. Recent CNN-based methods outperform earlier model-based and decision-tree approaches [44]. Early regression-based methods [45,46,47,48] predicted joint coordinates directly from images. Later work demonstrated that predicting confidence (belief) maps for each joint significantly improves accuracy [49,50], as these maps capture spatial dependencies more effectively.

3.1.4. Human Body Part Segmentation

Human body part segmentation extends semantic segmentation by distinguishing between individual body regions. Due to limited labeled datasets, modern approaches such as [51] augment training data through synthetic generation and pose-guided refinement. These methods leverage key-point alignments, morphing, and dedicated refinement networks to improve per-part accuracy.

3.2. Selected Deep Learning Approaches

Depending on the safety function, it may be necessary to identify either the closest human body point or the closest specific body part. Therefore, the framework evaluates both body-level and part-level deep learning models. Human detection, segmentation, and pose estimation methods are trained on MS COCO [52], while body part segmentation models use the PASCAL-Part dataset [53]. All selected architectures used within the framework are summarized in Table 2.

3.3. Extraction of Depth Information

Within the HRSF the extraction of spatial human body information is carried out with RGB-D data, captured from an Intel Realsense D435i camera (Intel Corporation, Santa Clara, CA, USA). While the surveyed deep learning methods are applied to RGB input images, the gathered depth information is aligned with the color image data in order to bring the human body information into a spatial context. According to the applied type of algorithm, the extraction of depth information distinguishes for two different classes:
A
Determination of the minimal separation distance of the closest body point to a hazardous area.
B
Determination of the separation distance for individual body parts.
Since the extracted spatial body information is generally related to the camera coordinate frame F C a m , extrinsic parameters are applied in order to determine minimal separation distances with regard to the robot world coordinate frame F w .

3.3.1. Minimal Separation Distance for a Single Body Point

For human body recognition and human body segmentation, the desired minimal separation distance is obtained by determining the body point within the bounding box or the human surrounding contour closest to the robot world coordinate origin (Figure 1).
Due to stereo mismatching and aliasing artifacts, a determination of depth information might lead to the occurrence of small depth values close to zero which do not conform with reality. Thus, a lower threshold level d t h r e s is introduced, and all depth data with d m i n < d t h r e s are neglected.

3.3.2. Separation Distance for Individual Body Parts

The separation distances determined by the recognition of individual human body parts are mainly influenced by the accuracy and robustness of the spatial body part predictions.
For human pose estimation, the body joint predictions correspond to specific coordinates in the image plane. Thus, it would be possible to assign a single depth value to each body joint coordinate. Due to stereo mismatching and occlusion, a determination of depth information on the pixel level might lead to erroneous depth values, which do not conform with reality. Therefore, again all depth data points with d m i n < d t h r e s are rejected, and instead, the mean depth value d m e a n within a prescribed region of interest A R O I (e.g., small windows: 10 × 10 pixels) is determined (for comparisons, see Figure 2a).
In contrast to that in human pose estimation, the depth information in human body part segmentation can be gathered for all image points that are assigned to a specific body part; i.e., for each image point which is associated to a specific body part, the corresponding depth value is extracted. From all of these body-part-related depth values, the minimal spatial separation distance is now determined for each body part individually. Thereby, a more robust spatial estimation of the corresponding body parts can be achieved compared to human pose estimation. An illustration of the depth extraction for human body part segmentation is given in Figure 2b.

4. Determination of ISO-Relevant Safety Parameters for Specific Robotic Systems

The safety-relevant parameters for SSM and PFL are specific to a particular implementation of the proposed HRSF. For SSM the corresponding parameter is the minimum separation distance S p at which the robot velocity can still be reduced to the corresponding safety parameters for PFL, which are the body-part-specific maximum robot velocities z ˙ m a x that comply with the biomechanical force and pressure thresholds of ISO/TS 15066. In addition to ISO/TS 15066, the determination and validation of these parameters must also consider the functional safety requirements defined in ISO 13849-1/-2 [55] and IEC 61508 [56], which prescribe performance levels (PLs) or safety integrity levels (SILs) for safety-related control functions. These standards provide guidance on evaluating hardware reliability, diagnostic coverage, and common-cause failures, all of which directly influence the acceptable reaction times and safety margins for SSM and PFL implementations. Since the parameters highly depend on the hardware (e.g., robot) and software (e.g., algorithm) components used, the following section describes generic methods for their determination. For the sake of comparison, the proposed approaches are applied to the different algorithms, as discussed in Section 3.

4.1. Minimum Separation Distance S p

The guidelines for SSM in ISO/TS 15066 define the minimum separation distance required between human and robot S p before adapting the robot velocity as
S p = S h + S r + S s + C + Z d + Z r
with S h and S r being the distance contributions attributable to the reaction time for sensing the current human and robot location, C describing the intrusion distance in the perceptive field, and Z d and Z r respectively being distance contributions corresponding to the uncertainties in human and robot position sensing. The distance contribution S s corresponding to the robot system’s stopping is neglected within the framework since it is the main aim of the framework to avoid robot-stopping behavior.
From a functional safety perspective, both ISO 13849-1 and IEC 61508 emphasize that the determination of reaction times and corresponding distance contributions must incorporate validated safety-related software execution times, sensor update rates, and fault-tolerant processing intervals. In practice, this means that the measured values for S h and S r and the uncertainty terms Z d and Z r must include worst-case execution times and consider hardware fault behavior to achieve the required PL or SIL. These standards also mandate verification and validation procedures ensuring that risk-reduction measures—such as velocity reduction and minimum distance enforcement—are implemented with adequate reliability.
In the following, the individual contributions to S p are analyzed in more detail by means of experimental tests.

4.1.1. Distance Due to Human Motion S h

The contribution in the separation distance that corresponds to human motion is given in ISO/TS 15066 as
S h = t 0 t 0 + t r v h · d t .
For all cases where the human speed cannot be monitored with specific sensor systems, a constant velocity v h corresponding to 1.6 m/s for separation distances above 0.5 m and 2.0 m/s for distances below 0.5 m is assumed, respectively. Within the framework, no additional sensors are attached to the human body, which means that S h can be characterized by the reaction time t r of the robot, i.e., the latency. In the context of the framework, the latency is defined as the required time for the sensing system to perceive the human body up to the moment when the robot has decreased its velocity to a safety-conforming level.
In addition to the definition provided in ISO/TS 15066, both ISO 13849-1 and IEC 61508 have direct implications for determining S h . ISO 13849 requires the reaction time of the safety-related control system to be evaluated with respect to its achieved performance level (PL), meaning that the latency used in the separation-distance calculation must incorporate the worst-case response time of all components contributing to the safety function. Similarly, IEC 61508 stipulates that the execution times of safety-related software, diagnostic functions, and signal-processing chains must be validated under worst-case conditions when determining the safety integrity level (SIL). Therefore, the experimentally measured latency t L a t M a x must include conservative margins covering hardware fault behavior, communication jitter between computation modules, and the maximum expected cycle time of the sensing and control tasks. These requirements ensure that the resulting S h complies not only with biomechanical safety constraints but also with the probabilistic and systematic reliability constraints of functional safety standards. Since the HRSF and the robot path planning are running on individual computation modules and in order not to introduce further latencies, a simple visual method is used to synchronize them.
The latency measurements are triggered by the flashing of a light bulb that is initiated by the robot controller. Within the human sensing node, at first the illumination of the light bulb is checked before human body information is extracted from the RGB-D input data. Each time a light bulb flash is registered, the robot velocity is reduced to the minimum level allowed. The maximum latency levels are observed when the light bulb flash is initiated directly after an RGB-image has been captured; i.e., in this case most processing steps are carried out twice. After the robot has reached the desired velocity limit, the latency measurement is stopped. For each of the algorithms analyzed, the latency measurements are carried out for 300 s in order to determine a relative estimate of the maximum latency levels t L a t M a x occurring. Ultimately, the latency is determined by the following factors: the image-capturing time t C a p , the inference time t A l g of the algorithm, the spatial information extraction time t 3 D . and the robot velocity adjustment time t A d j . An overview of the obtained algorithm-specific latencies is given in Figure 3.
The algorithm-specific maximum separation distances S h that can be derived from t L a t M a x are given in Table 3. A more rigorous description of the applied method as well as of the individual latency contributions are given in [57]. The latencies refer to the use of an NVIDIA Titan RTX GPU (NVIDIA Corporation, Santa Clara, CA, USA) for human body information extraction.
Apart from the human sensing latency, the current robot configuration also influences the minimal separation distance. The proposed framework determines S h with regard to the robot world coordinate frame F w . Indeed, any point on the robot surface might collide with the human body. Consequently, the minimum separation distance needs to be determined for all parts of the robot. In order to minimize the computational expense and determine the separation distance between any point of the human body and the robot surface, a cuboid protective hull P q R is used to describe the robot configuration q at a current time step. In order to determine the robot protective hull, the robot-specific Denavit–Hartenberg parameters can be used to extract the individual robot link positions from robot forward kinematics. The minimum and maximum robot deflection in each spatial position can be used to describe the protective hull with P q R = ( x m i n R , x m a x R , y m i n R , y m a x R , z m i n R , z m a x R ). Accordingly, the minimum separation distance is adapted as the closest distance between a body part and the protective hull. For a comparison, please see Figure 4.

4.1.2. Position Prediction Uncertainty Z d

In addition to algorithm-specific latency, the spatial position prediction uncertainty Z d constitutes a major factor influencing the required minimal separation distance. To quantify the individual error characteristics of the evaluated deep learning approaches, the spatial predictions produced by each algorithm were compared with ground-truth measurements obtained from a marker-based motion capture system. The deviation of the algorithm predictions were compared with 25 different markers that were attached to a human subject. Prior to data collection, all markers were placed according to the initial estimates provided by the respective models. Aside from a shared head marker, nine distinct marker positions were defined for the human pose estimation and human body part segmentation methods. For the human-body-centric approaches, these markers were also considered, supplemented by four additional torso markers and two markers placed near the elbows to improve the extraction of torso and upper-limb poses. The complete set of marker positions and their assignment to the respective methods are illustrated in Figure 5. Measurements were conducted for multiple static postures and a range of dynamic movements. An overview of all tested motion scenarios is provided in Figure 6. Each measurement was carried out for 3 min, ultimately corresponding to 1000 data samples acquired per measurement and body part.
The performance of the deep learning models was evaluated in terms of accuracy and robustness. Accuracy was assessed independently for each spatial coordinate and expressed as the mean offset and standard deviation relative to the marker-based ground truth. For body-part–centric approaches, each predicted body part position was compared to its corresponding marker, whereas for human-body–centric methods, predictions were compared to the marker closest to the camera. Robustness was measured by the proportion of failed detections, defined as instances in which no prediction was produced (e.g., due to occlusion) or in which the estimated depth value fell outside predefined limits (i.e., depth values smaller than 0.5 m and respectively larger than 8 m). These robustness metrics served as an empirical basis for estimating perception-system reliability parameters relevant to safety-function validation under ISO 13849-1 (e.g., DC and MTTFd considerations) and IEC 61508 (e.g., diagnostic coverage and dangerous failure rates).
A detailed summary of all experimental results is provided in the Appendix A (compare Table A1, Table A2, Table A3, Table A4 and Table A5). Upper error bounds for each spatial direction were computed by summing the mean offsets and standard deviations over all measurement conditions, yielding a general upper error estimate for each method. For body-part–centric approaches, per-body-part error bounds could also be extracted if required. The results show that the error range for human pose estimation is in general higher compared to human body part segmentation. Indeed, human pose estimation also struggles in detecting individual body parts when they are partially occluded. Within several of the analyzed scenarios, comparatively high mispredictions were obtained for arm positions. This behavior was observed for both the standing and sitting positions as well as for dynamic movements. Especially for sitting poses at high distances to the camera, the detection of stretched arms becomes challenging for both body-part-related approaches. An extensive overview of all spatial body part deviations for standing poses is given in Table A3, those for sitting poses in Table A4, and those for dynamic movements in Table A5.
The resulting estimates are presented in Table 4 and clearly demonstrate that the body-part-centric methods outperform the human-body-centric approaches in terms of spatial prediction accuracy. From the results obtained for human-body-related algorithms, one can conclude that at distances below 4 m, the depth resolution Δ z achieves detection accuracies < 10 cm. While similar accuracy levels were observed for Δ x , much stronger fluctuations were obtained for Δ y . Independent of the method and scenario investigated, maximal error deviations > 30 cm were observed for Δ y . This can be mainly explained by the fact that the predicted body points were compared to a limited amount of ground-truth marker data. Consequently, there were different body points that might have shared similar separation distances as the closest marker to the camera but did not necessarily have to coincide with the particular marker position.
In contrast to the analyzed human-body-related approaches, the results obtained for human-body-part-related techniques indicate a more precise prediction behavior. Even though at separation distances below 4 m, the obtained depth tolerance level of 15 cm was slightly worse, compared to the deviations observed for human body related methods, the lateral error range was significantly better by reaching 7 cm for most body parts analyzed. At higher distances discrepancies mainly appear for body extremities, e.g., the right upper arm. Particularly, at distances of 6 m, the depth estimation error exceeded levels of 0.5 m, which in turn suggests that an accurate depth resolution is no longer possible for particular body parts. In contrast, the lateral deviations increased only slightly with the distance to the camera, which indicates that the most limiting factor for an accurate spatial body part prediction is the depth sensor resolution. Among the different static, sitting, and moving measurement scenarios, higher discrepancy levels were observed with increasing distances to the camera. Particularly, at distances of >5 m error, levels above 0.5 m would have to be considered. Overall, the results obtained for human body segmentation show a less fluctuating behavior compared to human body recognition.
On the basis of the discrepancy results obtained, the position prediction error Z d should be estimated individually for each method analyzed. Generally, the depth resolution is a limiting factor at separation distances > 6 m and would thus highly increase the error bound estimation. Due to latency separation distance contributions < 1.4 m and limited robot ranges of approximately 1 m, it appears plausible to restrict the error estimation only to those measurement results that are below a distance of 6 m from the camera origin. By applying this assumption, it is, however, necessary to consider that the camera should be located within a tolerable distance to the world coordinate origin in order to guarantee that the separation distance is determined in accordance with the estimated error bounds; i.e., scenarios with minimal human-robot separation distances below 1.5 m must be avoided when the human–camera distance exceeds 6 m.
For human-body-related approaches, it would be generally possible to use error estimations for each body part individually, but for the proposed framework, a general mean error is determined from the individual upper body part error estimations. An overview of all position prediction error estimations is given in Table 4 and emphasizes that both human body-part-related approaches outperform human-body-related methods in terms of their prediction accuracy. Furthermore, it can be concluded that both segmentation techniques outperform the concurring human-body- and human-body-part-related methods analyzed. An extensive overview of the individual spatial deviations at varying distances to the camera is given in Figure A1, Figure A2 and Figure A3. Overall, the results indicate that human body part segmentation provides the highest accuracy and robustness for spatial body part localization using RGB-D data, achieving depth tolerances below 15 cm and lateral errors below 5 cm for all body parts. Moreover, the findings confirm that the Intel RealSense depth camera is capable of reliably resolving human body part depths at distances up to 4 m. Such validated error bounds and detection-failure characteristics constitute essential input parameters for the design and verification of perception-dependent safety functions in accordance with ISO 13849 and IEC 61508. At certain poses the investigated recognition approach is not able to identify human bodies at all.

4.1.3. Intrusion Distance C

The intrusion distance C is defined as the distance a body part can permeate the sensing field before being detected. In most cases, the applied industrial safety technologies consist of laser light beam systems. The intrusion distance contribution for these safety system is described in the normative guideline ISO 13855 [58] as
C = 8 ( d 14 )
with d denoting the sensor detection capacity in mm. Since the intrusion distance is defined in ISO 13855 solely for two-dimensional protective field sensors, the contribution of a 3D sensing system can be derived from the depth-dependent sensor resolution. Thus, at distances of 4 m to the applied RGB-D camera, the per-pixel resolution corresponds to 8.5 mm and 6.5 mm in the x- and y-directions, respectively. The ISO regulation proposes to neglect contributions < 14 mm.
In depth direction, the normative guidelines of ISO 13855 do no long hold since the RGB-D sensor does not monitor the intrusion to the workspace at a fixed sensing height but rather over a continuous range. Consequently, the intrusion distance contribution is mainly characterized by the depth uncertainty that is already included in the position prediction uncertainty Z d . Thus, due to the applied RGB-D technology, the contribution is not further considered within the determination of the minimum separation distance.

4.1.4. Robot Latency Contribution S r

The robot latency contribution S r to the separation distance arises from the robot joint angle querying process. Due to the processing time required from querying the current robot configuration up to the point in time when the minimum separation distance is determined, the robot can still move on its path with maximum speed. A mean latency of 3 ms can be determined experimentally within the HRSF, which corresponds to an upper error bound of approximately 5 mm at a maximum robot velocity of 1.6 m/s.

4.1.5. Robot Positioning Uncertainty Z r

The robot positioning uncertainty Z r can be derived from potential geometric deviations of robot lengths that are used for the determination of the robot protective hull. For the estimation of this contribution, the robot forward kinematics prediction at the tool center point was compared with the manufacturer’s robot model position. In total, mean errors of 8 mm, 7 mm, and 11 mm were determined in the x-, y-, and z-directions. Compared to the obtained positioning error, the robot repetitive accuracy of 0.1 mm for the KUKA iiwa is not significant for further consideration.

4.2. Maximum Robot Velocities z ˙ M a x

The maximal speed of the robot (more precisely of the tool or the part that is actually colliding) is linked with the body-part-dependent biomechanical force and pressure thresholds given in ISO/TS 15066. A generalized description of the relationship of robot velocities and resulting forces and pressures during collisions is not trivial since it depends, for example, on the tools mounted on the robot and on the actual robot pose in relation to the environment; for a comparison, see, e.g., [59]. Moreover, it must be distinguished between transient and quasi-static contact situations. At the moment, the HRSF deals with quasi-static situations. Therefore, in the experiments, the maximum velocity was derived from the more conservative quasi-static biomechanical force and pressure limits.
In this context, two extreme cases are analyzed. At first, the maximum force and pressure levels are determined for collisions where the robot motion is directed towards the operator. In the other case, vertical robot movements are considered that can potentially lead to squeezing of human body parts.
The occurring forces and pressures during collisions were determined using a GTE CBSF-75 Basic force measuring device (GTE Industrieelektronik GmBH, Viersen, Germany). Within the measurement setup, the safety controller was adjusted such that the robot motion stops when the external torque levels exceeded 20 Nm in one of the seven robot joints. Force and pressure measurements were carried out for five different velocities and repeated ten times. An overview of the measured force and pressure levels in dependence of the Cartesian robot velocity is given in Figure 7. The obtained results show that for collision situations where the robot approached toward the operator, the adjusted velocity highly depended on the released force levels, while for vertical movements, the pressure exerted also played a significant role. From the results one can derive the body-part-dependent maximum Cartesian robot speed allowed for this specific task, as listed in Table 5.

5. The Human–Robot–Safety Framework

The architecture of the HRSF consists of several building blocks that can be freely exchanged depending on the particular hardware used. An overview of the framework architecture is given in Figure 8.
The robot path planning is executed separately from the human recognition algorithms on an individual building block, denoted in Figure 8 as (1). The maximal robot speed and the maximally allowed joint torques (which are used to limit the interaction forces) are also set in this module. For the employed KUKA iiwa robot, adjustment of the pre-planned path was pursued on the robot controller using specific JAVA libraries.
The current robot position is obtained by the robot location building block (2) via a real-time interface that queries current joint angles q from the robot controller. The KUKA iiwa system offers a real-time package—Fast Robot Interface (FRI)–which allows query of robot positions within time intervals of less than 5 ms. A kinematic robot model is used to compute a cuboid-shaped robot protective hull P q R (see Figure 4).
The human body (part) extraction is executed as a separate building block (3), running on a computation node with sufficient processing power, i.e., multiple GPU cores. Within this building block the algorithm-specific body point(s) closest to the robot world coordinate frame F w are determined from the gathered RGB-D information.
After the current robot pose and the most critical human body points are identified, both spatial information are sent via TCP/IP protocol to building block (4), which evaluates the current separation distance between human and robot. Here, the separation distance is determined as the minimal distance of each of the determined body points to all faces of the protective hull. The obtained separation distances are compared with the minimum level allowed depending on the applied algorithm; i.e., the algorithm-specific safety contributions are used in this building block in order to determine if the robot velocity needs to be adjusted. If a particular body part violates the allowed minimum separation distance, the robot velocity is reduced. Since the velocity of the particular KUKA iiwa robot can only be adapted dynamically with “safe” input signals, the information of the maximum robot velocity is first sent to a safety PLC via OPC UA protocol. Accordingly, the obtained signals are transferred to “safe” output signals that are sent to the robot controller. Depending on the obtained information, the robot velocity is adapted.

6. Experimental Validation

6.1. Test Scenario

The HRSF was tested in a real-world manufacturing scenario that included screwing procedures. Within the experiments carried out, the robot tightened three different screws on an engine block while the worker carried out other workings tasks at the same time. Ultimately, the analyzed use case could be divided into three different phases of interaction of human and robot:
  • Coexistence: Human and robot are located far away from each other but are both heading towards the engine block.
  • Collaboration: Human and robot work are at the same work piece on different workings tasks. At this stage, collisions are rather likely to occur, and thus the robot must be operated with decreased velocities.
  • Cooperation: After finishing all working tasks at the engine block, the operator carries out other tasks in the common workspace. At this stage, collisions between human and robot are rather unlikely.

6.2. Speed Adjustment

Depending on the current separation distance between robot and human, the framework adapts the robot velocity whenever the algorithm-specific separation distance falls bellow the minimum level allowed. The robot’s cycle time t c y c l e corresponds to the time required from leaving its initial position, conducting all screwing tasks until it returns to the starting position. As an example, the robot movement and the adaptation of robot velocities in dependence of the separation distance while human body part segmentation within the HRSF is being applied are shown in Figure 9a–c. From the plots it can be concluded that the robot is moving with maximum speed during coexistence. When changing to the collaborative phase, reduced velocities are adopted. When the robot returns to its starting position (cooperative phase), the robot velocity can be increased again since the separation distance exceeds the minimum separation distance value allowed.
The cycle time was measured on the robot controller and analyzed for each of the investigated deep learning methods. The obtained results were compared with the cycle times obtained when the (1) state-of-the-art manufacturing safety technology (SICK S30A laser scanner, SICK AG, Waldkirch, Germany) was used or (2) when no additional safety system was applied. In case 1 when the laser scanner was used, the maximum robot velocity, i.e., 1.6 m/s, was reduced to the most restrictive speed value in Table 5. This particular velocity was adjusted constantly for the measurements that dispensed with any additional safety system for human recognition. When no additional safety measure was used (case 2), the lowest speed was applied.

6.3. Cycle Time Analysis

For the determination of the algorithm-specific robot process execution times, cycle time measurements of the proposed screwing tasks were repeated ten times for each method investigated. The results obtained are given in Table 6. They clearly indicate that with any of the investigated human body (part) detection methods, the cycle times are always reduced compared to current state-of-the-art implementations of robot collaboration in industrial environments. For example, the cycle time was decreased by up to 35% compared to systems where no additional safety equipment was used. Even when laser scanning systems were used for robot velocity adaptation, the best performing method from the framework achieved cycle time reductions of more than 15%. This is so because laser scanner systems give a very conservative estimate of the intrusion distance C and thus a relatively large cycle time.
Among the different algorithms analyzed, both human-body-part-related approaches achieved the lowest levels of robot process execution time. For human pose estimation and human body recognition, higher cycle time fluctuations were observed compared to both segmentation approaches. This behavior can be explained bu both methods reacting less robustly when parts of the human body were occluded, and thus the robot velocity was reduced.
The obtained results clearly show that the HRSF is a valuable approach to reducing process execution times in human–robot collaboration applications. Despite its high latency (compare Table 3), the method of human body part segmentation achieved the lowest mean cycle time results for process execution. Thus, it can be assumed that in the future, the cycle time results can be significantly reduced due to increasing progress in computational power, along with ongoing optimizations in the field of object segmentation.

7. Conclusions

This paper describes a performance and feasibility study for vision-based SSM under ISO/TS 15066 constraints. To this end, four different deep learning algorithms were investigated in order to extract spatial human body and human body part information on the basis of RGB-D input data and further to determine separation distances between humans and robots. The regulatory guidelines of ISO/TS 15066 were taken into account, and the algorithm-specific minimum separation distance was used to adapt the robot velocity accordingly.
The framework was applied to screwing tasks and was shown to achieve significant reductions of cycle time compared to state-of-the-art industrial technologies applied for human–robot collaboration today. The best performing method for human body part extraction achieved an optimization of the robot process execution time of more than 15%.
In conclusion, the HRSF provides a promising approach for improving both efficiency and safety in human–robot collaboration. However, before deployment in productive industrial environments, further investigations are required on robustness, reliability, multi-camera integration for plausibility verification, fail-safe scenarios, and functional safety certification. It should be noted that the experimental design has important limitations, including the use of a single subject, a single task, a limited number of repetitions, no formal statistical testing, and simplified baseline comparisons, which constrain the generalization of these initial findings. Addressing these aspects will ensure that the framework is both practically applicable and compliant with industrial safety standards.

Author Contributions

Conceptualization, D.B.; Methodology, D.B.; Software, D.B.; Validation, D.B.; Investigation, D.B.; Writing—original draft, D.B.; Writing—review & editing, A.M.; Supervision, A.M.; Project administration, D.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Results for standing poses (top) and sitting poses (bottom) obtained for different distances to the camera origin using human-body-related recognition methods.
Table A1. Results for standing poses (top) and sitting poses (bottom) obtained for different distances to the camera origin using human-body-related recognition methods.
Standing poses
DistanceHuman Body RecognitionHuman Body Segmentation
to Camera Δ x Δ y Δ zFailed Identifications Δ x Δ y Δ zFailed Identifications
[m][mm][mm][mm][%][mm][mm][mm][%]
2.5−7 ± 50−23 ± 137−134 ± 1300−7 ± 50−21 ± 136−132 ± 1260
3.07 ± 45−76 ± 11−155 ± 18808 ± 43−68 ± 111−132 ± 1490
4.0−14 ± 87−205 ± 63−25 ± 1550−138 ± 86−202 ± 55−238 ± 1240
5.0−236 ± 104−154 ± 224−47 ± 4290−192 ± 78−99 ± 172−273 ± 2260
6.018 ± 78−28 ± 107−19 ± 244048 ± 59−242 ± 73−164 ± 1830
Sitting poses
DistanceHuman Body RecognitionHuman Body Segmentation
to Camera Δ x Δ y Δ zFailed Identifications Δ x Δ y Δ zFailed Identifications
[m][mm][mm][mm][%][mm][mm][mm][%]
4.035 ± 43172 ± 168−17 ± 163052 ± 28184 ± 149−108 ± 870
5.5 (Arms stretched)−248 ± 67221 ± 124−22 ± 750−215 ± 44226 ± 102−196 ± 710
6.0−17 ± 93249 ± 102−294 ± 90−3 ± 88259 ± 91−279 ± 840
Table A2. Results for dynamic movements (movements from left to right at varying distances, gesticulation with hands/arms, and box-lifting scenarios) using human-body-related recognition methods.
Table A2. Results for dynamic movements (movements from left to right at varying distances, gesticulation with hands/arms, and box-lifting scenarios) using human-body-related recognition methods.
DistanceHuman Body RecognitionHuman Body Segmentation
Pose to Camera Δ x Δ y Δ z Failed Identifications Δ x Δ y Δ z Failed Identifications
[m] [mm] [mm] [mm] [%] [mm] [mm] [mm] [%]
Moving from left to right3.588 ± 49−18 ± 357272 ± 4973154 ± 289−19 ± 367283 ± 4940
Moving from left to right4.5195 ± 219363 ± 387−5 ± 250212 ± 97374 ± 39651 ± 1790
Moving from left to right5.5−274 ± 1173−4 ± 543−135 ± 333769 ± 8316 ± 564−4 ± 1830
Gesticulation of hands and feet4.0472 ± 422−291 ± 488−13 ± 2831567 ± 303−325 ± 456−51 ± 2780
Box Lifting3.5−112 ± 948−537 ± 601−317 ± 3127286 ± 391−53 ± 545−7 ± 1390
Table A3. Results for standing poses obtained for different separation distances and body parts using human-body-part-related recognition methods.
Table A3. Results for standing poses obtained for different separation distances and body parts using human-body-part-related recognition methods.
DistanceHuman Body Part SegmentationHuman Pose Estimation
to Camera: Δ x Δ y Δ zFailed Identifications Δ x Δ y Δ zFailed Identifications
2.5 m[mm][mm][mm][%][mm][mm][mm][%]
Head61 ± 766 ± 10−20 ± 290−7 ± 2110 ± 9−55 ± 120
Body78 ± 18−7 ± 832 ± 5026 ± 4437 ± 162 ± 200
Left Upper Arm38 ± 117 ± 2−25 ± 4031 ± 2628 ± 219 ± 201
Right Upper Arm59 ± 99 ± 3−38 ± 9067 ± 179 ± 55± 1831
Left Lower Arm40 ± 19−8 ± 3−52 ± 12063 ± 285 ± 1414 ± 170
Right Lower Arm43 ± 15−3 ± 4−26 ± 15083± 2341 ± 748 ± 170
Left Upper Leg64 ± 226 ± 320 ± 703± 3680 ± 20−7 ± 140
Right Upper Leg−49 ± 31−2 ± 1019 ± 70−20 ± 340 ± 9−9 ± 30
Left Lower Leg21 ± 12−4 ± 2−63 ± 4060 ± 2510 ± 94 ± 250
Right Lower Leg16 ± 1213 ± 4−63 ± 5055 ± 4532± 1725 ± 80
DistanceHuman Body Part SegmentationHuman Pose Estimation
to Camera: Δ x Δ y Δ zFailed Identifications Δ x Δ y Δ zFailed Identifications
3.0 m[mm][mm][mm][%][mm][mm][mm][%]
Head79 ± 13−6 ± 8−21 ± 5083 ± 1436 ± 19−39 ± 50
Body94 ± 14−9 ± 1127 ± 9034 ± 3144 ± 129 ± 210
Left Upper Arm33 ± 119 ± 3−26 ± 8033 ± 2929 ± 1418 ± 50
Right Upper Arm29 ± 2011 ± 10−45 ± 3061 ± 254 ± 2018 ± 140
Left Lower Arm52 ± 15−2 ± 2−65 ± 8073 ± 174 ± 1527 ± 250
Right Lower Arm35 ± 153 ± 6−43 ± 5065± 2436 ± 1761 ± 171
Left Upper Leg72 ± 167 ± 715 ± 4032 ± 2867 ± 17−7 ± 20
Right Upper Leg−26 ± 193 ± 914 ± 60−5 ± 27−15 ± 23−5 ± 30
Left Lower Leg11 ± 210 ± 6−41 ± 5042 ± 2617 ± 1810 ± 60
Right Lower Leg12 ± 2015 ± 5−44 ± 8051 ± 3551 ± 1724± 230
DistanceHuman Body Part SegmentationHuman Pose Estimation
to Camera: Δ x Δ y Δ zFailed Identifications Δ x Δ y Δ zFailed Identifications
4.0 m[mm][mm][mm][%][mm][mm][mm][%]
Head−70 ± 155 ± 20−52 ± 90−79 ± 2620 ± 31−83 ± 340
Body−65 ± 222 ± 257 ± 40−121 ± 5765 ± 34−29 ± 80
Left Upper Arm−113 ± 2523 ± 21−50 ± 30−136 ± 4274 ± 35−31 ± 60
Right Upper Arm−92 ± 3511 ± 20−54± 40−46 ± 5435 ± 22−20 ± 80
Left Lower Arm−92 ± 2524 ± 19−68 ± 50−79 ± 2632 ± 3119 ± 110
Right Lower Arm−80 ± 22−3 ± 15−41 ± 90−34 ± 4139 ± 2027 ± 821
Left Upper Leg−54 ± 3218 ± 1714 ± 30−150 ± 65100 ± 19−1 ± 30
Right Upper Leg−133 ± 4117± 1610 ± 30−146 ± 304 ± 362 ± 30
Left Lower Leg−12 ± 662 ± 11−21 ± 13020 ± 7612 ± 2721 ± 100
Right Lower Leg−7 ± 5723 ± 11−25 ± 13014 ± 7959 ± 1522 ± 100
DistanceHuman Body Part SegmentationHuman Pose Estimation
to Camera: Δ x Δ y Δ zFailed Identifications Δ x Δ y Δ zFailed Identifications
5.0 m[mm][mm][mm][%][mm][mm][mm][%]
Head−75 ± 2311 ± 31−62 ± 80−96 ± 2648 ± 31−94 ± 40
Body−69 ± 3014 ± 412 ± 60−90 ± 4072 ± 2610 ± 310
Left Upper Arm−116 ± 4225 ± 29−60 ± 110−94 ± 6448 ± 25−4 ± 410
Right Upper Arm−96 ± 3632 ± 33−67 ± 70522 ± 767190 ± 19489 ± 8811
Left Lower Arm−135 ± 4428 ± 36−68 ± 60−119 ± 4154 ± 4236 ± 190
Right Lower Arm−112 ± 382 ± 29−43 ± 110−69 ± 7962 ± 6040 ± 69
Left Upper Leg−78 ± 5320 ± 376 ± 90−151± 90110 ± 3113± 20
Right Upper Leg−161 ± 5524 ± 362 ± 90−157 ± 5422± 3113 ± 20
Left Lower Leg−97 ± 901 ± 28−15 ± 150−50 ± 10776 ± 2730 ± 110
Right Lower Leg−89 ± 7422 ± 30−21 ± 14032 ± 9686 ± 4123 ± 91
DistanceHuman Body Part SegmentationHuman Pose Estimation
to Camera: Δ x Δ y Δ zFailed Identifications Δ x Δ y Δ zFailed Identifications
6.0 m[mm][mm][mm][%][mm][mm][mm][%]
Head446 ± 7922 ± 3−5 ± 100673 ± 12982 ± 638 ± 320
Body162 ± 3222 ± 34 ± 40475 ± 11099 ± 678 ± 90
Left Upper Arm17 ± 7027 ± 3−49 ± 50163 ± 9796± 5546 ± 90
Right Upper Arm152 ± 5249 ± 3−64 ± 60855 ± 14953 ± 9114± 130
Left Lower Arm93 ± 5525 ± 4−84 ± 12095 ± 4192 ± 1043 ± 20
Right Lower Arm151 ± 3415 ± 728 ± 120229 ± 7076± 1552 ± 20
Left Upper Leg101 ± 4346 ± 2−4 ± 306 ± 73119 ± 324 ± 10
Right Upper Leg51 ± 3739 ± 2−8 ± 3097 ± 3881± 327 ± 10
Left Lower Leg206 ± 5319 ± 2−30 ± 70215± 4699± 241 ± 30
Right Lower Leg302 ± 3948 ± 1−45 ± 60306 ± 105101± 436 ± 60
Table A4. Results for sitting poses obtained for different distances and body parts using human-body-part-related recognition methods.
Table A4. Results for sitting poses obtained for different distances and body parts using human-body-part-related recognition methods.
DistanceHuman Body Part SegmentationHuman Pose Estimation
to Camera: Δ x Δ y Δ zFailed Identifications Δ x Δ y Δ zFailed Identifications
4.0 m[mm][mm][mm][%][mm][mm][mm][%]
Head−71 ± 193 ± 2−36 ± 30−49 ± 3929 ± 38−66 ± 40
Body−40 ± 20−3 ± 334 ± 60−90 ± 4378 ± 2−12 ± 30
Left Upper Arm−97 ± 3124 ± 2−38 ± 30−117 ± 4092 ± 2−12 ± 20
Right Upper Arm−12 ± 3030 ± 2−69 ± 4057 ± 5860 ± 4−8 ± 331
Left Lower Arm−66 ± 1933 ± 3−38 ± 30−85 ± 2045 ± 521 ± 30
Right Lower Arm−53 ± 25−17 ± 37 ± 4040 ± 3512 ± 668 ± 70
Left Upper Leg−51 ± 1231 ± 3−55 ± 10−123 ± 37131 ± 04 ± 30
Right Upper Leg180 ± 1821 ± 2−52 ± 20316 ± 207 ± 11 ± 20
Left Lower Leg115 ± 179 ± 2−21 ± 40129 ± 2329 ± 35−58 ± 30
Right Lower Leg65 ± 166 ± 1−22 ± 40267 ± 8054 ± 6−56 ± 250
DistanceHuman Body Part SegmentationHuman Pose Estimation
to Camera: Δ x Δ y Δ zFailed Identifications Δ x Δ y Δ zFailed Identifications
6.0 m[mm][mm][mm][%][mm][mm][mm][%]
Head254 ± 5344 ± 10−19 ± 50314 ± 68140 ± 13−5 ± 50
Body168 ± 2635 ± 511 ± 60210 ± 66148 ± 1245 ± 40
Left Upper Arm69 ± 5624 ± 7−59 ± 20107 ± 11665 ± 1744 ± 50
Right Upper Arm239 ± 4599 ± 10−57 ± 50456 ± 132112 ± 2749 ± 40
Left Lower Arm111 ± 3361 ± 5−62 ± 5073 ± 25156 ± 1531 ± 30
Right Lower Arm144 ± 3921 ± 2534 ± 70376 ± 68112 ± 14118 ± 40
Left Upper Leg111± 2752 ± 4−108± 3057 ± 26111 ± 4−51 ± 20
Right Upper Leg372 ± 3492± 7−101 ± 30490 ± 3070 ± 6−50 ± 20
Left Lower Leg277 ± 4250 ± 7−81 ± 60320 ± 56114 ± 9−96 ± 280
Right Lower Leg94 ± 6736 ± 15−65 ± 901756 ± 125554 ± 31−98± 410
DistanceHuman Body Part SegmentationHuman Pose Estimation
to Camera: Δ x Δ y Δ zFailed Identifications Δ x Δ y Δ zFailed Identifications
5.5 m[mm][mm][mm][%][mm][mm][mm][%]
Head277 ± 9128 ± 9−21 ± 60223 ± 7455 ± 10−43 ± 50
Body147 ± 6911 ± 833 ± 20142 ± 7562 ± 810 ± 20
Left Upper Arm598 ± 13159 ± 10−42 ± 30133 ± 9199 ± 8−29 ± 20
Right Upper Arm259 ± 13383 ± 20−30 ± 20385 ± 179163 ± 27−26 ± 40
Left Lower Arm943 ± 592712 ± 522−50 ± 35411743 ± 366116 ± 7252 ± 245
Right Lower Arm669 ± 224216 ± 108−27 ± 50859 ± 333219 ± 7417 ± 2710
Left Upper Leg−34 ± 6140 ± 6−42 ± 50−56 ± 7850 ± 7−18 ± 50
Right Upper Leg148 ± 5110 ± 7−35 ± 40377 ± 78111 ± 12−23 ± 50
Left Lower Leg−13 ± 3118 ± 3−15 ± 40177 ± 5751 ± 4−45 ± 50
Right Lower Leg−165 ± 21−10 ± 3−1 ± 40154 ± 121120 ± 21−44 ± 90
Table A5. Results for poses at dynamic movement using human-body-part-related recognition methods.
Table A5. Results for poses at dynamic movement using human-body-part-related recognition methods.
DistanceHuman Body Part SegmentationHuman Pose Estimation
to Camera: Δ x Δ y Δ zFailed Identifications Δ x Δ y Δ zFailed Identifications
3.5 m (Moving)[mm][mm][mm][%][mm][mm][mm][%]
Head28 ± 707 ± 47−29 ± 2804 ± 2922 ± 44−41 ± 860
Body26 ± 23−10 ± 51−6 ± 80−31 ± 3933 ± 52−15 ± 170
Left Upper Arm−25 ± 2828 ± 4510 ± 140−35 ± 3230 ± 51−24± 171
Right Upper Arm12 ± 32−27 ± 46−1 ± 11021 ± 1436 ± 72−15 ± 1092
Left Lower Arm−4 ± 298 ± 42−24 ± 19018 ± 355 ± 4628 ± 172
Right Lower Arm2 ± 27−0 ± 435 ± 16029 ± 4040 ± 4444 ± 1310
Left Upper Leg22 ± 236 ± 42−16 ± 90−23 ± 3354 ± 505 ± 180
Right Upper Leg−69 ± 25−6 ± 45−29 ± 130−16 ± 53−4 ± 586 ± 660
Left Lower Leg−11 ± 2811 ± 44−42 ± 1409 ± 3317 ± 4527 ± 210
Right Lower Leg0 ± 295 ± 43−53 ± 17017 ± 4346 ± 4536 ± 200
DistanceHuman Body Part SegmentationHuman Pose Estimation
to Camera: Δ x Δ y Δ zFailed Identifications Δ x Δ y Δ zFailed Identifications
4.5 m (Moving)[mm][mm][mm][%][mm][mm][mm][%]
Head34 ± 3414 ± 40−26 ± 14024 ± 3532 ± 42−53 ± 150
Body29 ± 39−5 ± 42−13 ± 90−46 ± 5547 ± 46−7 ± 270
Left Upper Arm−30 ± 4725 ± 38−4 ± 100−39 ± 5453 ± 45−24 ± 280
Right Upper Arm6 ± 48−12 ± 38−17 ± 9055 ± 15734 ± 65−6 ± 337
Left Lower Arm−6 ± 5010 ± 37−61 ± 13321 ± 6724 ± 4418 ± 201
Right Lower Arm11 ± 578 ± 37−22 ± 19235 ± 7158 ± 4234 ± 2823
Left Upper Leg33 ± 4314 ± 32−30 ± 100−25 ± 6168 ± 444 ± 70
Right Upper Leg−46 ± 43−3 ± 32−44 ± 1500 ± 7111 ± 471 ± 80
Left Lower Leg−10 ± 4116 ± 33−25 ± 20034 ± 5734 ± 3729 ± 250
Right Lower Leg15 ± 4014 ± 33−35 ± 20068 ± 8772 ± 4037 ± 204
DistanceHuman Body Part SegmentationHuman Pose Estimation
to Camera: Δ x Δ y Δ zFailed Identifications Δ x Δ y Δ zFailed Identifications
5.5 m (Moving)[mm][mm][mm][%][mm][mm][mm][%]
Head−7 ± 4518 ± 54−44 ± 150−16 ± 6154 ± 56−87 ± 280
Body−29 ± 425 ± 52−27 ± 100−56 ± 6869 ± 567 ± 250
Left Upper Arm−69 ± 12636 ± 45−14 ± 160−67 ± 7874 ± 52−11 ± 320
Right Upper Arm−1 ± 870 ± 43−36 ± 121230 ± 43867 ± 10823 ± 5314
Left Lower Arm−42 ± 9425 ± 44−64 ± 1410−28 ± 9541 ± 5028 ± 140
Right Lower Arm−11 ± 11914 ± 44−36 ± 2510103 ± 37580 ± 5027 ± 1415
Left Upper Leg−20 ± 6024 ± 42−41 ± 160−97 ± 8494 ± 5416 ± 110
Right Upper Leg−95 ± 737 ± 43−56 ± 180−55 ± 9318 ± 5412 ± 120
Left Lower Leg−56 ± 5623 ± 44−22 ± 280−8 ± 9353 ± 5038 ± 160
Right Lower Leg−43 ± 5717 ± 44−31 ± 290100 ± 28285 ± 5233 ± 294
DistanceHuman Body Part SegmentationHuman Pose Estimation
to Camera: Δ x Δ y Δ zFailed Identifications Δ x Δ y Δ zFailed Identifications
3.5 m (Gesticulation)[mm][mm][mm][%][mm][mm][mm][%]
Head21 ± 552 ± 28−33 ± 17011 ± 5521 ± 30−49 ± 210
Body9± 50−13 ± 42−17 ± 110−36 ± 7234 ± 39−11 ± 210
Left Upper Arm−30 ± 6417 ± 35−17 ± 270−39 ± 7230 ± 38−29 ± 240
Right Upper Arm23 ± 155−13 ± 38−27 ± 27059 ± 26124 ± 50−24 ± 793
Left Lower Arm−19 ± 767 ± 44−44 ± 43019 ± 8819 ± 4913 ± 405
Right Lower Arm−20 ± 1652± 55−13 ± 780−293 ± 94025 ± 1546 ± 20919
Left Upper Leg32 ± 516 ± 35−19 ± 190−25 ± 6155 ± 374 ± 120
Right Upper Leg−63 ± 79−18 ± 34−27 ± 200−3 ± 91−7± 353 ± 120
Left Lower Leg9 ± 5912 ± 29−35 ± 17027 ± 6018± 3629 ± 240
Right Lower Leg−3 ± 75−6± 31−44 ± 21037 ± 9447 ± 3735 ± 264
DistanceHuman Body Part SegmentationHuman Pose Estimation
to Camera: Δ x Δ y Δ zFailed Identifications Δ x Δ y Δ zFailed Identifications
3.0 m (Lifting)[mm][mm][mm][%][mm][mm][mm][%]
Head38 ± 5112 ± 81−6 ± 31034 ± 5639 ± 55−57± 335
Body−67 ± 603 ± 952 ± 340−32 ± 6851 ± 6823 ± 310
Left Upper Arm1 ± 9939± 92−23 ± 310−31 ± 7934 ± 68−19± 281
Right Upper Arm151 ± 155−39± 113−62± 513283 ± 19646 ± 65−11 ± 316
Left Lower Arm−41 ± 12849 ± 76−49 ± 33011 ± 9268 ± 71−10 ± 333
Right Lower Arm−23 ± 155−37 ± 97−47 ± 38024 ± 14224 ± 80−11 ± 3811
Left Upper Leg−12 ± 6525 ± 63−41± 210−126 ± 13869 ± 734 ± 220
Right Upper Leg−54 ± 96−7 ± 73−55 ± 230−42 ± 1757 ± 690 ± 273
Left Lower Leg−26 ± 7624 ± 47−21 ± 24036 ± 8757 ± 6542 ± 241
Right Lower Leg−18± 84−8 ± 54−29 ± 23058 ± 14446± 6537 ± 273
Figure A1. Error determination of human body part predictions for standing poses in x-direction. The dashed lines indicate the estimated maximal error for human pose estimation (red) and human body part segmentation (blue).
Figure A1. Error determination of human body part predictions for standing poses in x-direction. The dashed lines indicate the estimated maximal error for human pose estimation (red) and human body part segmentation (blue).
Sensors 25 07136 g0a1
Figure A2. Error determination of human body part predictions for standing poses in y-direction. The dashed lines indicate the estimated prediction error for human pose estimation (red) and human body part segmentation (blue).
Figure A2. Error determination of human body part predictions for standing poses in y-direction. The dashed lines indicate the estimated prediction error for human pose estimation (red) and human body part segmentation (blue).
Sensors 25 07136 g0a2
Figure A3. Error determination of human body part predictions for standing poses in z-direction. The dashed lines indicate the estimated maximal error for human pose estimation (red) and human body part segmentation (blue).
Figure A3. Error determination of human body part predictions for standing poses in z-direction. The dashed lines indicate the estimated maximal error for human pose estimation (red) and human body part segmentation (blue).
Sensors 25 07136 g0a3

References

  1. Ogenyi, U.E.; Liu, J.; Yang, C.; Ju, Z.; Liu, H. Physical Human–Robot Collaboration: Robotic Systems, Learning Methods, Collaborative Strategies, Sensors, and Actuators. IEEE Trans. Cybern. 2021, 51, 1888–1901. [Google Scholar] [CrossRef] [PubMed]
  2. Michalos, G.; Makris, S.; Tsarouchi, P.; Guasch, T.; Kontovrakis, D.; Chryssolouris, G. Design Considerations for Safe Human-robot Collaborative Workplaces. Procedia CIRP 2015, 37, 248–253. [Google Scholar] [CrossRef]
  3. ISO/TS 15066:2016; Robots and Robotic Devices—Collaborative Robots. Technical Specification. International Organization for Standardization: Geneva, Switzerland, 2016.
  4. Halme, R.J.; Lanz, M.; Kämäräinen, J.; Pieters, R.; Latokartano, J.; Hietanen, A. Review of vision-based safety systems for human-robot collaboration. Procedia CIRP 2018, 72, 111–116. [Google Scholar] [CrossRef]
  5. Munaro, M.; Basso, F.; Menegatti, E. Tracking people within groups with RGB-D data. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 7–12 October 2012; pp. 2101–2107. [Google Scholar]
  6. Munaro, M.; Menegatti, E. Fast RGB-D People Tracking for Service Robots. Auton. Robot. 2014, 37, 227–242. [Google Scholar] [CrossRef]
  7. Nikolakis, N.; Maratos, V.; Makris, S. A cyber physical system (CPS) approach for safe human-robot collaboration in a shared workplace. Robot. Comput.-Integr. Manuf. 2019, 56, 233–243. [Google Scholar] [CrossRef]
  8. Nguyen, V.P.; Laursen, T.; Schultz, U.P. Human detection and tracking for collaborative robot applications. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 1–7. [Google Scholar]
  9. Shotton, J.; Fitzgibbon, A.; Cook, M.; Sharp, T.; Finocchio, M.; Moore, R.; Kipman, A.; Blake, A. Real-time human pose recognition in parts from a single depth image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 23–28 June 2013; pp. 1297–1304. [Google Scholar]
  10. Mehta, D.; Sridhar, S.; Sotnychenko, O.; Rhodin, H.; Shafiei, M.; Seidel, H.P.; Theobalt, C. VNect: Real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. 2017, 36, 44. [Google Scholar] [CrossRef]
  11. Cao, Z.; Simon, T.; Wei, S.E.; Sheikh, Y. Realtime multi-person 2D pose estimation using Part Affinity Fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 7291–7299. [Google Scholar]
  12. Papandreou, G.; Zhu, T.; Chen, L.C.; Gidaris, S.; Tompson, J.; Murphy, K. PersonLab: Person pose estimation and instance segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 269–286. [Google Scholar]
  13. Krüger, V.; Bawa, A. Semantic segmentation for safe human-robot collaboration. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 1–8. [Google Scholar]
  14. Bergamini, L.; De Magistris, G.; Roveda, L.; Sanguineti, M.; Masia, L. Deep learning for safe human–robot collaboration. Robot. Comput.-Integr. Manuf. 2021, 67, 102037. [Google Scholar]
  15. Petković, T.; Miklić, D. Vision-based safety monitoring for collaborative robotics: A deep learning approach. Sensors 2021, 21, 1–18. [Google Scholar]
  16. Flacco, F.; Kroeger, T.; De Luca, A.; Khatib, O. Depth-based human motion tracking for real-time safe robot guidance. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–2 October 2015; pp. 2710–2716. [Google Scholar]
  17. Roncone, A.; Hoffmann, M.; Pattacini, U.; Metta, G. Safe and compliant physical human–robot interaction using vision-based tracking. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI), Christchurch, New Zealand, 7–10 March 2016; pp. 507–514. [Google Scholar]
  18. Haddadin, S.; De Luca, A.; Albu-Schäffer, A. Collision detection, isolation, and identification for robots in human environments. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), St Paul, MN, USA, 14–18 May 2012; pp. 3356–3363. [Google Scholar]
  19. Khatib, O. Real-time Obstacle Avoidance for Manipulators and Mobile Robots. Int. J. Rob. Res. 1986, 5, 90–98. [Google Scholar] [CrossRef]
  20. Warren, C. Global path planning using artificial potential fields. In Proceedings of the 1989 IEEE International Conference on Robotics and Automation, Scottsdale, AZ, USA, 14–19 May 1989; pp. 316–321. [Google Scholar] [CrossRef]
  21. Makris, S. Dynamic Safety Zones in Human Robot Collaboration. In Cooperating Robots for Flexible Manufacturing; Springer International Publishing: Cham, Switzerland, 2021; pp. 271–287. [Google Scholar] [CrossRef]
  22. Ratliff, N.; Zucker, M.; Bagnell, A.; Srinivasa, S. CHOMP: Gradient Optimization Techniques for Efficient Motion Planning. In Proceedings of the IEEE International Conference on Robotics and Automation, Kobe, Japan, 12–17 May 2009. [Google Scholar]
  23. Kalakrishnan, M.; Chitta, S.; Theodorou, E.; Pastor, P.; Schaal, S. STOMP: Stochastic trajectory optimization for motion planning. In Proceedings of the IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 4569–4574. [Google Scholar] [CrossRef]
  24. Park, C.; Pan, J.; Manocha, D. Real-time optimization-based planning in dynamic environments using GPUs. In Proceedings of the IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, 6–10 May 2013; pp. 4090–4097. [Google Scholar]
  25. Park, J.S.; Park, C.; Manocha, D. I-Planner: Intention-aware motion planning using learning-based human motion prediction. Int. J. Robot. Res. 2019, 38, 23–39. [Google Scholar] [CrossRef]
  26. Yamada, Y.; Suita, K.; Imai, K.; Ikeda, H.; Sugimoto, N. A failure-to-safety robot system for human-robot coexistence. Rob. Autonom. Syst. 1996, 18, 283–291. [Google Scholar] [CrossRef]
  27. Yamada, Y.; Hirasawa, Y.; Huang, S.; Umetani, Y.; Suita, K. Human-robot contact in the safeguarding space. IEEE Trans. Mechatronics 1997, 2, 230–236. [Google Scholar] [CrossRef]
  28. Takakura, S.; Murakami, T.; Ohnishi, K. An approach to collision detection and recovery motion in industrial robot. In Proceedings of the 15th Annual Conference of IEEE Industrial Electronics Society, Philadelphia, PA, USA, 6–10 November 1989; pp. 421–426. [Google Scholar] [CrossRef]
  29. Aivaliotis, P.; Aivaliotis, S.; Gkournelos, C.; Kokkalis, K.; Michalos, G.; Makris, S. Power and force limiting on industrial robots for human-robot collaboration. Robot. Comput.-Integr. Manuf. 2019, 59, 346–360. [Google Scholar] [CrossRef]
  30. Peng, Y.; Sakai, Y.; Funabora, Y.; Yokoe, K.; Aoyama, T.; Doki, S. Funabot-Sleeve: A Wearable Device Employing McKibben Artificial Muscles for Haptic Sensation in the Forearm. IEEE Robot. Autom. Lett. 2025, 10, 1944–1951. [Google Scholar] [CrossRef]
  31. Wu, X.; Sahoo, D.; Hoi, S.C. Recent advances in deep learning for object detection. Neurocomputing 2020, 396, 39–64. [Google Scholar] [CrossRef]
  32. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
  33. He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2014), Columbus, OH, USA, 23–28 June 2014; pp. 346–361. [Google Scholar]
  34. Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
  35. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. [Google Scholar]
  36. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
  37. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Yang Fu, C.; Berg, A. SSD: Single Shot MultiBox Detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
  38. Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. Scaled-YOLOv4: Scaling Cross Stage Partial Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 13029–13038. [Google Scholar]
  39. Ciresan, D.; Giusti, A.; Gambardella, L.; Schmidhuber, J. Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 2852–2860. [Google Scholar]
  40. Farabet, C.; Couprie, C.; Najman, L.; LeCun, Y. Learning Hierarchical Features for Scene Labeling. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1915–1929. [Google Scholar] [CrossRef] [PubMed]
  41. Wang, Y.; Wang, C.; Long, P.; Gu, Y.; Li, W. Recent advances in 3D object detection based on RGB-D: A survey. Displays 2021, 70, 102077. [Google Scholar] [CrossRef]
  42. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
  43. Bolya, D.; Zhou, C.; Xiao, F.; Lee, Y.J. YOLACT++ Better Real-Time Instance Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 1108–1121. [Google Scholar] [CrossRef] [PubMed]
  44. Kohli, P.; Shotton, J. Key Developments in Human Pose Estimation for Kinect; Consumer Depth Cameras for Computer Vision, ed.; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
  45. Toshev, A.; Szegedy, C. DeepPose: Human Pose Estimation via Deep Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 1653–1660. [Google Scholar]
  46. Yang, W.; Ouyang, W.; Li, H.; Wang, X. End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3073–3082. [Google Scholar] [CrossRef]
  47. Chen, Y.; Shen, C.; Shen Wei, X.; Liu, L.; Yang, J. Adversarial PoseNet: A Structure-Aware Convolutional Network for Human Pose Estimation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV 2017), Venice, Italy, 22–29 October 2017; pp. 1221–1230. [Google Scholar]
  48. Chu, X.; Ouyang, W.; Li, H.; Wang, X. Structured Feature Learning for Pose Estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 4715–4723. [Google Scholar] [CrossRef]
  49. Tompson, J.; Jain, A.; LeCun, Y.; Bregler, C. Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation. In Proceedings of the Annual Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 1799–1807. [Google Scholar]
  50. En Wei, S.; Ramakrishna, V.; Kanade, T.; Sheikh, Y. Convolutional Pose Machines. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 4724–4732. [Google Scholar]
  51. Fang, H.S.; Lu, G.; Fang, X.; Xie, J.; Tai, Y.W.; Lu, C. Weakly and Semi Supervised Human Body Part Parsing via Pose-Guided Knowledge Transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 70–78. [Google Scholar]
  52. Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the Computer Vision—ECCV 2014, Zurich, Switzerland, 6–12 September 2014; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
  53. Chen, X.; Mottaghi, R.; Liu, X.; Fidler, S.; Urtasun, R.; Yuille, A. Detect What You Can: Detecting and Representing Objects using Holistic Models and Body Parts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1979–1986. [Google Scholar]
  54. Huang, J.; Rathod, V.; Sun, C.; Zhu, M.; Korattikara, A.; Fathi, A.; Fischer, I.; Wojna, Z.; Song, Y.; Guadarrama, S.; et al. Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 3296–3297. [Google Scholar]
  55. ISO 13849-1:2015; Safety of Machinery—Safety-Related Parts of Control Systems—Part 1: General Principles for Design. Technical Report; ISO: Geneva, Switzerland, 2015.
  56. IEC 61508:2010; Functional Safety of Electrical/Electronic/Programmable Electronic Safety-Related Systems. Technical Report; IEC: Geneva, Switzerland, 2010; Standard, Parts 1–7.
  57. Bricher, D.; Müller, A. Analysis of Different Human Body Recognition Methods and Latency Determination for a Vision-based Human-robot Safety Framework According to ISO/TS 15066. In Proceedings of the 17th International Conference on Informatics in Control, Automation and Robotics, ICINCO 2020, Paris, France, 7–9 July 2020. [Google Scholar] [CrossRef]
  58. ISO13855; Safety of Machinery—Positioning of Safeguards with Respect to the Approach Speeds of Parts of the Human Body. Specification; International Organization for Standardization: Geneva, Switzerland, 2010.
  59. Brandstötter, M.; Komenda, T.; Ranz, F.; Wedenig, P.; Gattringer, H.; Kaiser, L.; Breitenhuber, G.; Schlotzhauer, A.; Müller, A.; Hofbaur, M. Versatile Collaborative Robot Applications Through Safety-Rated Modification Limits. In Proceedings of the Advances in Service and Industrial Robotics; Berns, K., Görges, D., Eds.; Springer: Cham, Switzerland, 2020; pp. 438–446. [Google Scholar]
Figure 1. Visualized results of the applied deep learning techniques for (a) human body recognition [54] and (b) human body segmentation [42] in color image (left) and depth image (right).
Figure 1. Visualized results of the applied deep learning techniques for (a) human body recognition [54] and (b) human body segmentation [42] in color image (left) and depth image (right).
Sensors 25 07136 g001
Figure 2. Visualized results of the applied deep learning techniques for (a) human pose estimation [45] and (b) human body part segmentation [51] in color image (left) and depth image (right).
Figure 2. Visualized results of the applied deep learning techniques for (a) human pose estimation [45] and (b) human body part segmentation [51] in color image (left) and depth image (right).
Sensors 25 07136 g002
Figure 3. Individual latency contributions determined for each of the analyzed algorithms within the HRSF.
Figure 3. Individual latency contributions determined for each of the analyzed algorithms within the HRSF.
Sensors 25 07136 g003
Figure 4. Minimal separation distance determination according to the current robot pose via the application of the cuboid-shaped robot protective hull approach for (a) human-body-related and (b) human-body-part-related approaches.
Figure 4. Minimal separation distance determination according to the current robot pose via the application of the cuboid-shaped robot protective hull approach for (a) human-body-related and (b) human-body-part-related approaches.
Sensors 25 07136 g004
Figure 5. (a) Experimental setup for data acquisition. (b) Position of markers and indications of whether a marker is used for human pose estimation, human body part segmentation, or only for human-body-related accuracy validation. All of the markers chosen for human pose estimation and human body part segmentation were also considered for human-body-related measurements.
Figure 5. (a) Experimental setup for data acquisition. (b) Position of markers and indications of whether a marker is used for human pose estimation, human body part segmentation, or only for human-body-related accuracy validation. All of the markers chosen for human pose estimation and human body part segmentation were also considered for human-body-related measurements.
Sensors 25 07136 g005
Figure 6. Overview of the different measurement scenarios.
Figure 6. Overview of the different measurement scenarios.
Sensors 25 07136 g006
Figure 7. Determination of measured forces and pressures for two robot movement scenarios: movements in operator direction (a) and vertical movements (b). The colored dashed lines correspond to the biomechanical threshold levels taken from ISO/TS 15066 with the following nomenclature: skull (green), hands/fingers/lower arms (red), chest (blue), upper arms (violet), upper legs (black), and lower legs (orange).
Figure 7. Determination of measured forces and pressures for two robot movement scenarios: movements in operator direction (a) and vertical movements (b). The colored dashed lines correspond to the biomechanical threshold levels taken from ISO/TS 15066 with the following nomenclature: skull (green), hands/fingers/lower arms (red), chest (blue), upper arms (violet), upper legs (black), and lower legs (orange).
Sensors 25 07136 g007
Figure 8. Block architecture of the HRSF using the KUKA iiwa robot.
Figure 8. Block architecture of the HRSF using the KUKA iiwa robot.
Sensors 25 07136 g008
Figure 9. Plotted time evolution of (a) robot joint angles, (b) robot joint velocities. and (c) separation distance S p between the human and robot for the investigated screwing application while the human body part segmentation within the HRSF is applied.
Figure 9. Plotted time evolution of (a) robot joint angles, (b) robot joint velocities. and (c) separation distance S p between the human and robot for the investigated screwing application while the human body part segmentation within the HRSF is applied.
Sensors 25 07136 g009
Table 1. Comparison of representative vision-based and sensor-based HRC safety approaches.
Table 1. Comparison of representative vision-based and sensor-based HRC safety approaches.
Work/ApproachModalityStandardBody PartDynamic VelocityExperiment
AddressedAwarenessAdaptationValidation
Industrial safety scanners/light curtains2D laser/IRISO 13855NoNo (fixed stop/speed)Industrial use; no body part tests
RGB-D human detection [8,9,10]RGB-DNone or partialWhole-body onlyLimited; coarse scalingLaboratory demonstrations; limited safety analysis
Skeleton/keypoint tracking [11,12]RGB/RGB-DNot aligned with ISO/TS 15066Joint-level but not mapped to limitsRarely implementedLaboratory tests only
DL-based human segmentation for HRC [13,14,15]RGB-DPartial referencesRegion levelLimited/no modulationLaboratory studies; no latency/failure modes
Depth-based SSM [16,17,18]Depth, 3D sensorsPartially aligned w/ ISO/TS 15066Whole-body/coarse regionsYes, but uniform marginsStrong validation; no body part integration
Proposed HRSF (this work)RGB-D + DLExplicit ISO/TS 15066 mappingYes, per-body-part 3D localizationYes, part-specific velocity scalingReal-robot evaluation; accuracy, latency, robustness
Table 2. Analyzed deep learning approaches for human body and human body part recognition.
Table 2. Analyzed deep learning approaches for human body and human body part recognition.
Detection AlgorithmMethod
Human Body RecognitionSSD [54]
Human Body SegmentationMask R-CNN [42]
Human Pose EstimationDeep Pose [45]
Human Body Part SegmentationHuman Body Part Parsing [51]
Table 3. Contributions of the analyzed algorithms to the separation distance due to the maximal latency within the HRSF.
Table 3. Contributions of the analyzed algorithms to the separation distance due to the maximal latency within the HRSF.
Method t Lat Max S h
[ms][mm]
Human Body Recognition370592
Human Pose Estimation305488
Human Body Segmentation559894
Human Body Part Segmentation8121299
Table 4. Determined position prediction error for all algorithms applied in the HRSF.
Table 4. Determined position prediction error for all algorithms applied in the HRSF.
Method Δ x Δ y Δ z
[mm][mm][mm]
Human Body Recognition346767399
Human Body Segmentation346416334
Human Pose Estimation13157206
Human Body Part Segmentation8771151
Table 5. Body-part-specific maximum Cartesian robot velocities that were applied in the HRSF for the analyzed screwing application.
Table 5. Body-part-specific maximum Cartesian robot velocities that were applied in the HRSF for the analyzed screwing application.
Body Part z ˙ Max [mm/s]
Skull/Forehead50
Hand/Fingers/Lower Arms100
Chest100
Upper Arms100
Thighs200
Lower Legs50
Table 6. Overview of robot cycle times obtained for the execution of screwing tasks with different algorithms applied in the HRSF, state-of-the-art laser scanners, and no additional safety monitoring equipment being used.
Table 6. Overview of robot cycle times obtained for the execution of screwing tasks with different algorithms applied in the HRSF, state-of-the-art laser scanners, and no additional safety monitoring equipment being used.
Method t Cycle [s]
Human Body Recognition24.84 ± 2.31
Human Body Segmentation25.60 ± 0.41
Human Pose Estimation22.81 ± 1.17
Human Body Part Segmentation22.78 ± 0.37
Laser Scanner26.98 ± 0.59
No Additional Safety System35.08 ± 0.04
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bricher, D.; Müller, A. Analysis of Deep-Learning Methods in an ISO/TS 15066–Compliant Human–Robot Safety Framework. Sensors 2025, 25, 7136. https://doi.org/10.3390/s25237136

AMA Style

Bricher D, Müller A. Analysis of Deep-Learning Methods in an ISO/TS 15066–Compliant Human–Robot Safety Framework. Sensors. 2025; 25(23):7136. https://doi.org/10.3390/s25237136

Chicago/Turabian Style

Bricher, David, and Andreas Müller. 2025. "Analysis of Deep-Learning Methods in an ISO/TS 15066–Compliant Human–Robot Safety Framework" Sensors 25, no. 23: 7136. https://doi.org/10.3390/s25237136

APA Style

Bricher, D., & Müller, A. (2025). Analysis of Deep-Learning Methods in an ISO/TS 15066–Compliant Human–Robot Safety Framework. Sensors, 25(23), 7136. https://doi.org/10.3390/s25237136

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop