Improving Accuracy in Industrial Safety Monitoring: Combine UWB Localization and AI-Based Image Analysis

Di Rienzo, Francesco; Miglionico, Giustino Claudio; Ducange, Pietro; Marcelloni, Francesco; Salti, Nicolò; Vallati, Carlo

doi:10.3390/jsan14060118

Open AccessArticle

Improving Accuracy in Industrial Safety Monitoring: Combine UWB Localization and AI-Based Image Analysis

by

Francesco Di Rienzo

^*

,

Giustino Claudio Miglionico

,

Pietro Ducange

,

Francesco Marcelloni

,

Nicolò Salti

and

Carlo Vallati

Department Information Engineering, University of Pisa, 56126 Pisa, Italy

^*

Author to whom correspondence should be addressed.

J. Sens. Actuator Netw. 2025, 14(6), 118; https://doi.org/10.3390/jsan14060118

Submission received: 30 September 2025 / Revised: 3 December 2025 / Accepted: 5 December 2025 / Published: 11 December 2025

(This article belongs to the Special Issue Security and Smart Applications in IoT and Wireless Sensor and Actuator Networks)

Download

Browse Figures

Versions Notes

Abstract

Industry 4.0 advanced technologies are increasingly used to monitor workers and reduce accident risks to ensure workplace safety. In this paper, we present an on-premise, rule-based safety management system that exploits the fusion of data from an Ultra-Wideband (UWB) Real-Time Locating System (RTLS) and AI-based video analytics to enforce context-aware safety policies. Data fusion from heterogeneous sources is exploited to broaden the set of safety rules that can be enforced and to improve resiliency. Unlike prior work that addresses PPE detection or indoor localization in isolation, the proposed system integrates an UWB-based RTLS with AI-based PPE detection through a rule-based aggregation engine, enabling context-aware safety policies that neither technology can enforce alone. In order to demonstrate the feasibility of the proposed approach and showcase its potential, a proof-of-concept implementation is developed. The implementation is exploited to validate the system, showing sufficient capabilities to process video streams on edge devices and track workers’ positions with sufficient accuracy using a commercial solution. The efficacy of the system is assessed through a set of seven safety rules implemented in a controlled laboratory scenario, showing that the proposed approach enhances situational awareness and robustness, compared with a single-source approach. An extended validation is further employed to confirm practical reliability under more challenging operational conditions, including varying camera perspectives, diverse worker clothing, and real-world outdoor conditions.

Keywords:

computer vision; edge AI; edge computing; industrial safety; multi-sensor data fusion; RTLS; UWB

1. Introduction

Ensuring safety in the workplace is not only a legal obligation but also an ethical duty to those who contribute to production activities daily. In contemporary industrial settings, advanced technologies, including sensors, monitoring systems, and artificial intelligence, play a key role in protecting the health and safety of workers [1]. Very often, wearable devices are used to monitor workers, e.g., to measure physiological parameters or to track real-time positions. These physiological parameters can provide valuable data to assess the health of workers to prevent long-term injuries due to bad postures or repetitive movements. Real-time positions can be analyzed to detect entry into hazardous areas or prevent accidents by raising an alarm in case of potential collisions, for instance, with vehicles.

Another technology that has recently been introduced to improve workplace safety is monitoring via computer vision. In this case, machine learning techniques are used to detect whether workers wear Personal Protective Equipment (PPE) correctly, such as hard hats or high-visibility vests, by analyzing images [2]. Whenever a worker does not wear all the required PPE properly, an alarm can be raised, thus improving compliance with regulations, which can help mitigate the effects of accidents.

Both technologies have proven to be effective individually, but each has its limitations. Wearable devices, for example, must be worn at all times and are susceptible to battery depletion. Meanwhile, PPE detection via image analysis operates independently of worker compliance but can be hindered by visual obstructions or detection failures.

In this paper, we propose a hybrid approach to accident prevention that integrates multiple data sources. In particular, we combine worker indoor localization data with PPE compliance information obtained through image analysis. This integration enables detailed monitoring of critical areas and supports the enforcement of safety protocols by detecting both the presence of workers in hazardous zones and the correct use of protective equipment. This approach, based on heterogeneous data fusion, is exploited to enrich the set of safety rules that can be enforced and to improve the resiliency of the system, which can still ensure minimal safety in case one system is unavailable. This integration enables the enforcement of context-aware safety policies that cannot be implemented by either technology alone, such as verifying PPE compliance specifically within hazardous zones or detecting unauthorized personnel through cross-validation of visual and localization data.

To the best of the authors’ knowledge, no prior work has demonstrated the integration of UWB-based indoor localization with AI-based PPE detection in a unified rule-based engine for industrial safety monitoring. While previous studies have explored either PPE detection or indoor localization individually, the proposed system uniquely combines per-person PPE status (from AI), precise zone membership (from RTLS), and authorization-level verification (from RTLS) to enable safety rules that explicitly require heterogeneous data fusion.

The feasibility of the proposed approach is validated via a proof-of-concept implementation that comprises the following two components: a commercial Ultra-Wideband (UWB) localization system and an AI-based PPE detection system. The latter analyzes video streams captured by a fixed surveillance camera to automatically verify PPE compliance. The implementation is exploited to validate the system, showing that the system is capable of processing video streams on edge devices with 89.5% accuracy at 10 frames per second, while the UWB positioning achieves 0.36 m accuracy at 10 Hz, demonstrating that the proposed approach can be implemented using technologies available on the market.

The efficacy of the system is validated considering a set of seven safety rules implemented in a controlled laboratory scenario. The validation shows that the proposed approach enhances the set of rules each system can enforce singularly and improves the resiliency to the failure of a single component. An extended quantitative validation is also executed to assess the performance in an environment similar to an industrial scenario and highlight its limits.

This paper is structured as follows. Section 2 provides an overview of existing safety technologies, with a focus on solutions based on heterogeneous data fusion. Section 3 describes the proposed system architecture and its main components, along with the proof-of-concept implementation. Section 4 reports the results of performance tests carried out in a realistic industrial setting, including controlled laboratory scenarios and extended qualitative validation in realistic environments with varying camera perspectives, worker clothing, and real-world forklift detection. Finally, Section 5 summarizes the key findings, discusses the current system’s limitations, and outlines future research directions toward a holistic approach to workplace safety.

2. State of the Art

This section provides an overview of recent technological advancements aimed at improving workplace safety, with a particular focus on solutions that integrate heterogeneous sources of information.

Wearable devices have emerged as promising non-intrusive tools to reduce the risk of accidents in high-risk environments such as construction sites and industrial facilities. These devices can monitor physiological parameters to help prevent chronic injuries and health incidents [3]. Beyond physiological monitoring, wearables are also widely used for indoor localization. Various studies have investigated the use of wearable tags and wireless technologies to track the position of workers or vehicles within industrial areas [4].

Among the available wireless technologies, Ultra-Wideband (UWB) has gained significant attention for industrial safety applications due to its high positioning accuracy and robustness in challenging environments. The study by [5] demonstrates that UWB-based Indoor Positioning Systems (IPS) can achieve centimeter-level accuracy for Industrial Internet of Things (IIoT) applications, with localization errors typically ranging from 10 cm to 30 cm. The study highlights UWB’s superior performance compared to Bluetooth and Wi-Fi in metal-rich industrial environments, where multipath interference is common. More recently, ref. [6] proposed an integrated approach combining UWB with barometric sensors for 3D indoor area recognition, enabling more accurate personnel security monitoring in multi-floor industrial facilities. This fusion approach addresses the challenge of vertical positioning, which is often problematic in traditional 2D localization systems.

For real-time worker safety applications, UWB-based Real-Time Locating Systems (RTLS) have shown promising results. In [7], an untethered RTLS was developed specifically for road-worker safety, where UWB sensors embedded in safety cones track workers wearing smart helmets equipped with mobile tags. The system can detect when workers enter predefined danger zones and trigger automatic alerts, demonstrating the practical feasibility of UWB for dynamic safety monitoring. Similarly, ref. [8] presents an Industry 4.0 implementation of UWB-based RTLS for optimized tracking in manufacturing environments, reporting improvements in both worker safety compliance and operational efficiency through automated proximity alerts and restricted area monitoring. In addition to single-purpose localization, some studies have explored sensor fusion to enhance tracking accuracy in industrial settings.

For example, in [9], the authors propose a system for tracking forklifts in large-scale warehouses by fusing data from multiple sensors: an inertial measurement unit, an optical flow sensor, and a commercial UWB system using an Unscented Kalman Filter.

In parallel, computer vision has gained traction as a means to improve PPE compliance monitoring. For instance, ref. [2] presents a deep learning-based system for detecting helmets and safety vests in real-time using edge devices equipped with an Intel Neural Compute Stick 2. The use of local processing minimizes bandwidth consumption and alleviates privacy concerns associated with cloud-based solutions. This work is further extended in [10], where different machine learning models are evaluated for performance improvements and tested with various deployment methods across different hardware platforms.

Recent advances in deep learning architectures have significantly improved PPE detection performance. A systematic review by [11] analyzed computer vision-based PPE compliance systems in industrial practice, identifying object detection approaches as the most effective solutions, achieving detection accuracies exceeding 95% in controlled environments. The study also highlights key challenges, including occlusion handling, varying lighting conditions, and the need for large annotated datasets. Building on these findings, ref. [12] conducted a comparative study of YOLOv8 architectures for PPE detection on benchmark datasets, demonstrating that YOLOv8-Nano achieves a favorable balance between accuracy (mAP@50 > 90%) and inference speed, making it particularly suitable for resource-constrained edge devices.

Edge computing has emerged as a critical enabler for real-time PPE monitoring in industrial settings. For instance, ref. [13] proposed a low-cost system for real-time PPE verification using edge computing devices, reporting an Average Precision improvement of 6% compared to cloud-based approaches while reducing latency to less than 100 ms. The system processes video streams locally, generating immediate alarms when PPE violations are detected. For deployment in construction sites, ref. [14] developed a framework based on the real-time pixel-level detection model YOLACT with MobileNetV3 as backbone, achieving 91.3% accuracy in determining workers’ PPE-wearing status while maintaining compatibility with standard surveillance infrastructure.

While the above works demonstrate mature single-modality safety monitoring solutions, recent research has begun exploring broader hetereogeneous data fusion strategies for safety applications. In [15], textual data from historical accident reports are combined with real-time sensor readings (e.g., temperature, pressure) using Natural Language Processing (NLP) techniques. This integration enables a richer understanding of incident causes and supports improved risk prediction. Similarly, ref. [16] addresses accident prediction in motorway tunnels by fusing structured data (e.g., weather and historical incidents) with unstructured image data processed through Gabor filters. The combined use of GRU neural networks and the AdaBoost algorithm demonstrates improved prediction accuracy. The application of multimodal sensor fusion in smart manufacturing has been extensively reviewed by [17], who identified key trends including the integration of vision, acoustic, vibration, and IoT sensor data for comprehensive process monitoring. The study emphasizes that effective fusion strategies must address challenges such as temporal synchronization, heterogeneous data representation, and real-time processing constraints. Building on these principles, ref. [18] proposed MFGAN, a multimodal fusion approach for industrial anomaly detection that combines attention-based autoencoders with generative adversarial networks. The method fuses visual, thermal, and acoustic data to detect subtle anomalies in manufacturing equipment, achieving superior performance compared to single-modality baselines in early fault detection.

Although several works explore the fusion of heterogeneous data to enhance safety, none of the reviewed studies investigates the integration of indoor localization data with real-time PPE compliance. This gap motivates our proposal to combine these two complementary technologies to enable more accurate and context-aware safety monitoring.

More specifically, prior works have addressed PPE detection at the edge and indoor localization separately, and demonstrated safety-critical fusion in adjacent domains (e.g., accident prediction in tunnels, predictive maintenance). However, we found no study that integrates indoor localization with real-time PPE compliance in a unified, rule-based engine specifically designed for industrial safety monitoring. This gap motivates the proposed approach, which explicitly couples per-person PPE status, zone membership, worker authorization, and vehicle proximity into event-level rules. By fusing these heterogeneous data streams at the aggregation layer, our system enables enforcement of context-aware safety policies that cannot be realized by either technology alone.

3. Multi-Source Industrial Safety Monitoring

This section provides an overview of the proposed multi-source industrial safety monitoring system, structured into three key modules, as depicted in Figure 1.

The system is composed of the following modules:

Localization Module: Responsible for real-time tracking of workers’ positions within the monitored area.
Artificial Intelligence (AI) Module: Detects the presence of workers and verifies PPE compliance using video analysis.
Aggregation Module: Integrates data from the localization and AI modules, applies predefined safety rules, and triggers alerts when necessary.

Figure 1. The overall architecture of the proposed multi-source safety monitoring system. The Localization Module interfaces with a commercial UWB RTLS to track workers and vehicles in real time, providing position coordinates and tag metadata. The AI Module processes camera feeds on edge devices (Raspberry Pi 5 with Coral Edge TPU) to detect workers, verify PPE compliance, and identify forklifts. The Aggregation Module fuses both data streams through frame-synchronous matching, evaluates seven safety rules, and triggers visual/auditory alerts when violations are detected. All processing occurs on-premises to ensure privacy and minimize latency.

Each module is interfaced with specific physical systems that collect and process data:

The Localization Module communicates with a commercial Ultra-UWB Real-Time Location System (RTLS). This system consists of fixed anchors, typically installed at the boundaries of the monitored area, and wearable tags (e.g., badges or smartwatches) carried by workers. The tags periodically transmit signals that are received by the anchors, which forward the data to a processing unit that calculates the worker’s position using triangulation techniques.
The AI Module processes real-time video streams captured by surveillance cameras installed in the workspace. The video frames are analyzed using a deep learning-based object detection model running on an edge device, similar to the approach proposed in [2]. The model detects workers and verifies whether they are wearing mandatory PPE, such as helmets and high-visibility vests.
The Aggregation Module acts as a central processing unit that fuses the data streams from the localization and AI modules. It cross-references workers’ positions with PPE compliance status and applies predefined safety rules. If a worker enters a hazardous area without the required protective equipment, the system can trigger alerts, such as visual or auditory warnings, to ensure compliance with safety protocols.

The integration of these three modules enables a more robust and context-aware approach to workplace safety, overcoming the limitations of individual subsystems operating in isolation. In the following subsections, we provide further details on the functionalities of each module and describe the proof-of-concept implementation developed to validate the feasibility of the proposed approach.

3.1. Localization Module

The Localization Module is responsible for interacting with commercial RTLS to enable real-time worker and vehicle tracking. The system distinguishes three categories of tracked entities: workers, visitors, and forklifts (representing indoor vehicles). Each entity is equipped with a tag that continuously transmits localization data. This data, including the tag ID, object type, and coordinates, is forwarded to the Aggregation Module for further processing.

The module is designed to be flexible and expandable, ensuring compatibility with different RTLS. Integration is achieved through a standardized interface based on RESTful APIs. New RTLS solutions can be added by developing a dedicated plug-in that implements the required communication protocol. This modular approach abstracts implementation details while handling initialization, data processing, and tag management uniformly.

Our proof-of-concept implementation integrates the commercial UbiTrack UWB Positioning System (https://www.ubitrack.com/products/starter-kit (accessed on 30 September 2025)), priced at approximately $2400. The manufacturer specifies a localization accuracy of under 30 cm, supporting both Time Difference of Arrival (TDoA) and Two-Way Ranging (TWR) positioning methods. The system operates at up to 10 Hz, making it suitable for real-time tracking. Additionally, its support for REST APIs facilitates seamless integration into our safety monitoring framework.

The UbiTrack Starter Kit includes four anchors, covering up to 900

m^{2}

, along with six tags designed for different use cases:

Wristband Tag: Suitable for tracking personnel.
Location Badge: Worn as a necklace, ideal for visitor identification.
Location Tag: A rugged tag attachable to objects, used for tracking equipment and vehicles (e.g., forklifts).

For testing, the system was deployed in the Crosslab—Cloud Computing, Big Data, and Cybersecurity (https://crosslab.dii.unipi.it/(accessed on 30 September 2025)) laboratory at the University of Pisa. This environment replicates industrial conditions, featuring metal structures and electromagnetic interference sources. The four UWB anchors were installed on tripods at fixed heights.

To evaluate the RTLS performance, we conducted both static and dynamic tests using all available tags. Tests were performed with different anchor heights and monitored area sizes to assess their impact on accuracy [19].

In static tests, the tags were placed at fixed positions, and the average localization error was measured. Table 1 presents the results.

As observed, reducing the monitored area decreases the localization error by approximately 20% for the same anchor height. Similarly, lowering the anchor height improves accuracy by nearly 60% for a given area size. This indicates that anchor placement height has a greater impact on accuracy than the covered area.

For dynamic tests, we set the parameters that yielded the lowest error in static tests (i.e., a 28

m^{2}

area with 2.0 m anchor height). A person carried all six tags while moving them along predefined linear trajectories. Two test runs were conducted following perpendicular paths, as illustrated in Figure 2.

To enhance localization accuracy, we implemented two filtering techniques:

Kalman Filter (KF) to reduce noise and refine position estimates.
Kalman Filter with Exponential Moving Average (KEF) to further smooth high-frequency oscillations.

Table 2 presents the dynamic test results and reports average localization errors for the six tags across different filtering configurations: No Filter (NF), Kalman Filter (KF), and Kalman Filter with Exponential Moving Average (KEF).

Applying the Kalman Filter reduced localization error by approximately 20% along the X-axis and 13% along the Y-axis. The combined Kalman Filter and Exponential Moving Average (KEF) further improved accuracy, achieving a 24% reduction on the X-axis and 23% on the Y-axis. These results highlight the effectiveness of post-processing techniques in refining RTLS-based localization.

3.2. AI Module

The AI module is responsible for analyzing images captured by surveillance cameras to detect the presence of workers, verify correct PPE usage, and identify forklifts operating within the monitored area. The extracted information is sent to the Aggregation Module, where it is combined with localization data to enforce safety rules and trigger alerts when necessary.

The PPE detection component is based on the architecture proposed in [2] and recently improved in [10], which adopts an edge computing approach. Images are processed locally on an embedded system deployed near the camera, eliminating the need to offload data to the cloud. This ensures privacy preservation and system resilience to network failures.

YOLOv8 was selected as the core object detection model due to its high maturity, active community support, and ease of adaptation to different hardware platforms. Various YOLOv8 configurations were tested, differing in model complexity and number of parameters. However, our evaluation showed that models more complex than the Nano version, despite having a significantly higher parameter count, did not provide appreciable improvements in detection accuracy. Therefore, we selected YOLOv8-Nano for our final implementation, as it offers the best trade-off between accuracy, speed, and resource usage.

The YOLOv8-Nano model, consisting of 3.2 million parameters, was trained on a high-performance server equipped with an NVIDIA A100 GPU (80 GB VRAM). The training set extends the dataset introduced in [2] and recently used in [10], with additional labeled images and object classes, including person and forklift, to better fit our scenario. The training was performed using five-fold cross-validation. The YOLOv8-Nano model is first converted to TensorFlow format, then quantized (using the TFLite format), and finally compiled for execution on the Edge TPU (https://coral.ai/(accessed on 30 September 2025)). After quantization, the YOLOv8-Nano size is significantly reduced, with a final footprint of approximately 3.3 MB. Despite its reduced size and lower numerical precision (INT8), the quantized model maintains high detection accuracy, as shown in Table 3. For model evaluation, we adopted the

A P_{@ 50}

metric (Intersection-over-Union threshold at 50%) for each class.

In our proof-of-concept, the AI module is deployed on a Raspberry Pi 5 board (https://www.raspberrypi.com/products/raspberry-pi-5/(accessed on 30 September 2025)) equipped with a Google Coral USB accelerator (https://coral.ai/products/accelerator/ (accessed on 30 September 2025)) to execute the model on the Edge TPU. Images are captured using an Intel RealSense D435 depth camera, connected via USB.

This deployment is both cost-effective (under $200 excluding the camera) and energy-efficient (consumption below 30 W under full load). The software stack is written in Python 3.10 and exposes results via a REST API for communication with the Aggregation Module.

To assess real-time performance, we conducted tests using three-minute video clips. After a 20-s warm-up phase, we measured the model’s inference time (i.e., the time required to process a single frame), the number of frames per second (FPS), and the

m A P_{@ 50}

. Results are summarized in Table 4.

The results confirm that YOLOv8-Nano enables efficient and accurate real-time video analysis, making it suitable for deployment in industrial safety scenarios.

3.3. Aggregation Module

The Aggregation Module serves as the central component of the safety monitoring system. It collects and integrates data from the Localization and AI modules to evaluate safety conditions in real-time. Its main responsibility is to enforce a predefined set of safety rules and, in case of violations or hazardous situations, trigger appropriate alerts (e.g., acoustic or visual signals) to notify workers in the area.

The module is implemented following a Model-View-Controller (MVC) pattern, which ensures a modular and scalable structure. This design promotes ease of maintenance and facilitates the future addition of new rules or input sources. All internal and external communications occur through RESTful APIs, allowing loose coupling with the AI and Localization modules. Like the other components, the Aggregation Module runs on-premises entirely, preserving worker privacy and minimizing latency.

To enable real-time monitoring and system supervision, the Aggregation Module provides a comprehensive graphical user interface (GUI) that integrates multiple information streams into a unified dashboard (see Figure 3). The interface serves three primary functions: (1) real-time visualization of both AI detections and RTLS tracking data, (2) immediate notification of safety rule violations through visual alerts and chronological event logs, and (3) interactive configuration of system parameters, including critical zone definitions, capacity limits, tag-to-person mappings, and authorization policies. This design allows safety operators to monitor the entire facility at a glance while maintaining full control over system behavior without requiring manual code modifications. Specifically, the interface includes:

a live video stream from the cameras with object detection annotations (bounding boxes and class labels);
a 2D map of the monitored area showing the positions of UWB anchors, cameras, and tracked entities;
a dashboard displaying logs of rule violations, system messages, and visual alerts;
a configuration panel for setting critical zones, defining authorized personnel, updating tag associations, and managing other system settings.

The Aggregation Module receives:

From the Localization Module: a list of active tags, each with a unique ID, type (worker, visitor, or forklift), and real-time coordinates.
From the AI Module: a list of detected objects per video frame, including people, forklifts, and PPE items such as helmets and vests.

The data fusion process between the AI and Localization modules is implemented as a frame-synchronous event pipeline. Let T denote the set of active RTLS tags and D the set of objects detected in the current video frame f. Each detected person

d \in D

is characterized by an image-space centroid, computed as the center of the bounding box provided by the object detector. Using the camera calibration parameters (calculated at installation time), this centroid is projected onto the ground plane to obtain a world-space position

p_{d} = (x_{d}, y_{d})

. Each RTLS tag

t \in T

provides a real-time position

p_{t} = (x_{t}, y_{t})

in the same world coordinate frame, together with metadata such as unique identifier, role, and authorization level. Associations between visual detections and RTLS tags are established through a nearest-neighbor matching procedure with spatial gating. A candidate pair

(d, t)

is considered admissible if the Euclidean distance between their projected positions does not exceed a confidence radius r, defined as

r = k σ_{RTLS}

with

k \approx 2

, where

σ_{RTLS}

represents the typical localization uncertainty. Among all admissible pairs, this association minimizes the following spatial discrepancy function:

c (d, t) = {∥ p_{d} - p_{t} ∥}_{2} .

(1)

This discrepancy quantifies the geometric discrepancy between the world-space position inferred from the camera image and that provided by the RTLS system. The matching process thus selects the configuration that minimizes the overall spatial inconsistency between the two sensing modalities.

The centroid projection used to calculate the world-space position

p_{d} = (x_{d}, y_{d})

is carried out as follows. For each bounding box detected by the AI module with class ID 0 (person) or 5 (forklift), the system computes the real-world position using camera calibration parameters configured during installation. Given a detected object centered at pixel coordinates

(x, y)

in an image with resolution

(r e s_{x}, r e s_{y})

, the vertical angular offset

Φ

from the camera’s optical axis is computed using the camera’s vertical field of view

f o v_{v}

:

Θ = \frac{y \times f o v_{v}}{r e s_{y}}, Φ = Θ - \frac{f o v_{v}}{2}

(2)

Using the distance z from the camera to the detected object (obtained from depth information provided by the camera) and the angle

Φ

, the ground-plane distance b is calculated as:

b = z \times cos (Φ)

(3)

The horizontal angular offset

γ

is then computed using the horizontal field of view

f o v_{h}

:

α = \frac{x \times f o v_{h}}{r e s_{x}}, γ = α - \frac{f o v_{h}}{2}

(4)

Finally, the coordinates

(x_{d}, y_{d})

in the ground plane are obtained as:

x_{d} = b \times sin (γ), y_{d} = b \times cos (γ)

(5)

These coordinates are expressed in the same reference frame used by the RTLS, enabling direct spatial comparison. Camera parameters, including position, orientation (in degrees), height, resolution, and field of view, are configured through the graphical interface during the initial system setup phase.

The computational complexity of this nearest-neighbor association is

O (| D | | T |)

per frame, since each visual detection is compared with every active RTLS tag to evaluate the pairwise distance. Although more efficient data structures (e.g., k–d trees) could reduce this cost, the exhaustive formulation guarantees deterministic behavior and predictable real-time performance given the moderate number of entities typically involved. Detections or tags exceeding the gating threshold are treated as unmatched and are conservatively ignored to prevent false associations. Once the association step is completed, the Aggregation Engine evaluates the predefined rule set

{R_{1}, \dots, R_{7}}

on the resulting tuples

〈 person, PPE, zone, authorization, forklift proximity 〉

, enabling the system to reason over heterogeneous contextual information in real time.

The system architecture is implemented using a microservices design in which the three modules communicate via RESTful APIs using JSON-formatted messages. This design choice provides enhanced scalability, maintainability, and fault tolerance, allowing the Aggregation Module to continue operating even if one of the other modules becomes temporarily unavailable. To handle concurrent data streams efficiently, the module employs a multi-threaded processing pipeline with a pool of 20 worker threads instantiated at startup. When a new localization message arrives via the /aggregator/localization endpoint, a dedicated thread processes the tag position update. Similarly, when a detection message arrives via the /aggregator/detection endpoint, another thread handles the incoming frame data. This parallel processing strategy minimizes latency and ensures that the system can handle multiple simultaneous events without blocking.

The module maintains three primary data structures that are updated in real time as new information arrives. The objectList is a dynamically updated list representing the current state of all tracked entities, where each element contains information about a specific tag, including its current position, associated PPE status (if matched with a visual detection), zone membership, and authorization level. The viewerTable is a producer-consumer queue used to update the graphical user interface: when the data receiver server processes incoming information, it adds GUI update commands to this table, which are then consumed by a dedicated thread that renders the corresponding visual elements on the map display. The alarmList stores pending alarm notifications; when the rule evaluation logic detects a violation, it adds an alarm entry to this list, which is subsequently consumed and displayed in the log area of the graphical interface, optionally triggering a siren if configured.

Figure 3. Graphical interface of the Aggregation Module showing the three main components of the system’s user interface. (a) Real-time camera feed panel with object detection annotations, displaying bounding boxes (red circles are the center) and class labels for detected workers, PPE items (helmet, vest), and forklifts. (b) 2D map panel showing UWB anchor positions (blue triangles), critical zones (red rectangles), the camera (green camera), an indication alert position (red arrow), and tracked entities. Visual markers indicate compliance status: blue circles represent compliant workers/vehicles, while red crosses highlight violations. (c) Dashboard panel displaying chronological rule violation logs with timestamps and rule IDs, system status messages, and a configuration panel for defining critical zones, setting capacity limits, managing tag-to-person associations, and configuring authorization policies.

The system’s operational performance depends on several configurable parameters and environmental factors validated through laboratory and field tests. Table 5 summarizes the key parameters and their validated operational ranges. The association radius r is set as

k σ_{RTLS}

with

k \approx 2

, resulting in a threshold of approximately 0.7 m. Camera height was validated across a range of 2.0 to 5.0 m with tilt angles between

- 10 °

and

- 30 °

. The qualitative validation identified that detection accuracy degrades significantly beyond approximately 8 m from the camera, where individuals may not be reliably identified by the AI module. Additionally, objects positioned at image boundaries or heavily occluded by other entities may be missed by the detection algorithm. These limitations are inherent to vision-based systems and highlight scenarios where the RTLS provides critical redundancy. The proximity threshold for collision detection (Rule 6) is configurable and was set to 1.0 m during validation.

To maintain partial functionality in case of component failure, the Aggregation Module implements a fault tolerance mechanism through two dedicated monitoring threads that continuously verify the operational status of the AI and Localization modules. Each monitoring thread checks every 10 s whether the corresponding module has sent data within a configurable timeout period by tracking the elapsed time since the last received message. If the timeout threshold is exceeded, the module is declared unavailable, and the system transitions to a degraded operational mode with relaxed safety constraints. When the Localization Module is unavailable, the system can still enforce Rules 2 and 3, which rely primarily on visual detection data. Conversely, when the AI Module is unavailable, the system continues to enforce Rules 4, 5, 6, and 7, which depend primarily on localization data. Rule 1, which requires both PPE detection and zone membership verification, becomes unavailable when either module fails. During degraded operation, the monitoring thread responsible for the failed module periodically attempts to re-establish the connection every 10 s, and once the connection is successfully restored, the system automatically returns to full operational mode, re-enabling all seven safety rules.

By correlating this data, the module can identify mismatches between detected persons and registered tags (e.g., an unidentified person not wearing a tag), and verify PPE compliance based on location and context (e.g., presence in a critical zone).

The Aggregation Module evaluates a set of safety rules designed to prevent unauthorized access, ensure PPE usage, regulate forklift operations, and monitor proximity conditions. Some rules rely on the fusion of heterogeneous data sources (e.g., combining visual and localization data), while others use data redundancy to ensure robustness in case of partial system failure.

The following rules have been defined in order to showcase the potential of the proposed data fusion approach:

Rule 1 (R1): An alarm is triggered if a person is detected in a critical area without wearing the required PPE (e.g., helmet or vest).
Rule 2 (R2): An alarm is triggered if an unidentified person (i.e., not associated with any tag) is detected.
Rule 3 (R3): An alarm is triggered if a forklift is detected by the AI module but not tracked by the RTLS.
Rule 4 (R4): An alarm is triggered if a forklift is in operation without an authorized driver nearby.
Rule 5 (R5): An alarm is triggered if the number of individuals in a critical area exceeds the predefined limit.
Rule 6 (R6): An alarm is triggered if the distance between a worker and a forklift drops below a minimum safety threshold and continues to decrease.
Rule 7 (R7): An alarm is triggered if a person enters a restricted area where they are not authorized.

In order to analyze the enforcement of each rule with different configurations, Table 6 analyzes the five core features provided by three different deployment strategies: AI-only, i.e., a system relying only on computer vision, RTLS-only, i.e., a system relying only on UWB localization, and the proposed approach, i.e., a system that relies on the fusion of the data from the AI and the RTLS systems. The table shows clearly that the proposed approach is the only one providing all the features.

If we analyze the features provided by the AI PPE detector and the RTLS, we can notice that the two technologies offer complementary characteristics, as the AI module excels at visual analysis, while the RTLS provides accurate identification and localization. However, neither system in isolation provides all the features. More specifically, the AI-only system provides accurate PPE detection and can visually recognize a person’s presence, but it cannot determine their identity, such as whether the worker is authorized or not. This limitation is addressed by the RTLS, which assigns each worker a unique, pre-configured tag, enabling reliable identification of the person. Similarly, when it comes to tracking, the AI module can only track people or vehicles within its field of view, making its tracking capability inherently partial and dependent on camera positioning. In contrast, RTLS offers comprehensive site-wide tracking, providing continuous updates even in non-visible areas. The same reasoning applies to vehicle tracking, where AI-only solutions may not work due to occlusions or limited viewpoints, whereas RTLS ensures consistent tracking across the entire environment. Finally, when it comes to people counting, the AI system outperforms RTLS because it can detect all visible individuals, including visitors or untagged personnel. RTLS, on the other hand, can only count people wearing an active tag, making its counting capability only partial.

Based on these features in Table 7, we analyze the feature requirements of each safety rule, showing that many rules depend on multiple complementary features, which cannot be provided by the AI PPE detector or the RTLS only. In the following, we analyze the requirements for each rule.

Rule 1 requires a combination of PPE detection and person tracking. A visual system can fairly determine whether a worker is equipped with the required PPE, however, it cannot accurately determine the worker’s position relative to the entire surface served by the system. The RTLS system, on the other hand, offers precise localization but cannot perform visual assessments. Their integration allows for verification of PPE compliance within critical areas.

Rule 2 requires the availability of person identification and tracking. The RTLS system assigns each tag to a specific individual, thus enabling reliable identification, however, it cannot detect untagged individuals. The AI module, on the other hand, detects all visible individuals, including visitors or unauthorized personnel, but does not provide identity information. By correlating visual detections with RTLS tag data, the integrated system identifies discrepancies between detected individuals and registered tags, thus enabling the detection of untagged or unexpected individuals within the monitored environment.

Rule 3 requires vehicle tracking from both sources. AI-based detection identifies any visible vehicle, including unregistered or unexpected ones, however, it remains prone to inaccuracies due to occlusions and camera coverage limitations. RTLS ensures continuous tracking of tagged vehicles, however, it cannot detect untagged forklifts. Data fusion enables cross-validation of vehicle detections, thus ensuring the identification of missing tags and reducing false alarms.

Rule 4 relies on people and vehicle tracking, along with the ability to associate drivers with specific vehicles. AI can classify people and vehicles; however, it cannot determine their identity. RTLS provides identity information and tracks both workers and vehicles; however, it cannot assign semantic labels. Merging the two streams of information allows for the verification that a forklift is operated by an authorized worker associated with the corresponding RTLS tag.

Rule 5 requires the counting and tracking of people. The AI-based configuration is capable of counting all visible individuals, regardless of whether they are wearing a tag. On the other hand, RTLS provides accurate zone membership, however, it can only count tagged personnel. Their combination enables accurate and comprehensive monitoring in specific areas, thus ensuring compliance with capacity limits under realistic conditions.

Rule 6 relies on tracking people and vehicles, with a focus on precision. The AI-only system provides PARTIAL tracking because image-based distance estimation is affected by perspective distortions, occlusions, and a lack of depth information. RTLS provides accurate three-dimensional distance measurement; however, it does not distinguish between workers and vehicles. The combined system can ensure the accuracy of RTLS with the semantic classification of the AI system, thus enabling robust proximity assessment and preventing both false positives (e.g., proximity events involving non-human objects) and false negatives (e.g., occluded workers remaining tracked via RTLS).

Rule 7 requires the identification and tracking of individuals. RTLS provides identity and authorization; however, it cannot detect untagged individuals. AI detects all the individuals, but it cannot determine their identity. The integrated system links visual detections to RTLS tag data, ensuring that only authorized and properly identified personnel access restricted areas, while also enabling the detection of unauthorized or untagged access.

In the following section, we present experimental validation that demonstrates how the proposed system successfully enforces all seven rules in realistic scenarios.

4. Validation Tests

The implementation presented in Section 3 was used to evaluate the proposed safety monitoring system and its decision-rule implementation. To this aim, we consider two different scenarios: a controlled laboratory scenario, designed to verify the rule enforcement and data fusion logic, and a set of qualitative tests that are conducted in an environment more similar to an industrial one to assess the AI module’s robustness under operational conditions similar to the in-field deployment.

The in-lab validation comprised a series of real-world-inspired scenarios recreated in a laboratory environment configured to replicate a typical industrial workspace. Each scenario is designed to trigger each safety rule based on the fusion of data from the AI and localization modules. The controlled scenarios (Section 4.1) describe six test cases illustrating the system’s behavior under both compliant and non-compliant conditions. Subsequently, the qualitative validation (Section 4.2) evaluates the AI module’s performance under varying camera perspectives, worker clothing, scene complexity, and real industrial environments.

Before presenting the validation scenarios, it is useful to recall how the monitored environment is represented within the system’s graphical interface. The interface displays a map of the workspace in which critical areas, zones requiring restricted access, or the use of specific PPE, are highlighted as red rectangles. These areas are fully configurable and can be adapted to different safety requirements.

Entities tracked by the system, such as workers and forklifts, are visualized using intuitive markers. When an individual operates in compliance with all safety rules, for example, wearing the required PPE and remaining within authorized zones, they are represented as blue circles. In contrast, when a violation occurs, such as entering a restricted area without authorization, missing PPE, or proximity between a worker and a forklift, the corresponding entity is displayed as a red cross. These visual indicators allow operators to monitor the environment at a glance and respond promptly to any hazardous situation.

4.1. Controlled Laboratory Scenarios

4.1.1. Scenario 1: Worker in a Critical Area with/Without PPE

This scenario was designed to test the system’s ability to verify proper PPE usage in critical zones. A person entered a critical area while being correctly identified by both the AI and Localization modules. In the compliant case, shown in Figure 4, the person was wearing both a helmet and a high-visibility vest, and no alert was triggered. In a second test (Figure 5), the same person entered again the critical area without the vest. The AI module detected the missing PPE, and the Aggregation Module triggered a visual alert on the map and logged the violation.

4.1.2. Scenario 2: Unauthorized Person in the Working Area

In this scenario, we assessed the system’s ability to detect an unauthorized person, i.e., an individual present in the monitored area without carrying a localization tag. Initially (Figure 6-Top), no person was present, and the system correctly remained idle. In the second phase (Figure 6-Bottom), a person entered the area without a UWB tag. The AI module detected the individual; however, since the Localization Module did not provide a corresponding position, the Aggregation Module correctly triggered an alert, highlighting the presence of an unidentified person.

4.1.3. Scenario 3: Unauthorized Forklift Driver

This test evaluated the system’s ability to determine whether a person interacting with a forklift was granted the required driving authorization. In this test, the forklift is simulated using a wheeled cart on which a tag for the localization of a forklift was deployed. In the compliant case (Figure 7-Top), the person was identified by a tag associated with forklift operation privileges. The system confirmed the match, and no alarm was generated. In the second case (Figure 7-Bottom), the same worker was associated with a tag, the associated user, however, lacked the required driving authorization. Although the cart was not visually recognized as a forklift by the AI module, the system relied on localization data to infer the vehicle’s presence. In this case, the system correctly triggered an alert as the operator was unauthorized.

Figure 6. Scenario 2—(Top): The monitored area appears empty, with no detections reported by either the AI or RTLS modules, and no alarm is triggered. (Bottom): The camera feed (left panel) shows a person detected by the AI module, while the map view (right panel) displays no corresponding RTLS tag. The map view displays the worker (red cross), critical zone (red rectangle), UWB anchor positions (blue triangles), the camera (green camera), and an indication alert position (red arrow). The Aggregation Module identifies the inconsistency between the AI detection and the absence of localization data, activating Rule 2 (unauthorized person). A visual alert (red cross) is issued to indicate the untagged individual.

Figure 7. Scenario 3—(Top): The (right panel) displays the map view, where an authorized operator (blue circle) is positioned near the forklift. The map view displays the worker (blue circle), UWB anchor positions (blue triangles), the camera (green camera), and an indication alert position (red arrow). The RTLS verifies that the operator’s tag possesses the required forklift operation privileges, and no alarm is triggered. (Bottom): An unauthorized individual (red cross) is detected in proximity to the forklift. The Aggregation Module cross-checks the worker’s tag authorization with the RTLS proximity data, identifying the absence of operation privileges. Consequently, Rule 4 is activated, logging the violation and issuing a visual alert, even if the forklift is not visually detected by the AI module.

4.1.4. Scenario 4: Overcrowded Critical Area

This scenario aimed to verify the system’s ability to detect overcrowding inside critical areas. The maximum capacity for the area was set to one person. In the compliant case (Figure 8-Top), only one person was standing in the critical zone, while a second individual remained outside. The system did not report any anomaly as expected. In the second case (Figure 8-Bottom), both individuals are standing in the critical area. The Aggregation Module detected the overcrowding and correctly triggered an alert.

4.1.5. Scenario 5: Potential Collision with Forklift

This test simulated a potentially hazardous situation in which a worker and a forklift moved closer together, increasing the risk of a collision. In the compliant case (Figure 9-Top), the distance between their respective tags remained above the configured 1-m threshold, and no action was taken.

In the second case (Figure 9-Bottom), the distance fell below the safety limit. The system responded promptly by issuing a proximity alert, effectively highlighting the hazardous condition.

Figure 8. Scenario 4—(Top): The (left panel) presents the camera feed showing one worker inside and another outside the critical area. The (right panel) displays the map view, where one blue circle is located inside the critical zone (red rectangle) and the other outside, thus respecting the maximum capacity limit of one person. The map view displays the worker (blue circle), the critical zone (red rectangle), UWB anchor positions (blue triangles), the camera (green camera), and an indication alert position (red arrow). No alarm is triggered. (Bottom): Both workers have entered the critical area. The map view shows two entities (red crosses) within the critical zone. The RTLS identifies the zone membership of both tags, and the Aggregation Module computes the occupancy count, activating Rule 5 (overcrowding). The system generates an alert and records the capacity violation together with a timestamp.

4.1.6. Scenario 6: Unauthorized Access to Critical Area

In the final scenario, the system was tested for its ability to enforce access control in restricted areas. Two users were configured with different access rights. In the compliant case (Figure 10-Top), an authorized person entered the critical area. The system confirmed access and did not raise any alert. In the second case (Figure 10-Bottom), a person without the required permission entered the area. The Aggregation Module detected unauthorized access and triggered an alert, even though the individual was wearing all the required PPE.

Figure 9. Scenario 5—The (left panel) presents the camera feed showing a worker and a forklift maintaining a safe distance. The (right panel) displays the map view, where the corresponding tags (blue circles) are separated by more than the configured 1-m safety threshold. The RTLS continuously monitors the inter-tag distances, and no alarm is triggered. The map view displays the worker (blue circle) positioned within the critical zone (red rectangle), UWB anchor positions (blue triangles), the camera (green camera), and an indication alert position (red arrow).(Bottom): The worker and the forklift move closer to each other. In the map view, both entities are represented as red crosses, indicating a proximity violation. The map view displays the worker (red cross) positioned within the critical zone (red rectangle), UWB anchor positions (blue triangles), the camera (green camera), and an indication alert position (red arrow). The Aggregation Module calculates the real-time Euclidean distance between the tags, detecting that the separation has fallen below 1 m. Consequently, Rule 6 is activated, generating a collision-risk alert to mitigate potential accidents.

Figure 10. Scenario 6—(Top): The (left panel) presents the camera feed showing a worker wearing the required PPE while entering the critical area. The (right panel) displays the map view, where the worker (blue circle) is located within the critical zone (red rectangle). The RTLS verifies that the worker’s tag possesses the necessary authorization for this restricted area, and no alarm is triggered despite zone entry. (Bottom): A different worker, also wearing complete PPE, enters the same critical area. The map view shows this worker as a red cross inside the restricted zone. The Aggregation Module cross-checks the tag ID against the access control list for the area, detecting missing authorization. Consequently, Rule 7 is activated, demonstrating that PPE compliance alone is insufficient and that explicit zone permissions are required.

4.1.7. Wrap-Up Considerations

Overall, the experimental results confirmed the system’s ability to accurately exploit the data provided by the AI and Localization modules to enforce safety rules in real-time. All violations were correctly detected, with the graphical user interface displaying appropriate alerts and logging the events for subsequent analysis.

The validation scenarios empirically confirm the theoretical analysis presented in Table 7: in every considered case, the proposed system successfully detected violations that would have been impossible, at least with the proper level of accuracy, to identify using either the AI-only or the RTLS-only approaches. For instance, in Scenario 1 validated Rule 1 by requiring both PPE detection (AI) and zone membership (RTLS); Scenario 6 validated Rule 7 by requiring both visual person detection (AI) and authorization verification (RTLS).

Beyond technical validation, these controlled scenarios reveal some qualitative insights about the practival utility of the proposed data fusion approach for industrial safety.

First, the integration enables contextual safety enforcement that adapts to worker location and authorization level. Unlike generic PPE compliance checking, the system is capable of verifying protective equipment only when workers enter designated hazardous zones.

Similarly, access control becomes location-aware, allowing the same worker to move freely in authorized zones while triggering alerts upon unauthorized zone entry.

Second, the cross-validation capability from heterogeneous sensing enhances detection reliability and reduces false positives. Scenario 2 demonstrates how visual detection of a person without a corresponding RTLS tag reliably identifies unauthorized individuals, whereas an AI-only system could not distinguish between authorized and unauthorized personnel. Moreover, Scenario 3 illustrates an additional benefit: even when the AI module fails to visually recognize a vehicle, the RTLS data enables enforcement of driver authorization rules, preventing false negatives that could compromise safety.

Finally, the redundant data sources ensure a fault tolerance mechanism that ensures a graceful degradation in case of ourages: when one subsystem becomes temporarily unavailable, the system continues to enforce a subset of safety rules using the remaining operational module.

4.2. Validation of the AI Module in Realistic Industrial Scenarios

While the previous scenarios validated the system’s rule enforcement capabilities under controlled laboratory conditions, this section provides a qualitative evaluation of the AI module’s detection performance in more challenging operational contexts. Specifically, it examines how variations in camera perspective, worker clothing, scene complexity, and real industrial environments influence detection accuracy and reliability, thereby identifying both the system’s strengths and its operational limitations.

4.2.1. Camera Perspective Validation

This analysis aims to qualitatively evaluate the AI model’s detection accuracy under variations in camera height and tilt angle. The focus is on how these modifications influence detection quality, particularly whether certain perspectives affect recognition accuracy or reduce the model’s ability to correctly distinguish individuals within the scene.

The analyzed scenes, shown in Figure 11 and Figure 12, include more than nine people positioned at different distances from the camera. The critical working area is assumed to be located at the center of the image, corresponding to the central desk zone.

In Figure 11 and Figure 13, the camera is mounted at a height of 2.8 m above ground level with a

- 30 °

tilt angle, whereas in Figure 12, it is installed at 2.0 m with a

- 10 °

tilt.

The analysis of Figure 11 and Figure 12 highlights two main operational limitations of the detection system. First, objects located beyond approximately 8 m from the camera are consistently undetected by the AI model, as illustrated by the unidentified individuals in the back-right portion of the images. Second, objects positioned too close to the camera, particularly those more than 50% outside the field of view or partially occluded by other objects, are also missed by the detection algorithm. Examples include the person on the left edge of the frame and the individual seated on the right, both outside the effective detection range due to their position relative to the camera’s field of view.

The analysis of Figure 13 provides additional insights into the system’s behavior under challenging visual conditions. In this scene, four individuals are standing within the critical area, producing substantial overlap in the camera’s field of view. Each person partially occludes the one positioned behind, resulting in a cascade occlusion effect. This represents an inherent physical limitation of vision-based systems, which cannot compensate for occlusions caused by the spatial arrangement of individuals in crowded environments. Consistent with the observations from Figure 11, the person located in the lower-left corner remains undetected due to proximity to the image boundary. Notably, however, a head positioned beyond the 8 m distance threshold is successfully detected in this case. Unlike the missed detections in Figure 11 and Figure 12, this correct detection is likely attributable to improved environmental conditions, particularly a higher background contrast that enhances visual discrimination.

Figure 11. Frame from a scene used to validate the AI model under different camera perspectives. Configuration: camera mounted at a height of 2.8 m with a

- 30 °

tilt angle.

Figure 11. Frame from a scene used to validate the AI model under different camera perspectives. Configuration: camera mounted at a height of 2.8 m with a

- 30 °

tilt angle.

Figure 12. Frame from a scene used to validate the AI model under different camera perspectives. Configuration: camera mounted at a height of 2.0 m with a

- 10 °

tilt angle.

Figure 12. Frame from a scene used to validate the AI model under different camera perspectives. Configuration: camera mounted at a height of 2.0 m with a

- 10 °

tilt angle.

Figure 13. Frame from a scene used to validate the AI model under different camera perspectives with overlapping objects. Configuration: camera mounted at a height of 2.8 m with a

- 30 °

tilt angle.

Figure 13. Frame from a scene used to validate the AI model under different camera perspectives with overlapping objects. Configuration: camera mounted at a height of 2.8 m with a

- 30 °

tilt angle.

4.2.2. Validation with Different Worker Clothing Types

The scenes illustrated in Figure 14, Figure 15 and Figure 16 include four to five individuals positioned at various distances from the camera (average distance of approximately 4 m). This test configuration is designed to assess the model’s ability to distinguish PPE from worker clothing with similar colors. The test involves four workers wearing different garments: an orange sweatshirt (without vest), a white sweatshirt with a yellow safety vest, a green sweatshirt, and a teal-patterned jacket. The camera is mounted at a height of 2.8 m with a

- 30 °

tilt angle.

The analysis of Figure 14 demonstrates the model’s robustness against color-based false positives. Although one individual wears an orange sweatshirt, a color visually similar to high-visibility, the PPE system correctly identifies the person as not wearing a safety vest, thereby avoiding a false-positive detection. Conversely, the yellow safety vest worn by another worker over a white sweatshirt is accurately recognized, confirming the model’s ability to detect genuine PPE. Furthermore, heads, torsos, and complete person bounding boxes for all other individuals in the scene are correctly identified, demonstrating consistent detection performance across multiple subjects.

Figure 14. Frame 1 from a scene used to validate detection performance with different worker clothing types.

The analysis of Figure 15 and Figure 16 further evaluates the system’s performance under partial occlusions and different pose configurations. In both scenes, the orange safety vest worn over the orange sweatshirt is correctly recognized, even when partially occluded in Figure 16, demonstrating the model’s resilience to moderate visual obstructions. The scenes also include several individuals partially covered by other people or objects, providing a realistic test of the system’s capability to operate in crowded environments. In Figure 15, even a partially occluded head is successfully detected, indicating robust performance when sufficient visual information remains visible. However, Figure 16 presents a failure case: the person wearing the teal-patterned jacket is completely missed by the detection system, with neither the person, head, nor chest identified. This failure likely results from severe occlusion combined with the worker’s position and pose. Despite this isolated case, all visible PPE items across these scenes are correctly identified, confirming the model’s reliable PPE classification capability when workers are within the effective detection range.

Figure 15. Frame 2 from a scene used to validate detection performance with different worker clothing types.

Figure 16. Frame from a scene used to validate detection performance with different worker clothing types. The scene includes partial occlusion of worker (iv), reducing the visible area.

4.2.3. Validation with Multiple Work Groups in the Same Scene

This analysis aims to qualitatively evaluate the performance of the AI models in scenarios featuring two distinct work groups within the same scene, with particular attention to situations in which individuals interact at varying distances.

The scenes illustrated in Figure 17 and Figure 18 include two work groups, each composed of two workers equipped with different PPEs and positioned at various distances from the camera. The camera is mounted at a height of 2.8 m with a

- 30 °

tilt angle.

The system performs correctly in both cases. In each image, a yellow vest worn over a white sweatshirt and an orange vest worn over an orange sweatshirt are present and accurately detected, confirming the model’s ability to consistently recognize PPEs across multiple interacting groups.

4.2.4. Forklift Detection in Real Industrial Environment

This analysis aims to evaluate the performance of the AI models in detecting forklifts in real outdoor industrial scenarios.

Figure 19 presents four frames captured by a camera installed on the roof of a warehouse belonging to a company operating in an outdoor environment during daytime conditions. The scene depicts a forklift moving along a 50 m-long road.

The images correspond to different temporal instances of the same sequence, showing variations in the vehicle’s distance from the camera. These frames were extracted from a 15-s video recording in which the forklift traverses a designated work area. The footage, acquired using a camera mounted at a height of 5 m above ground level, provides a comprehensive overview of the operational area.

Figure 17. Frame from a scene used to validate work areas involving multiple overlapping work groups within the same view. Configuration: two work groups, each composed of two workers equipped with different PPE. The groups interact at approximately 2 m and 6 m from the camera, respectively.

Figure 18. Frame from a scene used to validate work areas involving multiple work groups within the same view. Configuration: two work groups, each composed of two workers equipped with different PPE. The groups interact at approximately 2 m and 6 m from the camera, respectively.

As illustrated in Figure 19, the forklift is correctly detected at multiple distances from the camera: (a) 25 m, (b) 20 m, (c) 15 m, and (d) 10 m. The vehicle is correctly identified, confirming that the system is capable of detecting the forklift also in real-world industrial deployment scenarios.

Figure 19. Frames extracted from a video used to validate forklift detection in a real industrial environment. Configuration: camera mounted at a height of 5 m. The frames illustrate different forklift distances from the camera: (a) 25 m, (b) 20 m, (c) 15 m, and (d) 10 m.

5. Conclusions and Future Work

This paper presents an integrated system for workplace safety monitoring, combining an indoor UWB-based localization subsystem with a computer vision module for PPE detection. By fusing heterogeneous data sources, the proposed approach enables the implementation of complex safety rules and supports real-time alerts to prevent hazardous situations.

A proof-of-concept prototype was developed and deployed in a laboratory environment to assess the system’s feasibility and effectiveness.

The evaluation of the individual modules demonstrates adequate performance for both PPE detection and indoor localization tasks. Specifically, the former achieved a mean Average Precision (mAP@50) of 89.5% for PPE compliance verification, operating at approximately 10 FPS on low-cost edge hardware such as Raspberry Pi 5 and Coral Edge TPU, with a total cost under $200. The real-time localization subsystem, on the other hand, reached an average positioning accuracy of 0.36 m after Kalman filtering and Exponential Moving Average smoothing, corresponding to a 23% improvement over the unfiltered baseline. The end-to-end latency was measured at approximately 100 ms on average, enabling near real-time responses to safety violations.

In addition to single-module evaluation, we carried out six realistic test scenarios, which confirmed the system’s ability to analyze data correctly, enforce safety rules, and react promptly to violations. All validation scenarios demonstrated accurate detection of safety breaches, including PPE non-compliance, unauthorized access, untracked vehicles, zone overcrowding, collision risks, and restricted-area incursions, with appropriate visual markers and event logs. Qualitative validation in a scenario resembling an industrial environment further confirmed the robustness of the AI module under challenging operational conditions, including variations in camera perspective (2.0–2.8 m height, −10 to −30 degrees tilt), diverse worker clothing that could interfere with PPE detection, complex multi-person scenes with occlusions, and real-world forklift detection in outdoor environments at distances ranging from 10 to 25 m. These tests revealed operational limits (e.g., reduced detection accuracy beyond 8 m and in case of heavily occluded subjects) while confirming reliable performance under typical deployment conditions. These results show that integrating localization and visual analysis enhances situational awareness and contributes to improved workplace safety and operational resilience.

The validation results confirm the theoretical analysis presented in Table 6 and Table 7, demonstrating that five out of seven safety rules require data from both subsystems and cannot be enforced using a single-modality approach. The fault tolerance mechanism further distinguishes this work by enabling graceful degradation: when one subsystem becomes temporarily unavailable, the system continues to enforce a subset of safety rules using the remaining operational module.

The validation phase also confirmed that the on-premise system architecture effectively mitigates privacy concerns, as no raw video data leaves the local network, and that the redundancy provided by the multi-sensor design improves robustness, allowing graceful degradation in case of temporary component failures.

While the current prototype establishes a solid foundation, several challenges must be addressed before large-scale deployment becomes feasible. This study, in particular, focuses on functional validation through scenario-based tests in a controlled laboratory environment rather than exhaustive end-to-end quantitative evaluation across diverse industrial sites. While our validation demonstrates correct system behavior for all seven safety rules, we acknowledge several limitations. First, the evaluation does not include long-term stress tests under adverse conditions such as extreme lighting variations, heavy occlusion, high worker density, or severe UWB multipath interference. Second, cross-site generalization has not been validated; the AI model was trained on a specific dataset and may require retraining or domain adaptation when deployed in facilities with different visual characteristics. Third, the current implementation does not address edge cases such as simultaneous failures of multiple system components or malicious tampering with tags.

Future research will include extended quantitative evaluations at the rule level, computing precision, recall, and F1 metrics over long-term annotated datasets. Field trials in operational industrial environments will assess robustness, scalability, and user acceptance. Advanced methods such as perceptual loss functions based on Structural Similarity Index Measure features will be explored to enhance model generalization under varying lighting and camera conditions, together with federated learning techniques for privacy-preserving updates across multiple deployment sites.

Author Contributions

Conceptualization, F.D.R., G.C.M., P.D., F.M. and C.V.; methodology, F.D.R., G.C.M., P.D., F.M. and C.V.; software, F.D.R., G.C.M. and N.S.; experimental validation in lab, F.D.R., G.C.M. and N.S.; writing—original draft preparation, F.D.R., G.C.M., P.D., F.M. and C.V.; writing—review and editing, F.D.R., G.C.M., P.D., C.V. and F.M.; supervision, P.D., C.V. and F.M.; funding acquisition, P.D., C.V. and F.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the project “Intelligent Visual Analysis in Work Safety—IVAWS” (CUP: I59J23000580008, COR: 16011657, ID 8830), funded under the INAIL—ARTES 4.0 Innovation Call No. 1-2022. The modules described in this paper will be integrated into a Work Safety platform designed and developed by Visual Engines SRL. This work has also been partially funded by the PNRR—M4C2—Investimento 1.3, Partenariato Esteso PE00000013—“FAIR—Future Artificial Intelligence Research”—Spoke 1 “Human-centered AI” under the NextGeneration EU program and by the Italian Ministry of University and Research (MUR) in the framework of the PRIN 2022 project “CAVIA: enabling the Cloud-to-Autonomous-Vehicles continuum for future Industrial Applications” grant 2022JAFATE. All the activities have also been carried out in the framework of the FoReLab and Cross-Lab projects (Departments of Excellence).

Institutional Review Board Statement

The validation tests were performed in a laboratory environment. The only individuals involved were the authors of the paper and few members of the lab, who voluntarily participated in the simulations. No external participants were recruited, and no health-related or sensitive personal data were collected. Therefore, according to the University of Pisa Bioethics Committee regulations, the study does not require ethical approval.

Informed Consent Statement

All authors and the colleagues involved in the experimental validation were fully aware of the procedures and gave their explicit consent to participate in the simulations.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Bhambri, P.; Rani, S. Challenges, opportunities, and the future of industrial engineering with IoT and AI. In Integration of AI-Based Manufacturing and Industrial Engineering Systems with the Internet of Things; CRC Press: Boca Raton, FL, USA, 2024; pp. 1–18. [Google Scholar] [CrossRef]
Gallo, G.; Di Rienzo, F.; Garzelli, F.; Ducange, P.; Vallati, C. A smart system for personal protective equipment detection in industrial environments based on deep learning at the edge. IEEE Access 2022, 10, 110862–110878. [Google Scholar] [CrossRef]
Svertoka, E.; Saafi, S.; Rusu-Casandra, A.; Burget, R.; Marghescu, I.; Hosek, J.; Ometov, A. Wearables for industrial work safety: A survey. Sensors 2021, 21, 3844. [Google Scholar] [CrossRef] [PubMed]
Syazreen Ahmad, N. Recent Advances in WSN-Based Indoor Localization: A Systematic Review of Emerging Technologies, Methods, Challenges, and Trends. IEEE Access 2024, 12, 180674–180714. [Google Scholar] [CrossRef]
Che, F.; Ahmed, Q.Z.; Lazaridis, P.I.; Sureephong, P.; Alade, T. Indoor Positioning System (IPS) Using Ultra-Wide Bandwidth (UWB)—For Industrial Internet of Things (IIoT). Sensors 2023, 23, 5710. [Google Scholar] [CrossRef] [PubMed]
Yang, F.; Liu, D.; Gong, X.; Chen, R.; Hyyppä, J. 3D indoor area recognition for personnel security using integrated UWB and barometer approach. Sci. Rep. 2024, 14, 16353. [Google Scholar] [CrossRef] [PubMed]
Ochoa-de-Eribe-Landaberea, A.; Zamora-Cadenas, L.; Velez, I. Untethered Ultra-Wideband-Based Real-Time Locating System for Road-Worker Safety. Sensors 2024, 24, 2391. [Google Scholar] [CrossRef] [PubMed]
Sidiropoulos, A.; Bechtsis, D.; Vlachos, D. Implementing an Industry 4.0 UWB-Based Real-Time Locating System for Optimized Tracking. Appl. Sci. 2025, 15, 2689. [Google Scholar] [CrossRef]
Motroni, A.; Buffi, A.; Nepa, P. Forklift Tracking: Industry 4.0 Implementation in Large-Scale Warehouses through UWB Sensor Fusion. Appl. Sci. 2021, 11, 10607. [Google Scholar] [CrossRef]
Miglionico, G.C.; Ducange, P.; Marcelloni, F.; Vallati, C.; Di Rienzo, F. Performance Evaluation of YOLOv5 on Edge Devices for Personal Protective Equipment Detection. In Proceedings of the 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, 30 June–5 July 2024; pp. 1–8. [Google Scholar] [CrossRef]
Vukicevic, A.M.; Petrovic, M.; Milosevic, P.; Peulic, A.; Jovanovic, K.; Novakovic, A. A systematic review of computer vision-based personal protective equipment compliance in industry practice: Advancements, challenges and future directions. Artif. Intell. Rev. 2024, 57, 358. [Google Scholar] [CrossRef]
Barlybayev, A.; Amangeldy, N.; Kurmetbek, B.; Krak, I.; Razakhova, B.; Tursynova, N.; Turebayeva, R. Personal protective equipment detection using YOLOv8 architecture on object detection benchmark datasets: A comparative study. Cogent Eng. 2024, 11, 2333209. [Google Scholar] [CrossRef]
Lema, D.G.; Usamentiaga, R.; Garcia, D.F. Low-cost system for real-time verification of personal protective equipment in industrial facilities using edge computing devices. J. Real-Time Image Process. 2023, 21, 14. [Google Scholar] [CrossRef]
Lee, Y.-R.; Jung, S.-H.; Kang, K.-S.; Ryu, H.-C.; Ryu, H.-G. Deep learning-based framework for monitoring wearing personal protective equipment on construction sites. J. Comput. Des. Eng. 2023, 10, 905–917. [Google Scholar] [CrossRef]
Kamil, M.Z.; Khan, F.; Amyotte, P.; Ahmed, S. Multi-source heterogeneous data integration for incident likelihood analysis. Comput. Chem. Eng. 2024, 185, 108677. [Google Scholar] [CrossRef]
Yu, S.; Wang, F.; Gu, H.; Leng, H.; Fan, Y. A Method for Multisource Heterogeneous Data Fusion and Modeling in New Power Systems. In Proceedings of the 2023 2nd Asian Conference on Frontiers of Power and Energy (ACFPE), Chengdu, China, 20–22 October 2023; pp. 124–128. [Google Scholar] [CrossRef]
Tsanousa, A.; Bektsis, E.; Kyriakopoulos, C.; Gonzalez, A.G.; Letyriondo, U.; Gialampoukidis, I.; Karakostas, A.; Vrochidis, S.; Kompatsiaris, I. A Review of Multisensor Data Fusion Solutions in Smart Manufacturing: Systems and Trends. Sensors 2022, 22, 1734. [Google Scholar] [CrossRef] [PubMed]
Qu, X.; Liu, Z.; Wu, C.Q.; Hou, A.; Yin, X.; Chen, Z. MFGAN: Multimodal Fusion for Industrial Anomaly Detection Using Attention-Based Autoencoder and Generative Adversarial Network. Sensors 2024, 24, 637. [Google Scholar] [CrossRef] [PubMed]
Liu, Z.; Hakala, T.; Hyyppä, J.; Kukko, A.; Chen, R. Performance Comparison of UWB IEEE 802.15.4z and IEEE 802.15.4 in Ranging, Energy Efficiency, and Positioning. IEEE Sens. J. 2024, 24, 12481–12489. [Google Scholar] [CrossRef]

Figure 2. Trajectories followed in dynamic experiments. The blue triangle represents the position of anchors.

Figure 4. Scenario 1—Compliant case. (Left panel): The camera feed shows a worker wearing a helmet and a high-visibility vest. (Right panel): The map view displays the worker (blue circle) positioned within the critical zone (red rectangle), UWB anchor positions (blue triangles), the camera (green camera), and an indication alert position (red arrow). The AI module confirms PPE compliance, and the RTLS verifies zone membership. No alarm is triggered, as all safety requirements are satisfied.

Figure 5. Scenario 1—Violation case. (Left panel): The camera feed shows a worker without a safety vest (only the helmet is detected). (Right panel): The map view displays the worker (blue circle) positioned within the critical zone (red rectangle), UWB anchor positions (blue triangles), the camera (green camera), and an indication alert position (red arrow). The map view displays the worker (red cross) inside the critical zone. The AI module identifies missing PPE, thereby activating Rule 1. The Aggregation Module issues a visual alert and records the violation event with the corresponding timestamp and rule ID.

Table 1. Average localization error in static tests under different anchor height and area size configurations.

Monitored Area	Anchor Height
Monitored Area	2.5 m	2.0 m
63 $m^{2}$	0.74 m	0.43 m
28 $m^{2}$	0.61 m	0.36 m

Table 2. Average localization error in dynamic tests with different filtering techniques: No Filter (NF), Kalman Filter (KF), and Kalman Filter with Exponential Moving Average (KEF).

Axis	NF	KF	KEF
X	0.50 m	0.40 m	0.38 m
Y	0.47 m	0.41 m	0.36 m

Table 3.

A P_{@ 50}

accuracy per class for YOLOv8-Nano (cross-validation).

Table 3.

A P_{@ 50}

accuracy per class for YOLOv8-Nano (cross-validation).

Class	${AP}_{@ 50}$
Person	92.1 ± 2.8
Head	89.7 ± 2.5
Helmet	94.0 ± 1.4
Chest	91.6 ± 2.6
Vest	82.4 ± 8.5
Forklift	86.1 ± 4.1

Table 4. YOLOv8-Nano runtime performance on the Edge TPU.

Inference Time (ms)	FPS	${mAP}_{@ 50}$
95.74 ± 3.36	10.20 ± 0.35	89.48

Table 5. System operational parameters and validated ranges.

Parameter	Value/Range	Notes
Association radius (r)	∼0.7 m	Set as $k σ_{RTLS}$ with $k \approx 2$
Camera height	2.0–5.0 m	Validated range
Camera tilt angle	$- 10 °$ to $- 30 °$	Validated range
Effective detection distance	Up to 8 m	Beyond this, accuracy degrades
Proximity threshold (collision)	1.0 m	Configurable via settings
Data processing threads	20	Thread pool size
Fault detection timeout	Configurable	Default: 10 s monitoring interval

Table 6. System capabilities comparison: feature availability across AI-only, RTLS-only, and the proposed approach. Checkmarks (✓) indicate full capability, crosses (×) indicate missing capability, and PARTIAL denotes unreliable or approximate capability.

Feature	AI-Only	RTLS-Only	Proposed (Both)
PPE Detection	✓	×	✓
Authorized Person Identification	×	✓	✓
Person Tracking	PARTIAL	✓	✓
Vehicle Tracking	PARTIAL	✓	✓
Person Count	✓	PARTIAL	✓

Table 7. Data requirements for safety rule enforcement. Each row shows which features are needed to implement the corresponding safety rule, demonstrating why data fusion is essential.

Rule	Required Features	Rationale
R1	PPE detection + Person tracking	PPE detection requires AI, while precise zone membership requires RTLS; both are necessary to enforce compliance in critical areas.
R2	Person identification + Person tracking	Detecting untagged individuals requires correlating visual detections with RTLS tag data to identify workers without registered tags.
R3	Vehicle tracking (both sources)	Cross-validation of detected vehicles with registered RTLS tags ensures detection of untracked forklifts and prevents false positives.
R4	Person tracking + Vehicle tracking	Verifying that forklifts are operated by authorized drivers requires linking visual entities to authenticated RTLS tags and computing proximity.
R5	Person count + Person tracking	Combining visual headcount with RTLS zone information allows enforcement of maximum capacity limits in critical areas.
R6	Person tracking + Vehicle tracking	Reliable proximity monitoring requires accurate 3D distances from RTLS and AI-based entity classification to identify workers and vehicles.
R7	Person identification + Person tracking	Access control rely on matching detected individuals with RTLS tag IDs and associated authorization levels for restricted zones.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Di Rienzo, F.; Miglionico, G.C.; Ducange, P.; Marcelloni, F.; Salti, N.; Vallati, C. Improving Accuracy in Industrial Safety Monitoring: Combine UWB Localization and AI-Based Image Analysis. J. Sens. Actuator Netw. 2025, 14, 118. https://doi.org/10.3390/jsan14060118

AMA Style

Di Rienzo F, Miglionico GC, Ducange P, Marcelloni F, Salti N, Vallati C. Improving Accuracy in Industrial Safety Monitoring: Combine UWB Localization and AI-Based Image Analysis. Journal of Sensor and Actuator Networks. 2025; 14(6):118. https://doi.org/10.3390/jsan14060118

Chicago/Turabian Style

Di Rienzo, Francesco, Giustino Claudio Miglionico, Pietro Ducange, Francesco Marcelloni, Nicolò Salti, and Carlo Vallati. 2025. "Improving Accuracy in Industrial Safety Monitoring: Combine UWB Localization and AI-Based Image Analysis" Journal of Sensor and Actuator Networks 14, no. 6: 118. https://doi.org/10.3390/jsan14060118

APA Style

Di Rienzo, F., Miglionico, G. C., Ducange, P., Marcelloni, F., Salti, N., & Vallati, C. (2025). Improving Accuracy in Industrial Safety Monitoring: Combine UWB Localization and AI-Based Image Analysis. Journal of Sensor and Actuator Networks, 14(6), 118. https://doi.org/10.3390/jsan14060118

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improving Accuracy in Industrial Safety Monitoring: Combine UWB Localization and AI-Based Image Analysis

Abstract

1. Introduction

2. State of the Art

3. Multi-Source Industrial Safety Monitoring

3.1. Localization Module

3.2. AI Module

3.3. Aggregation Module

4. Validation Tests

4.1. Controlled Laboratory Scenarios

4.1.1. Scenario 1: Worker in a Critical Area with/Without PPE

4.1.2. Scenario 2: Unauthorized Person in the Working Area

4.1.3. Scenario 3: Unauthorized Forklift Driver

4.1.4. Scenario 4: Overcrowded Critical Area

4.1.5. Scenario 5: Potential Collision with Forklift

4.1.6. Scenario 6: Unauthorized Access to Critical Area

4.1.7. Wrap-Up Considerations

4.2. Validation of the AI Module in Realistic Industrial Scenarios

4.2.1. Camera Perspective Validation

4.2.2. Validation with Different Worker Clothing Types

4.2.3. Validation with Multiple Work Groups in the Same Scene

4.2.4. Forklift Detection in Real Industrial Environment

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI