Previous Article in Journal
Privacy-Preserving Set Intersection Protocol Based on SM2 Oblivious Transfer
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

PPE-EYE: A Deep Learning Approach to Personal Protective Equipment Compliance Detection

by
Atta Rahman
1,*,
Mohammed Salih Ahmed
2,
Khaled Naif AlBugami
2,
Abdullah Yousef Alabbad
2,
Abdullah Abdulaziz AlFantoukh
2,
Yousef Hassan Alshaikhahmed
2,
Ziyad Saleh Alzahrani
2,
Mohammad Aftab Alam Khan
2,
Mustafa Youldash
2 and
Saeed Matar Alshahrani
3
1
Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
2
Department of Computer Engineering, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
3
College of Computing and Informatics, Saudi Electronic University, P.O. Box 93499, Riyadh 11673, Saudi Arabia
*
Author to whom correspondence should be addressed.
Computers 2026, 15(1), 45; https://doi.org/10.3390/computers15010045 (registering DOI)
Submission received: 8 December 2025 / Revised: 4 January 2026 / Accepted: 9 January 2026 / Published: 11 January 2026
(This article belongs to the Section AI-Driven Innovations)

Abstract

Safety on construction sites is an essential yet challenging issue due to the inherently hazardous nature of these sites. Workers are expected to wear Personal Protective Equipment (PPE), such as helmets, vests, and safety glasses, to prevent or minimize their exposure to injuries. However, ensuring compliance remains difficult, particularly in large or complex sites, which require a time-consuming and usually error-prone manual inspection process. The research proposes an automated PPE detection system utilizing the deep learning model YOLO11, which is trained on the CHVG dataset, to identify in real-time whether workers are adequately equipped with the necessary gear. The proposed PPE-EYE method, using YOLO11x, achieved a mAP50 of 96.9% and an inference time of 7.3 ms, which is sufficient for real-time PPE detection systems, in contrast to previous approaches involving the same dataset, which required 170 ms. The model achieved these results by employing data augmentation and fine-tuning. The proposed solution provides continuous monitoring with reduced human oversight and ensures timely alerts if non-compliance is detected, allowing the site manager to act promptly. It further enhances the effectiveness and reliability of safety inspections, overall site safety, and reduces accidents, ensuring consistency in follow-through of safety procedures to create a safer and more productive working environment for all involved in construction activities.

1. Introduction

Construction sites are considered hazardous environments with several risks that may arise, including physical injuries, falls, and exposure to hazardous materials. However, compliance with Personal Protective Equipment (PPE) regulations is often inconsistent because some workers may neglect to wear the PPE, either unintentionally or intentionally, or site supervisors lack enforcement. Traditional methods of monitoring compliance, such as time-consuming manual inspections, can lead to mistakes and are difficult to apply effectively to larger or more complex construction sites [1].
Two major concerns in the construction industry are ensuring the safety of construction site workers and maintaining compliance with all safety protocols at the workplace. Due to the inherently dangerous nature of construction sites, workers are at risk of injuries from falls, being struck by heavy objects, and exposure to hazardous materials. Workers must wear PPE such as helmets, safety vests, and protective glasses to reduce these risks. PPE is a fundamental line of defense that protects workers from severe injuries [2].
However, even when PPE is provided, ensuring compliance remains a significant challenge because workers may not consistently wear or use the equipment correctly. Some workers may forget to wear safety gear, while others might wear it incorrectly. This lack of compliance increases the probability of accidents and reduces the effectiveness of safety measures. Traditionally, site supervisors and safety officers manually check PPE compliance, but this approach is time-consuming, prone to human error, and difficult to scale for large or complex construction sites [3].
With recent developments in deep learning and computer vision, there is an opportunity to automate real-time detection of PPE compliance. Using computer vision, the proposed PPE detector for construction sites leverages the deep learning model YOLO11 to accurately detect safety gear, including helmets, safety vests, and protective glasses. The system monitors workers in real-time to ensure compliance with safety regulations and protocols. By providing continuous monitoring and instant feedback, the system enables construction managers and safety officers to monitor compliance more effectively, thereby reducing workplace accidents and enhancing overall site safety. There are currently two main types of PPE detection algorithms widely used: transformer-based and Convolutional Neural Network (CNN)-based.
The prime focus of the research is to target the heavy construction industry in the Kingdom of Saudi Arabia, which spans from residential buildings to huge skyscrapers and the ambitious NEOM project, an initiative for urban development, including LINE, a future smart city [4]. Since most of the construction sites are in remote areas, PPE compliance is mostly ignored, and a limited number of studies have been reported in this regard. Such research is necessary for the automated inspection of construction sites, ranging from medium to very large scales. Currently, limited studies have been conducted in this regard, targeting the kingdom’s requirements and environmental factors. In this regard, the current study investigates an augmented dataset using YOLO11 to detect the PPE compliance in the construction industry in the Kingdom of Saudi Arabia.
The major contributions of this work are as follows: first, developing an automated system that uses the latest computer vision and deep learning approaches to monitor PPE compliance in real time, considering a state-of-the-art dataset. Second, the system will detect whether critical PPE is being used correctly, ensuring that all workers follow safety protocols. Third, by implementing this solution, the research goal is to reduce the risks of injuries and fatalities on construction sites and promote a safer working environment.

2. Review of Literature

This review explores various studies on AI-powered worker safety monitoring for Personal Protective Equipment (PPE). Each review will discuss the methods employed, the datasets investigated, the evaluation metrics, and the research findings. The review is categorized into four subsections. Namely, the major PPE dataset used for detection, the transformer- and CNN-based approaches, and the PPE compliance detection system employed per the literature.

2.1. PPE Detection Datasets

A study was conducted by Chang et al. [1] about a monitoring system for classifying PPE using computer vision. The research focused on using multiple cameras to cover wider areas. This study employed two technologies to monitor worker safety: worker ReID, which re-identifies individuals, and PPE classification, which tracks individual workers across camera footage. PPE classification utilizes object recognition to detect and categorize whether workers are wearing the necessary safety gear. A dataset of four different viewpoints of house construction in Hong Kong is used. Forty workers were identified from 6245 images; twenty-four were used for training re-identification, and sixteen for testing. PPE classification classifies workers into four categories: workers with a helmet and vest (WHV), workers with a vest only (WV), workers with a helmet only (WH), and workers without a helmet and vest (W). The research findings concluded that the combined approach boosted accuracy, with worker identification improving by 4% and PPE detection improving by 13%.
Han and Zeng [2] described the detection of safety helmets on construction sites using deep learning techniques. The research employed YOLOv5 as the primary algorithm, incorporating a four-scale detection method. The dataset used contains construction site images collected from the Internet. The main evaluation metrics used are the precision-recall curve, mean average precision (mAP), and mean detection time. The study also focused on detecting whether the workers were wearing helmets. The findings of this study concluded with a 92.2% mAP, which is better than using YOLOv5 alone, which achieved 85.9%. Moreover, it takes 3 ms to detect a frame in a video with 640 × 640 resolution.
Research by Hayat and Morgado-Dias [3] proposes detecting safety helmets on construction sites using deep learning. Moreover, the research used YOLOv3, YOLOv4, and YOLOv5 algorithms for real-time detection. The dataset used is the Hat Worker Image Dataset published by MakeML, which contains 5000 images. The research results achieved a 92.44% mAP using YOLOv5, which is excellent for detecting objects in both high- and low-light conditions.

2.2. Transformer-Based PPE Detection Algorithms

A vision-based framework for real-time detecting PPE on construction sites was proposed by Lee et al. [5]. The framework uses the YOLACT algorithm to detect individual pixels, which uses instance segmentation. It is a lightweight model suitable for real-time applications. Additionally, the framework utilizes DeepSORT for object tracking to monitor workers on construction sites. The dataset is a combination of multiple public datasets, including AIM, MOCS, and ACID, some of which contain CCTV and smartphone images, while others are sourced from open resources. The research results were good, as the framework achieved 91.3% accuracy and 66.4% mAP.
A real-time PPE detection algorithm using deep learning by Lo et al. focused on detecting workers’ vests and helmets from videos and images [6]. The study used YOLOv3, YOLOv4, and YOLOv7. The dataset used is a custom dataset created by the study’s developers, containing images sourced from the web and from cameras. The augmented dataset comprises 11,000 images and 88,725 labels of PPE from various construction areas. The study categorized workers into four classifications: no hat, hat, no high-visibility vest, and high-visibility vest. The study achieved mAP of 97%.
Ferdous et al. [7] proposed a YOLO-based PPE detector model for construction sites. The study uses the CHVG dataset, which comprises eight classes: four coloured hard hats, vests, safety glasses, a person’s body, and a person’s head. In this paper, the YOLOX architecture, an anchor-free variant of the YOLO family, is employed. The study demonstrated that the YOLOX-m model achieved the highest mAP among YOLOX versions, at 89.84%. Additionally, the paper addresses some potential challenging conditions that the system might encounter, such as rain, haze, and low-light images, by artificially adding these effects to some images to test the model’s robustness.

2.3. CNN-Based PPE Detection

Isailovic et al. [8] used deep learning-based object detection to ensure compliance with industrial PPE. The proposed pipeline integrates the estimate of the head region of interest with a PPE detector system, using a dataset containing 12 different PPE types, integrated with public datasets. Three deep learning architectures, including Faster R-CNN, MobileNetV2-SSD, and YOLOv5, have been used. The results show that YOLOv5 achieves superior performance, with a slight advantage over the alternative, achieving a precision of 0.920 ± 0.147 and a recall of 0.611 ± 0.287. Nath et al. [9] proposed a deep learning-based real-time site safety system for PPE detection. The study demonstrates three deep learning models from the YOLO family. The first method involves detecting separate components of workers’ PPE, such as hats and vests, and combining them in the model to recognize and verify the correct PPE used in construction sites. The second method uses a single Convolutional Neural Network (CNN) to detect and verify each worker’s PPE compliance. The third method involves detecting workers and then classifying and verifying them based on their PPE attire using CNN-based classifiers. The second method achieves the highest performance, with an mAP of 72.3%.
A detailed study by Gallo et al. [10] about innovative systems for PPE detection in industrial environments. The study presents an innovative system for detecting PPE based on deep learning at the edge; the system’s purpose is to enhance workers’ safety in industrial environments by monitoring their compliance with PPE by analyzing video from surveillance cameras and alerting or triggering an alarm if the system detects a worker who is not compliant with PPE. A system prototype was developed using a Raspberry Pi and tested with five pre-trained Convolutional Neural Networks (CNNs) for PPE detection. The evaluation compares the classification and inference latency of CNNs with YOLO, showing promising results.
In a study on workers’ safety regulation compliance using a spatiotemporal graph convolutional network by Lee et al. [11], the authors focus on detecting compliance with safety regulations by analyzing workers’ sequential movements from video footage. The training dataset used in this paper is not mentioned in detail. This study used OpenPose to capture images of humans, and a spatial–temporal graph convolutional network (ST-GCN) was employed to predict whether a worker was wearing PPE. Results showed that an average F1-score of 0.827 was achieved.
A study on generic industrial PPE compliance using deep learning was proposed by Vukicevic et al. in [12]. The dataset used in this paper is collected from public PPE datasets (400 from the Pictor PPE dataset and 5200 from the Roboflow dataset) and web-mined images, totaling 15,728 cropped images. Deep learning models used in this paper are Inception_v3, DenseNet, SqueezeNet, VGG19, ResNet, and MobileNetV2. The overall accuracies of these models are 0.92, 0.95, 0.87, 0.93, 0.95, and 0.95, respectively. In conclusion, DenseNet, MobileNetV2, and ResNet were the best performers for predicting PPE compliance; however, the study favored MobileNetV2 due to its lower computational requirements.
Another study by Delhi et al. [13] examined computer vision-based deep learning techniques for detecting PPE compliance on construction sites. CNN and YOLOv3 models were trained on a dataset of 2509 web-based construction site videos. Results show that the model achieved 96% mAP, 0.96 recall, and 0.96 F1-score.
The study by Azizi et al. [14] compared two machine learning-based approaches for predicting PPE on construction sites, focusing on robustness and timely detection. The test set used four videos from realistic construction sites. The algorithms tested in this paper are Faster R-CNN employing ResNet-50 and Few-Shot Object Detection (FsDet). Results showed that Faster R-CNN achieved a mean accuracy of 82%, while FsDet achieved 58%. In conclusion, the faster R-CNN results are significantly better than FsDet’s prediction of PPE compliance in various environments.
The study by Li et al. [15] focuses on inspecting worker PPE using deep learning. An OpenPose algorithm was used to identify 1200 online video clips containing safe and unsafe behaviour. From this process, 1604 images were collected, and an additional 1604 images were obtained from the horizontal mirror of the videos, totalling 3208 images in the training set. Additionally, the YOLOv5 model was trained to detect objects in images. A one-dimensional CNN was trained on 600 videos and tested on 600. The accuracy achieved with this model is 0.9467 in experimental scenarios.
Li et al. [16] reviewed the behaviors of construction employees under monitoring. The dataset encompasses various lighting conditions, for which CNNs effectively address the challenges. Faster R-CNN and YOLO methodologies have yielded satisfactory outcomes in scenarios with diverse lighting conditions. The distribution of publication years indicates rapid development within the field, with continual updates to technical methodologies. Ahmed et al. [17] employed deep learning for PPE detection. The study compared two deep learning-based algorithms, primarily used for computer vision tasks, namely YOLOv5 and Faster R-CNN with a ResNet50 backbone. The study examined CHVG, a publicly available dataset. Data pre-processing techniques, including image flipping, filtering, and augmentation, were applied. Recall, precision, and mAP were used to evaluate the models. The study concluded that YOLOv5 achieved an mAP50 of 0.68, while the faster R-CNN achieved an mAP50 of 0.96, making it the best-performing model. Li et al. [18] investigated the application of deep learning for detecting safety helmets. The automated monitoring approach offers a means to oversee construction workers and ensure compliance with safety helmet regulations on construction sites. A public dataset comprising 3261 images of safety helmets was investigated. SSD-MobileNet algorithm, based on CNNs, is used for model training.
Huang et al. [19] proposed an enhanced YOLOv3 model for helmet detection. The experimental dataset comprises surveillance videos captured at various times and angles within real construction sites. By selecting every fourth image from a sequence of screenshots, a diverse set of 13,000 images depicting significant variations is curated for helmet detection data. Data samples are evenly distributed to enable the model to effectively learn features across different images, accounting for factors such as weather conditions and viewing distance. Experimental results reveal that Faster R-CNN achieves the highest mAP at 94.3%, followed closely by the enhanced detection algorithm at 93.1%, which offers better speed.

2.4. PPE Detection Systems

Rahman et al. [20] presented a YOLO11-based approach for generic PPE compliance detection in an industrial setting. In this regard, they have investigated the approach to an augmented dataset duly obtained from various open sources as well as collected manually. The study was reported as promising, with a mAP50 of 95.5% for real-time applications involving a wide range of PPEs. The study presented by Marquez et al. [21] proposes a novel enhancement of workplace safety for field workers by integrating edge computing with smart protective gear. The study presents a wearable gadget system. That system consists of belts, bracelets, and helmets with sensors powered by artificial intelligence. The proposed system ensures workers’ safety and integrity by detecting and notifying them of anomalies in their environment early.
Airton and Gonzalez examined how well Vision artificial intelligence systems monitor and assure PPE compliance in various fields [22]. The study’s primary goal is to investigate the use of artificial intelligence (AI) algorithms that can adapt to various environmental factors, thereby enhancing the precision and reliability of PPE detection.
A study by Balakreshnan et al. [23] proposes an AI-powered automated system for PPE compliance and monitoring. This system can be used in a wide range of work scenarios. Using a combination of cloud-based and on-premises analytics, the system regularly provides real-time monitoring and compliance checks. The system is well known for its ability to generate automated reports and warnings, helping organizations reduce risk and comply with strict safety regulations. This system can initiate various control actions in response to safety violations. The study by Pooja and Preeti [24] suggests that AI may help ensure mask-wearing during the COVID-19 pandemic, thereby protecting public health and well-being. The authors suggest automating mask identification using a computer vision system, a crucial step for public safety. Identification may alert people to those who have not worn masks and warnings. A study by Muanme et al. [25] describes an AI system that uses YOLO techniques to improve PPE compliance in industrial settings. This method ensures that employees have the appropriate equipment before gaining access to hazardous areas, thereby ensuring the safety of both workers and visitors.
Table 1 provides a summary of the reviewed literature.
Following a detailed literature review, several observations have been made and a research gap identified.
Firstly, no noteworthy studies have been conducted that focus on PPE detection in the Saudi Arabian environment. On the other hand, Saudi Arabia is among the countries with a robust construction industry that encompasses a wide range of construction types, from residential buildings to skyscrapers and smart cities, such as LINE, a project by NEOM [4]. Moreover, most construction sites are in remote areas, and PPE compliance is often not enforced due to the lack of smart detection systems, leading to unsafe conditions. The situation demands smart systems for automated detection of PPE compliance.
Secondly, most studies have used the YOLO architecture, but none have used the most recent versions, such as YOLOv10 or YOLO11, from a Saudi Arabian construction industry perspective. That provides two reasons to use YOLO11. Also, the overall YOLO family’s promising behavior in the area.
Thirdly, recent studies are focusing on generic PPE compliance detection, which may not be very effective in a construction site’s specific environment [20]. This situation suggests a need to develop more algorithms that can achieve higher accuracy than the existing ones.
Finally, the CHVG dataset used in studies [6,17] is imbalanced and requires further improvement to achieve higher accuracy. By addressing these gaps, this research aims to discover and apply previously unexplored computer vision techniques, thereby balancing the dataset to improve accuracy.

3. Proposed Methodology

3.1. Proposed PPE-EYE Model

In this section, the methodology of this study is demonstrated step by step. In brief, the proposed PPE-EYE model employed in this research is based on the YOLO11 architecture. The model will be trained by first loading the images and labels, then providing the hyperparameters to initiate training. After each model training step, the mean average precision (mAP) will be calculated to monitor the model’s performance. After the training, the model’s overall performance will also be evaluated using mAP, the most widely used metric for PPE detection models, as per the comprehensive literature review in the previous section. Other evaluation parameters include accuracy, recall, and F1-score. By monitoring the model’s performance, hyperparameter tuning and dataset balancing are performed as CHVG data is highly imbalanced, as described subsequently.
Figure 1 demonstrates the methodology of the proposed system. That includes data preprocessing, model training and testing, hyperparameter tuning, and evaluation as one part. In the second part, the proposed model is incorporated into a real-time monitoring environment.

3.1.1. YOLO11

In this study, we opted for YOLO11 because its feature aggregation is more refined. Moreover, it does not rely on fixed anchors, which makes it particularly effective for small objects. This is specifically helpful in finding individual components of PPE compliance. Nevertheless, YOLO11 harnesses significantly better outcomes than prior versions, such as YOLO8, where lighting or dusty conditions—typically a case of Saudi Arabia (where dust and sandstorms are more often encountered)—are encountered and can limit the performance of the model [26,27].
Figure 2 demonstrates the schematic of the proposed YOLO11 model, which is investigated in the proposed study. The first, second, and third boxes/rectangles in the diagram present, respectively, the backbone, neck, and head of the YOLO11 architecture under consideration.

3.1.2. Dataset

The dataset used to train the proposed models is a publicly available dataset called CHVG (four-colored hard hats, vest, safety glass) [23]. The images in the dataset are from various construction sites, captured at different angles. The dataset comprises around 1700 instances/images and 11,605 objects within these images.
Table 2 contains the basic dataset description.
It can be observed that the data is highly imbalanced. For instance, the person class is around ten times bigger than the glass class. It is noteworthy that to balance the dataset, the Synthetic Minority Over-Sampling Technique (SMOTE) approach has been utilized [28]. Figure 3a,b shows the instances before and after balancing the augmented dataset. The resultant augmented and balanced dataset contains more than 28,000 instances, where each class has approximately 3650 instances on average. The following augmentations have been employed.
  • Flip: Horizontal and vertical;
  • 90° Rotate: Clockwise, counter-clockwise;
  • Crop: 0% minimum zoom, 20% maximum zoom;
  • Shear: ±10° horizontal, ±10° vertical;
  • Grayscale: Applied to ~15% of images;
  • Brightness: Adjusted within ±15%;
  • Exposure: Adjusted within ±10%;
  • Noise: Up to 0.14% of pixels.
Finally, to prevent overfitting and improve generalizability, we integrated datasets from Roboflow [29]. This introduced a wider range of PPE usage scenarios and environmental conditions, ensuring the final model could reliably detect PPE compliance.
Subsequently, the dataset is further divided into training and testing sets, with an 80:20 ratio (a commonly used distribution in deep learning models, as per a literature review). The images in the dataset all have a resolution of 640 × 640. For validation, a locally collected dataset was manually compiled using CCTV footage and investigated, encompassing approximately 5% of the original dataset size.
Furthermore, the labels for each image are stored in a separate folder called ‘labels’. There are eight classes in the dataset, each containing labels such as person, vest, glass, head, red, yellow, blue, and white. The colors are related to the color of a helmet that is detected. The annotations (labels) are stored in XML format for each image, while the images are in JPG format.

3.1.3. Evaluation Metrics

The following evaluation metrics have been used to assess the proposed PPE compliance detection model. The metrics are shortlisted based on their suitability and relevance observed in the literature review.
Mean Average Precision (mAP): To evaluate the model’s performance, average precision (mAP) will be used as the main metric [30]. The following formula shows how to calculate the mean average precision:
mAP = 1 N i = 1 N A P I
To be more specific, the performance of the model will be focused on the mean average precision with a 0.5 threshold, because this is what most research papers focus on.
Recall: It is also known as sensitivity. It is the ratio of true positives to true negatives and false negatives. It is effective in many vision-based systems evaluations [30].
R e c a l l = T P ( T P + F N )
Accuracy: It is the ratio of actual classes predicted over all the classes by the model. It is helpful in identifying how accurately the model was able to perform the prediction [30].
A c c u r a c y = T P + T N T P + T N + F P + F N
F1-Score: The F1-score combines the model’s precision and recall and is defined as their harmonic mean. It is between 0 and 1, with 1 being a better score [30].
F 1 - s c o r e = 2 × p r e c i s i o n × r e c a l l p r e c i s i o n + r e c a l l

3.1.4. Operational Framework

The workflow of the model begins with the training phase, where it takes input images with a resolution of 640 × 640 and their corresponding annotations (labels) in YOLO format. The images are already divided into training and testing sets. To train the model, the training images are passed through the training phase, and the testing images are used for the testing phase later.
For the hardware and software implementation of the model, we have considered the Dell XPS 9320 that is equipped with 12th Generation Intel® Core (i7-1260P) with 32 GB of LPDDR5 RAM and the Ultralytics library in Python 3.12.12, respectively. The machine is equipped with a 1360 P graphics card (Xe Graphics G7) for enhanced graphics processing tasks.
In addition to model training and evaluation, the system is implemented as a software prototype and tested in a semi-real-time environment using live camera feeds. The system screens are depicted subsequently. As far as the inference time of the implemented model is concerned, it is recorded on average at 7.3 milliseconds on edge devices.

4. Experimental Results and Analysis

This section outlines the performance achieved by the algorithms tested in this research and compares them with similar research that used the same dataset. Table 3 compares different models employing the same dataset, CHVG. Regarding mAP50, the proposed scheme outperformed the scheme in [7] employing YOLOx. As far as the precision, recall, and F1-score values are concerned, the proposed scheme offers a marginal improvement over the scheme in [7], but with a significantly lower inference time of 7.3 ms, compared to 150 ms. Moreover, the performance of the proposed scheme was slightly better than that of the scheme in [17], with a 0.9% difference in mAP50. However, in terms of precision, recall, and F1-score, the proposed scheme outperformed the study [17] by 22.89%, 12.75%, and 17.35%, respectively. Finally, in terms of processing time, the proposed scheme was significantly faster than [17], with a time of 7.3 ms compared to 170 ms. None of the schemes employed these metrics for accuracy.
Additionally, the proposed scheme is compared with a recent study that employed YOLO11 on the PPE dataset [20]. The proposed scheme achieved a 0.17-point improvement in mAP50 and an identical interface time. However, in terms of F1-score, precision, and recall, the scheme in [20] performed marginally better than the proposed scheme.
Figure 4 presents the training graph for the YOLO11 model for mAP50 and mAP50-95. It is apparent that mAP steadily improves over the training period, reflecting the model’s growing ability to distinguish between PPE-compliant and non-compliant classes.
Additionally, Table 4 presents the results of each model, including the optimizer and other hyperparameters employed to achieve the results of this study. The hyperparameters for each model have been optimized, and the results have been obtained. As demonstrated, YOLO11x obtained the best results in terms of mAP50 and mAP50-95 at a batch size of 16 and with 21 epochs using NAdam optimizer and learning rate 0.00001, as 96.9% and 70.9%, respectively, followed by the same model with batch sizes of 20 and 87 epochs at 91% and 60.3%, respectively. It can be deduced that the batch size plays a critical role in relation to epochs, while other parameters remain the same.
YOLO11l, with 100 epochs and a batch size of 28, obtained 90.7% and 58.7%, respectively. YOLO10s was followed by the aforementioned models at 75 epochs, with a batch size of 16, an auto-optimizer, and a learning rate of 0.01, achieving 87.3% and 54.4% accuracy, respectively. Eventually, the YOLO10n, with 25 epochs and a batch size of 32, obtained relatively poor performance at 77% and 47.3%, respectively, while keeping all other parameters the same. Furthermore, the table shows that YOLO11x with increased batch size and lower momentum does not improve the model’s performance.
Moreover, data augmentation, including adjusting brightness and flipping images, helped increase the model’s performance. These results are also demonstrated in Figure 5, where all five models are compared for mAP50 and mAP50-95.
From the graph in Figure 6 for the best model (YOLO11x), the highest mAP50 is achieved by the class ‘person’, which contains the highest number of instances. The lowest mAP50 is 0.884, achieved by the class ‘glass’, which contains the lowest number of instances compared to the other classes prior to balancing.
Figure 7 shows the F1-score confidence curve. It reveals the same outcome as in Figure 2. The person class obtained the highest confidence, while the glass class received the lowest confidence. The best model achieved an F1-score of 0.90 at a confidence threshold of 0.424. This score represents a good trade-off between precision and recall, and the figure illustrates the model’s performance at various confidence levels.
The normalized confusion matrix is depicted in Figure 8.
Figure 9 demonstrates the examples of the model’s predicted classes, along with their bounding boxes and confidence scores. The closer the value to 1 is, the more confidence the model has in the predicted class.
Figure 10, Figure 11 and Figure 12 show the proposed PPE-EYE model deployed in a UI to show how the model can be deployed in real systems. The system used to detect all PPE classes trained on the model, but in the UI, there is only one bounding box, which indicates two things; the first one is when the bounding box is green, this means that there is no violation, and when the bounding box is red it means there is a violation and the alert will be sent to the incident section with more details about it (Figure 11). Moreover, a metric section summarizes the violations and other relevant information graphically (Figure 12).
In the feedback mechanism, the misclassified cases are treated as new examples/shots to the existing system and used to retrain the model for continuous improvement after twenty-four hours.

5. Discussion

5.1. Advantages

The current study examines a wide range of YOLO10 and YOLO11 variants on an augmented and balanced dataset. The best PPE-EYE model was obtained after hyperparameter tuning, and the models were finally evaluated against the state of the art. The studies were shortlisted based on their reliance on the CHVG dataset, which was primarily investigated under deep learning models. Compared to the previous study [7], which achieved 0.8984 mAP50 with YOLOX, this study achieved around 7% higher mAP50, indicating that the new YOLO11 performs better. Furthermore, the study [17] achieved 0.96 mAP50, which is approximately the same as PPE-EYE achieved in this study, and an inference time of 0.17 s.
In contrast, the proposed PPE-EYE in this research achieved an inference time of 0.0073 s, which is significantly better for real-time object detection. That means the proposed model can generate more frames per second than the other faster R-CNN model [17]. Likewise, in contrast to a recent, more detailed study [20], the proposed scheme achieves 1.7% higher mAP with identical interface time. Regarding other metrics, such as recall, F1-score, and precision, the proposed scheme marginally underperformed compared to the scheme in [20]. It is worth noting that in the current study, the idea has been tested using a software prototype, and all training and testing were conducted in an academic laboratory environment. For the large-scale implementation, trade-offs among accuracy, latency, and hardware cost must be addressed in future studies.

5.2. Challenges and Limitations

As discussed in Section 4, the model has limitations, including its mAP50, which can be further improved in future studies. Another limitation is that running the model with low inference time requires high-end hardware; otherwise, inference time will increase with the user’s hardware, as the model has many parameters. In the current study, the idea was demonstrated using a software prototype. A high-end edge device equipped with a powerful graphical processing unit (GPU) can perform the task more effectively, especially when the live feed comes from multiple cameras with enhanced framerates. In the current study, we experimented with it at a relatively small scale; however, in future studies, the model’s scalability can be addressed.
Furthermore, as the dataset has eight classes, most of which are related to the color of the hat and a few related to other aspects, the number of classes is small, and it detects only the main PPE. In the future, more images should be added to the other classes, and the models should be reevaluated. Moreover, different light conditions, including various daylight and night shifts, should also be considered. Additionally, other transfer learning approaches, including vision transformers and vision-language models, can be investigated [31,32].

6. Conclusions

In conclusion, monitoring PPE is a crucial subject that requires improvement to ensure a safe and healthy environment for employees in this field. This article discusses the implementation of an AI-based solution, PPE-EYE, to address this problem, utilizing the latest member of the YOLO architecture, CHVG, and data augmentation and balancing techniques. The YOLO11x model achieved a 0.969 mAP50 score, the best result in this research experiment, compared to the earlier YOLOX version, which achieved only 0.8984 mAP50. Furthermore, based on the software prototype developed in the academic environment, the current research model has an inference time of 0.0073 s, compared to Fast RCNN, which achieved 0.17 s. Feature improvements include balancing the dataset using up-sampling techniques and other data augmentation techniques such as rotation, scaling, and contrast. Additionally, adding more classes to the model will enhance its ability to detect a broader range of real-world PPE, including gloves and other tools. Moreover, adding images from multiple environments, such as indoor and outdoor settings, during both day and night, will enhance the model’s diversity across different environments.

Author Contributions

Conceptualization, A.R. and M.S.A.; Data Curation, A.Y.A. and Y.H.A.; Formal Analysis, Y.H.A. and M.Y.; Funding Acquisition, M.Y. and S.M.A.; Investigation, A.A.A. and M.A.A.K.; Methodology, A.R., K.N.A., A.A.A., M.A.A.K., and M.Y.; Project Administration, M.A.A.K.; Resources, Z.S.A., S.M.A.; Software, K.N.A., A.Y.A., Y.H.A., and Z.S.A.; Supervision, A.R. and M.S.A.; Validation, A.Y.A.; Visualization, Z.S.A. and S.M.A.; Writing—Original Draft, K.N.A. and A.A.A.; Writing—Review and Editing, A.R. and S.M.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data are openly available in a public repository. PPE Object Detection Dataset (V9, Pictor-V3-Revised) by PPE, Roboflow. Available online: https://universe.roboflow.com/ppe-orxtt/ppe-u7jtr/dataset/9 (accessed on 29 December 2024).

Conflicts of Interest

The authors declare no conflicts of interest to report regarding the present study.

References

  1. Cheng, J.P.; Wong, P.K.-Y.; Luo, H.; Wang, M.; Leung, P.H. Vision-based monitoring of site safety compliance based on worker re-identification and personal protective equipment classification. Autom. Constr. 2022, 139, 104312. [Google Scholar] [CrossRef]
  2. Han, K.; Zeng, X. Deep Learning-Based Workers Safety Helmet Wearing Detection on Construction Sites Using Multi-Scale Features. IEEE Access 2022, 10, 718–729. [Google Scholar] [CrossRef]
  3. Hayat, A.; Morgado-Dias, F. Deep Learning-Based Automatic Safety Helmet Detection System for Construction Safety. Appl. Sci. 2022, 12, 8268. [Google Scholar] [CrossRef]
  4. THE LINE: A Revolution in Urban Living—NEOM. Available online: https://www.neom.com/en-us/regions/theline (accessed on 6 October 2025).
  5. Lee, Y.-R.; Jung, S.-H.; Kang, K.-S.; Ryu, H.-C.; Ryu, H.-G. Deep learning-based framework for monitoring wearing personal protective equipment on construction sites. J. Comput. Des. Eng. 2023, 10, 905–917. [Google Scholar] [CrossRef]
  6. Lo, J.-H.; Lin, L.-K.; Hung, C.-C. Real-Time Personal Protective Equipment Compliance Detection Based on Deep Learning Algorithm. Sustainability 2023, 15, 391. [Google Scholar] [CrossRef]
  7. Ferdous, M.; Ahsan, S.M.M. PPE detector: A YOLO-based architecture to detect personal protective equipment (PPE) for construction sites. PeerJ Comput. Sci. 2022, 8, e999. [Google Scholar] [CrossRef] [PubMed]
  8. Isailovic, V.; Peulic, A.; Djapan, M.; Savkovic, M.; Vukicevic, A.M. The compliance of head-mounted industrial PPE by using deep learning object detectors. Sci. Rep. 2022, 12, 16347. [Google Scholar] [CrossRef] [PubMed]
  9. Nath, N.D.; Behzadan, A.H.; Paal, S.G. Deep learning for site safety: Real-time detection of personal protective equipment. Autom. Constr. 2020, 112, 103085. [Google Scholar] [CrossRef]
  10. Gallo, G.; Rienzo, F.D.; Garzelli, F.; Ducange, P.; Vallati, C. A smart system for personal protective equipment detection in Industrial Environments Based on Deep Learning at the Edge. IEEE Access 2022, 10, 110862–110878. [Google Scholar] [CrossRef]
  11. Lee, B.; Hong, S.; Kim, H. Determination of workers’ compliance to safety regulations using a spatio-temporal graph convolution network. Adv. Eng. Inform. 2023, 56, 101942. [Google Scholar] [CrossRef]
  12. Vukicevic, A.M.; Djapan, M.; Isailovic, V.; Milasinovic, D.; Savkovic, M.; Milosevic, P. Generic compliance of industrial PPE by using deep learning techniques. Saf. Sci. 2022, 148, 105646. [Google Scholar] [CrossRef]
  13. Delhi, V.S.K.; Sankarlal, R.; Thomas, A. Detection of Personal Protective Equipment (PPE) Compliance on Construction Site Using Computer Vision Based Deep Learning Techniques. Front. Built Environ. 2020, 6, 136. [Google Scholar] [CrossRef]
  14. Azizi, R.; Koskinopoulou, M.; Petillot, Y. Comparison of Machine Learning Approaches for Robust and Timely Detection of PPE in Construction Sites. Robotics 2024, 13, 31. [Google Scholar] [CrossRef]
  15. Li, J.; Zhao, X.; Zhou, G.; Zhang, M. Standardized use inspection of workers’ personal protective equipment based on deep learning. Saf. Sci. 2022, 150, 105689. [Google Scholar] [CrossRef]
  16. Li, J.; Miao, Q.; Zou, Z.; Gao, H.; Zhang, L.; Li, Z. A Review of Computer Vision-Based Monitoring Approaches for Construction Workers’ Work-Related Behaviors. IEEE Access 2024, 12, 7134–7155. [Google Scholar] [CrossRef]
  17. Ahmed, M.I.B.; Saraireh, L.; Rahman, A.; Al-Qarawi, S.; Mhran, A.; Al-Jalaoud, J.; Gollapalli, M. Personal Protective Equipment Detection: A Deep-Learning-Based Sustainable Approach. Sustainability 2023, 15, 13990. [Google Scholar] [CrossRef]
  18. Li, Y.; Wei, H.; Han, Z.; Huang, J.; Wang, W. Deep Learning-Based Safety Helmet Detection in Engineering Management Based on Convolutional Neural Networks. Adv. Civ. Eng. 2020, 2020, 9703560. [Google Scholar] [CrossRef]
  19. Huang, L.; Fu, Q.; He, M.; Jiang, D.; Hao, Z. Detection algorithm of safety helmet wearing based on deep learning. Concurr. Comput. Pract. Exp. 2021, 33, e6234. [Google Scholar] [CrossRef]
  20. Rahman, A.; Alatallah, F.A.; Almubarak, A.J.; Alkhazal, H.A.; Alzayer, H.A.; Shaaban, Y.Z.; Aloup, K. Deep Learning-Based Automated Inspection of Generic Personal Protective Equipment. Comput. Mater. Contin. 2025, 85, 3507–3525. [Google Scholar] [CrossRef]
  21. Márquez-Sánchez, S.; Campero-Jurado, I.; Herrera-Santos, J.; Rodríguez, S.; Corchado, J.M. Intelligent Platform Based on Smart PPE for Safety in Workplaces. Sensors 2021, 21, 4652. [Google Scholar] [CrossRef] [PubMed]
  22. Cabrejos, J.A.L.; Roman-Gonzalez, A. Artificial Intelligence System for Detecting the Use of Personal Protective Equipment. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 580–585. [Google Scholar] [CrossRef]
  23. Balakreshnan, B.; Richards, G.; Nanda, G.; Mao, H.; Athinarayanan, R.; Zaccaria, J. PPE Compliance Detection using Artificial Intelligence in Learning Factories. Procedia Manuf. 2020, 45, 277–282. [Google Scholar] [CrossRef]
  24. Pooja, S.; Preeti, S. Face Mask Detection Using AI BT—Predictive and Preventive Measures for COVID-19 Pandemic; Khosla, P.K., Mittal, M., Sharma, D., Goyal, L.M., Eds.; Springer: Singapore, 2021; pp. 293–305. [Google Scholar]
  25. Muanme, W.; Pararach, S.; Kaeprapha, P. An Artificial Intelligence Camera System to Check Worker Personal Protective Equipment Before Entering Risk Areas. In Proceedings of the International Conference on Innovative Computing Volume 2—Emerging Topics in Future Internet; Hung, J.C., Chang, J.-W., Pei, Y., Eds.; Springer Nature Singapore: Singapore, 2023; pp. 747–755. [Google Scholar]
  26. PPE Datasets and Pre-Trained Models—Roboflow. Available online: https://universe.roboflow.com/browse/manufacturing/ppe (accessed on 31 December 2024).
  27. Explore Ultralytics YOLOv11. Available online: https://yolov11.com/ (accessed on 23 February 2025).
  28. Blagus, R.; Lusa, L. SMOTE for high-dimensional class-imbalanced data. BMC Bioinform. 2013, 14, 106. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  29. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 13–15 December 2013; pp. 779–788. [Google Scholar] [CrossRef]
  30. Rahman, A. Solar Panel Surface Defect and Dust Detection: Deep Learning Approach. J. Imaging 2025, 11, 287. [Google Scholar] [CrossRef] [PubMed]
  31. Riaz, M.; He, J.; Xie, K.; Alsagri, H.S.; Moqurrab, S.A.; Alhakbani, H.A.A.; Obidallah, W.J. Enhancing Workplace Safety: PPE_Swin—A Robust Swin Transformer Approach for Automated Personal Protective Equipment Detection. Electronics 2023, 12, 4675. [Google Scholar] [CrossRef]
  32. Chen, Z.; Chen, H.; Imani, M.; Chen, R.; Imani, F. Vision language model for interpretable and fine-grained detection of safety compliance in diverse workplaces. Expert Syst. Appl. 2025, 265, 125769. [Google Scholar] [CrossRef]
Figure 1. Methodology of the proposed PPE-EYE system.
Figure 1. Methodology of the proposed PPE-EYE system.
Computers 15 00045 g001
Figure 2. YOLO11 architecture [26].
Figure 2. YOLO11 architecture [26].
Computers 15 00045 g002
Figure 3. (a) Original CHVG dataset; (b) dataset instances after balancing.
Figure 3. (a) Original CHVG dataset; (b) dataset instances after balancing.
Computers 15 00045 g003
Figure 4. YOLO11 training graph.
Figure 4. YOLO11 training graph.
Computers 15 00045 g004
Figure 5. Models comparison.
Figure 5. Models comparison.
Computers 15 00045 g005
Figure 6. Precision–recall curve of the best model.
Figure 6. Precision–recall curve of the best model.
Computers 15 00045 g006
Figure 7. F1-confidence curve.
Figure 7. F1-confidence curve.
Computers 15 00045 g007
Figure 8. Normalized confusion matrix.
Figure 8. Normalized confusion matrix.
Computers 15 00045 g008
Figure 9. Examples of model’s predictions.
Figure 9. Examples of model’s predictions.
Computers 15 00045 g009
Figure 10. Deployed model user interface.
Figure 10. Deployed model user interface.
Computers 15 00045 g010
Figure 11. PPE non-compliance incident reporting.
Figure 11. PPE non-compliance incident reporting.
Computers 15 00045 g011
Figure 12. Summary of overall PPE compliance.
Figure 12. Summary of overall PPE compliance.
Computers 15 00045 g012
Table 1. Summary of literature review.
Table 1. Summary of literature review.
RefYearTechnique/s UsedResultsDataset
[20]2025YOLO11 for generic PPE detectionDeveloped a system with promising result of 95.5% mAP score.Public CHVG and manually collected
[14]2024Faster R-CNN + ResNet-50, Few-Shot Object DetectionFaster R-CNN achieved 73.8% mAP; FsDet effective with limited data.Not specified
[25]2023YOLO, AI camera technologyDescribed a system that increases PPE compliance with real-time detection Not specified
[5]2023YOLACT and DeepSORT66.4 mAPMOCS and ACID
[11]2023OpenPose and ST-GCN for activity classificationThe approach achieved F1-score above 80%Not public
[17]2023Faster RCNN with the ResNet50 backbonemAP50 of 0.96CHVG
[22]2023YOLO neural networkmAP 98.13% and a recall 86.78%Not specified
[8]2022Faster R-CNN, MobileNetV2-SSD, YOLOv5YOLOv5 achieved a precision of 0.920 and recall of 0.611Dataset link 1 and dataset link 2
[1]2022ReID and PPE classificationIdentification improved by 4%, and PPE detection improved by 13%Not public
[15]2022YOLOv5, OpenPose, 1D-CNNAchieved a high accuracy of 94.67% in identifying improper PPE useNot public
[3]2022YOLOv592.44% mAPDataset link
[5]2022YOLOv797% mAPDataset link
[7]2022YOLOX architecture89.84% mAPCHVG
[10]2022DNN, edge computing, YOLOComparison of CNNs based on performance and latency; YOLO was effective.Dataset link
[12]2022HigherHRNet for pose; MobileNetV2, DenseNet, and ResNet MobileNetV2 achieved 98% accuracy, DenseNet reached 97%, and ResNet 96%.Dataset link 1 and dataset link 2
Table 2. Dataset description.
Table 2. Dataset description.
Class NumberClassInstancesObjects
1Person4764674
2Vest2172137
3Glass 51533
4Head71730
5White521290
6Yellow1491200
7Blue54543
8Red119496
Total169911,603
Table 3. Benchmark comparison.
Table 3. Benchmark comparison.
ModelmAP50EpochsF1-ScorePrecisionRecallAccuracyInference Time (ms)
YOLOX-m [7]0.89841910.90340.9040.903-150
Faster RCNN [17]0.96200.72650.680.78-170
YOLO11 [20]0.952900.9250.9230.921-7.3
Proposed YOLO11x (PPE-EYE)0.969210.900.90890.90750.82767.3
Table 4. Model results.
Table 4. Model results.
ModelEpochsBatchOptimizerLearning RateMomentumcos_lrmAP50mAP50-95
YOLO10n2532auto0.010.937False0.770.473
YOLO10s7516auto0.010.937False0.8730.545
YOLO11l10028NAdam0.000010.937True0.9070.587
YOLO11x8720NAdam0.000010.5True0.910.603
YOLO11x2116NAdam0.000010.937True0.9690.709
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Rahman, A.; Ahmed, M.S.; AlBugami, K.N.; Alabbad, A.Y.; AlFantoukh, A.A.; Alshaikhahmed, Y.H.; Alzahrani, Z.S.; Khan, M.A.A.; Youldash, M.; Alshahrani, S.M. PPE-EYE: A Deep Learning Approach to Personal Protective Equipment Compliance Detection. Computers 2026, 15, 45. https://doi.org/10.3390/computers15010045

AMA Style

Rahman A, Ahmed MS, AlBugami KN, Alabbad AY, AlFantoukh AA, Alshaikhahmed YH, Alzahrani ZS, Khan MAA, Youldash M, Alshahrani SM. PPE-EYE: A Deep Learning Approach to Personal Protective Equipment Compliance Detection. Computers. 2026; 15(1):45. https://doi.org/10.3390/computers15010045

Chicago/Turabian Style

Rahman, Atta, Mohammed Salih Ahmed, Khaled Naif AlBugami, Abdullah Yousef Alabbad, Abdullah Abdulaziz AlFantoukh, Yousef Hassan Alshaikhahmed, Ziyad Saleh Alzahrani, Mohammad Aftab Alam Khan, Mustafa Youldash, and Saeed Matar Alshahrani. 2026. "PPE-EYE: A Deep Learning Approach to Personal Protective Equipment Compliance Detection" Computers 15, no. 1: 45. https://doi.org/10.3390/computers15010045

APA Style

Rahman, A., Ahmed, M. S., AlBugami, K. N., Alabbad, A. Y., AlFantoukh, A. A., Alshaikhahmed, Y. H., Alzahrani, Z. S., Khan, M. A. A., Youldash, M., & Alshahrani, S. M. (2026). PPE-EYE: A Deep Learning Approach to Personal Protective Equipment Compliance Detection. Computers, 15(1), 45. https://doi.org/10.3390/computers15010045

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop