Applying the Haar-cascade Algorithm for Detecting Safety Equipment in Safety Management Systems for Multiple Working Environments

There are many ways to maintain the safety of workers on a working site, such as using a human supervisor, computer supervisor, and smoke–flame detecting system. In order to create a safety warning system for the working site, the machine-learning algorithm—Haar-cascade classifier—was used to build four different classes for safety equipment recognition. Then a proposed algorithm was applied to calculate a score to determine the dangerousness of the current working environment based on the safety equipment and working environment. With this data, the system decides whether it is necessary to give a warning signal. For checking the efficiency of this project, three different situations were installed with this system. Generally, with the promising outcome, this application can be used in maintaining, supervising, and controlling the safety of a worker.


Introduction
Since the introduction of Industry 4.0 (I4) in 2011 at the Hannover Fair in Germany, automation and machine learning (ML) have piqued the interest of researchers to apply them to industry, agriculture, and other services. This field forms an important part of modern business and research. ML can improve computing performance in processes pertaining to a single factory or system, a chain of factories, or multi-systems used in any organization. I4 will benefit human society when it synergizes artificial intelligence (AI) with automation in production. In this century, four billion people are connected through the Internet, and there exist 50 trillion gigabyte of data and 25 million tablet and PC applications. All of them have been developed based on I4's revolution. In addition, the effect of I4 spreads to each field of human life such as agriculture, industry, medical, and education. In Germany, 75% of the factories are smart factories that use AI to control manufacturing systems. To compete with this development trend, the Korean Government has requested many corporations and factories to develop a smart system. With this urgency, collaboration among scientific fields such as computer science, chemistry, and physics is mandatory to cope with current trends. The applications of ML are various, e.g., object recognition, face detection, spoken language understanding, customer segmentation, and weather prediction. As hazardous chemicals need to be handled in chemical engineering, an intelligent system that can control and maintain the safety level of a working environment is urgently demanded. Therefore, in this research, a safety system is introduced that can remind workers to wear personal protective equipment when they are working in a dangerous environment. This system is the combination of some preprocessing algorithms, the ML Haar-cascade algorithm, and system control. The Haar cascade is an ML object detection algorithm used to identify objects in an image or video and is based on the concept of features proposed by Paul Viola and Michael Jones in their 2 of 14 paper [1]. Complete learning is always based on typical observations or data, i.e., programming by examples. This system includes several steps such as training classifiers and applying classifiers. The training-classifier step comprises processes such as obtaining data (images) from video, applying preprocessed images, categorizing images to several groups, and training these images using the cascade algorithm. Moreover, the applying-classifier steps include collecting images from video, detecting safety objects, calculating a safety score, and providing feedback based on the safety score. With this research, industrial companies will be able to detect and control a working environment's safety automatically with the assistance of computers. And the Safety Management System (SMS) is one of the cornerstones of the safety regulatory framework that helps to ensure a high level of safety of a company. This system from this paper could be an intelligence SMS part of the firms' AI in the I4.0 century.
There are a lot of other equipment used in a normal environment, such masks, safety cloth for the lab, and safety cloth for workers in different working environments. In this study, our system was set only for four classes (human, safety helmet, hooks, and gloves) because the others require a very large dataset. For example, there are many types of masks and cloth such as normal masks, gas masks, or chemical masks (for mask); and normal cloths, safety cloths, or lab cloths (for cloth). Moreover, the selected equipment (helmet, hook, and gloves) have the same pattern structure and are mostly used as safety protection. This paper had four parts: Introduction, Materials and Methods, Experiment Results, and Conclusion.

Related Work of Machine Learning Algorithm
Before Haar cascade's invention and application, many templates and objects matching algorithms with extremely high accuracy existed, such as the scale-invariant feature transform, speed up robust feature, and oriented fast and rotated binary robust independent elementary features [2]. These algorithms exhibit a high efficiency but cannot be applied to real-time detection owing to their long processing times. Meanwhile, the Haar-cascade algorithm is an ML-based approach where a cascade function is trained from numerous positive and negative images. It is subsequently used to detect objects in other images. The algorithm comprises four stages: Haar feature selection, creating integral images, AdaBoost training, and cascading classifiers, as shown in Figure 1 and in [3]. algorithm used to identify objects in an image or video and is based on the concept of features proposed by Paul Viola and Michael Jones in their paper [1]. Complete learning is always based on typical observations or data, i.e., programming by examples. This system includes several steps such as training classifiers and applying classifiers. The training-classifier step comprises processes such as obtaining data (images) from video, applying preprocessed images, categorizing images to several groups, and training these images using the cascade algorithm. Moreover, the applying-classifier steps include collecting images from video, detecting safety objects, calculating a safety score, and providing feedback based on the safety score. With this research, industrial companies will be able to detect and control a working environment's safety automatically with the assistance of computers. And the Safety Management System (SMS) is one of the cornerstones of the safety regulatory framework that helps to ensure a high level of safety of a company. This system from this paper could be an intelligence SMS part of the firms' AI in the I4.0 century.
There are a lot of other equipment used in a normal environment, such masks, safety cloth for the lab, and safety cloth for workers in different working environments. In this study, our system was set only for four classes (human, safety helmet, hooks, and gloves) because the others require a very large dataset. For example, there are many types of masks and cloth such as normal masks, gas masks, or chemical masks (for mask); and normal cloths, safety cloths, or lab cloths (for cloth). Moreover, the selected equipment (helmet, hook, and gloves) have the same pattern structure and are mostly used as safety protection. This paper had four parts: Introduction, Materials and Methods, Experiment Results, and Conclusion.

Related Work of Machine Learning Algorithm
Before Haar cascade's invention and application, many templates and objects matching algorithms with extremely high accuracy existed, such as the scale-invariant feature transform, speed up robust feature, and oriented fast and rotated binary robust independent elementary features [2]. These algorithms exhibit a high efficiency but cannot be applied to real-time detection owing to their long processing times. Meanwhile, the Haar-cascade algorithm is an ML-based approach where a cascade function is trained from numerous positive and negative images. It is subsequently used to detect objects in other images. The algorithm comprises four stages: Haar feature selection, creating integral images, AdaBoost training, and cascading classifiers, as shown in Figure 1 and in [3].  The cascade classifier consists of a collection of stages, in which each stage is an ensemble of weak learners. The weak learners are simple classifiers called decision stumps. Each stage is trained using a technique called boosting. Boosting provides the ability to train a highly accurate classifier by taking a weighted average of the decisions made by the weak learners. Each stage of the classifier labels the region defined by the current location of the sliding window as either positive or negative. A positive indicates that an object was found and a negative indicates that no objects were found. If the label is negative, the classification of this region is complete, and the detector slides the window to the next location. If the label is positive, the classifier passes the region to the next stage. The detector reports an object found at the current window location when the final stage classifies the region as a positive. Cascade classifier training requires a set of positive samples and a set of negative images. Haar-like features are attributes extracted from images used in pattern recognition. Their names are derived from their similarities to Haar wavelets. First, the pixel values inside the black area are added together; subsequently, the values in the white area are added. Next, the total value of the white area is subtracted from the total value of the black area. This result is used to categorize image sub-regions. The application of this algorithm varies from face detection to other object recognition applications. During the Haar-cascade algorithm process, the AdaBoost learning algorithm was also applied to boost the performance of the training process. AdaBoost required a large number of examples that had a strong effect on the generalization of the training error. It combined weak classifiers into strong ones using its specific Equations [4]. By collecting positive and negative images of a single object, this algorithm can build a completed classifier that can detect an object within a short time (almost real-time) and with high efficiency (~99.2-99.8%) compared to other algorithms.
Before the introduction of the Haar-cascade algorithm in 2001, many object recognition applications have been created. Devi et al. used an additional principal component analysis (PCA) to reduce the complexity of face images, decrease data size, and remove noise [4]. Subsequently, Navaz et al. combined PCA with neural networks for face recognition and sex determination [5]. These previous algorithms demonstrated some disadvantages such as a low percentage of classification (31.48-94.5%) and high mean square error (0.02-0.12). Meanwhile, with the advantages of quick detection and high efficiency, the Haar cascade was applied in many studies [6][7][8][9][10][11][12][13][14]. Wanjale et al. tried to detect the face of registered people from an input video [6]. This concept was applied in real-time video with a high accuracy rate and fast speed. However, this implementation depended on the video quality (light, angle, no obstacles). Additionally, Cuimei et al. improved the Haar cascade by combining three different classifiers (color HSV, histogram matching, and eyes/mouth detection) [7]. In 2017, Ulfa et al. applied the Haar cascade to detect a motorcycle [8]. Last year, Arreola et al. applied this algorithm to a quad-rotor Unmanned Aerial Vehicle (UAV) to detect face objects [9]. In addition to this algorithm, a few others can be applied to real-time tracking topics such as linear binary patterns (LBPs) or a histogram of object gradients (HOG). Cruz et al. and Guennouni et al. compared these three algorithms together in their project of detecting objects using UAVs. The results indicated that the Haar-like cascade performed better than LBP in accuracy rate and better than HOG in speed [10][11][12][13][14]. Moreover, there are many researches on using deep learning and applying it in detecting cloths and non-hardhat-use for fashion and surveillance videos [15][16][17]. However, these previous deep-learning algorithms were applied mostly for fashion with HOG (switch is slower than Haar cascade), not for safety management control. Therefore, in this work, we used the Haar cascade to train classifiers with fast speed and high accuracy. With these advantages of the Haar cascade algorithm, our system to train and detect safety objects in real time as well as calculate a safety score will be a valuable contribution to human working safety.

Obtaining Images from Raw Video and Preprocessing and Categorizing Them
This programmed was written in the Python language and ran on an Intel Core i7-6700 CPU-3.40 GHz, with 16 GB RAM and an NVIDIA GeForce GTX 1050 graphics card. This program used the coding library called Open Computer Vision Library (OpenCV) and training libraries from two sources. The first source came from the Open Images Dataset and the second one was from our recorded videos. Initially, the learning step runs before the detecting step. As this algorithm requires a large number of input images, approximately 10 videos were used (30 fps) and some image databases from the internet. These 10 videos were recorded by phone in different backgrounds (school zone, construction zone, and chemical site zone). From the Open Images Dataset, color images of humans, which size varied from 100 × 100 pixels to 200 × 200 pixels were collected. In addition, to collect objects from these videos, motion detection and a tracking algorithm were applied, as shown in Figure 2. These videos have a size of 900 × 500 pixels. In each video, the first background image (no human or safety equipment as in Figure 3a) of three videos is stored before the next frame of the video is processed as the current frame (as Figure 3b). After recording by the computer, the current frame image was applied with color channel switching (from RGB to gray as in Figure 3c) and the Gaussian blur algorithm as Formula 1 (opencv: cv2.cvtColor and cv2.GaussianBlur) (Figure 3d). The idea of Gaussian blur is to use this 2-D distribution as a "point-spread" function, and this point is achieved by convolution. Since the image is stored as a collection of discrete pixels, we need to produce a discrete approximation to the Gaussian function before we can perform the convolution. The Gaussian outputs a "weighted average" of each pixel's neighborhood, with the average weighted more towards the value of the central pixels. This is in contrast to the mean filter's uniformly weighted average. Because of this, the Gaussian provides gentler smoothing and preserves edges better than a similarly sized mean filter (blur or median blur). Subsequently, the frame difference between the background frame and current frame is calculated by the function cv2.absdiff, as shown in Figure 3e. The cv2.absdiff is a function that helps in finding the absolute difference between the pixels of the two image arrays. By using this, we will be able to extract just the pixels of the objects that are moving. To use cv2.absdiff we will need to convert our images to grayscale (grayscale is a range of shades of gray ranging from black to white). Based on the frame difference, the dilation of threshold images was found and stored in a basic array (Figure 3f). The threshold binary is a method used in this case as the simplest method to reduce noise [18][19][20]. After that, cv2.findContours functions runs to output these separate shapes appearing in Figure 3f. There are four types of contour: CV_RETR_EXTERNAL, CV_RETR_LIST, CV_RETR_CCOMP, and CV_RETR_TREE. In this case, we use CV_RETR_EXTERNAL to retrieve the extreme outer contours and compress three segments (horizontal, vertical, and diagonal) to only their four ends. Only the shapes with an acceptable size were put out and saved to the computers. This process is shown in Figure 3.
where x is the distance from the origin in the horizontal axis, y is the distance from the origin in the vertical axis, and σ is the standard deviation of the Gaussian distribution. used the coding library called Open Computer Vision Library (OpenCV) and training libraries from two sources. The first source came from the Open Images Dataset and the second one was from our recorded videos. Initially, the learning step runs before the detecting step. As this algorithm requires a large number of input images, approximately 10 videos were used (30 fps) and some image databases from the internet. These 10 videos were recorded by phone in different backgrounds (school zone, construction zone, and chemical site zone). From the Open Images Dataset, color images of humans, which size varied from 100 x 100 pixels to 200 x 200 pixels were collected. In addition, to collect objects from these videos, motion detection and a tracking algorithm were applied, as shown in Figure 2. These videos have a size of 900 x 500 pixels. In each video, the first background image (no human or safety equipment as in Figure 3a) of three videos is stored before the next frame of the video is processed as the current frame (as Figure 3b). After recording by the computer, the current frame image was applied with color channel switching (from RGB to gray as in Figure 3c) and the Gaussian blur algorithm as Formula 1 (opencv: cv2.cvtColor and cv2.GaussianBlur) (Figure 3d). The idea of Gaussian blur is to use this 2-D distribution as a "point-spread" function, and this point is achieved by convolution. Since the image is stored as a collection of discrete pixels, we need to produce a discrete approximation to the Gaussian function before we can perform the convolution. The Gaussian outputs a "weighted average" of each pixel's neighborhood, with the average weighted more towards the value of the central pixels. This is in contrast to the mean filter's uniformly weighted average. Because of this, the Gaussian provides gentler smoothing and preserves edges better than a similarly sized mean filter (blur or median blur). Subsequently, the frame difference between the background frame and current frame is calculated by the function cv2.absdiff, as shown in Figure 3e. The cv2.absdiff is a function that helps in finding the absolute difference between the pixels of the two image arrays. By using this, we will be able to extract just the pixels of the objects that are moving. To use cv2.absdiff we will need to convert our images to grayscale (grayscale is a range of shades of gray ranging from black to white). Based on the frame difference, the dilation of threshold images was found and stored in a basic array (Figure 3f). The threshold binary is a method used in this case as the simplest method to reduce noise [18][19][20]. After that, cv2.findContours functions runs to output these separate shapes appearing in Figure 3f. There are four types of contour: CV_RETR_EXTERNAL, CV_RETR_LIST, CV_RETR_CCOMP, and CV_RETR_TREE. In this case, we use CV_RETR_EXTERNAL to retrieve the extreme outer contours and compress three segments (horizontal, vertical, and diagonal) to only their four ends. Only the shapes with an acceptable size were put out and saved to the computers. This process is shown in Figure 3.
where x is the distance from the origin in the horizontal axis, y is the distance from the origin in the vertical axis, and σ is the standard deviation of the Gaussian distribution.   After the images were obtained, these input images' sizes were found to vary from 10 × 10 pixels to 200 × 200 pixels. The total images (near three million) were classified to different folders such as helmet, hook, gloves, and human, by hand. The primary need-to-detect objects were helmet, hook, gloves, and people. Therefore, these four required classifiers were used to build the safety system. However, to use the Haar-cascade algorithm, these images were categorized into positive and negative images. A positive image is one containing an object that must be detected; a negative image is one not containing a need-to-find object, as shown in Figure 4. In our case, for example, these positive images of a helmet classifier are helmet images, and the negative ones are hook, human, pipelines, and background. It is similar to the hook, gloves, and human classifiers. To finish the learning step, "good dipping" images were grouped in a folder named "positive", and "negative" images were grouped in a folder named "negative." After the images were obtained, these input images' sizes were found to vary from 10 × 10 pixels to 200 × 200 pixels. The total images (near three million) were classified to different folders such as helmet, hook, gloves, and human, by hand. The primary need-to-detect objects were helmet, hook, gloves, and people. Therefore, these four required classifiers were used to build the safety system. However, to use the Haar-cascade algorithm, these images were categorized into positive and negative images. A positive image is one containing an object that must be detected; a negative image is one not containing a need-to-find object, as shown in Figure 4. In our case, for example, these positive images of a helmet classifier are helmet images, and the negative ones are hook, human, pipelines, and background. It is similar to the hook, gloves, and human classifiers. To finish the learning step, "good dipping" images were grouped in a folder named "positive", and "negative" images were grouped in a folder named "negative."

Creating the Haar-Cascade Classifier to Detect Objects
After the procedures in Section 2.2.1 were performed, the couple sets (sets of negative and positive images) were used for creating a classifier of each different object mentioned in the previous section. In this step, the performance is improved from Haar-cascade training by AdaBoost, thus allowing for the algorithm to contain a large number of examples that significantly affects the generalization performance of a strong classifier's training error. This caused a small number of the images containing the need-to-find object to be misclassified. The AdaBoost algorithm composites with the learning process simultaneously. The purpose of learning was to construct a classifier for the recognition of focus objects. The learning process comprises many states that must be decided by the user. For each state, the computer creates a first classifier from the positive images and tests it on the negative images for evaluation and subsequently builds a second classifier featuring higher detection rates. The second classifier is subsequently used in the next states. This process ends when the last state is completed. The cascade stages are performed by training the classifier tool using the AdaBoost algorithm and compiling with the threshold algorithm to minimize the error rate. The technical input information is listed in Table 1. The number of images for each class is listed in Table 2.

Creating the Haar-Cascade Classifier to Detect Objects
After the procedures in section 2.2.1 were performed, the couple sets (sets of negative and positive images) were used for creating a classifier of each different object mentioned in the previous section. In this step, the performance is improved from Haar-cascade training by AdaBoost, thus allowing for the algorithm to contain a large number of examples that significantly affects the generalization performance of a strong classifier's training error. This caused a small number of the images containing the need-to-find object to be misclassified. The AdaBoost algorithm composites with the learning process simultaneously. The purpose of learning was to construct a classifier for the recognition of focus objects. The learning process comprises many states that must be decided by the user. For each state, the computer creates a first classifier from the positive images and tests it on the negative images for evaluation and subsequently builds a second classifier featuring higher detection rates. The second classifier is subsequently used in the next states. This process ends when the last state is completed. The cascade stages are performed by training the classifier tool using the AdaBoost algorithm and compiling with the threshold algorithm to minimize the error rate. The technical input information is listed in Table 1. The number of images for each class is listed in Table  2.

Data/Type
Number of States 20  When these four completed classifiers of different objects such as the Helmet classifier (H-Class), Glove classifier (G-Class), Hook classifier (Ho-Class), and Human classifier (Hu-Class) were found, the proposed system used them to help AI to maintain the safety level of each object.

Creating a Safety System for a Chemical Plant Environment
After performing the cascade, a classifier .xml file was generated containing the results of the training process for each class. Four output classifiers were used to detect different objects (helmet, gloves, and hook). To create a safety system, a safety score was calculated based on the scores of the four classes as Equation (2). Each score of H-Class, G-Class, and Ho-Class was based on Equation (3), and the Hu-Score was from Equation (4).
where SC(x) is the safety score used to decide whether the system puts an alarm; S Hu is the score of Human; S H is the score of Helmet; S G is the score of Gloves; and S Ho is the score of Hook.
where S(y) is the score of the class; T is the difference between the appearing time of an object and that of a human; T is the appearing time of a human; W is the weight value of the object decided by the user based on specific situations; the i is the number of objects; and h is the number of humans.
where S Hu is the score of the Human class and W is the weight value of each class (H-Class, G-Class, and Ho-Class). For each of the five frames, the system detects the number of humans in the frame; subsequently, if a human is found, S Ho , S H , and S G were calculated only when the number of human is higher than the number of detecting objects of each class. The weight value of each class is decided based on the working environment which can be decided later. For example, if the employee works in a scaffolding environment, weight value W of hook can be assigned as 1. And if that person does his job in a ground environment, hook does not need to count, so its W can be 0. But in this research, we decided to have a strong consideration for all of this safety equipment, therefore all of W for class H, G, and Ho were designed as 1. On parallel time, the appearing times of these object were recorded by the Python function "time.time()". The appearing time of these objects are recorded when the object is found the first time. Based on different situations, S Ho , S H , and S G can either be counted or not. Moreover, this system can be flexible to different working environments, such as workers at high places who are required to wear helmets and hooks, or workers at dangerous place who are required to wear gloves. This concept is controlled by the weight value of each class, which is decided by the manager. When SC(x) is found, if it is smaller than a specific value (baseline SC(x)), the system will output an alarm signal. This baseline can be established by a user depending on the situation. For example, a school zone does not require a worker to wear safety helmet, hook, or gloves; meanwhile, a chemical site requires these outwear protections strongly. The general concept of our system is shown in Figure 5. The system is programmed in the Python language and tested on a PC with an I7-3770 3.4 GHz processor and 16 GB RAM.
such as workers at high places who are required to wear helmets and hooks, or workers at dangerous place who are required to wear gloves. This concept is controlled by the weight value of each class, which is decided by the manager. When SC(x) is found, if it is smaller than a specific value (baseline SC(x)), the system will output an alarm signal. This baseline can be established by a user depending on the situation. For example, a school zone does not require a worker to wear safety helmet, hook, or gloves; meanwhile, a chemical site requires these outwear protections strongly. The general concept of our system is shown in Figure 5. The system is programmed in the

Performance of Four Class Classifiers
To evaluate the performance of the classifiers, five positive videos, two negative videos, and one background video were used as a test example. Each video is of a different length but the same fps. The three types of video cases were as follows: Positive videos from chemical plants and structure sites containing many workers with safety equipment; negative videos exhibiting people with few or without safety equipment; and background video from normal life with a few humans without any safety equipment. These videos span from 15 to 75 min. For each video case, a single classifier was used to detect its object (H-Class for Helmet, Hu-Class for Human, G-Class for Gloves, and Ho-Class for Hooks). These objects that were found by these classifiers were saved to a PC. The true positive object was that detected by the system as an object of the class and it was an exact

Performance of Four Class Classifiers
To evaluate the performance of the classifiers, five positive videos, two negative videos, and one background video were used as a test example. Each video is of a different length but the same fps. The three types of video cases were as follows: Positive videos from chemical plants and structure sites containing many workers with safety equipment; negative videos exhibiting people with few or without safety equipment; and background video from normal life with a few humans without any safety equipment. These videos span from 15 to 75 min. For each video case, a single classifier was used to detect its object (H-Class for Helmet, Hu-Class for Human, G-Class for Gloves, and Ho-Class for Hooks). These objects that were found by these classifiers were saved to a PC. The true positive object was that detected by the system as an object of the class and it was an exact object. A false object was detected as an object of the class, but it was not an object (Type-I error). The result of each class classifier is listed in Table 3. Table 3. Detection results for each class in eight cases.

No.
No. Frame As shown in Table 3, the classification accuracy rate of each classifier is presented as Figure 6. And the error rate of each classifier is presented as Figure 7. These data are calculated from Equation (5). CAR (Classification Accuracy Rate) is a number that can represent the number of correct predictions among all predictions made. It is a good metric when a binary classification problem is encountered (object or non-object). Moreover, to increase the quality of the result, for each case of calculating CAR, a triplicate measurement was performed. Each time yielded a different value and these errors were the average of those differences.

Hu-Class H-Class G-Class Ho-Class
where CAR is the fraction of correct predictions over total prediction, correct predictions are the total correct predictions decided by the system, and total prediction is the total numbers of predictions decided by the system. In addition, to check the efficiency of this system when using it for a real-time safety management system, a time difference variable is designed to record how fast and how close the system ran comparing with video as Equation (6).
where Time difference is decided by a subtraction between processing time and video time. The processing time is calculated by the subtraction between the times when the application start until it ends.

Performance of the Safety System
To measure the efficiency of the proposed safety system, eight videos were used. These videos were recorded with different backgrounds such as school zones, chemical plants, and construction sites with people wearing safety equipment or not. In each video, the safety system algorithm (as in

Performance of the Safety System
To measure the efficiency of the proposed safety system, eight videos were used. These videos were recorded with different backgrounds such as school zones, chemical plants, and construction sites with people wearing safety equipment or not. In each video, the safety system algorithm (as in Figure 5) was executed several times. Each video exhibits a different number of frames; therefore, it was extremely difficult to store all the safety scores. Hence, the safety score of every frame in every 10 min was stored and the average was calculated. The values in Table 4 are the average 10 min safety score of each video. The black cell represents the video ending before that milestone. Moreover, the accuracy rate of this safety system was calculated by counting the number of correct times the system underwent a warning over the number of system warnings. For each video, the safety system algorithm was applied thrice to measure the average of the accuracy rate. For each type of video, the baseline SC(x) was decided by the users. If the average safety scores of every 10 min were below the baseline, the system will trigger an alarm. Baseline SC(x) was different in each case (school zone: 0.5; construction site: 0.70; and chemical site: 0.75).

Discussion
In Table 3, the number in the frame column represents the length of each video. The number of false positives is smaller than that of positives in all cases, implying that these classifiers performed with high efficiency. Therefore, the CARs of the Hu-Class were from 0.989 to 0.992 (98.9% to 99.2%); for H-Class, the CARs were from 95.6% to 97.1%; for G-Class, the CARs were in the range between 0.859 and 0.877; and, finally, Ho-Class's CARs varied from 66.8% to 68.2% with very low error rate as Figures 6 and 7. The CAR of Ho-Class is the lowest among the four classes because the object of hook used in these videos from metal, which is typically shivered by sunlight and is extremely difficult to be recorded by any type of camera. Moreover, hooks used in safety must be attached to places that are frequently blocked by pipelines in chemical plant sites or walls. In the G-Class, the numbers of gloves detected by the glove's classifiers were high because workers must wear gloves when they are working. Any failure (false detection) can be explained by two types of problems such as using insufficient training examples or choosing many training stages. The overall form of cascade classifier resembles a degeneration tree. A positive result from Stage 1 is adjusted and resulted in Stage 2 to achieve a high final detection rate. Finally, in the last case (background video), the number of true positive and false positive examples are insufficient to draw any conclusion. Therefore, we excluded the last case data from the graph.
The error rate of each class demonstrated how the variable spread out the residuals of the algorithm. The error rates were from the difference of each time these classifiers were tested on the videos. This information is an uncertainty of the classifiers over a certain statistic (how our result might differ from the real situation). For example, the CAR of Hu-Class was 98.9% with a 3% difference from a real population point. Figure 8 shows the results of our safety system tested based on eight cases. The cases indicate the average accuracy from 62.5% to 79.3%. The defect might vary based on the quality of the recorded video, the light and shadow of the video, and the complexity of the background image. For example, in test Videos 1 and 8 (school zone), the background was clear, the light and shadow of the video was good, and the complexity was low. Therefore, the accuracy rates of our safety system in these cases were high (over 96%). However, in the other cases, the backgrounds were construction sites and chemical plant sites comprising many pipelines, tubes, and scaffolding that occasionally blocked the objects. To overcome these disadvantages, the system's camera should be installed at a high location rather than a low location. At a high location, the blocking obstacles will be cleared for object detecting. This will be our future study. Moreover, in Figure 9, the difference between the process time of the system applied to the videos and the real-time video was very slight, around 0.1 min (or 6 s), which is very close to the video's time and 1 s faster than the system run with HOG. It means that there is a high probability for applying this system in real-time detecting.
scaffolding that occasionally blocked the objects. To overcome these disadvantages, the system's camera should be installed at a high location rather than a low location. At a high location, the blocking obstacles will be cleared for object detecting. This will be our future study. Moreover, in Figure 9, the difference between the process time of the system applied to the videos and the real-time video was very slight, around 0.1 min (or 6 s), which is very close to the video's time and 1 s faster than the system run with HOG. It means that there is a high probability for applying this system in real-time detecting.   blocking obstacles will be cleared for object detecting. This will be our future study. Moreover, in Figure 9, the difference between the process time of the system applied to the videos and the real-time video was very slight, around 0.1 min (or 6 s), which is very close to the video's time and 1 s faster than the system run with HOG. It means that there is a high probability for applying this system in real-time detecting.

Conclusions
It is important to maintain the safety of a working environment for workers. Controlling or monitoring the safety score reduces a company, factory, or any organization's accident rate. Workers are required to wear safety equipment or devices for protection; however, they are occasionally dismissed. Therefore, the safety score system was introduced to detect humans with these protection accessories. Using strong Haar-cascade classifiers from a large number of training sets, the cover setup was programmed to detect humans, helmets, gloves, and hooks as classes with extremely high accuracy (human: 98.9%; helmet: 95.9%; gloves: 85.9%; and hooks: 66.5%) in recorded videos. Furthermore, by recognizing these classes, the safety score of each video at 10' were calculated and the system warning might signal based on different situations. If this system is applied in real life, a company's manager can decide a response for different warning situations. With the advantages of the Haar-cascade algorithm, this system can be used as a real-time safety tracking system. Other safety equipment, such as safety masks and safety uniforms, will be reported in our future research. Moreover, in the future work, applying the Haar-cascade algorithm with deep learning will make the system run faster than current deep learning using HOG.