Deep Learning Based Protective Equipment Detection on Offshore Drilling Platform

: There is a symmetrical relationship between safety management and production efﬁciency of an offshore drilling platform. The development of artiﬁcial intelligence makes people pay more attention to intelligent security management. It is extremely important to reinforce workplace safety management by monitoring protective equipment wearing using artiﬁcial intelligence, such as safety helmets and workwear uniforms. The working environment of the offshore drilling platforms is particularly complex due to small-scale subjects, ﬂexible human postures, oil and gas pipeline occlusions, etc. To automatically monitor and report misconduct that violates safety measures, this paper proposes a personal protective equipment detection method based on deep learning. On the basis of improving YOLOv3, the proposed method detects on-site workers and obtains the bounding box of personnel. The result of candidate detection is used as the input of gesture recognition to detect human body key points. Based on the detected key points, the area of interest (head area and workwear uniform area) is located based on the spatial relations among the human body key points. The safety helmets are recognized using the deep transfer learning based on improved ResNet50, according to the symmetry between the helmets and the workwear uniforms, the same method is used to recognize the workwear uniforms to realize the identiﬁcation of protective equipment. Experiments show that the proposed method achieves a higher accuracy in the protective equipment detection on offshore drilling platforms compared with other deep learning models. The detection accuracies of the proposed method for helmets and workwear uniforms are 94.8% and 95.4%, respectively.


Introduction
Injury caused by falling objects is a common concern of all industrial departments. In order to protect workers or visitors from falling objects, a safety helmet is an effective safety measure. Therefore, it is required to wear safety helmets in some construction and manufacturing industries, oil fields, refineries and chemical plants. Nowadays, there is an implementation of unified clothing management in enterprises; that is, people should wear work clothes in line with the attributes of the enterprise. Unified clothing not only makes the enterprise look more formal but can also effectively and quickly distinguish foreign personnel, which plays an important role in preventing the invasion of foreign personnel. However, due to personal reasons, some workers did not comply with the regulations. It was reported that 47.3% of the victims did not wear personal protective equipment or did not wear it correctly at the time of danger [1]. Therefore, it is necessary to detect the wearing of personal protective equipment and study the activity of workers [2].
With the rapid development of offshore petroleum production, the management of offshore drilling platforms has come into focus. The working conditions on the drilling plat-focuses on applying deep learning to solve the problem of protective equipment detection of offshore drilling platform workers. The innovations and contributions are as follows.

1.
With the complex background of offshore drilling platforms, we modify the YOLOv3 algorithm and use random erasing [22] for data augmentation to ease the problem of a lack of occluded workers. This improves the recognition accuracy for small-scale personnel and occluded personnel.

2.
We use a pose estimation algorithm to obtain the key points of the human body and locate the area of interest (head area and workwear uniform area) based on the spatial relations among the key points.

3.
A deep transfer learning method based on modified ResNet50 is introduced to train the protection equipment recognition model, which can effectively avoid the impact of network training caused by an insufficient sample size of protective equipment images.

Related Works
Safety helmet detection in surveillance videos is an important application on high-risk work sites (such as coal mines, electrical substations, and construction sites) to ensure safety protocols. Related works on safety helmet detection can be divided into two categories: feature plus classifiers/detectors and deep-learning-based methods.
In the first category, researchers utilize traditional human-selected features. In 2004, Wen et al. [12] first proposed an improved hough transform, which uses geometric features to detect whether there is a helmet in the arc set so as to recognize the helmet in the ATM monitoring system. Chiverton et al. [13] exploited a method based on Histogram of Gradients (HOG) and Support Vector Machine (SVM), which applied image recognition to helmet recognition and achieved an accuracy of 85%. Waranusat et al. [14] used a KNN to classify shape and color information extracted from images, and achieved an accuracy of 74%. Dahiya et al. [15] first detected pedestrians, then located the head area, and compared three feature descriptors of HOG, scale invariant feature transformation (SIFT), and local binary pattern (LBP) using an SVM classifier to detect the helmets of bicycle riders. The generalization ability of these methods is insufficient due to large amount of calculations, so it is difficult to apply the protective equipment detection methods to complex scenarios.
With the development of deep learning, many researchers apply deep learning methods to detect safety helmets. Bo et al. [16] used the YOLOv3 model to identify the helmets on a construction site with an accuracy rate of 95%. Fang et al. [17] proposed an on-site safety helmet detection method based on Faster R-CNN for construction automation, and solved the problem of worker identification without helmets in remote monitoring scenarios. Nath et al. [18] utilized three deep learning models for detection. The above method applies deep learning to extract features for helmet recognition, which has improved accuracy and efficiency, but the scenario is relatively simple, so they are applicable in a confined scope.
There are relatively few works on workwear uniform recognition compared with helmet detection. Most of the methods segment and recognize pedestrian clothing from street photos. The current method of identifying individual clothing is usually divided into two stages. The first stage is clothing segmentation, and the second stage is the feature recognition of the clothing area.
General-purpose image segmentation methods are usually used for clothing segmentation. Hu et al. [23] segmented the human body with the graph cuts algorithm, used face detection and skin color detection algorithms to remove the skin color area, established a Gaussian mixture model using background and foreground information, used the constrained Delaunay triangulation (CDT) algorithm to filter the noise in the model, and finally, segmented the clothing area. Gallagher et al. [24] pre-learned multiple images of a person, built a clothing model from them, and then used Graph cuts images to segment the clothing area in each image based on the clothing model.
In the second stage of the feature recognition, three global features (color, shape, and texture) are extracted. Color is the most significant feature and is also a crucial cue of the human vision system. Common color feature descriptors are color moments, color histograms, and aggregation vectors, etc. Strieker et al. [25] proposed a block color moment feature extraction method to improve feature extraction. Li et al. [26] proposed a retrieval method based on a block color histogram. Yang et al. [27] used a multi-image iterative optimization segmentation algorithm to segment the pedestrian clothing, constructed a multi-image model using the statistical information, and optimized the labeling results. Yamaguchi et al. [28] proposed a method specifically for clothing recognition in fashion photos, which recognized clothing by estimating the human pose. The above methods can achieve good recognition accuracies, but with low efficiency, and the input images must be clear with a simple background. There are also studies working on improving the accuracy of clothing recognition by introducing human models [29].

The Proposed PPED Method
Currently, there is a lack of publicly available video data sets of human activity on offshore drilling platforms. In this work, we collected our own data from a video surveillance system on an offshore drilling platform. For data augmentation, we use random erasure [22] to increase the number of samples with occlusion. We use the improved YOLOv3 to train the worker detection model. The detected candidate in the form of bounding boxes are sent into RMPE, where the human body key points are extracted, and the head region and workwear uniform region are located accordingly. Transfer learning techniques based on improved ResNet50 are used to detect the head area and the workwear uniform area to detect the protective equipment. The overall pipeline of the method proposed in this paper is illustrated in Figure 1. Based on the PPED method, the process of detecting offshore drilling platform protection equipment is as follows: (1) The improved YOLOv3 was used to detect the bounding boxes of workers in the images.   The current object detection technology can be divided into two categories: one is a two-stage detection method based on the candidate region, the other is a one-stage detection method based on regression. Generally, although the two-stage object detection algorithm has high detection accuracy, the network structure is complex and the detection speed is slow, so it is difficult to meet the real-time requirements in industry. The onestage object detection algorithm [30] can complete the task of target classification and location at one time, and the detection speed is faster, which can better meet the real-time requirements of industrial production. However, the working environment of an offshore drilling platform is complex, the scale of workers changes greatly, and there are a large number of small-scale personnel. The existing single-stage target detection algorithm can not accurately detect the target personnel.
In order to detect the workers of the offshore drilling platform accurately, we improved YOLOv3 to adapt to the environment of the offshore drilling platform. In the feature fusion stage of the consecutive layers of YOLOv3, we introduce a fusion factor [31] calculated using the proportion of the candidate number between the adjacent layers. Fusion factor is a weighted coefficient in the deeper layer when fusing the features of two adjacent layers of YOLOv3. Meanwhile, the inconsistency across different features is a primary limitation for YOLOv3. Adaptive spatial feature fusion (ASFF) [32] was introduced to improve the scale invariance of features. ASFF can reduce the cost of reasoning and makes full use of features of different scales. Figure 2 illustrates the feature fusion mechanism of the YOLOv3. YOLOv3 incorporates up-sampling, feature fusion on three scales (13 × 13, 26 × 26, and 52 × 52), and detects independently on the fusion feature map on multiple scales. Feature C 5 is processed by five convolutional layers and then followed by one convolutional layer of size 3 × 3 to compute feature P 5 . Feature C 4 is processed by a convolutional layer of size 1 × 1 and up-sampled twice, and then it is fused with the feature C 3 using concatenation. Afterward, feature P 3 is computed after a series of convolutions, and feature P 4 is calculated in the same way.

Figure 2.
Feature fusion mechanism of the YOLOv3. C 3 , C 4 and C 5 are the three different scale feature maps output by YOLOv3 through the backbone network. P 3 , P 4 and P 5 are the feature maps used for detection after feature fusion. Figure 3 shows the diagram of the improved YOLOv3 model. We introduced a fusion factor module before fusing features from two data streams. After the last 3 × 3 convolution operator, feature maps of three scales (level 1-3) are obtained. In order to make full use of all the features, ASFF is introduced for further feature fusion. ASFF is divided into two stages. The first stage is a feature resizing module, in which the features from the three streams are processed to the same resolutions and channel numbers. The second stage is an adaptive fusion module, in which the three feature maps are fused into three corresponding feature maps and fused with adaptively adjusted weights.

Fusion Factors Calculation
We use the S-α method [31] to calculate the fusion factor α as the following: where N p i+1 and N p i represent the number of detected candidates of the p i+1 and p i data streams. Suppose the P 5 is obtained by C 5 after five convolution operations, then P 4 can be obtained by fusing P 5 and C 4 in the following manner: where f conv+upsample denotes the 1 × 1 convolution operation and the 2× upsampling operation. P 3 is defined likewise.

Feature Reshaping
We donate the features calculated after the 3 × 3 convolution operation as X l for each data stream l (l ⊆ {1, 2, 3}) as shown in Figure 3. For the l-th data stream, we resize the features X k (k ⊆ {1, 2, 3} and k = l) of the other two data streams so that X k and X l are of the same size. Since the features of the three levels have different resolutions and channel numbers, we modify the up-sampling and down-sampling strategies of each data stream so that the features are consistent. For the features obtained by up-sampling, we use a 1 × 1 convolutional layer to compress feature channel dimensions of the l-th data stream and then interpolate to get higher resolutions. For down-sampling with the sampling ratio equal to 1/2, we use a 3 × 3 convolutional layer with stride equals to 2 to modify the channel dimensions and resolutions simultaneously.

Adaptive Fusion
YOLOv3 predicts on three scales respectively. Different from YOLOv3, adaptive fusion multiplies the features of different scales by the weight parameters and adds them to get three new fusion features, and forecasts on the new fusion features, making full use of the features of different scales. We assume that X n→l ij denotes the feature vector at the position (i,j) on the feature maps reshaped from the feature of the n-th data stream to have the same dimensions as those of the l-th data stream. The feature fusion of the l-th data stream is carried out as follows: where y l ij is the (i, j)-th vector of the output feature maps y l in the channel, and β l ij , γ l ij , and η l ij are the weights of the features reshaped from the three data streams to the dimensions of the l-th data stream and they are learned adaptively. Inspired by [33], we define the adaptive weight β l ij as the following: are the parameters of the softmax function after a convolutional layer with kernel size of 1 × 1. Weights γ l ij and η l ij are defined likewise. We calculate λ l β ij ,λ l γ ij and λ l η ij from X 1→l , X 2→l and X 3→l through the 1 × 1 convolution layer.
The features of each data stream are fused through adaptive feature fusion, and the output {y 1 , y 2 , y 3 } is further processed for candidate detection.

Areas of Interest Detection
For protective equipment detection on an offshore drilling platform, we incorporate pose estimation to extract the human body key points so as to further locate the head areas and workwear uniform areas accordingly.

Human Body Key Points Extraction
To deal with the problem of complex posture of offshore drilling platform workers and the occlusion problem caused by oil and gas pipelines, we use RMPE to extract the human body key points. RMPE is a top-down pose estimation algorithm and includes three components (as shown in Figure 4): symmetric spatial transformer network (SSTN), parametric pose non-maximum-suppression (NMS), and pose-guided proposals generator (PGPG). The spatial transformation network (STN) can extract high-quality single human target regions from inaccurate candidate bounding box. A single person pose estimator (SPPE) is used to estimate the person's pose skeleton from the extracted region. The spatial de-transformer network (SDTN) remaps the estimated pose to the image coordinate system. Parametric pose NMS is used to deal with the problem of repeated prediction. PGPG is used to generate images containing various poses to augment the existing training samples. We used the pre-trained object detection method to detect the workers on the offshore drilling platform. The output in the form of human bounding boxes is fed into the RMPE, and 17 human body key points are extracted. Figure 5 illustrates the 17 key points of the human body: left eye, right eye, left ear, right ear, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left hip, right hip, left knee, right knee, left ankle, and right ankle.

Head Area Detection
In this work, a seven-point localization method (SPLM) is proposed to identify the head areas of workers based on the seven human body key points. First, we connect the left shoulder with the right ear, and connect the left ear with the right shoulder, respectively. Then we compute the lengths and the intersection of the two lines and use them to determine the head position displacement. We place the origin horizontally at the intersection point and vertically at a point that has the smallest vertical coordinates among the left eye, the right eye, the nose, the left ear, and the right ear. We draw a circle centering the origin of radius r, where r = 2 3 × mlen, and mlen is the maximum length of the two connected lines. The circle denotes the head area location.
Suppose the coordinates of the left ear, right ear, the left shoulder and the right shoulder as L ear (x 1 , y 1 ), R ear (x 2 , y 2 ), L shoulder (x 3 , y 3 ), and R shoulder (x 4 , y 4 ), and the distance between the left ear and the right shoulder can be computed as the following: The line connecting the right ear and the left shoulder is denoted as: and its length is calculated as the following: The line connecting the right ear and the left shoulder is denoted as: Compute the point of intersection M( x 0 ,y 0 ) of the line L 1 and L 2 , where x 0 is the head offset and is equal to the horizontal coordinate of the center point of the head. Denote the coordinates of the left eye, the right eye, and the nose by L eye (x 5 , y 5 ), R eye (x 6 , y 6 ), and N nose (x 7 , y 7 ), respectively. The helmet should sit on the head, so we denote the highest key point by T(x 8 , y 8 ), where y 8 = min{y 1 , y 2 , y 3 , y 4 , y 5 , y 6 , y 7 }. We take the vertical coordinate y 8 of the highest key point T(x 8 , y 8 ) and assign it to the vertical coordinate of the head center. Thus, the head center point is denoted by C(x, y), where x = x 0 , and y = y 8 . We draw a circle C(x, y) centering C(x, y) of radius 2 3 max{|L ear R shoulder |, |R ear L shoulder |}, and obtain its circumscribed square S, which is the head area, as illustrated in Figure 6.

Work Wear Uniform Areas Detection
Considering the working environment of the offshore drilling platforms, this work proposes a four-point localization method (FPLM) to detect the workwear uniform areas. The workwear uniform areas are usually below the neck and above the ankle, so the workwear uniform areas are localized using four human body key points: the left shoulder, the right shoulder, the left ankle, and the right ankle. We connect the left shoulder with the left ankle, the right shoulder with the right ankle, the left ankle with the right ankle, and the left shoulder with the right shoulder to form an irregular quadrilateral bounding box. The irregular quadrilateral bounding box is transformed into a rectangular box according to the maximum edge principle. Since we only consider four key points (the left shoulder, the right shoulder, the left ankle, and the right ankle) and ignore the key points on arms, we enlarge the initially detected area by a scale factor of 1.2 to get the final workwear uniform area detection.
Denote the coordinates of the left shoulder, the right shoulder, the left ankle, and the right ankle using L shoulder (x 3 , y 3 ), R shoulder (x 4 , y 4 ), L ankle (x 9 , y 9 ), and R ankle (x 10 , y 10 ), then the distance between the left shoulder and the left ankle is calculated as the following: the distance between the right shoulder and the right ankle is calculated as the following: the distance between left shoulder and right shoulder is computed as the following: and the distance between the left ankle and the right ankle is calculated as the following: |L ankle R ankle | = (x 9 − x 10 ) 2 + (y 9 − y 10 ) 2 .
Let x = min{x 3 , x 4 , x 9 , x 10 }, and y = min{y 3 , y 4 , y 9 , y 10 }, and use T(x, y) as the initial point and expand vertically to the length of max{|L shoulder R shoulder |, |L ankle R ankle |} and horizontally to the length of max{|R shoulder R ankle |, |L shoulder L ankle |}. Expand to form a rectangle using the symmetry principle. The rectangle is the initial workwear uniforms area. We enlarge the initially detected area by a scale factor of 1.2 to get the final workwear uniform area detection, as illustrated in Figure 6.

Personal Protective Equipment Recognition
Currently, there is no publicly available data set of personal protective equipment of offshore drilling platforms. In this work, we collected our own dataset using surveillance videos of the offshore drilling platform. Each video is then used to extract a number of frames containing the staff to form the data set of protection equipment recognition. It is a tremendous amount of manual work to collect and label millions of helmet/workwear uniform images, so we use a transfer learning method based on improved ResNet50 to recognize the protective equipment. We feed the dataset to the improved ResNet50 model, employ the convolution layer and the pooling layer for feature extraction, and train the parameters of the fully connected layer, and finally implement the classification and recognition of protective equipment (as illustrated in Figure 7).
In the original network, the parameters of the full connection layer are too large, which reduces the training speed and easily leads to overfitting. Therefore, we modify the original ResNet50 and make the following improvements: the feature extraction layer of the original network is retained to extract image features, the original output layer of the network is removed, a global pooling layer and a fully connected layer are added, the output is processed by a softmax function, and the Stochastic Gradient Descent (SGD) method is used to optimize the network parameters.

Experimental Results
The experiments are conducted on a desktop computer. The configuration of the computer is specified as follows: The experiments are comprised of three parts: candidate detection, area of interest detection, and protective equipment detection. The surveillance cameras on each offshore platform are static. Real-time surveillance videos are transmitted to and stored on the streaming media server of the offshore oil platform, where training and testing data are collected.
We extract key frames [34] from the collected surveillance videos to form the ODPD containing 25,000 images. Then the images are manually annotated for candidate detection, and 22,000 target images with image labels and object locations are produced. The protective equipment detection dataset is comprised of four categories (helmet, non-helmet, workwear uniforms, and non-workwear uniforms). Each category has around 4000 images. The human body key point annotations contain the specifications of the 17 human body key points and positions of the detected human body key points for each image.

Candidate Person Detection
The workers in the offshore oil platform are small and may be occluded by pipelines, thus leading to possible missing candidate person detections. To solve this problem, we propose an improved YOLOv3 method for candidate person detection, tailored to the condition of the problem.
To verify the effectiveness of the proposed method, we conduct comparative experiments between the YOLOv3 method and the improved YOLOv3 method. The results are illustrated in Table 1. The table shows that: the proposed improved YOLOv3 method outperforms the original YOLOv3 method on the same data-scale verification experiment. When we expand the scale of the dataset, the accuracy of our method is much higher than that of the original YOLOv3 method, which shows that our method can improve the accuracy of object detection.  Figure 8 shows the detection results when the target is small. Subfigure (a) displays the detection result for the YOLOv3 method, in which the target is not detected; while subfigure (b) displays the detection result of the improved YOLOv3 method, in which the method accurately detects the target. Figure 9 shows the detection result under occlusions. Subfigure (a) displays the detection result for the YOLOv3 method, in which the occluded worker is not detected; while subfigure (b) is the detection result of the improved YOLOv3 method, in which the occluded target is successfully localized. The experimental results show that the improved YOLOv3 method can effectively deal with small targets and occluded personnel on an offshore drilling platform.

Area of Interest Detection
The detected human objects from the above-mentioned method are fed into RMPE to extract human body key points. The coordinates of the 17 human body key points (the head, the left eye, the right eye, the left ear, the right ear, the left shoulder, the right shoulder, the left elbow, the right elbow, the left wrist, the right wrist, the left hip, the right hip, the left knee, the right knee, the left ankle, and the right ankle) are computed. Then the head area is localized using the SPLM, and the workwear uniforms areas are localized by the FPLM. We utilize the Intersection-over-Union (IoU) to evaluate the effectiveness of the detection. The IoU measures the intersections of the helmet or workwear uniforms between the detection and the ground truth. If the value of the IoU is below a certain threshold, the helmet or the workwear uniform detection will be identified as "missed"; otherwise, it will be marked as "successfully detected". Usually, the threshold is set to "0.5" [35,36].
We evaluate the proposed SPLM by comparing with the safety helmet localization methods of Shen et al. [3] and Jie, L. et al. [11]. We selected 1500 images from the ODPD, containing workers with complex poses and those whose faces are occluded. The detection results are illustrated in Figure 10. We calculated the number of safety helmet bounding boxes, standard deviation, and the IoU value with the largest number of bounding boxes when the IoU ≥ 0.5 in Figure 10. Using the method by Shen et al. [3], the number of images successfully detected is 982, the probability of successfully detecting the safety helmet is 65.5%, the standard deviation is 101.4, and the IoU value with the largest number of bounding boxes is 0.5-0.6, mainly because many workers on the offshore platform are working with their backs to the camera. Using the method by Jie, L. et al. [11], the number of images successfully detected is 780, the probability of successfully detecting the safety helmet is 52%. The standard deviation is 159.2, and the IoU value with the largest number of bounding boxes is 0.5-0.6. This is due to the fact that the method [11] focuses on detecting safety helmets of workers in an electricity substation. The method only extracts one-fifth of the upper body and can only deal with an upright standing person, so it is not capable of dealing with the complex human postures on offshore platforms. Using the proposed method, the number of images successfully detected is 1209, the probability of successfully detecting the safety helmet is 80.6%, the standard deviation is 153.5, and the IoU value with the largest number of bounding boxes is 0.6-0.7. It can be seen from the results of statistical analysis that our method can better locate the safety helmet areas, and the stability of the method is satisfactory. Figure 11 displays results of the safety helmet detection. Subfigure (a), (d) and (g) show the safety helmet detection results using the method by Jie, L. et al. [11], subfigure (b), (e) and (h) show the safety helmet detection results using the method by Shen et al. [3], and subfigure (c), (f) and (i) show the safety helmet detection result using the proposed method in this work. The figure shows that in the case of an upright human, both methods from [3,11] and our method locate the safety helmet successfully, but the method by Shen et al. [3] is the most accurate. In the case of complex human posture, the method by Jie, L. et al. [11] and our method are capable of localizing the safety helmet, but the method by Jie, L. et al. can only locate a portion of the safety helmet. When the workers' faces are occluded, the method by Shen et al. [3] cannot recognize the face and locate the safety helmet. The proposed method is capable of localizing the safety helmet accurately. We selected 1000 images from the ODPD containing workers with workwear uniforms and complex poses, and then we evaluated the workwear uniform areas detection method using the proposed method. Figure 12 gives the statistical result of the IOU values of the workwear uniforms areas. We calculated the number of workwear uniforms bounding boxes, standard deviation, and the IoU value with the largest number of bounding boxes when the IoU ≥ 0.5 in Figure 12. The number of images successfully detected is 720, the probability of the workwear uniforms areas being successfully detected by the proposed method is 84.2%, the standard deviation is 78.6, and the IoU value with the largest number of bounding boxes is 0.6-0.7. It can be seen from the results of statistical analysis that our method can better locate the workwear uniforms areas, and the stability of the method is satisfactory. The proposed workwear uniforms areas detection method is tailored to the condition of the offshore oil platform, and there are no related works, so no comparative experiments are carried out.

Protective Equipment Detection
We used 8000 images for training helmet detection and workwear uniforms detection. A total of 6000 images were used for training, and 2000 images were used for testing. We further split the training set by randomly selecting 20% as the validation set and the rest as the training set and use the random gradient descent (SGD) method to optimize the network, with the momentum set to 0.9, the initial learning rate set to 0.01, and the batch number set to 4. We use flipping and zooming for data augmentation and employ normalization using ImageNet mean and standard deviation. Figure 13a visualizes the loss curve for training the helmet detection model, and Figure 13b visualizes the loss curve for training the workwear uniforms detection model. The figure shows the loss curve for the training set decreases rapidly at the beginning and converges gradually as the number of epochs further increases. The final test accuracy of the helmet detection achieves an accuracy of 94.3%, and the final workwear uniforms detection accuracy achieves an accuracy of 95.6% for the test data.

Comparison with Related Methods
Most of the existing methods only detect the safety helmet or workwear uniforms, so we carry out comparative experiments on safety helmet detection and workwear uniforms detection, respectively. We compare our method with the methods proposed by Jie, L. et al. [11], Shen et al. [3], and Park et al. [7]. Among them, the method by Jie, L. et al. [11] and the method by Shen et al. [3] only detect safety helmets, and the method by Park et al. [7] only detect workwear uniforms. A total of 2000 images were selected from the ODPD, including 1500 images conforming to safety regulations and 500 images violating. The second category has fewer images because there are relatively fewer cases violating safety equipment wearing regulations on the offshore drilling platform. In the selected images, some of the workers' bodies or faces are occluded by pipes, and some have complicated postures. The comparison results on helmet detection are illustrated in Table 2, and the comparison results on workwear uniforms detection are shown in Table 3. It can be seen from Table 2 that the proposed method achieves the highest accuracy. The method by Jie, L. et al. [11] utilize a "Visual Background extractor(ViBe)+HOG+SVM" method to detect the targets and decide whether the target is wearing a helmet according to the color feature of the head area, but factors such as the occlusions and the lighting deteriorate the performances. The method by Shen et al. [3] locates the head area based on the face while many workers on the offshore drilling platform are turning their backs to the camera or their faces are occluded, so the accuracy is much lower than the proposed method.  Table 3 shows that the accuracy of the proposed method outperforms its counterpart. The complex environment of the offshore drilling platforms results in the low accuracy of the method by Park et al. [7]. Our method is capable of localizing the safety helmet and workwear uniforms even when the workers' bodies or faces are occluded. Furthermore, the improved ResNet50 model based protective equipment detection method is not affected by lighting, so it achieves higher accuracy on the protective equipment detection of the offshore oil drilling platform.
Our method can locate the safety helmet and work clothes of the workers who are sheltered or not exposed, and the identification method of protective equipment based on improved ResNet50 is not affected by light, so it has higher accuracy in the detection of protective equipment of the offshore drilling platform.
Exemplar protective equipment detections of offshore drilling platforms are shown in Figures 14-18.  The proposed method can not only realize the detection of safety helmets but also realize the detection of workwear uniforms. Figure 15 is the detection result of protective equipment for workers with complex posture. It can be seen that our method can accurately detect the wearing condition of protective equipment for workers with complex posture. Our method can accurately identify small-scale workers and detect their wearing of protective equipment, as shown in Figure 16. Most of the workers on offshore drilling platforms are small-scale, and our method can solve this problem very well. In the offshore drilling platform, many workers are covered by dense pipelines, and the color of pipelines is close to the color of workers' workwear uniform. Figure 18 is the detection result of protective equipment when workers are covered. It can be seen from the results that the proposed method successfully detects protective equipment in the complex scenes of offshore drilling platforms.

Conclusions
We propose a novel method based on deep learning to realize intelligent detection of offshore drilling platform protection equipment. According to the symmetrical relationship between production benefit and safety management, intelligent safety management can improve the production benefit of the offshore drilling platform. We improve the feature fusion process of YOLOv3; the fusion factors are used to control the information from the deep layer to the shallow layer and make it more suitable for the detection of small objects, and the ASFF is used to make full use of the characteristics of different scales. At the same time, we use the random erasure method to increase the number of occluded workers samples, and the final worker target detection model has a high accuracy in the detection of small-scale workers and occluded workers. We fuse object detection and the RMPE method, fuse the candidate person detection and human body key point detection, and use the detected key point to locate the region of interest in the case of complex posture. We employ a convolutional neural network and transfer learning to implement the protective equipment detection. The accuracies of the safety helmet detection and the workwear uniform detection of the offshore drilling platform achieves 94.8% and 95.4%, respectively. As far as we know, this is the first attempt to use deep learning models for protective equipment detection of offshore drilling platforms. Currently, the proposed method is dedicated to safety helmet detection and workwear uniforms detection. In the future, we will improve the detection of the areas of interest and the identification of protective equipment in this paper so as to apply to more kinds of protective equipment detection according to the symmetry relationship between protective equipment, such as the detection of workers' safety belts and gloves.