A Foreign Object Detection Method for Belt Conveyors Based on an Improved YOLOX Model

: As one of the main pieces of equipment in coal transportation, the belt conveyor with its detection system is an important area of research for the development of intelligent mines. Occurrences of non-coal foreign objects making contact with belts are common in complex production environments and with improper human operation. In order to avoid major safety accidents caused by scratches, deviation, and the breakage of belts, a foreign object detection method is proposed for belt conveyors in this work. Firstly, a foreign object image dataset is collected and established, and an IAT image enhancement module and an a tt ention mechanism for CBAM are introduced to enhance the image data sample. Moreover, to predict the angle information of foreign objects with large aspect ratios, a rotating decoupling head is designed and a MO-YOLOX network structure is constructed. Some experiments are carried out with the belt conveyor in the mine’s intelligent mining equipment laboratory, and di ﬀ erent foreign objects are analyzed. The experimental results show that the accuracy, recall, and mAP 50 of the proposed rotating frame foreign object detection method reach 93.87%, 93.69%, and 93.68%, respectively, and the average inference time for foreign object detection is 25 ms.


Introduction
With the development of intelligent mines and the improvement of machine vision technology, real-time detection systems for belt conveyors have become an important research topic in recent years [1].The transportation belt plays a pivotal role and is prone to severe accidents such as deviation, slipping, belt breakage, and longitudinal belt tearing during production [2].An in-depth analysis revealed that non-coal foreign objects entering the belt conveyor system are accountable for 61% of belt tearing and breakage incidents, amounting to a total of 21 cases [3].The accurate and rapid identification of foreign objects in the belt transportation system, followed by removal, can substantially mitigate damage to the belt system and ensure the safe and stable operation of the belt transportation system [4].
Intelligent detection for the production of safe underground coal mines has become a hot research topic [5,6].By utilizing video surveillance images, combined with image processing and machine-vision-related technologies, the theory of mine image monitoring has been applied to multiple aspects of automatic safety detection in coal mines, such as the automatic identification of spontaneous combustion fires [7], coal production monitoring [8], face detection and recognition methods for underground miners [9], and the automatic recognition of coal rock interfaces in coal faces [10].However, traditional belt conveyor foreign object systems still rely on cameras to transmit collected video data to the central control room, where the staff can monitor the coal transportation area and the surrounding environment in real time.This practice is associated with significant drawbacks, including the duplication of work and fatigue-induced misjudgments.At the same time, staff cannot address foreign objects in a timely manner, which can easily cause foreign objects to block the transportation belt or sharp parts to scratch the belt, resulting in belt tearing and causing major safety accidents.
Nowadays, many systems have been proposed for the detection of foreign objects on the belt, and some results have been achieved.With an understanding of the characteristics of remote sensing targets with different dimensions, including their dense distribution and complex background, Xu et al. [11] applied YOLO-V3 to the field of remote sensing to detect remote sensing targets at different scales.The lightweight, fast Illumination Adaptive Transformer (IAT) was proposed by Cui et al. [12] to restore a normally lit sRGB image from either low-light or under-exposed conditions.Wang et al. [13] introduced the attention mechanism into YOLO-V5 to detect Solanum rostratum Dunal seeds, and it was found that the CBAM attention mechanism can effectively improve accuracy during model recognition.However, for the underground environment of a coal mine, there is no special model for the detection of foreign bodies on a coal flow belt, and the performance of other models still needs to be further improved.
In this work, a foreign object detection method for belt conveyors is proposed, and the remainder of this paper is organized as follows.In Section 2, the status and deficiencies of current foreign object detection methods are introduced.In Section 3, the improved algorithm and model architecture are proposed.In Section 4, comparative experiments are carried out on the proposed model, and the effectiveness of the improved algorithm is verified using a self-made foreign object dataset.The conclusions and future works are summarized in Section 5.

Foreign Object Detection Methods Based on Image Processing
The foreign object detection methods for belt conveyors based on images work with images of coal and non-coal foreign objects by obtaining the shallow or deep abstract features of the object, and use image processing to detect foreign objects.This approach has the advantages of simple installation and maintenance processes and low application costs, and has become one of the research focuses of foreign object detection for belt conveyors.Due to the diversity of the types of foreign objects (such as anchor rods and wood) causing belt tearing, many scholars have begun to extract the color, texture, shape, spatial relationship, and other features of foreign objects from the image features, achieving the automatic detection of foreign objects through image processing.Jiang et al. [14] used extreme median filtering to perform image noise processing and improve the traditional Canny edge detection algorithm to obtain an improved edge detection method for a Canny operator.The algorithm is used to perform image edge detection, and the image gray histogram is used for enhanced foreign object image processing.Zhang et al. [15] proposed a new image segmentation algorithm for belt conveying.A multi-scale linear filter composed of a Hessian matrix and Gaussian function forms the core of the algorithm, which can effectively obtain the edge intensity image, form a good seed area for watershed segmentation, and segment the background between the coal pile and foreign objects.Saran et al. [16] developed an image-processing-based foreign object detection solution to detect foreign objects such as concrete boulders and iron bars that often occur in the conveyor belts used for G furnace raw coal.The solution uses a multimode imaging (polarization camera)-based system to distinguish foreign objects.Tu et al. [17] proposed a new moving target detection method to solve the difficulties caused by the intermittent motions, temperatures, and dynamic background sequences of moving targets.By further comparing the similarities of edge images, ghosts and real static objects can be classified.Lins et al. [18] developed a system based on the concept of machine vision, which aims to realize the automation of the crack measurement process.Using the above method, a series of images can be processed and the crack size can be estimated as long as a camera is installed on a truck or robot.

Foreign Object Detection Method Based on Deep Learning
With the rapid development of deep learning, using this data learning method to learn image data features and perceive the surrounding environment has good research value in foreign object detection, to obtain a foreign object detection model that is more adaptable to a complex and changeable environment [19,20].Deep learning is achieved by establishing and simulating the information processing neural structure of the human brain to extract low-level to high-level features from external input data, enabling machines to understand the learning data and obtain useful information [21].Pu et al. [22] used CNN to identify coal and gangue images and to help separate coal and gangue, and introduced transfer learning to solve the problems of massive trainable parameters and limited computing power faced by the model.In order to apply CNN to the field of target detection, Ren et al. [23] put forward the RCNN method, which uses a selective search to obtain pre-selected regions, and completes image recognition through CNN combined with SVM.Because the multi-stage implementation of the algorithm led to its huge time cost, Girshick et al. [24] further put forward the concept of a ROI (region of interest) pooling layer, and replaced SVM with fully connected neural network, and proposed the Fast-RCNN algorithm.In order to solve the problem of foreign objects on the belt conveyor in the coal mine damaging the belt conveyor, Wang et al. [25] proposed a video detection method of foreign objects on the surface of the belt conveyor based on SSD.Firstly, the deep separable convolution method was adopted to reduce the number of parameters of the SSD algorithm and improve the calculation speed.Then, the GIOU loss function was used to replace the position loss function in the original SSD, which improves detection accuracy.Finally, the extraction position of the feature map and the proportion of the default frame were optimized, which improves the detection accuracy.Considering the fast-running speed of the belt and the influence of background and light source on foreign object targets, Ma et al. [26] proposed an improved Center-Net algorithm, which improved detection efficiency.The normalization method was optimized to reduce computer memory consumption, and a weighted feature fusion method was added to fully utilize the features of each layer, improving detection accuracy.In the experimental environment, the average detection rate was about 20fps, and met the demand for the real-time detection of foreign objects.Xiao et al. [27] used a median filtering method to preprocess images with foreign objects, removed the influence of dust, improved the clarity of ore edges, and established a dataset to train the YOLOv3 belt foreign object detection algorithm.Finally, after sparse training based on the BN layer, the YOLOv3 model was lightweight, and its parameters were fine tuned.Compared with the original YOLOv3 model, the model achieved smaller calculations, faster processing, and a smaller size.

Discussion
However, although many approaches for detecting foreign objects have been developed in the above literature, some common disadvantages of them are summarized as follows.Firstly, due to the specific coal mine environment, with a lot of dust, noise, and a complex background for foreign objects, it is difficult to achieve accurate detection of foreign objects on the belt conveyor.Therefore, the general target detection algorithms cannot be easily migrated to the coal mine environment.At the same time, the robustness of the traditional foreign object detection algorithms is poor, and the extraction of foreign object features requires a wealth of experience.Finally, the current public foreign object detection dataset lacks coal-mine-belt foreign object detection data, so it cannot flexibly adapt to different scenarios in different mining areas.
In this paper, the image dataset of belt foreign bodies in the coal mine environment is collected and established, and a target detection algorithm based on improved YOLO-V5 is used to detect non-coal foreign bodies on the coal belt.

Target Detection of YOLO Model
The YOLO series target detection algorithm is a supervised learning target detection algorithm [28].Its basic principle is to divide the input image into several grids, then extract the features of each part of the image through the convolutional neural network, and finally output the predicted bounding box, which is the center coordinates of the predicted object; the length and width of the detected object; and the confidence of the object category.
As shown in Figure 1, the input foreign object image is divided into S × S squares, and features are extracted from each grid through the convolutional neural network, then, features are fused and analyzed to output the confidence degree of the foreign object target, the boundary box coordinate information, and the foreign object category.In order to improve the accuracy of foreign object detection, a fixed number of anchor boxes are used for each grid to assist in learning position information.Clustering analysis is performed on the known labels of the target detection object in the image to obtain the initial size of the anchor box.The framework of the YOLO series detection models has always been composed of three parts: the backbone feature extraction network, feature fusion layer, and detection decoupling head, as shown in Figure 2. The feature extraction network mainly extracts features from the input image data, then the feature fusion layer fuses the lowdimensional and high-dimensional features of the image to provide richer image information.Finally, the detection decoupling head outputs and predicts the position and category information of objects of interest.The YOLO series of algorithms all use a three branch detection head algorithm to predict objects of different scales, such as large, medium, and small.In the actual target detection process, directly predicting the central coordinates, width, and height of the bounding box will result in too large a solution space for the predicted target, which will seriously waste computing resources.Therefore, an anchor frame mechanism is designed to accelerate the convergence of the model and improve the target detection accuracy, and the prediction principle of the bounding box is shown in Figure 3. where pw and ph are the width and height of the anchor frame; bw and bh are the width and height of the prediction box; tx and ty are the offset from the anchor frame to the center of the prediction box; cx and cy are the coordinates of the upper left corner of the bounding box; σ() is the normalized function.

Established Foreign Object Image Dataset
At present, the publicly available large-scale datasets do not include non-coal foreign objects.Therefore, it is necessary to establish an actual foreign object engineering dataset for belt conveyors to solve the problem of foreign object detection in practical engineering.This self-made dataset is named the belt conveyor foreign object detection dataset, and the sample categories of the dataset mainly include the following three types of foreign objects: iron, wood, and large gangue, as shown in Figure 4.This work selected laboratory and belt-conveyor work scenarios for foreign object image collection.At the same time, foreign object image datasets were captured under different natural light conditions, and directional foreign objects such as iron and wood were offset to increase the information of image angles.In the laboratory environment, for the same foreign object, a foreign object dataset can be established that includes images of areas without coal flow, areas with coal flow, and areas obstructed by coal flow.Photos of foreign objects in different directions were collected in the laboratory environment to increase the diversity of foreign object dataset samples, as shown in Figure 5.In order to ensure the diversity of perspective in the collected dataset and better simulate the different shooting angles of cameras installed in actual working conditions, the top view images were collected by using a DJI drone with a pan tilt camera.The heights from the ground during the collection weare 1 m, 2 m, and 4 m, respectively, to ensure the diversity of perspective in the collected data, as shown in Figure 6.In order to improve the robustness and generalization of the model, the Mosaic multisamples data augmentation method proposed by YOLOv4 was adopted.During the training process, four images in the training set were randomly selected, and the images were randomly scaled, cropped, and arranged for image combination.The sample size of the images during the training process was expanded, as shown in Figure 7.As shown in Figure 8, the images were expanded by means of horizontal flipping, random occlusion, random scaling, motion blur, random scaling and filling, and salt and pepper noise.A total of 1105 foreign object image datasets were collected for the belt conveyor foreign object image dataset, including 303 large gangue datasets, 401 iron tools datasets, 301 wood datasets, and 100 mixed target images.The belt conveyor foreign object image dataset was labeled with horizontal and rotating boxes, and the horizontal and rotating box foreign object detection datasets were constructed.Finally, the dataset was expanded to 8100 datasets through geometric expansion, and a complete dataset of foreign object images for belt conveyors had been constructed.

Improved Depthwise Separable Convolution Block
Depthwise separable convolution breaks down the operations of standard convolution into depthwise convolution and point by point convolution [29].Depthwise convolution performs separate spatial convolutions on each input channel, while point by point convolution combines the convolution results of each channel, which can greatly reduce the size and complexity of the model while maintaining high accuracy.The specific operations are shown in Figure 9. Assuming that the input image size is 640 × 640 × 3 and the expected output size is 640 × 640 × 4, and the ordinary convolution uses a convolution kernel of 3 × 3 × 3 × 4, then the parameter stc of the ordinary convolution is: Depthwise separable convolution is used for depth-by-depth convolution, and then point-by-point convolution of the channel relationship is carried out.First, convolution is performed by depth, and the number of parameters is as follows: Then, through the point-by-point convolution operation, the total number of parameters for the depth-separable convolution is: The ratio of the number of parameters for the two convolution operations is: Using deep separable convolution for convolution operations can effectively reduce the number of parameters in the model, ensuring the feature extraction ability of the convolution and facilitating the light weight of the model.In addition, the Hard-Swish activation function was selected as the activation function of the belt conveyor foreign object detection model, as shown in Figure 10. ( 3)/6 The basic module of the improved foreign object detection model is shown in Figure 11.Replacing the ordinary convolution at the end of the merge channel in the CSP1_ X and CSP2_X module with depthwise separatable convolution reduces the number of parameters in the convolution process and accelerates the inference speed of the model.

IAT Image Enhancement Module
In order to ensure the end-to-end output characteristics of deep learning, the IAT image enhancement module [30] is introduced to achieve image enhancement, and the network structure is shown in Figure 12.The color matrix in the IAT architecture represents the pixel weight weighted by a self-attention mechanism, in which the different colors are used to distinguish different patches from the original image.The IAT module can enhance the brightness of the image, restore the relevant details, improve the image quality, reduce the noise, and enhance the image contrast.At the same time, the objective evaluation index Peak Signal to Noise Ratio (PSNR) for image enhancement is used as the specific evaluation index for image enhancement, and the formula is as follows: where X (i, j) are the pixel values of the original image, Y (i, j) are the pixel values of the enhanced image, and H and W are the length and width of the image, respectively.

Improved CBAM Attention Block
CBAM [31] is a convolutional neural network module based on an attention mechanism, which is used to improve the overall performance of the model.Its essence is to inhibit the expression of redundant features by increasing the weight of non-redundant features.It is composed of a channel attention module (CAM) and a spatial attention module (SAM), and the specific network structure is shown in Figure 13.Therefore, in order to suppress redundant features and obtain attention feature maps that pay more attention to channels and spaces, the CBAM attention mechanism was introduced into the network structure, and the specific location of the addition is shown in Figure 14.

Designed Rotating Decoupling Head and MO-YOLOX Network
The detection boxes of the YOLO series object detection algorithms are all horizontal boxes, which is not conducive to the detection of foreign objects with diverse distribution directions such as ironware.Therefore, angle regression prediction was added to the head network of YOLOX, and a branch decoupling head based on the angle regression was constructed to accurately locate directional foreign objects.The structure of the rotary decoupling head is shown in Figure 15, where CBS*2 is an acronym for having two CBS modules.The overall network structure of MO-YOLOX is shown in Figure 16.

Experimental Platform
The proposed MO-YOLOX network model was trained in the GPU environment, and the environment configuration is shown in Table 1 below.

Experimental Comparisons
In order to verify the comprehensive performance of the proposed MO-YOLOX, comparison experiments of horizontal target detection and rotating target detection were carried out on the PASCALVOC dataset and the DOTA dataset, respectively.The PASCALVOC dataset is marked with horizontal boxes, including Bird, Dog, Cat, Person, Soft, Car, Bottle, and House.There are 312 image data, including a total of 1623 objects.The mainstream horizontal object detection models YOLOX-small and SSD300 [32] were selected for the comparative experiments.The experimental results are shown in Table 2, where excellent results are shown in bold.In addition, in order to verify the effectiveness of MO-YOLOX in rotating target detection, the DOTA dataset marked with the rotating frame was selected, including plane (PL), ship (SH), large vehicle (LV), harbor (HA), small vehicle (SV), and baseball diamond (BD).Mainstream rotating object detection models, such as S2A-Net [33] and CFA [34], were selected for comparative experiments.The experimental results are shown in Table 3.
Compared with YOLOX-Small, the detection accuracy and reasoning speed of the MO-YOLOX target detection model with its attention mechanism and depthwise separable convolution are better than the original model.The average detection accuracy of the proposed model in the VOC test data set is higher than that of the original YOLOXsmall and SSD300 model, and its detection accuracy in the DOTA data set is the same as that of S2ANet and CFA, but the reasoning time of MO-YOLOX is better than the above comparison algorithms.Therefore, the proposed foreign body detection network can meet the requirements of both detection accuracy and reasoning speed in the target detection task.5 and 6.From the experimental results, it can be seen that the performance of the proposed foreign object detection model of the belt conveyor is superior to similar mainstream algorithms on both horizontal foreign object datasets and rotating foreign object datasets.Specifically, when the target foreign object and the coal mine stone have obvious differences in shape and texture, such as large gangue, the proposed horizontal frame foreign object detection model has very excellent performance.However, for slender ironware, the detection effect of the proposed model is slightly poor.It is exciting that the proposed rotating frame foreign body detection model has good detection sensitivity for targets with large length and width, such as slender iron bars, and the cost of angle prediction is a 5.7 ms increase in the reasoning time.There is an obvious difference in length and width between iron and wood, and the proposed model can effectively predict the angle of a foreign body.However, with the irregular gangue, its characteristics are quite different, and the angle information of the data label is irregular, which causes great difficulties in the angle regression prediction of the network.In addition, in the case of slight occlusion from the coal background, both horizontal frame foreign body detection and rotating foreign body detection can accurately detect foreign bodies and determine the types of foreign bodies.Therefore, the experimental results show that the proposed model meets the design requirements.

Conclusions and Future Works
In this paper, a foreign object image dataset for the belt conveyor is collected and established, and the IAT image enhancement module and CBAM attention mechanism are introduced.Secondly, a novel rotating decoupling head is designed to predict the angle information of foreign objects, and a MO-YOLOX network structure is constructed.The experimental results show that the proposed algorithm has a performance of 71.9% and 73.2% on the VOC and DOTA test datasets, respectively, with an average inference time of around 26 ms, which can meet the requirements of real-time inference.Ten-fold crossvalidation is conducted on the self-built foreign object dataset of the belt conveyor, and the accuracy, recall, and mAP 50 of horizontal frame foreign object detection are 94.05%, 94.25%, and 94.01%, respectively.Moreover, the accuracy, recall, and mAP 50 of the rotating frame foreign object detection reaches 93.87%, 93.69%, and 93.68%, and the average inference time of foreign object detection is 25 ms.
However, the proposed foreign object detection method for belt conveyors we have designed has not yet considered embedded deployment as part of the industrial experiment.In the future, further research is needed on the pruning optimization of the model and embedded deployment.

Figure 3 .
Figure 3.The prediction principle of YOLO's bounding box.

Figure 4 .
Figure 4. Different kinds of foreign object samples.

Figure 5 .
Figure 5. Single foreign material data samples.

Figure 10 .
Figure 10.Hard-Swish activation function and its derivative.The Hard-Swish activation function is a smooth function with no upper bound or lower bound.The activation function makes the model non-linear, which can effectively reduce the calculation cost in the embedded environment, and the expression is as follows: 0 3 Hard-Swish(x) 3

Figure 14 .
Figure 14.Improved addition location of CBAM in the network.

Figure 18 .
Figure 18.Confusion matrix results of the self-made dataset.

Figure 19 .
Figure 19.Cross-validation results of foreign object detection models.It can be seen from Figures 20 and 21 that the proposed foreign object detection model can effectively detect foreign objects in the case of background coal flow.The rectangle in the figures is the target result predicted by the foreign object detection model, and different colors represent different categories.In Figure 21, the predicted angle information is represented by the long side of the rotating rectangular box, with angle values of 36.8, −30.3, and 65.1, which can verify the effectiveness of the rotation decoupling head in angle regression prediction.Figures 22 and 23 show the results of foreign object detection under coal-flow occlusion and the multi-angle detection results of the same foreign object, respectively.The proposed model can locate the foreign object in the image more accurately, and the performance indicators of the foreign object detection model are shown in Tables5 and 6.

Figure 20 .
Figure 20.Test results of the foreign object detection of horizontal frames.

Figure 21 .Figure 22 .
Figure 21.Test results of the foreign object detection of rotating frames.

Figure 23 .
Figure 23.Multi-angle foreign object detection with the same foreign object sample.

Table 1 .
Model training environment configuration.

Table 2 .
VOC dataset detection accuracy test results.

Table 3 .
DOTA dataset detection accuracy test results.

Table 5 .
MO-YOLOX horizontal foreign object detection performance index parameters.

Table 6 .
MO-YOLOX rotating frame foreign object detection performance index parameters.