FPGA-Based Linear Detection Algorithm of an Underground Inspection Robot

: Conveyor belts are key pieces of equipment for bulk material transport, and they are of great significance to ensure safe operation. With the development of belt conveyors in the direction of long distances, large volumes, high speeds, and high reliability, the use of inspection robots to perform full inspections of belt conveyors has not only improved the efficiency and scope of the inspections but has also eliminated the dependence of the traditional method on the density of sen-sor arrangement. In this paper, relying on the wireless-power-supply orbital inspection robot inde-pendently developed by the laboratory, aimed at the problem of the deviation of the belt conveyor, the methods for the diagnosis of the deviation of the conveyor belt and FPGA (field-programmable gate array) parallel computing technology are studied. Based on the traditional LSD (line segment detection) algorithm, a straight-line extraction IP core, suitable for an FPGA computing platform, was constructed. This new hardware linear detection algorithm improves the real-time performance and flexibility of the belt conveyor diagnosis mechanism.


Introduction
The belt conveyor is a piece of continuous transportation equipment with a conveyor belt; it is also seen as a carrying and traction mechanism. This type of conveyor is widely used for material transportation in mining, wharf, metallurgy, building materials, machinery, and storage industries. It has some advantages, such as strong carrying capacity, low transportation resistance, low power consumption, stable operation, and less damage to materials during transportation. Nowadays, the belt conveyor has become recognized as the most efficient and widely used continuous transportation equipment for bulk materials [1]. In the actual production process, the main fault types of belt conveyors include longitudinal tearing of the conveyor belt, transverse tearing of the conveyor belt, and deviation of the conveyor belt [2].
While the conveyor belt is driven by the motor, it also receives a lateral force perpendicular to the direction of movement. Under the action of lateral force, the conveyor belt will deviate. As a result, the friction between one side of the more idle roller and the belt will increase. This result will cause wear between them, increasing the working load at the same time. Moreover, if the conveyor belt deviates from the ideal centerline, it will also cause sprinkling and hanging. In more serious cases, the belt conveyor will partly heat up, which causes a risk of fire or belt breakage [3]. Therefore, the detection of the deviation of the belt conveyor is the key purpose of the inspection robot at work.
In existing line segment detection algorithms, due to the perfect Hough transform theory and its stable performance on low-texture images, the algorithm based on this theory has been a standard line segment detection method for many years. Additionally, it is widely used in belt deviation detection [4][5][6][7]. However, all methods based on this theory require a lot of nonlinear calculations to transfer points forward or backward to the Hough space, which consumes a lot of time and resources. Furthermore, with complex images with more textures, the higher false-detection rate brings serious problems to the subsequent video processing.
In 2008, Von Gioi et al. proposed a line segment detector (LSD) [8]. This linear-time line segment detector can get good detection results without adjusting parameters. The algorithm consists of five parts: image filtering, gradient calculation, gradient pseudosorting, region growing, rectangle approximation, and NFA calculation.
Based on previous research and relying on a wireless-power-supply inspection robot mobile platform, this paper combines this platform with FPGA, which has the two obvious advantages of powerful parallel computing and flexible configuration, to meet the needs of conveyor belt deviation fault diagnosis. This new method improves the traditional LSD algorithm and can develop a low-latency and high-precision line detection algorithm IP core. This IP core is deployed on the programmable logic (PL) end of FPGA to achieve low-latency extraction of the straight-line area of the image to be diagnosed. In this way, the purpose of improving the real-time flexibility of the belt conveyor diagnosis mechanism is achieved. Ultimately, this is to ensure the safe operation of the belt conveyor.

Previous Work
In 2008, Pinto J.K.C. et al. designed a real-time detection system for hot spots in substations [9]. The system consists of a patrol robot equipped with Wi-Fi and infrared cameras and an upper computer in the control room. The robot moves through the steel cable in the substation, and the infrared camera is positioned to different equipment detection points through the pan-tilt system. The robot communicates with the upper computer located in the control room. The upper computer controls the movement of the robot and analyzes the images collected by the robot. After that, this computer detects hot spots in the substation.
In 2008, Von Gioi et al. [8] proposed a line segment detector (LSD). The LSD algorithm merges the pixels with the same edge direction through the method of region growth to realize the detection of the line segment area. Without adjusting the parameters, good detection results could be obtained.
In 2013, Chien-Chao Tseng, Chia-Liang Lin, and others studied a surveillance and patrol robot based on SIP protocol for home security [10]. The robot is equipped with multiple sensors and a high-definition camera. The robot is activated at a preset time, regularly inspects the home space, and senses and tracks moving objects during the inspection process. Through the SIP protocol, it initiates a communication request to the mobile device that belongs to the head of the household, then establishes an audio and video stream between the robot and the mobile device.
In February 2020, the National Development and Reform Commission (China), the National Energy Administration, and another eight ministries and commissions jointly issued the "Guiding Opinions on Accelerating the Development of Intelligent Coal Mines", marking that the coal industry has entered a new stage of achieving intelligent mining. With the development of intelligence and digitization, the coal industry urgently needs to change its traditional high-intensity work methods and realize humanized or even unmanned mining as soon as possible [11]. In recent years, due to the various shortcomings of manual inspection in belt conveyors, the research on automatic inspection devices of belt conveyors has attracted more and more scholarly attention.
In 2019, Zhang Junnan from Xi'an University of Science and Technology built an orbital wireless power supply inspection robot [12]. In view of the failure of the belt conveyor roller, the audio signal of the belt conveyor is dynamically collected by the robot, and then the pulse excitation frequency in the audio signal is extracted. After that, the fault samples are classified and trained through a BP neural network model. Finally, the high-precision identification of roller failures is realized.
The rapid development of computers, sensors, and logic digital devices has promoted the popularization of visual inspection technology in various fields of automation. Because of its non-contact, intuitive, and high-precision characteristics, visual inspection technology has become one of the main research directions for fault diagnosis and status detection in the industrial field [13]. Hence, using machine vision to detect and diagnose the operating status of the belt conveyor has become an effective means.
In 2019, Li Yuhan of China University of Mining and Technology [14] designed an orbital inspection robot that carries multiple types of sensors and uploads the collected image data, temperature, humidity, and smoke density to the upper computer via Wi-Fi. And then, aiming at diagnosing conveyor belt deviation faults using the upper computer, she designed a method of deviation diagnosis based on deep learning.
In 2019, Wang Lei [15] used Intel's FPGA as a computing chip and used a combination of coordinate statistics and grouping approximation methods to identify the lateral and torsional deviations of a conveyor belt, which achieved high detection accuracy and real-time performance. However, the accuracy of this method depends on the installation accuracy of the camera; other objects in the camera's field of view will also interfere with the detection result.
In 2019, Lin Jun [16] combined the linear detection software algorithm and the ROI extraction conveyor belt edge extraction method; however, the method relies on the Opencv image processing function library of the PC or MATLAB toolbox for algorithm design and research, which makes the real-time performance of the algorithm suffer. It is restricted and does not apply to mobile platforms, and it is difficult to achieve system integration.
Scholars from all over the world have done a lot of research on automatic inspection systems and the real-time status detection of belt conveyors. In recent years, most of the research on belt conveyor deviation has focused on image software processing algorithms. It is difficult to meet the real-time performance of the computing platform for video stream processing, resulting in a high false detection rate and serious problems for subsequent video processing.

Hardware Environment
This article selects Xilinx XC7Z020-2CLG484I SoC according to its needs, and its resource overview is shown in Figure 1: SoC is composed of the upper part of the processor system (PS) and the lower part of the programmable logic (PL) [17]. The PS side contains a dual-core ARM Cortex-A9 processor, a memory interface, and common I/O peripherals: USB, GigE, UART, I2C, and SPI. The PL end is a standard FPGA chip. The communication interface between PS and PL can be divided into general-purpose ports (GP) and high-performance ports (HP), according to the transmission rate. The GP port has two pairs of master and slave interfaces. The HP port only uses the PL side as the master and the PS side as the slave. Generally, it directly accesses the memory or on-chip memory.

IP Core Construction of the Line Segment Detection Algorithm
Based on previous studies, this paper builds on the overall structure of the IP core of the line segment detection algorithm, in which region growth is the key and difficult point of the IP. As shown in Figure 2, the IP core of the line segment detection algorithm first uses the image preprocessing algorithm for feature enhancement and smoothing and then detects the edge information of a single pixel in the image through the improved Canny operator. Finally, it achieves line segment detection by merging the edges with similar edge directions through region growth.

Image Preprocessing Module Design
The method of image preprocessing is divided into histogram equalization and Gaussian filtering.
Histogram equalization can be divided into two parts: statistics and mapping [18]. Since statistics work must wait until the image has been processed, this means that the processing method will lose the advantage of pipeline processing, which would vastly reduce the processing speed. Because the inspection robots do not have sudden changes in the lighting environment during its operation, the method of counting the grayscale histogram of this frame can be used to map the next frame. Using the histogram equalization IP core and the software algorithm designed based on this method, the processing result of the working image of the underground belt conveyor is shown in Figure 3. Compared with the previous pure-software algorithm, the method proposed in this paper only needs two clock cycles to get a result and greatly saves logic resources, but it will cause the mapped image to lose 2 gray levels, that is, the pixel gray value mapping at level 254. From the histograms, the loss of 2 gray levels can be ignored. The discrete mathematical expression of Gaussian filtering is: In Equation (1): -Smoothing coefficient; i-the abscissa of the Gaussian template; j-the ordinate of the Gaussian template; r-the radius of the Gaussian template.
When using FPGA to implement Gaussian filtering, in order to save logic resources and improve processing speed, fixed-point calculations are used instead of floating-point calculations and shifts are used instead of divisions. In this paper, = 1, the template size is 5 × 5, and the coefficient in the template is the theoretical value multiplied by 256. After the calculation is completed, the effect of the coefficient is eliminated by shifting to the right by 8 bits. Due to the central symmetry of the Gaussian template, in order to improve the utilization of logic resources, the convolution process is divided into row convolution and column convolution. The constructed logic circuit is shown in Figure 4.  The processing results of the Gaussian filtering hardware module constructed in this paper and the Gaussian filtering implemented by the software on the working image of the underground belt conveyor are shown in Figure 5. As the algorithm IP core does not copy the image frame pixels during the processing, the results will appear as black edges (image reduction), and the convolution result in Gaussian filtering will be truncated to an integer; the error can be ignored.

Edge Detection Algorithm Module Design
The edge detection IP core consists of four parts, namely, the gradient information calculation module, the non-maximum value judgment module, the threshold segmentation, and the edge link module [19].
Since the goal of the algorithm is to detect line segments, which are included in the edge part of the image, where the gray level changes sharply, enhancing the edge of the image is an indispensable step for subsequent line segment detection. In order to detect the correct single-pixel edge, in addition to obtaining the accurate gradient value, the gradient information calculation module also needs to solve the maximum value of the gradient, the gradient quadrant, and the edge direction angle (level line angle (LLA)). The calculation architecture of the overall algorithm module is shown in Figure 6. Additionally, the calculation method is shown in Figure 7. Since the offset of the conveyor belt will not be very large, we only pay attention to π/4~3π/4 when grouping the edge direction angles. As shown in Figure 8, a pyramid configuration is used to group LLAs. First, 3-bit data are used to discretely divide the interval (π/4~3π/4) into 8 groups according to the unit of π/8. Secondly, use the 2-bit grouping value to merge the 8 groups of 3-bit LLA. This grouping method sets the tolerance angle of line segment growth to 22.5 degrees. In addition, a 1 bit is used to mark the angle of the effective interval (π/4~3π/4). When the flag bit is 0, it means a valid grouping. When the flag bit is 1, it means an invalid group that exceeds the valid interval. The flag bit participates in the calculation of subsequent edge suppression. At the same time, in order to avoid the problem of missed detection of the line segment at the edge of the group, the original grouping method is offset by π/16 for regrouping to ensure that the angle on the frame of one grouping method is at the center of the other grouping method. The output of this grouping method is a 3-bit grouping value, and each potential edge point has two grouping results, which will be calculated at the same time in the subsequent area growth module.  Although this method limits the possibility of growth to a certain extent, in the implementation process, multiplication and comparison are used to replace the arc-tangent function solving process, which saves logic resources. Because the two-digit grouping result is conducive to the judgment operation of subsequent modules, the method reduces the difficulty of algorithm placement and routing.
A larger gradient value in a gradient image does not mean that the pixel is an edge point but only shows that the pixel has a higher potential. The non-maximum value judgment determines the single edge property of edge detection, which is particularly important for further line segment detection. This module performs 3 × 3 windowing on the result of gradient calculation, uses interpolation to calculate the potential maximum value, and then determines whether the center pixel is the local gradient maximum value and realizes the removal of false edges.
In addition to eliminating false edges, the edge detection module must also ensure the continuity of the edges. In addition to the result of judging the non-maximum value, whether a pixel can be defined as an edge is still affected by the gradient in the field. In this module, dual thresholds are used to link potential edges and suppress false edges for the results of non-maximum suppression. When the threshold is manually specified, there will always be edge loss caused by too-large thresholds or false edges caused by too-small thresholds. Then, use the maximum between-class variance method, i.e., Otsu's method, to adaptively determine the high and low thresholds and use this to perform threshold segmentation and edge linking. This method ensures that the divided two classes have the largest inter-class variance, which means that they have the smallest probability of misclassification. The calculation circuit diagram of this algorithm is shown in Figure 9. The clock cost of the inter-class variance calculation is 6, and the threshold calculation needs to traverse all gradient levels. Therefore, if the OTSU threshold of this frame of image is used as the basis for gradient image segmentation, an additional 1082 beats of delay will occur. In the inspection process, the edges and textures of the working image of the belt conveyor will not have a sudden short-term change, so the threshold value obtained by the gradient histogram of the previous frame is used as the basis for the segmentation of this frame. In summary, two-level first-in-first-out (FIFO) buffers, combined with registers, are used in FPGA to construct a 3 × 3 convolution window for the result of non-maximum suppression, and then the threshold determined by the OTSU algorithm is used as the high threshold, and one-half of it is used as the low threshold. Threshold segmentation and edge linking are based on three judgments: 1. When the central element is greater than the high threshold, it is judged as an edge; 2. When the central element is less than the low threshold, it is judged as non-edge; 3. When the central element is between the two thresholds, if there are elements higher than the high threshold in the neighborhood, it is judged as an edge.
Moreover, the pixels judged to be edges also need to satisfy the LLA in the interval [π/4, 3π/4].

Region Growth Algorithm Module Design
The regional growth module is the difficult key point of the IP core construction of the line segment detection algorithm. The edge information of the running image of the belt conveyor and the corresponding LLA are obtained through the preprocessing and edge detection of the previous stage [20]. The line segments that meet the requirements are detected from the edge image. In the area growth module, the scanning direction of the convolution window must be consistent with the direction of line segment growth. As shown in Figure 10, the convolution window constructed in FPGA always scans the image from left to right and top to bottom. When the direction of the line segment is consistent with the scanning direction, it can be detected, but the line segment on the right in Figure  10 is from right to left in the top-down direction. In other words, the line segment cannot be detected if it is opposite the scanning direction. Therefore, a mirroring module is arranged before the edge detection of the region-growing module, and the region-growing module will obtain the line segment detection result of the whole image by processing the edge image and the mirrored edge image. The implementation logic of the mirroring module is first-in-last-out. While the topdown scanning direction does not change, the order of elements in a row is reversed. The mirror module, constructed by simple dual-port RAM, will inevitably introduce a clock delay of one row of elements. Figure 11 shows the processing path from the edge to the line segment information. After the processing of the previous module, the edge image (Gra) and the grouping value of edge direction angle (GAP) are obtained. On the one hand, Gra and GAP directly enter the area growth module to calculate the line segment information. On the other hand, Gra and GAP firstly pass through the mirror module and then enter the regional growth module. Since there are two LLA grouping methods, the area growth operation must be performed at the same time for different grouping values; that is, the area growth module will be called four times in the line segment detection IP core.
First-in-lastout control logic Simple double mouth Ram Mirror module Regional growth Regional growth

Line area information
Edge direction angle grouping (gap_lla1) Figure 11. Processing flow of regional growth module.
The region growth algorithm has three basic elements: a seed selection method, a similarity criterion, and a termination growth criterion. In this paper, combining the characteristics of FPGA processing image data, a region-growing method using linearly increasing column numbers as virtual seeds is proposed [21]. This method does not use pixels with a certain characteristic in the image as seeds, but when the upstream module sequentially outputs row pixels, the increasing number of columns is used as the virtual seed. Based on the line segment area information stored in the virtual seed, the GAP in the heterogeneous convolution window is combined to realize the growth of the area and the update of the line segment information. In the gradient information calculation module, two-digit values are used to group LLAs. In the region growth module, the equality of GAP values is considered the similarity rule. The tolerance angle of line segment growth is π/8. Hence, when the difference in LLA values is within π/8, it is considered that the two points are aligned. In a digital image, a line segment is a collection of aligned pixels (close to LLA). The principle of stopping the growth will be described later in conjunction with the heterogeneous convolution window.
First, establish a heterogeneous convolution window. Since the growth of the region requires the combination of the seed point and the edge direction angle, this window is used to traverse the gradient image. This is shown in Figure 12. After completing the establishment of a convolution heterogeneous window, the method of updating regional information is studied. The update method of the area information depends on three growth methods: initialization, horizontal growth, and vertical growth. The three growth modes are determined by the alignment of the central element and the seed. As shown in Figure 13, when the gradient of the central element point Gra = 1, growth is triggered. However, if Gra = 0, the growth is waiting to be triggered at Gra In this design, the growth possibility of the three directions is analyzed to judge continued growth, and then NOR operations on the three judgments are performed. Finally, the judgment condition for the termination of the growth is constructed. The right side of Figure 13 shows the method for judging the termination of growth in the area update.   At the same time, for the correctness of the region growth algorithm, the design of RAM control logic is also important. As shown in Figure 14, port A is the write port, which is responsible for the initialization of RAM and the update of line segment information. The B port is the read port. Since the updated region information of the seed point will participate in the next growth during the calculation process, the read data will participate in the construction of the heterogeneous convolution window. In order to ensure the correctness of subsequent calculations, the RAM is initialized before the area growth; that is, the initialization data is written through Port A. The timing of the key signals is shown in Figure 15. The data update of area growth is determined by the growth mode and the judgment of the termination of growth. The timing diagram of key signals is shown in Figure 17. When the growth mode is vertical growth or initialization, the write enable of the A port is pulled high by one beat and the write data is determined by the growth termination signal. As the growth mode is horizontal, when the termination of growth is true, initialize J−1. When the termination of growth is false, it is necessary to initialize the address J−1 while updating the address J.

Experimental Verification
In order to prove the effectiveness of the construction algorithm, a noise-free handdrawn image, as shown in Figure 18a, is used. The circle in the image is divided by line segments with an interval of π/8. Figure 18b is the image edge detected by the Canny software operator, and Figure 18c is the processing result of the hardware algorithm constructed in the previous section [22]. The algorithm detects the edge while suppressing the edge pixels located in the LLA interval. Figure 18d,e are the line segments detected by the original LSD algorithm. Figure 18f,g are the processing results of the algorithm IP core constructed in this chapter. The pixel threshold of the algorithm in Figure 18d,f is set to 30; that is, the detected line segment contains at least 30 pixels. The pixel threshold of the algorithm in Figure 18e,g is set to 50; that is, the detected line segment contains at least 50 pixels. By comparison, it can be found that the original LSD algorithm has a higher detection rate for the line segment area in the circle, and the two have similar detection results for the line segment area inside the circle. In order to verify the stability of the construction algorithm, the top plan view of the belt conveyor operation collected by the IMX222 camera was used for testing. Figure 19a is the collected test image, Figure 19b is the converted gray image, and Figure 19c,d are the edge detection results before and after angle suppression, which is implemented by the hardware of the edge detection algorithm. Respectively, Figure 19e,f are the line segment information detected by the original LSD algorithm and the line segment information detected by the hardware algorithm IP core. Comparing both of them, the hardware algorithm constructed in this article has a good application effect in real images with noise. It can accurately detect line segment information with an edge direction angle from 45° to 135°. However, due to the fixed-point number calculation method and pipelined processing architecture, relative to the original LSD algorithm, the detected line segment information has certain errors. The line detection algorithm constructed in this paper is based on the XC7Z020-2CLG484I FPGA platform [23], and it is analyzed, synthesized, placed, and routed through the Vivado synthesis tool. Table 1 shows the resource consumption of the constructed algorithm on the FPGA platform. A 100 MHz clock was used during the test, and the algorithm clock delay was analyzed after the synthesis was completed, as shown in Figure 20 [24]. According to the analysis of this table, the calculation method of time latency (TL) and the frame rate N of the line segment detection IP core can be obtained, as shown in Equations (2) and (3).
In these two formulas, W and H, respectively, represent the width and height of the image, and C is the clock frequency of the module application. Taking an image with a resolution of 1280 × 720 as an example, the TL of the hardware algorithm is 90 μs. According to Formula 18, the frame rate of the algorithm can be calculated to be 108 FPS (frames per second); the average processing time of one frame is 9.2168 ms. Table 2 shows the time cost comparison between the original LSD algorithm and the IP core of the line segment detection algorithm. The CPU model running the original LSD algorithm is AMD Ryzen 9 3950X. Since the hardware algorithm applied on the FPGA has the characteristics of pipeline parallel processing, its real-time performance, compared with the original LSD algorithm, is greatly improved.

Conclusions
In this paper, based on the LSD algorithm, a line segment detection algorithm IP core suitable for an FPGA platform is constructed for the needs of the conveyor belt deviation fault diagnosis. The algorithm module integrates four sub-modules: image preprocessing, gradient information calculation, edge detection, and region growth. After experimental verification, the hardware algorithm constructed in this paper is shown to accurately detect the line segment area, with angles from 45° to 135° in the image, and has good stability. Moreover, the logic resource occupied by the IP core is moderate. Taking the clock frequency of 100 MHZ as an example, the processing frame rate of the algorithm can reach 108 FPS, which meets the requirements of real-time processing.