A Study of an Online Tracking System for Spark Images of Abrasive Belt-Polishing Workpieces

During the manual grinding of blades, the workers can estimate the material removal rate based on their experiences from observing the characteristics of the grinding sparks, leading to low grinding accuracy and low efficiency and affecting the processing quality of the blades. As an alternative to the recognition of spark images by the human eye, we used the deep learning algorithm YOLO5 to perform target detection on spark images and obtain spark image regions. First the spark images generated during one turbine blade-grinding process were collected, and some of the images were selected as training samples, with the remaining images used as test samples, which were labelled with LabelImg. Afterwards, the selected images were trained with YOLO5 to obtain an optimisation model. In the end, the trained optimisation model was used to predict the images of the test set. The proposed method was able to detect spark image regions quickly and accurately, with an average accuracy of 0.995. YOLO4 was also used to train and predict spark images, and the two methods were compared. Our findings show that YOLO5 is faster and more accurate than the YOLO4 target detection algorithm and can replace manual observation, laying a specific foundation for the automatic segmentation of spark images and the study of the relationship between the material removal rate and spark images at a later stage, which has some practical value.


Introduction
Turbine compressors have been widely used in petrochemical, metallurgical, and aviation fields. The blade is a crucial part of turbine power systems with complex structures, and its machining quality directly affects the operational performance and operating efficiency of the turbine power system. Many turbine blades are currently machined by manual grinding. When grinding, workers observe the characteristics of the grinding sparks and rely on experience to judge the material removal rate and control the processing of the blades. Experienced workers can machine a workpiece very quickly. However, this method is challenging to pass on, and master workers with extensive blade machining experience tend to be older. Additionally, due to poor working conditions, most young people are reluctant to take up this work.
Many scholars have studied these problems in order to solve them and achieve automatic control of the blade machining process. Qi, J.D. et al. [1] proposed a method for monitoring the material removal rate of abrasive belt grinding based on an improved convolutional neural network (CNN). A multisensor fusion grinding system for sound and vibration was established. Pandiyan, V. et al. [2] introduced a tool condition monitoring and prediction system and developed an abrasive belt wear prediction model based on a genetic algorithm (GA) and support vector machine (SVM). Pandiyan V et al. [3] reported that AI algorithms are not fully applied in abrasive processing and prediction for online monitoring and modelling of research trends. Gao, K. et al. [4] proposed an acoustic sensing In this study, the goal was to build a real-time spark image tracking and detection system to accurately detect the spark image area in a complex environment and quickly identify spark images. The ultimate goal was to establish a spark image and material removal rate prediction model that can realize automatic processing control of the workpiece, which is an area requiring further research.
While Ren LJ and Wang Nina have studied spark images during abrasive belt grinding, online tracking and real-time processing of the abrasive belt grinding process have not been achieved. Accordingly, the aim of this study was the online tracking and detection of spark images using the current state-of-the-art YOLO5 algorithm to quickly and accurately identify and detect spark images. The remainder of this paper is organised as follows. Section 2 describes the experimental setup process and briefly introduces the proposed spark image acquisition method for the abrasive belt grinding process. Section 3 introduces the preprocessing and labelling of the images, and Section 4 describes the main methods used. Section 5 presents the experiments, the training and testing results of the model, and a comparison of the results obtained by different algorithms. In Section 6, we present our conclusions and possible directions for future work.

Belt Grinding Mechanism
An experimental platform was built to enable the study of spark images, as shown in Figure 1, consisting mainly of a three-axis machine tool, two high-speed CCD cameras and two computers. As shown in Figure 1, the workpiece selects GCr15 with a hardness of HRC58 and dimensions of 170 mm × 41 mm × 50 mm. The average roughness of GCr15 is 0.2 µm. The belt is tightly mounted on the Z-axis through the drive pulley, tensioning pulley, and contact pulley. With a motor speed of 0-5000 rpm, the belt is driven at a speed of 0-34 m/s. The contact wheel is rubber with a Shore A hardness of 85. The belt is made of corundum and has a width of 20 mm. Rotating at high speed cuts the workpiece better and generates a spark field. involved organic machine tools, workpieces, and other parts that were difficult to distinguish and identify. (4) The spark image was processed after complete processing was completed, and a real-time detection system for the spark image could not be established. In this study, the goal was to build a real-time spark image tracking and detection system to accurately detect the spark image area in a complex environment and quickly identify spark images. The ultimate goal was to establish a spark image and material removal rate prediction model that can realize automatic processing control of the workpiece, which is an area requiring further research.
While Ren LJ and Wang Nina have studied spark images during abrasive belt grinding, online tracking and real-time processing of the abrasive belt grinding process have not been achieved. Accordingly, the aim of this study was the online tracking and detection of spark images using the current state-of-the-art YOLO5 algorithm to quickly and accurately identify and detect spark images. The remainder of this paper is organised as follows. Section 2 describes the experimental setup process and briefly introduces the proposed spark image acquisition method for the abrasive belt grinding process. Section 3 introduces the preprocessing and labelling of the images, and Section 4 describes the main methods used. Section 5 presents the experiments, the training and testing results of the model, and a comparison of the results obtained by different algorithms. In Section 6, we present our conclusions and possible directions for future work.

Belt Grinding Mechanism
An experimental platform was built to enable the study of spark images, as shown in Figure 1, consisting mainly of a three-axis machine tool, two high-speed CCD cameras and two computers. As shown in Figure 1, the workpiece selects GCr15 with a hardness of HRC58 and dimensions of 170 mm × 41 mm × 50 mm. The average roughness of GCr15 is 0.2 μm. The belt is tightly mounted on the Z-axis through the drive pulley, tensioning pulley, and contact pulley. With a motor speed of 0-5000 rpm, the belt is driven at a speed of 0-34 m/s. The contact wheel is rubber with a Shore A hardness of 85. The belt is made of corundum and has a width of 20 mm. Rotating at high speed cuts the workpiece better and generates a spark field.  A Beckhoff CX5130 embedded controller was selected as the experimental platform; the specific parameters are shown in Table 1. The controller was preinstalled with Windows operating system using the EtherCAT bus communication protocol. The drive motor of the abrasive belt machine was a Y803-2 three-phase asynchronous motor with a rated speed of 2800 r/min. A Sunye brand CM800 vector frequency converter was selected to control the motor speed.
The spark image acquisition device used a MT-E200GC-T CMOS industrial camera; its specific parameters are shown in Table 2. During the acquisition process, the CMOS camera was connected to the computer through the USB bus, and Mindvision software, which is capable of collecting images in real time, was used for image acquisition. In this experiment, the image acquisition frequency was set to 0.01 s, so 100 spark images were collected every 1 s. A hood was used during image acquisition in order to reduce the interference of other light in the experiment.
GCr15 with a hardness of HRC58 was used as the experimental specimen. A 60# brown corundum abrasive belt with a width of 20 mm was selected as the grinding abrasive belt in this study. A square workpiece of GCr15 material with a hardness of HRC58 and dimensions of 150 mm × 41 mm × 50 mm was used. The chemical composition is listed in Table 3. In order to establish a spark-tracking model based on grinding, the method of controlling a single variable was used to change the material removal rate in the experiment. The belt speed range was 20 m/s-45 m/s, with increments of 0.25 m/s; a total of 100 sets of data were collected.
The experimental workpiece was fixed on the Y axis of the machine tool by two mounting holes. After cutting, the workpiece was fed in the Y direction according to the grinding parameters set in the experiment. Spark image collection started after the abrasive belt contacted the workpiece to generate sparks, ending after a grinding stroke ended. Each test piece was ground for 5 strokes, and the length of each stroke was 41 mm.

The Mechanism of Spark Generation
During the grinding process, due to the high-speed rotation of the abrasive belt, the abrasive grains on the belt cut the workpiece under pressure, and the cutting fragments on the workpiece are thrown out along the tangential direction of the contact wheel. As the debris carries a lot of heat, it generates sparks when it meets the air.
In order to better process and study the relationship between spark images and the material removal rate, we set up two industrial CCD cameras directly above and to the side of the spark field, which were able to capture complete spark images at a frame rate of 100 Hz. PC1 saved and recorded the frontal spark images, and PC2 recorded and saved the side spark images. Figure 2a shows a side spark image acquired during the grinding process. Figure 2b shows a frontal spark image taken during the process. A total of 300 frontal spark images and 92 side spark images were collected during the complete grinding of a workpiece. grinding parameters set in the experiment. Spark image collection started after the abrasive belt contacted the workpiece to generate sparks, ending after a grinding stroke ended. Each test piece was ground for 5 strokes, and the length of each stroke was 41 mm.

The Mechanism of Spark Generation
During the grinding process, due to the high-speed rotation of the abrasive belt, the abrasive grains on the belt cut the workpiece under pressure, and the cutting fragments on the workpiece are thrown out along the tangential direction of the contact wheel. As the debris carries a lot of heat, it generates sparks when it meets the air.
In order to better process and study the relationship between spark images and the material removal rate, we set up two industrial CCD cameras directly above and to the side of the spark field, which were able to capture complete spark images at a frame rate of 100 Hz. PC1 saved and recorded the frontal spark images, and PC2 recorded and saved the side spark images. Figure 2a shows a side spark image acquired during the grinding process. Figure 2b shows a frontal spark image taken during the process. A total of 300 frontal spark images and 92 side spark images were collected during the complete grinding of a workpiece.

Image Preprocessing
The images captured by the CCD camera have dimension of 1600 × 1200 pixels, which far too many pixels, with a lot of useless background information. The images were preprocessed by cropping and scaling to retain useful spark information. The software processed all the images in the folder and converted them to 150 × 230 images, as shown in Figure 3. This preprocessing greatly increased the speed and efficiency of YOLO5 training.

Image Preprocessing
The images captured by the CCD camera have dimension of 1600 × 1200 pixels, which far too many pixels, with a lot of useless background information. The images were preprocessed by cropping and scaling to retain useful spark information. The software processed all the images in the folder and converted them to 150 × 230 images, as shown in Figure 3. This preprocessing greatly increased the speed and efficiency of YOLO5 training.
sive belt contacted the workpiece to generate sparks, ending after a grinding stroke ended. Each test piece was ground for 5 strokes, and the length of each stroke was 41 mm.

The Mechanism of Spark Generation
During the grinding process, due to the high-speed rotation of the abrasive belt, the abrasive grains on the belt cut the workpiece under pressure, and the cutting fragments on the workpiece are thrown out along the tangential direction of the contact wheel. As the debris carries a lot of heat, it generates sparks when it meets the air.
In order to better process and study the relationship between spark images and the material removal rate, we set up two industrial CCD cameras directly above and to the side of the spark field, which were able to capture complete spark images at a frame rate of 100 Hz. PC1 saved and recorded the frontal spark images, and PC2 recorded and saved the side spark images. Figure 2a shows a side spark image acquired during the grinding process. Figure 2b shows a frontal spark image taken during the process. A total of 300 frontal spark images and 92 side spark images were collected during the complete grinding of a workpiece.

Image Preprocessing
The images captured by the CCD camera have dimension of 1600 × 1200 pixels, which far too many pixels, with a lot of useless background information. The images were preprocessed by cropping and scaling to retain useful spark information. The software processed all the images in the folder and converted them to 150 × 230 images, as shown in Figure 3. This preprocessing greatly increased the speed and efficiency of YOLO5 training.

Annotation of Images
Prior to training the images with YOLO5, the images should first be labelled with LabelImg. Because we were only detecting one target, we chose a "fire" label, as shown in Figure 4. After labelling all the images, YOLO5 can be used for training. We divided the training and test sets by placing the annotated frontal and side spark images in separate folders. The labelled image files were generated, along with the corresponding annotation files.
Prior to training the images with YOLO5, the images should first be labelled with LabelImg. Because we were only detecting one target, we chose a "fire" label, as shown in Figure 4. After labelling all the images, YOLO5 can be used for training. We divided the training and test sets by placing the annotated frontal and side spark images in separate folders. The labelled image files were generated, along with the corresponding annotation files.

Overall Block Diagram
An overall flow chart of spark image processing is shown in Figure 5. First, an experimental platform was set up, as shown in Figure 1, and data acquisition was carried out for the frontal and side spark images. Then, the spark images were preprocessed. The target detection area was obtained by annotation with LabelImg. The spark images were divided into a training set and a test set, with 90% of images assigned to the training set and 10% assigned to the test set and trained with YOLO5 and YOLO4, respectively.

Overall Block Diagram
An overall flow chart of spark image processing is shown in Figure 5. First, an experimental platform was set up, as shown in Figure 1, and data acquisition was carried out for the frontal and side spark images. Then, the spark images were preprocessed. The target detection area was obtained by annotation with LabelImg. The spark images were divided into a training set and a test set, with 90% of images assigned to the training set and 10% assigned to the test set and trained with YOLO5 and YOLO4, respectively.
Prior to training the images with YOLO5, the images should first be labelled with LabelImg. Because we were only detecting one target, we chose a "fire" label, as shown in Figure 4. After labelling all the images, YOLO5 can be used for training. We divided the training and test sets by placing the annotated frontal and side spark images in separate folders. The labelled image files were generated, along with the corresponding annotation files.

Overall Block Diagram
An overall flow chart of spark image processing is shown in Figure 5. First, an experimental platform was set up, as shown in Figure 1, and data acquisition was carried out for the frontal and side spark images. Then, the spark images were preprocessed. The target detection area was obtained by annotation with LabelImg. The spark images were divided into a training set and a test set, with 90% of images assigned to the training set and 10% assigned to the test set and trained with YOLO5 and YOLO4, respectively.  For the trials, a YOLOv5s-based deep learning network was used for detection of spark images, and 300 rounds of training were carried out on frontal spark images and side spark images to obtain the corresponding optimal models.
After completing the training with YOLOv5s, we retrained with YOLO4 to obtain the optimal models for each of the two images.
Lastly, the obtained models were validated separately using images from the test set to compare the network performance and verify the validity and real-time performance of the models.

YOLO5 Model
Since 2016, the You Only Look Once (YOLO) algorithm has passed through five generations; its most notable feature is its speed, which makes it particularly suitable for real-time target detection. YOLO5 is only 27 MB in size, while the YOLO4 model using the Darknet architecture is 244 MB in size, demonstrating that YOLO5 is nearly 90% smaller than YOLO4. Furthermore YOLO5 is the fastest version of this model, has a very lightweight model size, and is comparable to the YOLO4 benchmark in terms of accuracy.
The YOLO5 model consists of four versions-YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x-with successively higher model parameters and performance. YOLO5 maintains the network structure of input, backbone, neck, and head outputs, as shown in Figure 6. YOLOv5s is the smallest model in the YOLO5 family, with a width and depth of 0.33 and 0.5, respectively, while YOLOv5m, YOLOv5l, and YOLOv5x are based on the YOLOv5s model. The YOLO5 model maintains the mosaic data enhancement method of YOLO4, which randomly crops and stitches four images into one image as training data so that the input side can obtain information from four images at the same time, which enriches the image background information on the one hand and reduces the model's reliance on batch size on the other hand. As an alternative, an adaptive anchor frame is model. The YOLO5 model maintains the mosaic data enhancement method of YOLO4, which randomly crops and stitches four images into one image as training data so that the input side can obtain information from four images at the same time, which enriches the image background information on the one hand and reduces the model's reliance on batch size on the other hand. As an alternative, an adaptive anchor frame is proposed to calculate the optimal anchor frame value according to the difference relative to the training set.
YOLOv5s consists of four parts: input, backbone, neck, and output. On the input side, the image is automatically scaled, mosaic data enhancement is performed, and the best anchor frame value is automatically calculated. The other three parts are shown in Figure 6. The main modules shown in Figure 6 are described below.

Focus Module
As shown in Figure 7, this module slices the image, expands the input channel by a factor of four, and convolves it once to obtain a downsampled feature map, reducing computational effort and increasing speed.  Taking YOLOv5s as an example, the original 640 × 640 × 3 image is fed into t structure, sliced into a 320 × 320 × 12 feature map, and convolved once to a 320 × feature map. The slicing operation is illustrated in Figure 8. As shown in Figure 8, the image is sliced before entering the backbone. The operation obtains a value for every other pixel in a picture, which was similar to downsampling, so that four pictures are obtained. The four images are complem and no information is lost. In this way, W and H information are concentrated in t nel space, and the input channel is expanded by 4 times. Compared with the origi 3-channel mode, the spliced pictures have 12 channels. Finally, the newly obtaine is subjected to a convolution operation, and a double downsampling feature map information loss is obtained.

BottlenetCSP Module
The BottleneckCSP adopts the CSPDenseNet structure, as shown in Figure 9. ing to the idea of cross-layer connectivity, partial cross-layer connections are ma features from different layers are fused to obtain a richer feature map, both of w crease the depth of the network and save computational effort. Taking YOLOv5s as an example, the original 640 × 640 × 3 image is fed into the focus structure, sliced into a 320 × 320 × 12 feature map, and convolved once to a 320 × 320 × 32 feature map. The slicing operation is illustrated in Figure 8.  Taking YOLOv5s as an example, the original 640 × 640 × 3 image is fed into t structure, sliced into a 320 × 320 × 12 feature map, and convolved once to a 320 × feature map. The slicing operation is illustrated in Figure 8. As shown in Figure 8, the image is sliced before entering the backbone. The operation obtains a value for every other pixel in a picture, which was similar to downsampling, so that four pictures are obtained. The four images are complem and no information is lost. In this way, W and H information are concentrated in t nel space, and the input channel is expanded by 4 times. Compared with the origi 3-channel mode, the spliced pictures have 12 channels. Finally, the newly obtaine is subjected to a convolution operation, and a double downsampling feature map information loss is obtained.

BottlenetCSP Module
The BottleneckCSP adopts the CSPDenseNet structure, as shown in Figure 9. ing to the idea of cross-layer connectivity, partial cross-layer connections are ma features from different layers are fused to obtain a richer feature map, both of w crease the depth of the network and save computational effort. As shown in Figure 8, the image is sliced before entering the backbone. The specific operation obtains a value for every other pixel in a picture, which was similar to adjacent downsampling, so that four pictures are obtained. The four images are complementary, and no information is lost. In this way, W and H information are concentrated in the channel space, and the input channel is expanded by 4 times. Compared with the original RGB 3-channel mode, the spliced pictures have 12 channels. Finally, the newly obtained image is subjected to a convolution operation, and a double downsampling feature map without information loss is obtained.

BottlenetCSP Module
The BottleneckCSP adopts the CSPDenseNet structure, as shown in Figure 9. According to the idea of cross-layer connectivity, partial cross-layer connections are made, and features from different layers are fused to obtain a richer feature map, both of which increase the depth of the network and save computational effort.

BottlenetCSP Module
The BottleneckCSP adopts the CSPDenseNet structure, as shown in Figure  ing to the idea of cross-layer connectivity, partial cross-layer connections are features from different layers are fused to obtain a richer feature map, both o crease the depth of the network and save computational effort.

SPP Module
The Spatial Pyramid Pooling Network (SPP-Net, a spatial pyramid pooling structure) was proposed by He et al. SPP-Net performs only one convolution operation on the image and uses the corresponding candidate box regions in the feature map by using pooling kernels of different sizes, as shown in Figure 10. Three pooling sizes are used, i.e., the feature map is divided into 44 and 22 with 11 sizes, and the feature map is taken in each grid using maximum pooling. With the introduction of the SPP module in YOLO5, the model can be trained on images of different sizes, enhancing the network's generalisation capability.
Sensors 2023, 23, x FOR PEER REVIEW and uses the corresponding candidate box regions in the feature map by using kernels of different sizes, as shown in Figure 10. Three pooling sizes are used, feature map is divided into 44 and 22 with 11 sizes, and the feature map is taken grid using maximum pooling. With the introduction of the SPP module in YOL model can be trained on images of different sizes, enhancing the network's genera capability. SPP processing can effectively increase the receptive field and separate sig contextual features without losing the original detection speed.

Output
Instead of YOLOv3's IOU_Loss, the output layer uses GIOU_Loss as a loss f adding a measure of intersection scale and alleviating the inability of IOU_Loss mise for cases in which two boxes do not intersect [22,23].
In comparison with IOU, GIOU solves the problem of non-differentiability of function when the prediction frame does not intersect with the target frame (IOU the other hand, when the two prediction frames are the same size and have the sam the IOU loss function cannot distinguish between the intersection of the two pr frames, which GIOU alleviates. The GIOU algorithm is as follows Algorithm 1 [2 Algorithm 1: Generalised Intersection over Union [24]  SPP processing can effectively increase the receptive field and separate significant contextual features without losing the original detection speed.

Output
Instead of YOLOv3's IOU_Loss, the output layer uses GIOU_Loss as a loss function, adding a measure of intersection scale and alleviating the inability of IOU_Loss to optimise for cases in which two boxes do not intersect [22,23].
In comparison with IOU, GIOU solves the problem of non-differentiability of the loss function when the prediction frame does not intersect with the target frame (IOU = 0); on the other hand, when the two prediction frames are the same size and have the same IOU, the IOU loss function cannot distinguish between the intersection of the two prediction frames, which GIOU alleviates. The GIOU algorithm is as follows Algorithm 1 [24]. Algorithm 1: Generalised Intersection over Union [24] input: Two arbitrary convex shapes: A, B ⊆ S ∈ Rn output: GIoU 1 For A and B, find the smallest enclosing convex object C, where C ⊆ S ∈ R n IoU = |A∩B| |A∪B| (1)

Datasets
In this paper, we focus on the spark images generated during the grinding process, which can be divided into axonometric and frontal images. For target detection, we used rectangular boxes to mark the spark images, with only one object, i.e., "fire".
During a complete workpiece polishing process, 300 frontal and 92 lateral spark images were collected. The resolution of the spark images captured by the HD industrial camera was 1600 × 1200. We divided the front and side spark image datasets into a training set and a test set, with 90% of the images were used for training and 10% used for testing [25,26].

(a) Training environment setup
The batch size was set to 32, with 300 epochs and an input image resolution of 640 × 640. Other parameter settings were consistent with the default settings of YOLOv5 [27]. The computer configuration used in the experiment is shown in Table 4. The image resolution in the training set was 640 × 640. The number of "epochs" was set to 300, "batch size" was set to 16, momentum" was set to 0.98 to reduce the oscillation of the gradient descent, and the learning rate was set to 0.01. The other parameters were set to default values in YOLO5.

(c) YOLO5 Testing parameter settings
The image resolution in the testing set was 640 × 640. The IOU threshold for nonmaximum suppression was set to 0.6. Other parameter settings were consistent with the default YOLO5 settings [28,29].

Training and Analysis of Results
YOLO5 RTX3060 required 38 min to train 300 frontal spark images with 300 epochs and only 15 min to train 92 side spark images, also with 300 epochs, which is very fast. The final trained images are shown in Figures 11 and 12.  Sensors 2023, 23, x FOR PEER REVIEW 12 of 20 Figure 11. YOLO5 side spark image training results. Figure 11. YOLO5 side spark image training results.
After 300 rounds of training, the accuracy curve, PR curve, and LOSS curve of the frontal spark images were obtained, as shown in Figures 13 and 14.
The loss is divided into three parts in the training process: cls_loss, box_loss, and obj_loss. cls_loss is used to supervise the category classification, box_loss is used to supervise the regression of the detection box, and obj_loss is used to supervise the presence or absence of objects in the grid [30,31].
The same method was used to train the side spark images, and their corresponding accuracy, PR, and LOSS curves were also obtained, as shown in Figures 15 and 16.
As shown by the above curves, the accuracy of the frontal spark images after training was up to 0.995 when mAP@0.5. After training, the side spark images, when mAP@0.5, the accuracy was up to 0.995. After 300 rounds of training, cls_loss, box_loss, and obj_loss all dropped below 0.01 according to the curves presented in Figure 16.

Forecasting and Analysis of Results
Once the training was completed, the optimal model was obtained, which was used to predict the images of the test set, the results of which are shown in Figure 17.
As shown in Figure 17, when the optimal model generated by YOLO5 training was used to predict spark images, the accuracy of its target detection reached over 0.96, and the detection of a single image took 2 s. After 300 rounds of training, the accuracy curve, PR curve, and LOSS curve of the frontal spark images were obtained, as shown in Figures 13 and 14.  The loss is divided into three parts in the training process: cls_loss, box_loss, obj_loss. cls_loss is used to supervise the category classification, box_loss is used to pervise the regression of the detection box, and obj_loss is used to supervise the prese or absence of objects in the grid [30,31].
The same method was used to train the side spark images, and their correspond accuracy, PR, and LOSS curves were also obtained, as shown in Figures 15 and 16.  The loss is divided into three parts in the training process: cls_loss, box_loss, and obj_loss. cls_loss is used to supervise the category classification, box_loss is used to supervise the regression of the detection box, and obj_loss is used to supervise the presence or absence of objects in the grid [30,31].
The same method was used to train the side spark images, and their corresponding accuracy, PR, and LOSS curves were also obtained, as shown in Figures 15 and 16.   As shown by the above curves, the accuracy of the frontal spark images after training was up to 0.995 when mAP@0.5. After training, the side spark images, when mAP@0.5, the accuracy was up to 0.995. After 300 rounds of training, cls_loss, box_loss, and obj_loss all dropped below 0.01 according to the curves presented in Figure 16.

Forecasting and Analysis of Results
Once the training was completed, the optimal model was obtained, which was used to predict the images of the test set, the results of which are shown in Figure 17. As shown in Figure 17, when the optimal model generated by YOLO5 training was used to predict spark images, the accuracy of its target detection reached over 0.96, and the detection of a single image took 2 s.

YOLO4 Training and Prediction
For comparison of the advanced and fast YOLO5 detection algorithm, we also trained and predicted spark images with YOLO4. The corresponding computer software and hardware configurations are shown in Table 5.

YOLO4 Training and Prediction
For comparison of the advanced and fast YOLO5 detection algorithm, we also trained and predicted spark images with YOLO4. The corresponding computer software and hardware configurations are shown in Table 5. The hardware configuration of the computers used for training and testing in YOLO4 is essentially the same as in YOLO5. The software configuration is slightly different; Cuda version 11.6 was used with the corresponding Cudnn version 8.4 and Darknet as the deep learning framework [32,33].

YOLO4 Training and Analysis of Results
It took close to 6 h for YOLO4 RTX3060 to train 300 frontal spark images with dimensions of 150 × 230 for a total of 4000 iterations [34]. Ninety-two side spark images were successfully trained by YOLO4 RTX3060 for a total of 4000 iterations, also taking close to 6 h. The final trained curves are shown in Figure 18. In the curve presented in Figure 18, that the average loss is 0.348. After the training was completed, the optimised model was obtained, which we used to make predictions about the spark images, the results of which are shown in Figure 19. In the curve presented in Figure 18, that the average loss is 0.348. After the training was completed, the optimised model was obtained, which we used to make predictions about the spark images, the results of which are shown in Figure 19.
A single spark image prediction took 5 s.

Discussion
After YOLO5 and YOLO4 training, we compared their performance metrics. The frontal and lateral sparks were treated separately for the training and test sets, and for performing metric comparison, we took the average value. The obtained results are shown in Table 6. In the curve presented in Figure 18, that the average loss is 0.348. After the training was completed, the optimised model was obtained, which we used to make predictions about the spark images, the results of which are shown in Figure 19. A single spark image prediction took 5 s.

Discussion
After YOLO5 and YOLO4 training, we compared their performance metrics. The frontal and lateral sparks were treated separately for the training and test sets, and for  Compared to YOLO4, YOLO5 training is faster, with fewer rounds and higher accuracy. The optimised model generated after training is smaller in size and takes less time to predict, making it more suitable for real-time target detection than YOLO4. With more higher performance computer hardware configurations, online tracking and detection of spark images can be achieved.

Conclusions
In this paper, YOLO5 and YOLO4 were used to perform target detection on images of sparks generated during abrasive belt polishing of workpieces. The following conclusions can be drawn from the research process described above.
(1) YOLO5 was used for spark recognition in this study. The optimal model obtained after training is able to track the spark image area quickly and accurately, and with a more higher performancecomputer hardware configuration, even faster spark image recognition and detection can be achieved. (2) Compared to YOLO4, the YOLO5 model has the advantage of high detection accuracy and high interference immunity. It can achieve good recognition under natural conditions, such as backlight or a dim machine tool processing environment, and can accurately identify and locate the spark image target.
(3) The small size of the YOLO5 model has better potential for portability than YOLO4, and with a more higher performance computer hardware configuration, the speed of target detection can reach the ms level, which is sufficient for real-time tracking of spark images. This work lays the foundation for future research on the automatic segmentation of spark images and the relationship between material removal rate and spark images.
Further research will be carried out in the future, mainly to investigate the following aspects: (1) Segmentation of a complete spark image from the spark image area detected by YOLO5.
(2) Investigation of the relationship between the material removal rate and spark images; (3) Establishment of a prediction model accounting for the relationship between the spark image and the material removal rate to realize automatic control of the grinding process.

Conflicts of Interest:
The authors declare no conflict of interests.