An Object Detection Model for Paint Surface Detection Based on Improved YOLOv3

: To solve the problem of poor performance of the target detection algorithm and false detection in the detection of paint surface defects of ofﬁce chairs ﬁve-star feet, we propose a defect detection method based on the improved YOLOv3 algorithm. Firstly, a new feature fusion structure is designed to reduce the missed detection rate of small targets. Then we used the CIOU loss function to improve the positioning accuracy. At the same time, a parallel version of the k-means++ initialization algorithm (K-means||) is used to optimize and determine the parameters of the a priori anchor so as to improve the matching degree between the a priori anchor and the feature layer. We constructed a dataset of paint surface defects on the ﬁve-star feet of ofﬁce chairs and performed optimization training, and used multiple algorithms and different datasets to conduct comparative experiments to validate the algorithm. The experimental results show that the improved YOLOv3 algorithm is effective in that the average precision on the self-made dataset reaches 88.3%, which is 5.8% higher than the original algorithm. At the same time, it has also been veriﬁed based on the Aliyun Tianchi competition aluminum dataset, and the average precision has reached 89.2%. This method realizes the real-time detection of the paint surface defects of the ﬁve-star feet of the ofﬁce chair very well.


Introduction
In the field of industrial production, with the development of information technology, the application of big data in industrial manufacturing has gradually become an essential method of intelligent production. In the office chair manufacturing industry, the appearance quality of the office chair has a significant impact on the sales of its products. The paint defect detection of the five-star feet is an essential part of improving the overall appearance quality of the office chair, so the paint surface of the five-star feet needs to be inspected. In detecting paint surface defects of office chair five-star feet, the traditional detection method relies on artificial eyes to execute. It can only be judged by existing standards and common sense, which requires workers to have sufficient experience and common sense. It requires long-term concentration and is prone to fatigue, resulting in misjudgment. Therefore, the efficiency of manual detection is very low, and the cost will increase in the long run.
Traditional target detection methods mainly use manual extraction of signs and then detection by sliding window. It primarily consists of region selection, feature extraction, and classifier. However, it has apparent shortcomings: in natural scenes, it is often difficult to extract features due to factors such as occlusion and distance, the region selection strategy based on the sliding window is not targeted, the amount of calculation is large, the time complexity is high, the window is redundant, it does not have good robustness, and there are often missed detections and false detections during detection.
With the development of computer technology, deep learning, and the rapid improvement of GPU computing power, it is possible to apply deep learning-based defect detection methods in industrial manufacturing. The task of object detection is to find all objects of interest in an image and determine their location, size, and category information. With the rise of deep learning, the extracted deep features have more powerful representation capabilities than traditional handcrafted features. Target detection algorithms based on deep learning have gradually become the mainstream target detection algorithms. In addition, it can be divided into two categories, the first category is the two-stage target detection algorithm, and the representative algorithm is R-CNN (region-convolution neural network) [1,2], Fast R-CNN [3], Faster R-CNN [4,5], etc. These algorithms have low recognition error and miss recognition rates, but the detection speed is slow and cannot be performed in real-time detection. The second category is single-stage object detection algorithms [6], which can also be called end-to-end object detection algorithms. It does not require the stage of generating candidate regions. Still, it directly generates the object category's probability and coordinate position values, and the final result can be obtained in a single detection. Therefore, the detection speed of this type of algorithm is faster, and the representative algorithms are SSD (Single Shot MultiBox Detector) [7,8], YOLO (You Only Look Once) [9,10], YOLOv2 [11], YOLOv3 [12][13][14][15][16][17], YOLOv4 [18], etc. Due to the structure of the YOLOv3 algorithms being more concise than the alternatives, it is more widely used in the industry. Although the detection performance of YOLOv3 is not as good as that of YOLOv4 [19][20][21], its transmission path is simple, and its versatility is strong. Therefore, the YOLOV3 algorithm is selected as the basis for the research method of the paint surface defect detection of the five-star feet of the office chair in this paper.
Although the target detection algorithm has developed a lot, there are still some problems to be solved, such as the research on small target detection not being mature, the resolution of small targets being low, and the proportion of pixels being small. The resolution of small targets is low, and the effective information that can be obtained during the target detection process is small. Due to the large deep receptive field in the convolutional neural network, it is difficult to extract the features of small targets after multiple down sampling. Therefore, the target detection algorithm is still poor for small-sized target recognition, and detection errors often occur [22,23].
In order to solve the problem of small target detection, Zhang Xu [24] and others proposed to add an anchor in each scale of YOLOv3 to improve the detection accuracy of small targets. Li Weigang [25] and others proposed to fuse shallow features and indepth features to form a new large-scale detection layer, use a new clustering algorithm to optimize and determine a priori frame parameters, and finally achieve the purpose of improving detection accuracy. Xu Lifeng [26] and others proposed to build a feature pyramid structure in dense blocks of different levels of DenseNet, combining the high resolution of low-level features and the high semantics of high-level features and introducing soft threshold non-maximum suppression to improve the detection rate and accuracy. In the current research, it is usually considered an optimization and improvement to the YOLOv3 algorithm when combining new structures or functions.
On the basis of previous research, this paper uses the improved deep learning algorithm YOLOv3 to detect small objects. The improvement directions include network structure, clustering algorithm, and bounding box loss function. Firstly, the clustering algorithm is optimized to obtain better anchor boxes, improving average detection accuracy and speed. Secondly, the network structure is improved to enhance the small target detection performance. Finally, the bounding box loss function is improved to improve the positioning accuracy and the overall detection effect. In addition, we constructed a dataset of paint surface defects on the five-star feet of office chairs and performed optimization training and validation.

Detection Principle
YOLOv3 integrates the feature pyramid network (FPN), residual network (ResNet), and other methods, extracts multiple feature detection layers of different scales for detection, and improves the algorithm's ability to detect targets of various sizes. The YOLOv3 algorithm can predict the category and location of objects while generating candidate regions. It does not need to be divided into two stages to complete the detection task and achieve end-to-end detection. YOLOv3 is mainly composed of a prediction network and darknet-53 feature extraction network. The network structure of YOLOv3 is shown in Figure 1. of paint surface defects on the five-star feet of office chairs and performed optimization training and validation.

Detection Principle
YOLOv3 integrates the feature pyramid network (FPN), residual network (ResNet), and other methods, extracts multiple feature detection layers of different scales for detection, and improves the algorithm's ability to detect targets of various sizes. The YOLOv3 algorithm can predict the category and location of objects while generating candidate regions. It does not need to be divided into two stages to complete the detection task and achieve end-to-end detection. YOLOv3 is mainly composed of a prediction network and darknet-53 feature extraction network. The network structure of YOLOv3 is shown in Fig  Darknet-53 eliminates gradient dispersion with five residual blocks, and its network structure is shown in Figure 2 [14]. The Darknet-53 network is the core idea of the YOLOv3 algorithm. Its structure includes multiple convolutional layers, and the output YOLO layer image has three scales: small, medium, and large. Therefore, even if some of the detected feature information is lost or interfered with by external influences and other factors, the target can be detected. Darknet-53 eliminates gradient dispersion with five residual blocks, and its network structure is shown in Figure 2 [14]. The Darknet-53 network is the core idea of the YOLOv3 algorithm. Its structure includes multiple convolutional layers, and the output YOLO layer image has three scales: small, medium, and large. Therefore, even if some of the detected feature information is lost or interfered with by external influences and other factors, the target can be detected.

Loss Function
The loss function is usually used to evaluate the model's actual and predicted values. The loss function plays a vital role in the network learning speed and the final model

Loss Function
The loss function is usually used to evaluate the model's actual and predicted values. The loss function plays a vital role in the network learning speed and the final model prediction effect. The loss function of YOLOv3 is shown in Equation (1).
In Equation (1), S represents the grid size; that is, S 2 represents 13 × 13, 26 × 26, 52 × 52. B represents the number of prediction frames. I obj i,j indicates the probability that the box appears at i, j, if it is not 0, it is 1, I noobj i,j is the same as I obj i,j , on the contrary, it represents the probability that the box has no target at i, j, which is either 0 or 1.  i is determined by whether the Bounding Box in the grid is responsible for predicting an object. If it is responsible, it isĈ j i , otherwise, it is 0. P j i andP j i represent the predicted value and the actual value of the predicted target probability, respectively. λ coord and λ noobj denote the weights of bounding box loss and confidence loss, respectively.

Improvement of Network Structure
Because the paint surface of the five-star feet of the office chair has small defect types, and because the down sampling multiple of the original YOLOv3 algorithm is too large, it creates a large receptive field of the feature map, resulting in a poor detection effect of small targets and missed detections. The YOLOv3 algorithm draws on the FPN method and uses multi-scale feature maps to detect objects of different sizes to improve the prediction ability of small objects. It obtains three feature maps of different scales by down sampling 32 times, 16 times, and 8 times, and the feature map of each scale will predict three priors' anchor. This paper will improve the YOLOv3 algorithm by extending one scale to achieve small target detection.
On the basis of the 52 × 52 feature map obtained by eight times the down sampling, the 104 × 104 feature map is obtained by two times the up sampling. Then, it is stacked and fused with the 104 × 104 feature map obtained by down sampling the backbone network four times, and it is predicted from the feature map obtained this time. There are a total of 118 fully convolutional layers after the network improvement. Moreover, it has four different feature scales for independent prediction after improvement. It realizes the multiplexing of shallow information and can perform regression classification on the feature map after four times the down sampling, so as to achieve the purpose of enhancing the detection ability of small objects. The improved network structure is shown in Figure 3, and the part marked by the dotted box in the figure is the newly added feature scale. work four times, and it is predicted from the feature map obtained this time. There are a total of 118 fully convolutional layers after the network improvement. Moreover, it has four different feature scales for independent prediction after improvement. It realizes the multiplexing of shallow information and can perform regression classification on the feature map after four times the down sampling, so as to achieve the purpose of enhancing the detection ability of small objects. The improved network structure is shown in Figure  3, and the part marked by the dotted box in the figure is the newly added feature scale.

Improvement of Bounding Box Loss Function
The IOU loss is an important indicator in object detection, which mainly describes the overlapping area of the predicted anchor and the ground-truth anchor. That is, the IOU is calculated by the ratio of the intersection and union between the predicted box and the ground-truth box and is often used to evaluate the pros and cons of the bounding box, as shown in Equation (2).
In Equation (2), A and B represent the predicted anchor and the real anchor, respectively. The IOU is scale-invariant, but if the two boxes do not intersect, as shown in Figure  4, the values of IOU of the A box and the B box, and the A box and the C box are all 0. However, at this time, the distance between the B box and the C box is closer than the distance between the A box and the B box, and the IOU cannot calculate the distance between the two bounding boxes. In addition, the fine-tuning of the bounding box adopts

Improvement of Bounding Box Loss Function
The IOU loss is an important indicator in object detection, which mainly describes the overlapping area of the predicted anchor and the ground-truth anchor. That is, the IOU is calculated by the ratio of the intersection and union between the predicted box and the ground-truth box and is often used to evaluate the pros and cons of the bounding box, as shown in Equation (2).
In Equation (2), A and B represent the predicted anchor and the real anchor, respectively. The IOU is scale-invariant, but if the two boxes do not intersect, as shown in Figure 4, the values of IOU of the A box and the B box, and the A box and the C box are all 0. However, at this time, the distance between the B box and the C box is closer than the distance between the A box and the B box, and the IOU cannot calculate the distance between the two bounding boxes. In addition, the fine-tuning of the bounding box adopts the L2 norm. When there is no intersection of the real boxes, the IOU value is 0, and the gradient when optimizing the loss function is also 0, so learning and training cannot be performed.
Machines 2022, 10, x FOR PEER REVIEW 6 of 18 the L2 norm. When there is no intersection of the real boxes, the IOU value is 0, and the gradient when optimizing the loss function is also 0, so learning and training cannot be performed. GIOU [27] is optimized on the basis of IOU. Figure 5 shows the regression process of GIOU and CIOU, in which box a is the target box, box b is the anchor box, and box c is the offset result of the anchor box after different iterations. Moreover, both GIOU and CIOU can guide the detection anchor movement. GIOU adjusts the prediction anchor by position, aspect ratio, size, etc. As shown in Figure 5, during the regression process of GIOU, when the IOU is 0, GIOU first allows the anchor to overlap with the target anchor. Then GIOU will gradually degenerate into an IOU regression strategy, so the whole process will be slow, and there is a risk of divergence. Therefore, the idea of CIOU is introduced to solve the problem. CIOU is based on DIOU [28] and considers the width-length ratio GIOU [27] is optimized on the basis of IOU. Figure 5 shows the regression process of GIOU and CIOU, in which box a is the target box, box b is the anchor box, and box c is the offset result of the anchor box after different iterations. Moreover, both GIOU and CIOU can guide the detection anchor movement. GIOU adjusts the prediction anchor by position, aspect ratio, size, etc. As shown in Figure 5, during the regression process of GIOU, when the IOU is 0, GIOU first allows the anchor to overlap with the target anchor. Then GIOU will gradually degenerate into an IOU regression strategy, so the whole process will be slow, and there is a risk of divergence. Therefore, the idea of CIOU is introduced to solve the problem. CIOU is based on DIOU [28] and considers the width-length ratio in the three elements of bounding box regression, while DIOU considers the center distance information of the bounding box on the basis of IOU. CIOU quickly pulls back the position without changing the shape of the prediction frame, so CIOU pulls back the prediction frame faster than GIOU and makes its IOU greater than 0. When the IOU is greater than 0, CIOU will quickly adjust the size. When the IOU is more significant than 0.5, the aspect ratio part of the CIOU will start to be the main part of the gradient propagation so that the prediction box and the target box have the same aspect ratio. Moreover, CIOU is a better evaluation standard at present, so this paper will use CIOU to replace IOU in the original algorithm. GIOU [27] is optimized on the basis of IOU. Figure 5 shows the regression process of GIOU and CIOU, in which box a is the target box, box b is the anchor box, and box c is the offset result of the anchor box after different iterations. Moreover, both GIOU and CIOU can guide the detection anchor movement. GIOU adjusts the prediction anchor by position, aspect ratio, size, etc. As shown in Figure 5, during the regression process of GIOU, when the IOU is 0, GIOU first allows the anchor to overlap with the target anchor. Then GIOU will gradually degenerate into an IOU regression strategy, so the whole process will be slow, and there is a risk of divergence. Therefore, the idea of CIOU is introduced to solve the problem. CIOU is based on DIOU [28] and considers the width-length ratio in the three elements of bounding box regression, while DIOU considers the center distance information of the bounding box on the basis of IOU. CIOU quickly pulls back the position without changing the shape of the prediction frame, so CIOU pulls back the prediction frame faster than GIOU and makes its IOU greater than 0. When the IOU is greater than 0, CIOU will quickly adjust the size. When the IOU is more significant than 0.5, the aspect ratio part of the CIOU will start to be the main part of the gradient propagation so that the prediction box and the target box have the same aspect ratio. Moreover, CIOU is a better evaluation standard at present, so this paper will use CIOU to replace IOU in the original algorithm. The CIOU loss function equation is shown in Equation (3).
In Equation (3), ρ 2 b, b gt represents the Euclidean distance between the center points of the prediction frame and the real frame, respectively, and C represents the diagonal distance of the minimum closure area that can contain the prediction frame and the real anchor at the same time. α is the weight function and v is the parameter to measure the consistency of the aspect ratio. α as shown in Equation (4), v as shown in Equation (5), and ω, h and ω gt , h gt in Equation (5) represent the width and height of the prediction anchor and the real anchor, respectively. In Equation 5, ω and h represent the width and height of the predicted frame, and ω gt and h gt represent the width and height of the real frame. The modified loss function is shown in Equation (6).

Improvement of Clustering Algorithm
The YOLOv3 algorithm uses the K-means clustering algorithm to cluster and select anchor boxes. On the COCO dataset, nine kinds of priors' anchors are obtained by clustering, and the allocation of priors' anchors is shown in Table 1. The K-means algorithm randomly determines the initial clustering centers. Different clustering centers will lead to different clustering results, which may lead to slower convergence of the clustering algorithm and clustering errors. Therefore, the K-means++ [29,30] algorithm is proposed to solve the problem so that the distance between the cluster centers is far enough, and the distance is defined as follows.
However, since the selection of the next center point in the K-means++ algorithm depends on the center point that has been selected, in order to solve this defect, the K-means|| algorithm is used to solve it. The K-means|| algorithm is a variant of the K-means++ algorithm. The main idea of the algorithm is to change the sampling strategy for each traversal. Instead of taking out only one sample per traversal as in K-means++, each traversal takes O(k) samples. This paper will adopt the K-means|| algorithm, and its steps are shown in Algorithm 1.

Algorithm 1 K-means||
Input: DATA X; clusters K; oversampling l. Output: set of prototypes C = {c 1 ,c 2 , . . . ,c k }. 1. Uniformly and randomly select a sample from X as a candidate cluster center C. 2. ψ ← compute ∅ X (C) 3. for O(log(ψ)) times do 4. Csample each point x ∈ X independently with probability p x = l·d 2 (x,C) φ X (C) 5. C ← C ∪C 6. end for 7. Run the weighted K-means++ algorithm on the set of candidate centers to get the exact K cluster centers. 8. Run the standard K-means algorithm with the resulting K cluster centers.
Calculate ψ in step 2, the initial cost of the clustering after this selection. In general, it is sufficient to increase the oversampling from l to 2 K. and to take the value of O(log(ψ)) as 5. In step 4, calculate the distance from each sample to the nearest cluster center by using Equation (8) and extract a batch of points according to the probability as candidate cluster centers. Repeat step 4 and cycle five times to obtain a set of candidate clustering centers that is larger than the preset K. Calculate the density of each candidate center. Finally, run steps 7 and 8.
Select 12 anchor boxes according to the cluster center, and then use the logistic regression function to perform confidence regression on each anchor box at different scales, predict the bounding boxes, and then select the most suitable category according to the confidence. This article will use the optimized K-means|| clustering algorithm. The K value is 12, and after the clustering algorithm iteration, the corresponding anchor boxes are selected as (20 Figure 6 is the comparison of the three clustering algorithms under different numbers of cluster centers. The accuracy of the three algorithms is calculated in the six cases where the cluster centers are 3, 6, 9, 12, 15, and 18, respectively. According to Figure 6, it can be seen that the accuracy of K-means|| is generally better than the other two types of clustering algorithms. When the K value is 12, the accuracy of K-means|| is significantly better than that of the other two types of clustering algorithms.

Experimental Dataset
The five-star feet are a key component of the office chair, which support and sta the whole office chair. The five-star feet are shown in Figure 7. At present, the produ materials of the five-star feet for the office chair are mainly metal and nylon. The pr tion of the data set in this paper mainly considers the five-star feet made of metal m als.

Experimental Dataset
The five-star feet are a key component of the office chair, which support and stabilize the whole office chair. The five-star feet are shown in Figure 7. At present, the production materials of the five-star feet for the office chair are mainly metal and nylon. The production of the data set in this paper mainly considers the five-star feet made of metal materials. The data set comes from the five-star feet of office chairs shot by industrial cameras. The self-made data set is augmented and optimized by writing scripts to obtain 2300 images. The processing methods of data set image augmentation mainly include cropping, parallel movement, adding noise, dimming, etc.; the step of cropping is shown in Algorithm 2, and the step of parallel movement is shown in Algorithm 3; the image augmentation processing results of the dataset are shown in Figure 8. Augmented optimization of dataset images can enrich datasets and make dataset images more suitable for training. LabelImg labeling software was used for sample labeling, the data set was in VOC format, and it was divided into a training set, validation set, and test set. Due to the different sizes of the original pictures, the pixels of the pictures are uniformly processed to 416 × 416. 1.
uni f orm(0, x min ))) 8. C ymin ← max(0, int(y min − random.uni f orm(0, y min ))) 9. C xmax ← min(W, int(x max − random.uni f orm(0, d r ))) 10. C ymax ← min(H, int(y max − random.uni f orm(0, d b ))) 11. crop_img ← img C ymin : C ymax , C xmin : C xmax 12. crop_bboxes ← B xmin − C xmin , B ymin − C ymin , B xmax − C xmin , B ymax − C ymin The process of this algorithm can be understood as follows: First, obtain a set of data according to the input parameters, as shown in the first four steps. Then, calculate the maximum right-shift distance and the maximum down-shift distance, including all target boxes. Steps 7 to 10 are to randomly expand this minimum box and ensure that it does not exceed the bounds. Finally, obtain the cropped image and the information of the cropped bounding box.
The process of this algorithm can be understood as follows: First, obtain a set of data according to the input parameters, as shown in the first four steps. Then, calculate the maximum right-shift distance and the maximum down-shift distance, including all target boxes. In step 7, x is the pixel value moved left or right, positive is right, and negative is left; y is the pixel value moved up or down, positive is up, and negative is down. In step 8, M is an affine transformation matrix, which is used to represent the relation of translation or rotation. Finally, obtain the horizontally shifted image and the information of the horizontally shifted bounding box.
When combined with the actual production, this paper divides the paint surface defects of five-star feet into paint powder bulge defect, paint coating cracking defect, paint bubbles defect, paint flow defect, base color leakage defect, dirty spots defect, scratch defect, dent defect, etc. This article is mainly for the detection of a single defect on the five-star feet. A single defect means that only one type of paint surface defect appears on a picture. As shown in Figure 9, the first column from the left is a paint powder bulge defect, a paint coating cracking defect, and a paint bubbles defect, the second column is a paint flow defect, a base color leakage defect, and a dirty spots defect, and the third column is a scratch defect and a dent defect. As can be seen from Figure 8, paint bubbles and dirty spots are the main small target defect types in the self-made dataset. Among them, paint bubble defects include defects less than or equal to 5mm, and dirty spot defects include defects less than or equal to 6 mm 2 . The original YOLOv3 algorithm is insufficient for the detection of small target defects. The improved YOLOv3 algorithm proposed in this paper will be applied in the self-made data set to verify the performance of the algorithm.

Evaluation Indicators
When evaluating the performance of the model, it is usually necessary to take into account both the precision rate and the recall rate. Equation (9) is the formula for calculating the precision rate, and Equation (10) is the formula for calculating the recall rate. The average precision rate under different recall rates is defined as the Average Precision (AP), which is used to evaluate the detection accuracy of a certain class. In target detection, the mean Average Precision (mAP) is usually used to evaluate the model performance, and the small target missed detection rate is evaluated by comparing the prediction effect before and after the YOLOv3 algorithm. The precision and recall are defined as the following equation.
In Equations (9) and (10), TP is the number of positive samples successfully predicted, FP is the number of negative samples incorrectly predicted as positive samples by the model, and FN is the number of positive samples incorrectly predicted as negative samples by the model.
The precision rate represents the proportion of the number of correctly predicted samples in the prediction target of a certain category to the total number of correct samples, and the recall rate represents the proportion of the number of correctly predicted samples to the total number of predicted samples. In this paper, the performance of the model will be evaluated by using the two indexes of mAP and fps. The calculation Equation of mAP and fps are as follows.
f ps = Num Figure  Total Time (12) In Equation (11), AP(i) is the detection accuracy of a certain category, and n is the number of categories. In Equation (12), NumFigure is the total number of detected pictures, and TotalTime is the total detection time.

Analysis of Experimental Results
The hardware configuration of the experimental platform is AMD ryzen7 5800 h Radeon graphics CPU, 16 GB memory, 6 GB NVIDIA GeForce RTX 3060 laptop GPU, the operating system is Windows, and the software environment is CUDA 11.4 and cuDNN V8.2.2. Using PyTorch to build the network model, using the transfer learning idea, loading darknet53.conv.74 as the pre-training weight for training, iterating for a total of 10,000 generations, and the initial configuration parameters (i.e., initial learning rate, number of channels, momentum value, mini-batch size, etc.) have been kept the same as the original parameters in the YOLOv3 model. Figure 10 is the convergence curve of the average loss function during the training process. It can be seen from the figure that the Loss value decreases rapidly at the beginning, and when the iteration approaches 1000 times, the Loss value begins to stabilize. The experimental results are shown in Figure 11. The first column from the left is a paint powder bulge defect, a paint coating cracking defect, and a paint bubbles defect, the second column is a paint flow defect, a base color leakage defect, and a dirty spots defect and the third column is a scratch defect and a dent defect. The improved algorithm in this paper achieves an mAP value of 88.3% and a detection speed of 50 f·s −1 , which meets the accuracy and real-time requirements for the detection of the paint surface of the five-star feet. The experimental results are shown in Figure 11. The first column from the left is a paint powder bulge defect, a paint coating cracking defect, and a paint bubbles defect, the second column is a paint flow defect, a base color leakage defect, and a dirty spots defect, and the third column is a scratch defect and a dent defect. The improved algorithm in this paper achieves an mAP value of 88.3% and a detection speed of 50 f·s −1 , which meets the accuracy and real-time requirements for the detection of the paint surface of the five-star feet.
second column is a paint flow defect, a base color leakage defect, and a dirty spots d and the third column is a scratch defect and a dent defect. The improved algorithm paper achieves an mAP value of 88.3% and a detection speed of 50 f·s −1 , which mee accuracy and real-time requirements for the detection of the paint surface of the fiv feet. To verify the superiority of the model proposed in this paper, we first compa detection performance of models using anchor boxes obtained by different clusteri gorithms for detection. Secondly, we compare the detection performance of models different bounding box loss functions. Then, we use the Faster R-CNN algorithm, th algorithm, the YOLOv3 algorithm, and the proposed model to detect the office chai To verify the superiority of the model proposed in this paper, we first compare the detection performance of models using anchor boxes obtained by different clustering algorithms for detection. Secondly, we compare the detection performance of models using different bounding box loss functions. Then, we use the Faster R-CNN algorithm, the SSD algorithm, the YOLOv3 algorithm, and the proposed model to detect the office chair five-star feet paint defect dataset and compare their detection performance. Finally, the proposed model is used to detect defects in the aluminum data set released by the Aliyun Tianchi Competition [24] and compared with the model proposed in the literature [24] to further verify the performance of the proposed model. The test results of the model are based on IOU = 0.5.

1.
Improve the Network Structure and Clustering Algorithm Table 2 shows the performance comparison of target detection algorithms after obtaining anchor boxes using different clustering algorithms. After changing the priors anchor clustering method from K-means to K-means||, the mAP value of the original YOLOv3 algorithm has been improved. When using the same clustering algorithm, since the improved YOLOv3 algorithm has been adjusted in the network structure, the mAP value has been significantly improved. When using the same target detection algorithm, the mAP value of K-means|| is better than that of K-means and K-means++. Using the improved YOLOv3 algorithm, combined with the new anchor box clustered by the K-means|| algorithm for defect detection, the mAP value can reach 88.3%.

2.
Comparative Analysis of the Performance of Different Loss Functions Figure 12 shows the performance comparison of algorithms using different loss functions. It can be seen from Figure 12 that the performance of the algorithm using CIOU is optimized to a certain extent compared to algorithms using other loss functions. In the case of using the original YOLOv3 algorithm, the mAP value using the CIOU loss function is 2.2% higher than that using the IOU loss function. In the case of using the improved YOLOv3 algorithm, the mAP value using the CIOU loss function is 2.7% higher than that using the IOU loss function. Figure 12 shows the performance comparison of algorithms using different loss functions. It can be seen from Figure 12 that the performance of the algorithm using CIOU is optimized to a certain extent compared to algorithms using other loss functions. In the case of using the original YOLOv3 algorithm, the mAP value using the CIOU loss function is 2.2% higher than that using the IOU loss function. In the case of using the improved YOLOv3 algorithm, the mAP value using the CIOU loss function is 2.7% higher than that using the IOU loss function.

3.
Comparative Analysis of the Performance of Different Algorithms Table 3 shows the performance comparison of different algorithms on the paint surface dataset of self-made office chairs with five-star feet. From the obtained results, it can be found that the mAP value of the improved algorithm in this paper reaches 88.3%, which is better than other comparison algorithms. The detection speed of the improved algorithm in this paper reaches 50 f ·s −1 , which is lower than the original algorithm, but it also meets the real-time requirements. The main reason for the reduction in detection speed is that the network structure adds a detection scale, which increases the amount of calculation.  Figure 13 is a comparison of the detection results of the original YOLOv3 algorithm and the improved YOLOv3 algorithm in this paper. The detection scale of the improved YOLOv3 algorithm has been increased from 3 to 4, and 12 a priori anchors need to be set. It can be seen from the results that the mAP values of all defects of the improved YOLOv3 algorithm are improved compared to the original YOLOv3 algorithm. The mAP values for defects such as dirty spots and paint bubbles also increased from 76% and 83% to 81% and 88%, respectively. The improved YOLOv3 algorithm can meet the detection requirements of the five-star feet paint defect types of office chairs in the industry. and the improved YOLOv3 algorithm in this paper. The detection scale of the improved YOLOv3 algorithm has been increased from 3 to 4, and 12 a priori anchors need to be set. It can be seen from the results that the mAP values of all defects of the improved YOLOv3 algorithm are improved compared to the original YOLOv3 algorithm. The mAP values for defects such as dirty spots and paint bubbles also increased from 76% and 83% to 81% and 88%, respectively. The improved YOLOv3 algorithm can meet the detection requirements of the five-star feet paint defect types of office chairs in the industry.

Comparative Analysis of Algorithm Performance Based on Public Datasets
We conducted a comparative experiment on the algorithm performance in the Alibaba Tianchi aluminum data set, and the comparison results are shown in Table 4. The

Comparative Analysis of Algorithm Performance Based on Public Datasets
We conducted a comparative experiment on the algorithm performance in the Alibaba Tianchi aluminum data set, and the comparison results are shown in Table 4. The mAP value of the improved algorithm in this paper on the Aliyun Tianchi aluminum data set reaches 89.2%. Compared with the improved algorithm proposed in the literature [24], the mAP value of the improved YOLOv3 algorithm in this paper is also improved.

Discussion and Conclusions
We propose an improved YOLOv3 algorithm that can solve the problem of paint defect detection of office chairs' five-star feet. The clustering algorithm is optimized by using the K-means|| algorithm instead of the traditional K-means algorithm to obtain better anchor boxes. We improve its network structure and loss function to design a fast and accurate end-to-end five-star feet paint defect detection algorithm for office chairs.
The following conclusions are obtained through experiments. 1.
The new anchor boxes are obtained by K-means|| algorithm clustering, which increases the mAP value by 5.8%.

2.
By optimizing and improving the network structure of the YOLOv3 algorithm, a new detection scale is added to improve its detection ability for small target defect samples, and the CIOU loss function is used to improve the positioning accuracy. The mAP value is increased by 5.8% compared with the original algorithm.

3.
On the self-made five-star feet paint surface data set, the mAP value of 8 types of paint surface defect detection reached 88.3%, and the detection speed was also maintained at 50fps. Further verification was carried out on the aluminum data set released by the Aliyun Tianchi Competition. The mAP value of the improved YOLOv3 algorithm in this paper reached 89.2%, and the detection speed could be maintained at 50 fps.

4.
Through comparative experiments with other algorithms, the improved YOLOv3 algorithm has faster detection speed and better detection accuracy for small target defect detection.
The improved algorithm realizes the end-to-end fast and accurate detection of paint defects on the five-star feet of office chairs.
In this paper, after adding a new detection scale, the overall calculation volume of the algorithm is also increased, which affects the detection speed. Therefore, the network structure can be further optimized in the follow-up research, and the optimizer can be improved to improve the detection performance. At the same time, the preset anchor boxes not only add more parameters to the algorithm but also affect the detection speed. Next, we will further study how to eliminate the dependence on anchor boxes and improve the overall detection speed.