Research on the Small Target Recognition Method of Automobile Tire Marking Points Based on Improved YOLOv5s

: At present, the identification of tire marking points relies primarily on manual inspection, which is not only time-consuming and labor-intensive but also prone to false detections, significantly impacting enterprise efficiency. To achieve accurate recognition of tire marking points, this study proposes a small target feature recognition method for automotive tire marking points. In image pre-processing, MSRCR (Multi-Scale Retinex with Color Restoration) is invoked to enhance image features, which can be adapted to different environmental detection tasks. The YOLOv5s network is improved by adding a parameter-free simAM (Similarity Attention Mechanism) attention mechanism to improve the detection efficiency; adding a small target prediction head in the network to improve the minimum recognition size of the network; and changing the loss function to improve the network recognition performance. MAP, precision, and recall are important parameters. The comparison experiment with the traditional YOLOv5s network shows that the mAP of the improved YOLOv5s network and the original network is 0.86 and 0.955, respectively, and the mAP is increased by 9.5%. The precision is 0.87 and 0.96, an improvement of 9%, and the recall rate is 0.84 and 0.89, an improvement of 4%; the improved YOLOv5s model has a higher confidence level for small target recognition and is more suitable for application in practical detection tasks.


Introduction
The quality of car tires directly impacts the safety and comfort of the vehicle.Therefore, tire marking point detection is crucial prior to the release of automotive tires from the factory.Downstream companies that use these tires require rapid and high-precision inspection of a significant number of tire marking points for efficient storage.Through enterprise research, it has been observed that the current practice predominantly relies on manual inspection, which is time-consuming and labor-intensive.Companies aspire to replace manual inspection with machine vision technology to achieve automated detection, offering a more efficient and accurate solution [1,2].As early as the 1990s, domestic and foreign scholars conducted research related to the identification of tire marking points.In 1999, Bridgestone of Japan improved their rotating marking point identification system by integrating a marking point marking device and an identification device, which greatly reduced the installation space of the device and allowed the identification of the marking points immediately after they were marked on the tire [3].The Identity CON-TROL TMI 8303.1 closed-loop control system developed by Micro epsilon, Germany, is used for online identification of tire marking points and can identify the color, tire and location of tire marking points.The IRIS_M, developed by SICK in Germany, uses threedimensional images to locate the tire and two-dimensional images to identify the shape and color of the points [4].In [5], a support vector machine approach was used to achieve shape recognition and color recognition of tire identification points.In [6], color recognition and shape recognition are trained for using convolutional neural network SGD.In [7], the recognition area recognition is achieved by a template matching method.The above methods also represent most of the recognition methods for automotive tire marking point features.The advantages and shortcomings of the methods in the literature and the methods used in this paper are shown in Table 1.Based on enterprise research and literature review, it is evident that most current methods for automotive tire marking point recognition rely on traditional image processing or manual recognition.However, these approaches suffer from low efficiency and accuracy in recognition.Additionally, there is limited utilization of deep learning, a powerful image feature recognition method, in automotive tire marking point recognition.To address this issue and improve the speed, accuracy, recognition diversity, and reliability of tire marking point recognition, this paper proposes a deep-learning-based method for small target recognition in automotive tire marking points.The proposed method demonstrates high accuracy in recognizing marking points in both tire industrial production environments and outdoor natural environments.
The article is structured as follows: 1. Introduction.The research background of tire marking points and its development trend is presented through company research and literature summary.2. Materials and methods.It mainly introduces the overall scheme of tire marking point recognition, the way of data collection composition, and image pre-processing to solve the influence of light, weather, and other factors.3. Tire marking point identification network structure.An improved YOLOv5s algorithm is proposed for the problem of tire marking point recognition: (1) Adding an attention mechanism (2) Loss function (3) Small target prediction head 4. Model Training and Testing.To verify whether the improved algorithm is better than the original algorithm and to cerify the rationality of the improvement through ablation experiments.5. Conclusion.To summarize the whole text and draw conclusions.

Materials and Methods
The steps of the tire marking point recognition algorithm are shown in Figure 1a, and the flow chart is shown in Figure 1b.First, the captured raw image is pre-processed.Second, the processed images are expanded and labeled.Finally, an improved YOLOv5s is used to train the images and obtain the model data.

Dataset Preparation
A 14 MP industrial camera was used to take the shots and construct the dataset, which consisted of two parts.
The first part of the dataset is derived from the actual inspection environment in the factory, as shown in Figure 2. In the actual inspection, the tires are transported by a transport unit to the inspection area and pushed into the warehouse when the inspection is completed.
The second part of the dataset is derived from natural environment photography, collecting images of tires in different environments and weather; part of the collection is shown in Figure 3.

Image Pre-Processing
In industrial and natural environment acquisition, there are many factors that can lead to poor image quality, such as interference with industrial lighting, strong exposure due to strong light, and low contrast due to cloudy and rainy days.Therefore, image processing is needed to improve the image quality [8].Therefore, Multi-Scale Retinex with Color Restoration (MSRCR) was introduced to pre-process the image to enhance the differentiation between the marker points and the background, which is implemented as shown in Equation (1) [9,10].
( , ) x y represents the recovered image of the ith channel after using this al- gorithm; MSRCR adds a color recovery factor C as shown in Equation ( 2), which is used to adjust the color distortion due to contrast enhancement in local areas of the image, where α is the adjustment factor, β is the gain constant, and ( ) In Equation (3), ( , ) x y represents the recovered image of the ith channel after using multi-scale filtering, ( ) x y is a single-scale Gaussian filter, λ represents the weight, and m is the number of scales [11][12][13].A comparison of the images before and after processing is shown in Figure 4. From the results in Figure 4, it can be seen that after pre-processing, the environmental factors such as strong light and darkness, which cause unclear marking points, have been solved.

Data Enhancement
In the deep learning training process, the number of images has a significant impact on the model: the higher the number, the more accurate the training results.Therefore, the pre-processed images were flipped, rotated, and randomly scaled to expand the number of samples [14].A total of 5600 sample images were obtained to construct the dataset, of which 80% are used as the training set, 10% as the validation set, and 10% as the test set.The dataset is labeled using Labelimg and converted to txt format.

Tire Marking Point Identification Network Structure
The official code for YOLOv5 currently offers five different depths and widths, YOLOv5n, YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x.In industrial applications, it is desirable to speed up recognition as much as possible, so the smaller model, YOLOv5s, was chosen as the training model.

YOLOv5s Model Structure
The YOLOv5s model consists of three main components: the backbone network (Backbone), the neck network (Neck), and the head network (Head).
The Backbone uses the CSPDarknet53 structure to extract feature information from the input image [15,16].It serves as the foundation of the model.
The Neck is situated between the Head network and the Backbone network.Its primary function is to enhance feature diversity and network robustness.It achieves this by performing multi-scale feature fusion on the feature map, further improving the features' quality and passing them to the prediction layer.
The Head network is responsible for generating the target detection results.In the original model, it comprises three detection layers, each corresponding to a set of initialized anchor values.The Head network receives feature maps of three different scales, obtained from the Neck, and conducts network predictions on these feature maps.By employing convolutional operations, it produces the final feature output [17].The detection process involves identifying targets of three different dimensions: large, medium, and small [18].The model diagram is presented in Figure 5.

YOLOv5s Model Improvements
Due to the large overall tire shape size and the small percentage of marking points in the whole tire image, the traditional YOLOv5s model is not ideal for small target detection, so the YOLOv5s structure is improved.

Adding an Attention Mechanism
In the improved YOLOv5s model [19], the simAM (Similarity Attention Mechanism) attention mechanism was introduced.This mechanism is a 3D weighted attention module that derives attention weights for the feature map without requiring additional parameters.The schematic diagram of the simAM module is depicted in Figure 6.The inspiration behind this module comes from the attention mechanism observed in the human brain.In the figure, the 3D weight implementation is achieved using an energy function.
The design of the simAM module draws inspiration from the field of neurology, where neurons carrying rich information are often distinguished from other neurons by exhibiting unique firing patterns.Additionally, neurons carrying rich information tend to suppress the activity of surrounding neurons, a phenomenon known as null-field suppression in neurology.Hence, it is crucial to assign higher weights to these neurons to value their contributions.The simAM module uses the measurement of linear differentiability of neurons to find significant neurons and devises an energy function to find significant neurons, Equation (4) [20]: where t and i x are the input features,  ∈  ×× are neurons in one of the channels that contain a lot of information, and other ordinary neurons that do not contain much information,  ̂ and x i These two parameters are derived from t and i x and are trans- formed linearly by the following equation: where the parameter i is the spatial dimension index,  =  ×  is the number of neu- rons in the current channel,   is the transform weight, and   represents the deviation.Solving for the minimum of Equation ( 4) by binarizing   and  0 and regularizing Equation (4), a new expression for the energy function can be obtained as (7): An analysis of the above equation gives: Assuming for the moment that all pixels in each channel follow the same distribution and that the mean of the target neuron where , and * t e are smaller means that this neu- ron t is different from the surrounding neurons and that this neuron t is also quite im- portant, so it should be given enough weight and given a higher weight value.Once a specific neuron has been found and then the derivative of the neuron energy expansion has been derived, the feature enhancement to be studied is performed by adding the Sigmoid function as in Equation (11).
To determine the best location for the simAM module, it was connected to CBL, CSP_1, and CSP_3 of the original network to test the network performance, and it was finally decided to insert the simAM module behind CSP_3 and named CSPS.

Loss Function
The YOLOv5s model uses the IoU Loss, which was proposed in 2016, as the localization loss function.However, the original IoU Loss has several drawbacks.It fails to determine the distance between two frames when the predicted frame and the real frame do not intersect.Additionally, it cannot accurately determine the position and size of two frames when they have the same IoU value [21].To address these limitations, Hamid Rezatofigh introduced the GIoU (Generalized Intersection over Union) Loss in C PR 2019 [22].The GIoU Loss takes into account both overlapping and nonoverlapping regions, providing a more comprehensive measure of the overlap between predicted and real frames compared to the original IoU Loss.Following this, DIoU and CIoU were developed, each with their own advantages and disadvantages [23].In an effort to generalize existing IoU loss functions, Jiabo H proposed a power transformation by adding a Box-Cox transformation to the IoU Loss [24].By adjusting the value of the power, the Alpha-IoU Loss enables the utilization of the existing IoU to its full advantage, leading to improved target recognition accuracy.The transformation process is described by the Equation (12): Most of the loss functions now available by conditioning  include all of the loss functions mentioned above, and when using multiple  , can be extended to other loss functions that have  conditioning; multiple  values are generalized as in Equation (13).
It is found that Alpha-IoU Loss has better robustness in small data and in noisy interference, and the regression accuracy can be improved by simply adjusting the value of .Therefore, in this paper, the localization loss function of the network is replaced by Alpha-IoU Loss to improve the network recognition performance.

Small Target Prediction Head
When the image is input to the YOLOv5s network, the default input image size of the network is 640 × 640; if the image input size is too large, then the input image will be scaled, and after compression it will make the car tire marking point recognition more difficult.Therefore, on the basis of the original network, the small target prediction head is added to avoid the loss of features due to the compression of the image.The structure of the improved YOLOv5s network is shown in Figure 7.

Experimental Environment
The experimental setting for this study is shown in Table 2.As can be seen from the three comparison plots above, the improved YOLOv5s accuracy rises significantly higher than the traditional YOLOv5s in the first 25 iterations; in 50-100 iterations the accuracy rises, and after 150 iterations both models stabilize.Finally, after 300 iterations the traditional YOLOv5s mAP reaches 86%, while the improved YOLOv5s mAP reaches 95.5%, an improvement of 9.5%.At the same time, improved YOLOv5s achieved a 9% increase in accuracy and 4% increase in recall compared to traditional YOLOv5s.The improved YOLOv5s also converged faster, and the recall rate did not decrease while the precision improved.Therefore, the improved YOLOv5s model is significantly better than the traditional YOLOv5s model.The visualization curves generated during the training of the improved YOLOv5s model are shown in Figure 11.

SimAM Module alidation
Currently, a CBAM attention mechanism is widely applied to the model of YOLOv5 [25]; to obtain the effect of a simAM module on the model, YOLOv5s, YOLOv5s+CBAM, and improved YOLOv5s were trained to compare the accuracy, recall, and average precision, and the results are shown in Table 3.Although the CBAM attention module also improves the performance of the network, it does not improve as much as the simAM attention module, and unlike the simAM module, the CBAM module is parameter-free, which prolongs the training time of the model and can easily lead to overfitting.

Model isualization Comparison Experiments
To further analyze the effectiveness of the YOLOv5s model and the improved YOLOv5s model for tire marking point recognition, two randomly selected image samples from the test set of the dataset were tested separately for the two trained models, and the results were visualized.Some of the results are shown in Figure 12.
From the results, it can be seen that the confidence levels of the improved YOLOv5s model are all higher than those of the traditional YOLOv5s model, and for the clearer cases of Figure 12a,b, the confidence levels of the two models are not much different: 0.95 and 0.92, respectively.For the smaller target detection of Figure 12c,d

Comparison of Test Results
The YOLO series and Fast-RCNN are the more common small target detection models.For the dataset of this study, we choose YOLOv4, YOLOv5s, Faster-RCNN, and improved YOLOv5s for comparison experiments.The results are shown in Table 4.It can be observed that in the task of tire marking point detection, the model proposed in this study outperforms other models.

Conclusions
This paper proposes a deep-learning-based method for identifying small targets in car tire marking points, which improves the accuracy of small target marking point recognition in images; the research results of this paper are summarized as follows: 1.In image pre-processing, the MSRCR algorithm is introduced to improve the contrast of identification points in the image, which can adapt to different environments and reduce the difficulty of model training.2. Improve YOLOv5s model by adding a small target detection head to improve the accuracy of the model for small target recognition.3. The addition of the parameter-free simAM attention mechanism, which enhances the ability of the convolutional layer to protrude features, can better improve feature information.4. The loss function CIoU of the original network is replaced by Alpha-IoU, which is more flexible than the original CIoU in updating the parameters using back propagation and reducing the loss of predicted and true values, thus optimizing the model and enabling it to have better recognition results.
With the above improvements, the resulting improved YOLOv5s model has a 9.5% improvement in mAP compared to the original YOLOv5s.The accuracy is improved by 9%, the recall rate is improved by 4%, and the convergence speed is accelerated.Compared with other small target detection models, the models in this paper all have significant advantages.Through random test experiments, the superiority of the improved YOLOv5s model for small target recognition is verified and can be applied to practical detection tasks.

Figure 3 .
Figure 3. Images of part of the original dataset: (a-b) for industrial environment collection, (c-f) for natural environment collection.

Figure 4 .
Figure 4. Comparison images: (a,c) for the original image, (b,d) for the processed image.
are removed, the energy function of the smallest neuron should be the following equation:

Figure 12 .
Figure 12.Comparison of results: where (a,c,e,g) are improved YOLOv5s results, and (b,d,f,h) are YOLOv5s results.

Table 1 .
Comparison of methods.