Insulators and Defect Detection Based on the Improved Focal Loss Function

: Unmanned aerial vehicle (UAV) inspection has become the mainstream of transmission line inspection, and the detection of insulator defects is an important part of UAV inspection. On the premise of ensuring high accuracy and detection speed, an improved YOLOv5 model is proposed for defect detection of insulators. The algorithm uses the weights trained on conventional large-scale datasets to improve accuracy through the transfer learning method of feature mapping. The algorithm employs the Focal loss function and proposes a dynamic weight assignment method. Compared with the traditional empirical value method, it is more in line with the distribution law of samples in the data set, improves the accuracy of difﬁcult-to-classify samples, and saves a lot of time. The experimental results show that the average accuracy of the insulator and its defect is 98.3%, 5.7% higher than the original model, while the accuracy and recall rate of insulator defects are improved by 5.7% and 7.9%, respectively. The algorithm improves the accuracy and recall of the model and enables faster detection of insulator defects. (RCNN). By using k-means clustering, the Focal loss function is introduced, and the ROI mining module is added to solve the class imbalance in classiﬁcation stage. Experiments show that the high accuracy of bird’s nest detection can be achieved.


Introduction
During the routine inspection of transmission lines, the main methods include manual inspection, helicopter inspection, robot inspection, and unmanned aerial vehicle (UAV) inspection [1]. With the development of technology, UAVs have gradually become an important method for transmission line detection because of their low cost, convenient operation, and high efficiency. Power supply companies take a large number of inspection pictures during daily inspections through drone inspections. However, these pictures currently mainly rely on manual recognition, which is less efficient and accurate. Therefore, using deep learning technology to detect insulator defects quickly and accurately is the new direction of power inspection development [2].
In recent years, with the development of deep learning theory and the improvement of computer hardware performance, the research on transmission line defect detection based on deep learning has become a hot topic. At present, there are two main kinds of neural networks commonly used in target detection. The first is a two-stage target detection model based on region extraction, such as faster region-based convolutional neural networks (Faster R-CNN) [3], and the second type is based on regression single-stage target detection, such as "You Only Look Once" (YOLO) [4], Single Shot MultiBox Detector (SSD) [5], etc. These two models are widely used in current research. For example, Manninen et al. [6] proposed a new method for assessing the condition of transmission overhead lines, which can automatically isolate transmission poles, disaggregates components, detects defects, and determines the health index of concrete structures and insulators, compared to traditional foot patrol visual inspections. This approach greatly improves efficiency and reduces costs. Hosseini et al. [7] have developed a model for automatically estimating and locating the Li et al. [19] have designed an improved cross-entropy loss function that pays more attention to difficult samples and uses it for faster R-CNN. The experimental results show that the precision of the difficult sample can be improved effectively, and the total precision can be improved. Dai et al. [20] chose the fast and efficient lightweight YOLO v5 as the base network. The robustness of the model was improved by repetitive broadening, focusing, and smoothing of BCE strategies, and the imbalance of positive-negative sample ratio was resolved. Images of 59 crop disease types across 10 crops showed an average identification accuracy of 94.24%, an average reasoning time of 1.563 milliseconds per sample, and a model size of just 2 MB. Compared with the original model, the model size was reduced by 88% and the inference time was reduced by 72%. Cao et al. [21] proposed a high performance algorithm, which realizes real-time detection and traffic statistics by adjusting network structure, optimizing loss function, and introducing weight regularization. The experimental results show that the YOLO-UA model has high accuracy, precision and recall rate for different weather scenarios, and has good robustness with low impact on scenario and weather variation. Liu et al. [22] proposed a Cross Stage Partial Dense YOLO (CSPD-YOLO) model based on YOLO-v3 and the Cross Stage Partial Network. The CSPD-YOLO model uses the characteristic pyramid network and improved loss function to improve the accuracy of insulation fault detection. The average accuracy of the CSPD-YOLO model was 4.9% and 1.8% higher than that of YOLO-v3 and YOLO-v4, respectively.
To sum up, in the existing methods and literature, the Focal loss function is introduced to solve the uneven distribution of positive and negative samples and difficult and easy samples. However, for the two weight factors in the Focal loss function, most of them adopt the method of empirical value. There are several problems in the results of training according to the empirical method. First of all, the existing methods determine the weights based on experience, and it is difficult to adjust dynamically according to the distribution ratio of positive and negative samples, and difficult and easy samples. Secondly, the accuracy of the model trained by the recommended empirical values may not be the optimal solution. Finally, training after multiple adjustments to the two weighting factors takes a lot of time and computational resources.
In order to solve the above problems, this paper uses the YOLOv5 as the basic model, introduces the pre-trained weights trained on large-scale data sets for transfer learning, and uses the Focal loss function to replace the cross-entropy loss function. The two weighting factors in the Focal loss function are redesigned so that they can change dynamically based on the dataset. Finally, the training result of the original model is compared with the training result of the improved model. The improved detection model was also compared experimentally with Faster R-CNN, SSD, YOLOv3, and YOLOv4. The results show that the method proposed in this paper has high accuracy and short training time and can be used as a reference for the defect detection of transmission line insulators.

Introduction to Algorithm and Network Structure
The YOLOv5 [23] model was presented in 2020 and is at a relatively high level of speed and accuracy of detection. The model can choose different network depths and feature diagram widths according to the requirement of accuracy and speed. The official code of YOLOv5 provides five versions: YOLOv5n, YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x. In this study, due to the small number of insulator datasets and the high requirement for recognition speed, YOLOv5s is selected as the basic training model in this paper. Figure 1 shows the specific structure of the YOLOv5 network model.
The YOLOv5 network is mainly divided into four parts, namely input, backbone network, neck network, and output. At first, the adaptive anchor frame is calculated by K-means clustering to improve the accuracy of locating small target defect of broken insulator. Backbone is mainly used to extract image features, mainly using Focus and Cross Stage Partial (CSP) Network structures compared to previous generations. The Focus structure slices through the incoming images, reducing the computation. The CSP structure [24] is used to accelerate reasoning and speed up training while ensuring accuracy. FPN [25] and PANet [26] multi-scale feature fusion are used in the Feature Fusion Module to output detection results at different feature levels, not only improve the detection rate and the localization and recognition accuracy of each scale target, but also combine CSP structure fusion features to optimize feature fusion speed. Finally, the target box is regressed by CIOU loss function [27], and the binary cross-entropy loss function is used for classification and confidence regression. The detection output section eliminates redundant target boxes by non-maximum suppression (NMS) [28], predicts image features, generates boundary boxes, and predicts categories. YOLOv5x. In this study, due to the small number of insulator datasets and the high requirement for recognition speed, YOLOv5s is selected as the basic training model in this paper. Figure 1 shows the specific structure of the YOLOv5 network model. The YOLOv5 network is mainly divided into four parts, namely input, backbone network, neck network, and output. At first, the adaptive anchor frame is calculated by Kmeans clustering to improve the accuracy of locating small target defect of broken insulator. Backbone is mainly used to extract image features, mainly using Focus and Cross Stage Partial (CSP) Network structures compared to previous generations. The Focus structure slices through the incoming images, reducing the computation. The CSP structure [24] is used to accelerate reasoning and speed up training while ensuring accuracy. FPN [25] and PANet [26] multi-scale feature fusion are used in the Feature Fusion Module to output detection results at different feature levels, not only improve the detection rate and the localization and recognition accuracy of each scale target, but also combine CSP The loss function of YOLOV5 consists of three parts: classified loss, rectangular loss, and confidence loss, which are weighted together to form the ultimate loss function. As shown in Equation (1). Among them a, b, c is the weight of each of the three loss functions, usually, confidence loss gets the maximum weight, rectangular loss and classified loss are the next most weight.
The classification loss and confidence loss functions used by YOLOv5 are two-class cross-entropy loss, and the formula is shown in Equation (2). Where N is the total number of categories, y i is the true value of the current category and p(y i ) is the predicted probability of the current category.
Classified loss function is mainly used to predict the category of each lattice in three classification boxes. The confidence loss function is mainly used to predict the confidence of each prediction box, that is, the reliability of the prediction box. The greater the confidence value, the more reliable the prediction box is, that is, the closer to the true minimum box of the target.
Binary cross-entropy is used to judge the difference between the predicted result of a classification model and the true value. If the predicted value p(y i ) is closer to 1, then the value of the loss function should be closer to 0, that is, the smaller the difference between the predicted result and the true value, the smaller the value of the loss function. Conversely, if the predicted p(y i ) is closer to 0 at this point, that is, the greater the difference between the predicted result and the true value, the greater the value of the loss function.

Improved YOLOV5 Model
To detect defect locations of insulators more efficiently, pretrained weights are introduced in YOLOv5. Second, the Focal loss function is introduced to reduce the defect detection rate due to the imbalance of positive and negative samples, the difficulty of insulator and defect classification, and the Focal loss function is optimized by dynamically calculating the weight factor.

Pretrained Weight Transfer
At present, the target detection algorithm based on deep learning technology needs sufficient training samples to train the model on a large scale in order to achieve relatively good detection effects. In this paper, the insulator and its defect dataset are small, especially the defect sample data are smaller, so it is easy for the model to be overfitted and the accuracy and recall rate is low. Individuals have fewer computing resources and cannot conduct large-scale dataset training. Therefore, this paper uses transfer learning to learn prior knowledge from COCO128, a large-scale data set trained by companies with large computing resources. Migrating the corresponding weight parameters from COCO128 to the data set of this paper can accelerate the convergence speed of the loss function of the model, and the accuracy rate and other indicators can be quickly improved to a higher level. The main methods and applications of migration learning are shown in Table 1. Because the source dataset contains a large number of different categories and is different from the insulator defect dataset in the paper, the paper adopts a migration learning method based on feature mapping. Table 1. Scenarios for migration learning methods.

Types of Migratory Learning Methods Applicable Scenarios
Migration based on instance The source data are close to or similar to the target data Migration based on feature mapping The source data differs significantly from the target data Migration based on model parameters The source data differs less from the target data

Focal Loss
Focal loss is a loss function based on binary cross-entropy [29]. It is a dynamic scaling cross-entropy loss function that can reduce the weight of easily distinguishable samples in training. The method can help net quickly focus on difficult samples, both positive and negative, that are helpful to the training network. Its formula is shown in Equation (3).
where Loss f l is the Focal loss function, the α weight factor is used to regulate the balance between positive and negative samples, the γ weight factor is to regulate the weight balance between difficult samples. y i is the true value of the tag, 1 is a positive sample and the rest is a negative sample, p(y i ) is the predicted value output by the network model. The attenuation curve of the Focal loss function is shown in Figure 2. The smaller the difference between the predicted and true values, the smaller the loss value, indicating that the sample at this point is an easily classified sample. The greater the difference between the predicted and true values, the greater the loss value, indicating that it is more difficult for the sample to be correctly predicted at this time. When γ is 0, the loss function is the cross-entropy loss function, and the value of the loss function changes more smoothly than the predicted value. At this point, if there are a large number of easy to categorize samples, the loss of their contribution will dominate the whole loss function, making it difficult to optimize the model. When the value of γ increases gradually, the difference between the predicted value and the real value is smaller, the sample is classified easily, and the contribution of loss value is small. When the difference between the prediction and the real value is greater, the sample is difficult to classify, and the contribution of loss is large. In this way, in the process of model optimization, the loss value of the difficult sample will dominate the optimization of the model more and improve the accuracy of the difficult sample.

Focal Loss
Focal loss is a loss function based on binary cross-entropy [29]. It is a dynamic scalin cross-entropy loss function that can reduce the weight of easily distinguishable sample in training. The method can help net quickly focus on difficult samples, both positive an negative, that are helpful to the training network. Its formula is shown in Equation (3).
Loss is the Focal loss function, the α weight factor is used to regulate the balanc between positive and negative samples, the γ weight factor is to regulate the weight ba ance between difficult samples. i y is the true value of the tag, 1 is a positive sample an the rest is a negative sample, ( ) i p y is the predicted value output by the network mode The attenuation curve of the Focal loss function is shown in Figure 2. The smaller th difference between the predicted and true values, the smaller the loss value, indicatin that the sample at this point is an easily classified sample. The greater the difference be tween the predicted and true values, the greater the loss value, indicating that it is mor difficult for the sample to be correctly predicted at this time. When γ is 0, the loss functio is the cross-entropy loss function, and the value of the loss function changes mor smoothly than the predicted value. At this point, if there are a large number of easy t categorize samples, the loss of their contribution will dominate the whole loss function making it difficult to optimize the model. When the value of γ increases gradually, th difference between the predicted value and the real value is smaller, the sample is class fied easily, and the contribution of loss value is small. When the difference between th prediction and the real value is greater, the sample is difficult to classify, and the contr bution of loss is large. In this way, in the process of model optimization, the loss value o the difficult sample will dominate the optimization of the model more and improve th accuracy of the difficult sample.

Dynamic Weight Calculation
In the Focal loss function, since there is no standard value protocol for the weight factors α and γ, the values need to be selected according to the sample distribution of the dataset. The empirical values of 0.25 for α and 2 for γ were commonly used in the results presented in the literature [15]. However, empirical methods do not necessarily apply to all datasets and need to be dynamically adjusted according to the dataset itself and the variation in the distribution of positive and negative samples and difficult samples during training. By adjusting the alpha and gamma values, we can obtain a better detection network, but the disadvantage is that the alpha and gamma values require us to continuously adjust the training process, which inevitably interrupts the network training process, thus prolonging the network training time. Therefore, in the paper, the two hyperparameters α and γ are designed as parameters that can be dynamically adjusted during the network training process. α is defined as Equation (4).
where P i is the number of positive samples in each feature map. N i is the number of negative samples in each feature map. At the output end of the YOLOv5 network, three feature diagrams of different sizes are generated after each iteration and then matched with a predesigned nine prior bounding box. The feature map is the feature of different scales extracted from the image after the image input to the network is convolutional. These three feature maps of different scales are used to predict the target in the image and generate the corresponding information of the target frame.
Every three prior bounding boxes are matched with one feature map, and when the aspect ratio of the prior bounding box to the real box was lower than the threshold it was a positive sample and higher than the threshold it was a negative sample. Therefore, the number of positive and negative samples on three different feature diagrams is not the same. By calculating the number of positive and negative samples on each feature map, the proportion of positive samples in the overall sample can be calculated. In the algorithm, the ratio of positive samples to population samples on each feature map is first calculated. Second, the scales obtained on the three feature maps are summed. Finally, the result of the addition is averaged as the value of α. The ratio of positive to negative samples varies according to the characteristic diagrams generated during each iteration, and α varies according to the ratio of positive to negative samples. Positive and negative samples can be given different weights by changing α. Positive samples are given a larger weight, and negative samples are given a smaller weight.
where true is the true value of each predicted object, 0 or 1. pred is the prediction confidence. γ is mainly used for the adjustment of difficult and easy samples. The division of difficult and easy samples mainly depends on the difference between the predicted value and the actual value. In addition, the difference between the predicted value and the true value is between 0 and 1. Therefore, the design of γ in this paper is obtained by amplifying the difference between the predicted value and the actual value by the formula of the exponent e. Through the exponential formula of e, the value range of γ can be limited between 1 and 2.7. Compared with the fixed 2.0 value, it has a more flexible value space. During each iteration, the network makes probabilistic predictions for each prediction box, calculating γ by the difference between each prediction and the true value. When the predicted value is different from the real value, the result shows that the sample is difficult to classify and the calculated γ is larger, so the contribution of the difficult sample to the loss function can be improved. When the difference between the predicted value and the true value is small, the sample is classified easily and the calculated γ is small, thus reducing the contribution of the easy sample to the loss function. The dynamic Focal loss function proposed in this paper omits the manual setting process of hyperparameters. There is no need to have a deep understanding of the distribution of the dataset for training, or knowledge about the convergence of neural network training. This dynamic change method can be adjusted automatically according to the distribution calculation of the sample itself during each training, which can highlight the learning focus of different stages and converge in the direction of global optimization.

Experiment and Dataset
The transmission line insulator defect detection model is aimed at problems such as too small insulator defect size in transmission line inspection images, some insulators that are not significant in the real environment, serious overlapping and occlusion of insulators, and slow model detection. This paper proposes an improved YOLOv5 algorithm model to achieve fast and accurate detection of insulator defects during transmission line inspection. The insulator defect detection process is shown in Figure 3.
ing the contribution of the easy sample to the loss function.
The dynamic Focal loss function proposed in this paper omits the manual setting process of hyperparameters. There is no need to have a deep understanding of the distribution of the dataset for training, or knowledge about the convergence of neural network training. This dynamic change method can be adjusted automatically according to the distribution calculation of the sample itself during each training, which can highlight the learning focus of different stages and converge in the direction of global optimization.

Experiment and Dataset
The transmission line insulator defect detection model is aimed at problems such as too small insulator defect size in transmission line inspection images, some insulators that are not significant in the real environment, serious overlapping and occlusion of insulators, and slow model detection. This paper proposes an improved YOLOv5 algorithm model to achieve fast and accurate detection of insulator defects during transmission line inspection. The insulator defect detection process is shown in Figure 3.

Dataset Construction
The data on insulators and their defects in the experiment came from the China Power Line Insulator Dataset (CPLD), which contains 848 aerial and synthetic defect images taken by drones. Another part came from 333 images of defects in glass insulators collected by National Grid Power's inspection data. The number of two combined for 1181. Insulator defects usually include damage, fragmentation, missing, and other conditions. Due to the limitation of the insulator defect dataset, the paper mainly focuses on the missing insulator sheets, which is the most common insulator defect. A large amount of data is required in target detection missions, so the 1181 insulator images in the dataset are expanded to 2280 images by stochastic brightness, random shear, and random translation, and the positive angles are rotated by 10°, some of which are shown in Figure 4. For the constructed insulator defect dataset, 80% of the images in the dataset are selected as the training set, and 20% of the images are selected as the test set. Due to the different resolutions of different images in the constructed insulator dataset, in order to adapt to the resolution requirements under the YOLO framework, all images are rescaled to 640 × 640 resolution. The number of labels and label boxes of the dataset are shown in Figure 5.

Dataset Construction
The data on insulators and their defects in the experiment came from the China Power Line Insulator Dataset (CPLD), which contains 848 aerial and synthetic defect images taken by drones. Another part came from 333 images of defects in glass insulators collected by National Grid Power's inspection data. The number of two combined for 1181. Insulator defects usually include damage, fragmentation, missing, and other conditions. Due to the limitation of the insulator defect dataset, the paper mainly focuses on the missing insulator sheets, which is the most common insulator defect. A large amount of data is required in target detection missions, so the 1181 insulator images in the dataset are expanded to 2280 images by stochastic brightness, random shear, and random translation, and the positive angles are rotated by 10 • , some of which are shown in Figure 4. For the constructed insulator defect dataset, 80% of the images in the dataset are selected as the training set, and 20% of the images are selected as the test set. Due to the different resolutions of different images in the constructed insulator dataset, in order to adapt to the resolution requirements under the YOLO framework, all images are rescaled to 640 × 640 resolution. The number of labels and label boxes of the dataset are shown in Figure 5.
Use the image tagging tool called Labelme to tag all images. Insulators are labeled insulator, defect location is labeled defect, and the number of each labeling category is shown in Table 2. The production of the dataset was completed by the above-mentioned work.    Use the image tagging tool called Labelme to tag all images. Insulators are labeled insulator, defect location is labeled defect, and the number of each labeling category is shown in Table 2. The production of the dataset was completed by the above-mentioned work.

Deep Learning Environment Configuration
The hardware environment configuration of this experiment is shown in Table 3. The software environment is: Windows11, Pytorch1.1, Python3.7, CUDA11.2. The batchsize of the model is 32, the initial learning rate is 0.01, the weight decay coefficient is 0.0005, and the total number of epochs is 100.

Evaluation Metrics
The experimental results were evaluated using precision (P), recall (R), and mean average precision (mAP). Accuracy rates were used to measure the classification effect between different categories of the overall model, recall rates were used to measure the overall effect of the model across different categories, and mean accuracy was used to measure the overall accuracy of the model. The formula is as follows:

Deep Learning Environment Configuration
The hardware environment configuration of this experiment is shown in Table 3. The software environment is: Windows11, Pytorch1.1, Python3.7, CUDA11.2. The batchsize of the model is 32, the initial learning rate is 0.01, the weight decay coefficient is 0.0005, and the total number of epochs is 100.

Evaluation Metrics
The experimental results were evaluated using precision (P), recall (R), and mean average precision (mAP). Accuracy rates were used to measure the classification effect between different categories of the overall model, recall rates were used to measure the overall effect of the model across different categories, and mean accuracy was used to measure the overall accuracy of the model. The formula is as follows:

Comparison of Transfer Learning
The pre-training weights have a huge impact on the model detection performance. From the experimental results in Table 4, It can be seen that the direct use of the Focal loss function without pretrained weights will greatly reduce the accuracy of the model. The main reason is that the model focuses on positive and hard samples without any base weights. However, the model's excessive attention to positive and difficult samples reduces the learning of easy samples, resulting in poor accuracy of easy-to-separate samples. Due to the lack of basic weights, the overemphasis on difficult samples reduces the overall accuracy of the model. When the transfer learning does not use the Focal loss function, the main learning focus is on the insulator samples with a large number of labels, and the learning is lacking for a small number of defect samples, so the defect accuracy is not as good as the insulator accuracy. On the basis of transfer learning, the Focal loss function is introduced into YOLOv5, which can effectively learn the characteristics of the dataset. The Focal loss function shifts the focus of the model to positive and difficult samples, effectively improving the accuracy and recall of defect recognition.

Comparison of Weighting Factors
Different values of α and γ in the Focal loss function will have a greater impact on the model training process, resulting in differences in detection performance. Table 5 shows the influence of the YOLOv5 model on the accuracy and recall of insulators and their defects detection under different α values. Table 6 shows the influence of the YOLOv5 model on the accuracy and recall of insulators and their defects detection under different γ values. It can be seen from the experimental results that when α is set to 0.5, mAP reaches 98%, which is better than the 0.25 value suggested by the literature. When γ is set to 1.1, mAP reaches 97.7%, which is also better than the value of 2.0 suggested by the literature. Through the analysis of the data set samples, the results of the experiment can be improved to a certain extent by setting the values of α and γ reasonably.  However, each time the value is taken, new training of the model is required. A good weight value can be obtained through a large number of comparative experiments, which requires a lot of hard and soft resources and time. Figure 6 records the changing trends of α and γ during the training process (where α and γ are the mean values in each round of training). It can be seen from Figure 6a that the value range of α in this paper is calculated to be stable from 0.447 to 0.449 according to the number of positive and negative samples. It is close to the α value of 0.5 for the optimal effect in Table 5. Prove the validity of the dynamic α value. The value of γ in Figure 6b shows a trend of gradual decay. The network recognition effect is poor in the early stage, and the gap between the prediction and the true value is large, so the value of γ is large. In the later stage, with the emphasis on difficult samples, the network recognition effect is good, and the prediction effect is good. The gap between the prediction and the true value is small, and there are few difficult samples, so the γ value is small. The decay of the γ value reflects the process of difficult samples from more to less, which proves the effectiveness of the Focal loss algorithm with dynamic weights.  Figure 6 records the changing trends of α and γ during the training process (where α and γ are the mean values in each round of training). It can be seen from Figure 6a that the value range of α in this paper is calculated to be stable from 0.447 to 0.449 according to the number of positive and negative samples. It is close to the α value of 0.5 for the optimal effect in Table 5. Prove the validity of the dynamic α value. The value of γ in Figure 6b shows a trend of gradual decay. The network recognition effect is poor in the early stage, and the gap between the prediction and the true value is large, so the value of γ is large. In the later stage, with the emphasis on difficult samples, the network recognition effect is good, and the prediction effect is good. The gap between the prediction and the true value is small, and there are few difficult samples, so the γ value is small. The decay of the γ value reflects the process of difficult samples from more to less, which proves the effectiveness of the Focal loss algorithm with dynamic weights.  Table 7 shows the comparison results between the method of dynamically changing weights proposed in this paper and the best results of multiple empirical values of Focal  Table 7 shows the comparison results between the method of dynamically changing weights proposed in this paper and the best results of multiple empirical values of Focal loss. It can be seen from Table 7 that the overall mAP value of the YOLOv5 model using the dynamic weight calculation method is higher than that of the empirical method. The mAP is increased by 0.3%, the accuracy and recall rate of defects remain similar to the empirical value method, and the detection accuracy and recall rate of insulators are increased by 1.2% and 0.7%, respectively. However, the dynamic value calculation method only needs one training to obtain an effect similar to the optimal value in the comparison results of multiple empirical values, which saves a lot of time and computing resources.

Comparison of Results of Other Algorithms
The comparison results of different algorithms are shown in Table 8. Compared with other different algorithms, Faster R-CNN has a higher recall rate for insulators and defects, but a lower detection accuracy. SSD has high accuracy for insulator and defect detection, but low recall. The mAP value of YOLOv3 is high, but the model volume is too large. The mAP value of YOLOv4 is close to YOLOv5, but the volume is slightly larger than YOLOv5. YOLOv5 achieves a balance of mAP and volume with the first four algorithms. Compared with Faster R-CNN, SSD, YOLOv3, YOLOv4, and YOLOv5, the improved algorithm proposed in this paper improves the overall mAP by 2.5%, 7.4%, 2.6%, 5.9%, and 5.7%, respectively. Our algorithms can especially improve the accuracy and recall rate of difficult-to-detect insulator defects.

Visualization of Experimental Results
In order to prove the real detection effect of the improved algorithm in this paper, the improved model is used for actual testing, as shown in Figure 7. The insulators are marked with a red frame, and the defects are marked with a pink frame. It can be seen from the visualization results that the model using the cross-entropy loss function has better detection ability and higher confidence for insulators, but no defects and small insulators are detected. The paper introduces a model of the Focal loss function, which reduces the confidence, but detects difficult defect parts and small insulators. It is proven that the model in this paper has a good detection effect on samples that are difficult to detect and classify.

Conclusions
In order to better detect the defects of transmission line insulators, this paper proposes an improved YOLOv5 model based on Focal loss. First, we use the CPLD data set and the National Grid Power's inspection data to form a mixed data set, and then expand the data set through a variety of data enhancement methods. Secondly, transfer learning is performed with pre-training weights trained on large-scale data sets, which reduces the training time and improves the convergence speed of the model. Finally, the Focal loss function is introduced into the loss function and the weight factor is redesigned. The weight factor is dynamically updated according to the ratio of positive and negative samples in the feature diagram and the difference between the actual value and the predicted value in each training. By adjusting the loss function, the new algorithm increases the overall contribution of positive samples and difficult samples, makes the model more focused on insulator defects, and improves the model's ability to detect insulator defects. From the experimental results, it can be concluded that the average accuracy of the algorithm in this paper can reach 98.3%, which is 5.7 percentage points higher than that of the original model, especially since the accuracy of defective insulators has been greatly improved.

Conclusions
In order to better detect the defects of transmission line insulators, this paper proposes an improved YOLOv5 model based on Focal loss. First, we use the CPLD data set and the National Grid Power's inspection data to form a mixed data set, and then expand the data set through a variety of data enhancement methods. Secondly, transfer learning is performed with pre-training weights trained on large-scale data sets, which reduces the training time and improves the convergence speed of the model. Finally, the Focal loss function is introduced into the loss function and the weight factor is redesigned. The weight factor is dynamically updated according to the ratio of positive and negative samples in the feature diagram and the difference between the actual value and the predicted value in each training. By adjusting the loss function, the new algorithm increases the overall contribution of positive samples and difficult samples, makes the model more focused on insulator defects, and improves the model's ability to detect insulator defects. From the experimental results, it can be concluded that the average accuracy of the algorithm in this paper can reach 98.3%, which is 5.7 percentage points higher than that of the original model, especially since the accuracy of defective insulators has been greatly improved.