SDA-YOLO: Multi-Scale Dynamic Branching and Attention Fusion for Self-Explosion Defect Detection in Insulators

Yang, Zhonghao; Xu, Wangping; Chen, Nanxing; Chen, Yifu; Wu, Kaijun; Xie, Min; Xu, Hong; Zheng, Enhui

doi:10.3390/electronics14153070

Open AccessArticle

SDA-YOLO: Multi-Scale Dynamic Branching and Attention Fusion for Self-Explosion Defect Detection in Insulators

by

Zhonghao Yang

,

Wangping Xu

,

Nanxing Chen

,

Yifu Chen

,

Kaijun Wu

,

Min Xie

,

Hong Xu

and

Enhui Zheng

^*

School of Mechanical and Electrical Engineering, China Jiliang University, Hangzhou 310018, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(15), 3070; https://doi.org/10.3390/electronics14153070

Submission received: 1 July 2025 / Revised: 26 July 2025 / Accepted: 30 July 2025 / Published: 31 July 2025

Download

Browse Figures

Versions Notes

Abstract

To enhance the performance of UAVs in detecting insulator self-explosion defects during power inspections, this paper proposes an insulator self-explosion defect recognition algorithm, SDA-YOLO, based on an improved YOLOv11s network. First, the SODL is added to YOLOv11 to fuse shallow features with deeper features, thereby improving the model’s focus on small-sized self-explosion defect features. The OBB is also employed to reduce interference from the complex background. Second, the DBB module is incorporated into the C3k2 module in the backbone to extract target features through a multi-branch parallel convolutional structure. Finally, the AIFI module replaces the C2PSA module, effectively directing and aggregating information between channels to improve detection accuracy and inference speed. The experimental results show that the average accuracy of SDA-YOLO reaches 96.0%, which is higher than the YOLOv11s baseline model of 6.6%. While maintaining high accuracy, the inference speed of SDA-YOLO can reach 93.6 frames/s, which achieves the purpose of the real-time detection of insulator faults.

Keywords:

YOLOv11; insulator self-explosion defect; target detection; deep learning

1. Introduction

The safe operation of power transmission lines is an important guarantee for ensuring stable power supply and use, which is closely related to people’s lives and social development. Insulators are an important component of transmission lines, playing a role in electrical insulation and mechanical support in power transmission. The performance of insulators and the rationality of their configuration will directly affect the safe and stable operation of the line [1]. However, insulators exposed to the external environment for an extended period of time will develop fractures, self-explosion, flashover discharge, and other issues due to the complex interaction of the natural environment and other elements. These issues will impact the transmission system’s ability to operate safely and steadily. They could cause the transmission line to trip, cause extensive blackouts, damage equipment, or even cause a large-scale power system failure, which would have a negative social and economic impact. As a result, finding insulator flaws has grown in importance within power inspection. In electric power inspection, the traditional manual inspection mainly involves manual climbing of the power tower through the observation of manual records of each component. However, this inspection method results in a high workload, low detection efficiency, easy-to-make mistakes and other problems, and inspection personnel in the climbing tower operation. There are also specific safety hazards. The use of UAVs to perform inspection tasks has become the standard method of electric power inspection in recent years due to the advancement of UAV intelligent inspection [2] technology. When operated by qualified professionals, the drone’s professional camera allows it to take high-quality pictures from various angles during the inspection process, which offers the benefits of increased efficiency and safety as well as lower labor costs.

After acquiring the images captured by UAV inspection, the images need to be processed, i.e., target detection. Traditional machine learning methods rely on manual feature extraction (e.g., Hough [3], SIFT [4], and GLCM [5]) and classical classifiers (e.g., SVM [6] and Adaptive Boosting [7]). However, the contour features of insulator self-destruct defects cannot be efficiently and precisely extracted using conventional machine learning techniques because of the intricacy of the transmission line background (which frequently contains interference such as cables, trees and plants, and buildings) and the blurring of small target images. Deep learning-based target detection algorithms have emerged as the primary target detection realization due to their superior feature extraction capabilities, increased detection accuracy in complicated situations, and superiority over older machine learning techniques.

Deep learning-based target identification techniques are now categorized into two basic types: one-stage detection algorithms and two-stage detection algorithms. Before executing classification and bounding box regression on each candidate region, the two-stage target identification algorithm initially generates candidate boxes within the region. Subsequently, it extracts the features of each candidate box. Regions with CNN features (R-CNN) [8] originally introduced the two-stage object detection methodology in 2014. Prominent models, including Fast R-CNN [9], Faster R-CNN [10], and Mask R-CNN [11], were introduced sequentially. Previous researchers frequently enhanced the algorithm for two-stage target detection to achieve greater accuracy. Zhao et al. [12] enhanced the feature pyramid network. They implemented it in Faster R-CNN for insulator defect detection, enhancing the accuracy of identifying flaws such as ruptured or dislodged insulators. Zhou et al. [13] improved Mask R-CNN by incorporating an attention mechanism into the backbone network, thus augmenting the model’s feature representation capabilities and enhancing its focus on small-sized targets, which subsequently increases detection accuracy. Lu et al. [14] incorporated GIoU and Soft-NMS into Faster R-CNN, effectively improving the localization accuracy and detection recall of insulators in aerial images. Tang et al. [15] refined the network architecture based on Faster R-CNN and employed RoIAlign for candidate area alignment, significantly enhancing the model’s accuracy and robustness in insulator defect detection. The two-stage target detection method, while exhibiting higher accuracy, possesses a more intricate model structure that leads to reduced inference speed, hence complicating its ability to fulfill real-time detection requirements.

One-stage target identification algorithms perform category categorization and bounding box regression on the entire image. To enhance detection accuracy while preserving rapid detection speed, numerous contemporary studies have focused on one-stage target detection methodologies. Prominent models for one-stage target detection encompass You Only Look Once (YOLO) [16], Single-Shot Multibox Detector (SSD) [17], and RetinaNet [18], among others. YOLO is the quintessential one-stage object detection method, initially introduced by Joseph Redmon and colleagues in 2016. Since its inception, YOLO has undergone continual iterations, including YOLOv3 [19], YOLOv5, YOLOv8, and YOLOv11 [20]. Liu et al. [21] enhanced the YOLOv3 network by using a Dense-Block structure to address insulator detection in intricate backgrounds, thereby augmenting detection accuracy via multi-scale feature fusion. Han et al. [22] proposed an enhanced lightweight Tiny-YOLOv4 algorithm that integrates the self-attention mechanism with ECA-Net to decrease model complexity substantially. The technique attains a detection speed of 94 frames per second while preserving great accuracy. Wang et al. [23] proposed an enhanced YOLOv5-based approach, ML-YOLOv5, which ensures efficient and precise insulator defect detection in real-time by using depth-separable convolution and an improved feature fusion module, among other innovations. Zhang et al. [24] introduced the enhanced BaS-YOLOv5 model, which incorporates BiFPN and SimAM modules to augment multi-scale feature fusion and representation capabilities, thereby significantly improving insulator defect detection accuracy and fulfilling real-time power inspection demands. He [25] and colleagues suggested an enhanced MFI-YOLO model derived from YOLOv8 for the identification of multi-type insulator faults in intricate backgrounds. The integration of MSA-GhostBlock and ResPANet modules enhances feature extraction and multi-scale fusion, thereby significantly improving the detection accuracy of faults such as self-explosion, breakage, and flashover. Zhang et al. [26] proposed an insulator defect detection algorithm utilizing an enhanced YOLOv8s framework, incorporating the MLKA Attention Module and the lightweight C2f-GSC structure to augment feature extraction and detection efficiency, while employing the SIoU loss function to optimize model performance, balancing accuracy and real-time capabilities.

In the current task of detecting self-explosion defects in insulators, existing models encounter numerous problems. The two-stage target detection process results in more complex architecture and reduced inference performance due to the inclusion of processes like candidate frame creation. In contrast to two-stage target detection, one-stage target detection features a more straightforward structural hierarchy and offers superior inference speed. Nonetheless, the detection accuracy for small-sized faults, such as self-explosion defects, is inadequate. During electric power inspections, photos captured by UAVs will feature intricate background components, like towers, foliage, and buildings, which may hinder model recognition and impede the model’s ability to concentrate on areas of self-explosion flaws. To solve the above problems, this paper proposes an improved network SDA-YOLO based on YOLOv11s.

This paper’s primary contributions are summarized as follows.

To augment the model’s detection capability for diminutive targets, a supplementary small object detection layer (SODL) is incorporated into YOLOv11s, which amalgamates shallow and deep features to enhance the emphasis on small target characteristics. Additionally, the rotating target detection head (OBB) is devised in the HEAD section to refine detection efficacy for small-sized targets.
Integrating the DBB module into the C3k2 module within the YOLOv11s model backbone enables feature extraction via a multi-branch parallel convolutional architecture, enhancing the model’s capacity to detect targets of varying scales and increasing detection accuracy in intricate environments.
The AIFI module is employed to substitute the C2PSA module in the YOLOv11s model backbone; this approach facilitates the guidance and aggregation of information across channels, thus enabling the model to concentrate on critical regions and diminish superfluous characteristics. It can significantly enhance detection accuracy and inference speed without altering the model’s computational requirements.

2. Related Work

2.1. YOLOv11 Algorithm

YOLOv11 is a new one-stage target detection algorithm released by Ultralytics in 2024, and its model structure is shown in Figure 1. In comparison to other prevalent baseline models, YOLOv11s has fewer parameters and reduced computing expense, and it remains competitive regarding detection accuracy. Also, YOLOv11s incorporates an enhanced FPN/PAN architecture for superior target feature extraction. The modular and scalable architecture of YOLOv11s establishes a solid basis for future model enhancements and improved detection efficacy, while also being more conducive to lightweight processing and deployment of the models. Consequently, YOLOv11s serves as the baseline model for improvement in this paper.

The YOLOv11 model comprises three primary components: backbone, neck, and head. The backbone captures features from the input image and produces multi-scale feature maps for the subsequent neck and head components. The backbone architecture of YOLOv11 comprises several Conv, C3k2, SPPF, and C2PSA components. The Conv module comprises Conv2d, BatchNorm2d, and SiLU activation functions for fundamental feature extraction in the image. The C3k2 module is pivotal in YOLOv11, which is founded on the C3 architecture from YOLOv5 and integrates design concepts from the ELAN module of YOLOv7 [27], leading to additional optimization and refinement. Figure 2 illustrates the unique architecture, indicating true and false to denote the utilization of residual connections in C3k2. In contrast to the conventional C3 module, C3k2 employs numerous small convolutional kernels to substitute certain standard convolutions, hence enhancing the detection of small targets. Additionally, it incorporates many parallel branches to extract features at various scales, hence optimizing accuracy while maintaining a lightweight design.

The SPPF module is a spatial pyramid pooling layer designed for extracting feature information at various image scales. Significantly, YOLOv11 incorporates the C2PSA module near the conclusion of the backbone. This module integrates cross-stage connectivity with spatial attention, hence enhancing the model’s ability to detect intricate scenarios. The neck structure adheres to the same framework as the feature pyramid network [28] (FPN) and Path Aggregation Network [29] (PAN) for the integration of feature information across various depths. The structure significantly improves the representation of multi-scale targets via an up–down bidirectional information transfer mechanism, particularly in small target recognition and complicated sceneries, resulting in more steady and accurate performance. The head component is tasked with integrating the feature information obtained from the backbone and neck for target categorization and positional regression. The detection head employs a decoupled design and integrates Depthwise Separable Convolution [30] (DWConv) to enhance efficiency and efficacy. The detection head separates the classification and regression branches to independently acquire category information and location features, thereby minimizing mutual interference among various tasks.

2.2. Oriented Bounding Box (OBB)

In traditional target detection tasks, a Horizontal Bounding Box (HBB) is commonly used to label and regress the location of targets. Although this method is widely used and relatively simple in data annotation, when the detection target background is complex or dense, for example, the insulator strings in the process of electric power inspection often have obvious directional and non-horizontal distribution characteristics, this makes the HBB often contain too much irrelevant background in the labeling of this type of target or contain overlapping other targets in the labeling, resulting in the decline of the detection accuracy and inaccurate localization. Gottschalk [31] and others proposed the OBB-Tree framework, which lays the foundation for subsequent rotating target detection. In YOLO detection, we adopt the Oriented Bounding Box (OBB) mechanism, which uses rotated box labeling when labeling to achieve an accurate description of the actual shape of the target. As illustrated in Figure 3, the OBB method exhibits a closer alignment with the geometric configuration of the actual insulator self-explosion defective target than the traditional HBB method. This alignment effectively diminishes background interference and is adept at recognizing small targets with varying orientations or overlapping presence.

The OBB eight-parameter model is a rotating frame articulated through four vertex coordinates, denoted in the format (x1, y1, x2, y2, x3, y3, x4, y4). The IoU computation for the OBB differs from that of the HBB, often achieved through the ratio of polygonal intersection and union. Let the prediction frame be designated as A and the actual frame as B; the area is denoted as Area. The formula for calculating the Intersection over Union (IoU) for Oriented Bounding Boxes (OBB) is as follows:

{IoU}_{OBB} = \frac{Area (A \cap B)}{Area (A \cup B)}

(1)

2.3. Small Object Detection Layer (SODL) [32]

In the backbone of YOLOv11, the resolution of the supplied image decreases from 640 to 20 following five sequential downsamplings and contains a total of five detection scales: P1-320 × 320, P2-160 × 160, P3-80 × 80, P4-40 × 40, and P5-20 × 20. However, due to the minor insulator defects and the picture containing complex background factors, YOLOv11 downsampling will inevitably lose some of the features. Moreover, the default minimum detection layer of YOLOv11 starts from P3; when the detection target is the insulator self-destruct defects, it does not have more feature information, so it is more challenging to extract practical features, and ultimately, it is simple to overlook the identification of erroneous detection.

Regarding the issue of inadequate precision in the self-explosion of insulator flaws, we add a small object detection layer (SODL) in YOLOv11, which can effectively fuse the shallow and deep features. The data image is downsampled only twice, which can effectively obtain its rich edge and texture information. Meanwhile, after the up-sampling operation in Neck, the features of different layers can be further fused to enhance the interaction of multi-scale information.

2.4. Diverse Branch Block (DBB)

DBB [33] is a structural enhancement module that derives its core idea from structural re-parameterization, first proposed in RepVGG [34]. Structural re-parameterization seeks to enhance model performance without augmenting inference delay by reconfiguring the network architecture to exhibit distinct representations during training and inference phases. This is demonstrated by integrating many convolutional layers to create a multi-branch architecture during the training phase. The multi-branch architecture is restructured during the inference phase into a 3 × 3 convolutional layer. This method streamlines the inference process while augmenting the model’s expressive capacity via a complicated structure throughout the training phase.

Insulator self-explosion defect detection faces many challenges, including the problems of small defect sizes, complex backgrounds, and high detection accuracy requirements. Based on the above principle, we can apply it to the YOLOv11 framework. The specific operation is as follows: use the DBB module to substitute the Conv module in the bottleneck within C3k2. The bottleneck consists of a single-branch convolution, using the DBB fusion idea; the design of a four-branch structure to replace the original single-branch structure of the bottleneck; and finally, superimposing by the SiLU activation function output. The formula is shown below:

Y_{1} = BN ({Conv}_{1 \times 1} (X))

(2)

Y_{2} = BN ({Conv}_{3 \times 3} (X))

(3)

Y_{3} = BN ({Conv}_{3 \times 3} ({Conv}_{1 \times 1} (x)))

(4)

Y_{4} = BN ({AvgPool}_{3 \times 3} ({Conv}_{1 \times 1} (x)))

(5)

Y = SiLU (\sum_{i = 1}^{4} Y_{i})

(6)

where BN stands for the normalization layer, Conv for the convolution operation, AvgPool for the average pooling operation, SiLU for the activation function, X for the module’s input features, Y_i (i = 1, 2, 3, 4) for the output features of the four branches, and Y for the total output features.

The four branches are the 1 × 1 branch, 3 × 3 branch, 1 × 1_3 × 3 branch, and 1 × 1_AvgPool [34] branch, which consist of convolution and BatchNorm in each branch. The 1 × 1 branch is used to adjust the channel dimensions and extract the local features; the 3 × 3 branch is utilized to improve the model’s perception of spatial information; the 1 × 1_3 × 3 branch combines the channel compression and the advantages of spatial feature extraction; and the 1 × 1_AvgPool [35] branch strengthens the ability to capture global information. To increase the inference speed, the SiLU activation function also outputs a 3 × 3 convolutional layer, which is created by equivalently rearranging the four branches mentioned before. Figure 4 displays the DBB structure:

Integrating the DBB module into C3k2 enables the model to capture features at various scales in the insulator image via a multi-branch structure during the training phase, thereby enhancing the identification of insulator self-explosion flaws and improving detection accuracy. Furthermore, during the inference phase, the equivalent reorganization property of the DBB module can significantly diminish the computing complexity and guarantee the model’s efficacy in real-world applications.

2.5. Adaptive Interaction Feature Integration (AIFI)

In comparison to YOLOv8, YOLOv11 incorporates C2PSA near the conclusion of the backbone; nonetheless, the module fusion approach remains largely static. Defects of insulator self-explosion frequently manifest in intricate environmental contexts (e.g., towers, sky, and forests), where the target region possesses indistinct boundaries and diminutive dimensions, necessitating the detection model to exhibit both local detail recognition and global structural awareness.

The AIFI module was initially introduced in the RT-DETR [36]. This module constitutes a component of the efficient hybrid encoder inside the RT-DETR model, designed to enhance the efficiency and efficacy of feature extraction by the incorporation of internal scale feature interaction utilizing the attention mechanism. The AIFI module employs the Adaptive Interaction Mechanism [37], enabling dynamic adjustments to the feature fusion strategy based on the contextual information of the input image, thereby facilitating more effective integration of multi-scale feature information and exhibiting enhanced robustness and generalization in complex scenarios.

Figure 5 illustrates the operational principle of the AIFI module.

A feature map of dimensions B × C × H × W is provided as input, where B denotes the batch size, C indicates the number of channels, and H and W represent the spatial dimensions. Second, the input feature map is flattened into a two-dimensional feature sequence form through processing, where the feature sequences contain a total of H × W positions, and each position is a C-dimensional vector. While generating feature sequences, to retain the spatial position information, 2D sine–cosine position coding [38] is added to each spatial position to form a position coding of the same length as the feature sequences. The 2D sine–cosine position coding is formulated as follows:

P E_{(x, y), 2 k} = sin (\frac{x}{10, 000^{\frac{2 k}{d}}})

(7)

P E_{(x, y), 2 k + 1} = cos (\frac{x}{10, 000^{\frac{2 k}{d}}})

(8)

P E_{(x, y), 2 k + d / 2} = sin (\frac{y}{10, 000^{\frac{2 k}{d}}})

(9)

P E_{(x, y), 2 k + 1 + d / 2} = cos (\frac{y}{10, 000^{\frac{2 k}{d}}})

(10)

where

PE

represents the position encoding,

k

signifies the kth channel in the dimension, and

d

indicates the overall dimension of the position encoding. Subsequently, the two types of codes are transmitted together to the TransformerEncoder Block, which is sequentially processed through five layers, namely LayerNorm, Multi-head Attention, Residual + LayerNorm, Feedforward Net, and Residual + LayerNorm, to obtain a fused 2D feature map, and finally, after reshaping operation, the dimensions B × C × H × W are restored to form a new fused feature.

Based on the above principle, the AIFI module can enhance the spatial modeling capability of the model by adding positional coding and adapting to the fuzzy and dispersed characteristics of insulator self-explosion defects in complex backgrounds. Compared with the C2PSA module, the AIFI module is more likely to capture the weak defect features, effectively establish the context dependency relationship, and improve the detection accuracy and robustness, which is especially suitable for complex inspection image scenarios.

3. Model Improvement

To enhance the UAV capability to identify insulator self-explosion flaws during power inspections, we can use the SDA-YOLO model, the construction of which is illustrated in Figure 6.

Initially, we incorporate a minor target detection layer P2 to enhance the shallow detail extraction capabilities of the data and integrate it with deep features, hence augmenting the model’s ability to recognize self-explosion faults in small targets. Secondly, the C3k2_DBB module is utilized on the backbone to supplant the original C3k2 module. The DBB fusion process is implemented to enhance the diversity of the model’s feature extraction for self-explosion defects by replacing the original single-branch convolution structure with a multi-branch parallel convolution structure. The parallel branches are consolidated into a singular convolutional branch during the inference phase to enhance inference speed. The AIFI module ultimately supersedes the original C2PSA module on the backbone. AIFI has stronger feature fusion and expression capability through the adaptive establishment of interactions between multiple channels and spatial regions, and it has a better perception of the local details and global structure, especially in the complex background scene, demonstrating a better performance of small target recognition. In the head section, OBB is used to label and regress the target’s position, which better fits the detection target, thus significantly improving the accuracy of target localization and boundary fitting effect.

4. Experiment

4.1. Dataset

The picture data utilized in this thesis are sourced from State Grid Zhejiang Power Supply Company (Hangzhou, China). The images are taken by the UAV during its daily inspection, in which the background of the inspection includes mountains, forests, sky, town buildings, power pylons, etc., and the weather conditions include sunny and cloudy days. After data augmentation processing, there are 2962 images of power tower insulators in total. The image data are labeled using the RoLabelImg annotation tool and randomly divided into training sets, validation sets, and test sets according to the ratio of 7:1:2 as shown in Table 1.

4.2. Experimental Environment and Experimental Parameters

The experiment was performed using the Ubuntu 20.04 operating system with Python 3.10, Pytorch 2.2.2, and CUDA 12.1. The NVIDIA GeForce RTX 3060 was utilized for training, validation, and testing purposes.

In this experiment, the resolution of the preprocessed image was set to 640 × 640, the epoch was set to 300, workers were set to 4, and the batch size was fixed at 16. The model optimizer employed Stochastic Gradient Descent (SGD), with a learning rate of 0.01, with momentum and weight decay configured at 0.937 and 0.0005, respectively.

4.3. Evaluation Index

This section introduces several critical metrics for a comprehensive evaluation of model performance, including Precision (

P

), Recall (

R

), Average Precision (

AP

), mean Average Precision (mAP), Frames Per Second (FPS), Parameters, and Giga Floating Point Operations Per Second (GFLOPs).

P

,

R

,

AP

, and mAP are significant metrics for assessing target detection accuracy, calculated as follows:

P = \frac{T P}{T P + F P}

(11)

R = \frac{T P}{T P + F N}

(12)

A P = \int_{0}^{1} p (r) d r

(13)

mAP = \frac{1}{N} \sum_{i = 1}^{N} A P_{i}

(14)

The

P

is the proportion of samples predicted by the model to belong to the positive category that really belongs to the positive category, serving mainly as an indicator of model misdetection. True Positives (

TP

) refer to the count of samples accurately identified as positive by the model, whereas False Positives (

FP

) denote the count of samples erroneously classified as positive by the model. The

R

signifies the ratio of all positive samples that were accurately identified by the model, primarily utilized to assess the model’s missed detections. The False Negatives (

FN

) is the quantity of data from positive categories that the model erroneously classifies as samples from negative categories. The

AP

signifies the average precision at various recall rates, indicating the precision value corresponding to the recall rate

r

. The mAP signifies the mean of the

AP

for each category, reflecting the model’s proficiency in identifying various target categories, while

N

symbolizes the total number of categories.

The FPS indicates the number of frames processed per second by the model, signifying its inference speed. Parameters indicate the quantity of model parameters, signifying complexity and storage demands. The GFLOPs indicate the number of floating-point operations necessary for the model to execute a single forward inference, illustrating the model’s computational complexity.

4.4. Experimental Results

To validate the superiority of the SDA-YOLO model, we perform tests on this dataset and compare it with other prominent models from the YOLO series. The findings of the comparison are presented in Table 2 (OBB detection is used in the following experiments).

Based on the data comparison in the above table, we can see that the SDA-YOLO model performs well.

For the core metric mAP, our SDA-YOLO model has a significant improvement. Compared with YOLOv5n, YOLOv5s, and YOLOv8n, the mAP of SDA-YOLO improves by 14.9%, 8.5%, and 8.4%, respectively. Compared to YOLOv8s, where Params and GFLOPs are not very different, SDA-YOLO has a 7.5% higher mAP. Compared to YOLOv11n and YOLOv11s, SDA-YOLO achieves 8.8% and 6.6% improvement, respectively.

For R, SDA-YOLO also provides a large improvement, significantly reducing the probability of missed detection. Compared with YOLOv5n, YOLOv5s, YOLOv8n, YOLOv8s, YOLOv11n, and YOLOv11s, SDA-YOLO improves R by 13.4%, 9.7%, 11.4%, 8.6%, 13.6%, and 7.8%, respectively.

For FPS, after improving on the baseline model, the number of parameters of SDA-YOLO is increased, and the complexity of its computation will be increased accordingly, which will inevitably lead to a decrease in FPS. However, the FPS of the improved SDA-YOLO can still reach 93.6 frames/s, which satisfies the criteria for real-time detection. Although the FPS of the baseline model YOLOv11s can reach 106.3 frame/s, its detection accuracy still has a significant disadvantage compared with SDA-YOLO.

To alleviate the constraints of the current dataset, we conduct data augmentation to replicate extreme weather events as illustrated in Figure 7. We perform trials with the baseline model YOLOv11s and the enhanced model SDA-YOLO in visually blocked interference settings. The experimental results depicted in Figure 8 demonstrate that SDA-YOLO substantially surpasses YOLOv11s under these conditions. In comparison to the model’s performance under non-interference situations, the accuracy decline of SDA-YOLO is comparatively minimal, suggesting that the model exhibits a degree of robustness.

By comprehensively analyzing the above indexes, the comparison experiments show that the SDA-YOLO model we designed performs excellently for insulator self-explosion defect detection.

We perform ablation experiments on this dataset to assess the validity of SODL, DBB, and AIFI inside the SDA-YOLO model. S-YOLO refers to the model that exclusively integrates the SODL module. SD-YOLO refers to the model that integrates the SODL and DBB components. SDA-YOLO refers to the model that integrates SODL, DBB, and AIFI. The experimental findings are presented in Table 3.

The ablation experiment results indicate that the mAP of S-YOLO, SD-YOLO, and SDA-YOLO improved sequentially, surpassing the baseline model YOLOv11s, with values of 91.9%, 94.5%, and 96.0%, respectively. Among them, the mAP of the S-YOLO model with the addition of SODL is increased by 2.5%, which demonstrated the increased accuracy of the model for the self-explosion region of small-sized insulators. However, due to the addition of an extra detection layer, its input high-resolution feature maps (160 × 160) make the amount of convolutional computation increase in the inference process, which leads to a significant increase in GFLOPs and a decrease in FPS to 79.3 frames/s, which reduces the model efficiency more substantially. SD-YOLO is derived from S-YOLO, utilizing the C3k2_DBB module to supplant the original C3k2 module in the backbone, with the DBB module augmenting the model’s expressive capacity through varied branching. The DBB module enhances the model’s expressive capacity by diversifying the branching architecture and markedly decreases memory usage and computational demands during the inference phase. In comparison to S-YOLO, its FPS is enhanced by around 12.5%, and its mAP is increased by 2.6%. SDA-YOLO substitutes the C2PSA module with the AIFI module derived from the SD-YOLO model, which employs an active attention mechanism and a multi-scale information fusion strategy, thereby enhancing the detection accuracy and inference velocity without substantially increasing the model parameters and GFLOPs. In comparison to SD-YOLO, the mAP is increased by 1.5%, and the FPS is risen by approximately 4.9%. Despite SDA-YOLO’s worse FPS compared to the baseline model, its mAP surpasses that of YOLOv11s by 6.6% while still fulfilling the criteria for real-time monitoring.

The ablation experiment illustrates the validity of each module in SDA-YOLO, and their integration yields significant overall performance enhancements. Figure 9 illustrates the mAP and R curves of various models in the comprehensive identification procedure of insulators and associated self-explosion flaws. The figure illustrates that, in comparison to YOLOv11s, S-YOLO, and SD-YOLO, our SDA-YOLO exhibits superior mAP and R, enhanced accuracy in real detection, and significantly decreased missed detections.

This paper employs the gradient-weighted class activation mapping (Grad-CAM) technique [39] to produce heatmaps for a more intuitive comparison of the attention areas across different models in ablation experiments for target detection, utilizing the nineteenth layer of the network (P2 detection layer).

Figure 10 illustrates the comparison of heatmaps produced by the S-YOLO, SD-YOLO, and SDA-YOLO models on the test images. The heatmaps indicate that the S-YOLO model effectively identifies insulator self-explosion defects, yet it inadequately concerns specific small-sized targets; the SD-YOLO model pays more attention to small-sized self-explosion defects than the S-YOLO model, but also pays attention to irrelevant backgrounds. Conversely, the SDA-YOLO model not only pays the highest attention to small-sized self-exploding defects but also greatly reduces the attention to irrelevant backgrounds. The results indicate that our ablation tests are practical, demonstrating that the SDA-YOLO model possesses superior feature extraction and target localization capabilities, enabling more precise identification of the insulator’s self-explosion defect area with external interference.

Figure 11 and Figure 12 show the original images and the detection results after the YOLOv11s and SDA-YOLO model. We can see from the original image in Figure 11a–c that there are insulator self-explosion defects of larger size in different backgrounds, and the rotating detection frame can fit the detected object better, effectively reducing the background pixels in the detection frame, and the SDA-YOLO model detects the defect with higher confidence. In the original images of Figure 12d–f, it can be seen that there are several small-sized insulator self-explosion defects. Due to the target’s small and fuzzy size and the influence of background factors, the YOLOv11s model detection effect is not ideal, and there is missed detection. The SDA-YOLO model can effectively fit the target and all detected, effectively proving that it performs exceptionally well in the detection of insulator self-explosion defects in complex environments, which can significantly avoid the occurrence of missed detection.

To assess the generalization capability of the SDA-YOLO method, it is compared with contemporary advanced algorithms using the DOTA dataset, with the experimental findings presented in Table 4. DOTA is an extensive dataset created for remote sensing picture target detection tasks, released in 2018 by the Institute of Automation of the Chinese Academy of Sciences and other entities. It is mainly utilized to assess the efficacy of target detection algorithms in remote sensing imagery characterized by intricate backgrounds and multi-scale targets. The dataset comprises 15 target categories and several small-sized objects that are challenging to identify, aligning with the algorithmic study presented in this work. Due to the excessive size of images in the DOTA dataset and the limited capacity of our graphics card during training, we conduct segmentation processing to generate over 12,000 images and configure the input size to 640 × 640 to evaluate the SDA-YOLO model’s efficacy in detecting small targets under low-resolution (blurry image) conditions.

The experimental data in Table 4 indicates that the mAP of the SDA-YOLO model on the DOTA dataset surpasses all other model categories, achieving 70.4%. The overall mAP is increased by 3.0% relative to the baseline model YOLOv11s. The model’s accuracy is enhanced across twelve categories, namely SH, ST, BC, GT, HA, BR, LV, SV, HE, RA, SB, and SP; the accuracy is improved on all twelve categories with AP values of 93.9%, 81.6%, 45.3%, 59.6%, 81.8%, 61.0%, 84.3%, 71.7%, 30.3%, 72.4%, 41.2%, and 71.1%, and is almost equal on two categories, PL and BD. The difficult-to-detect small target categories BC, GT, BR, HE, and SB are improved by 2.1%, 2.7%, 8.7%, 7.8%, and 3.0% compared to YOLOv11s. The results illustrate the ability of SDA-YOLO to improve small target detection accuracy with low-resolution image inputs significantly. According to the experimental findings using the DOTA dataset, we can see that SDA-YOLO also has superior performance compared to other models.

5. Conclusions

This paper constructs a dataset for inspecting self-explosion defects in power pole tower insulators, comprising 2962 RGB photos. The SDA-YOLO model is designed to identify self-explosion flaws in power pole tower insulators. A small target detection layer is incorporated into the YOLOv11s model to augment its feature extraction capabilities for minor self-exploding flaws and enhance detection accuracy. The C3k2_DBB module is subsequently employed in the backbone to supplant the original C3k2 module. This module presents the DBB mechanism to acquire intricate spatial features via a multi-branch structure, hence enhancing the efficacy of feature extraction. To enhance model identification accuracy and inference speed, the AIFI module is implemented to supplant the original C2PSA module. The AIFI module dynamically captures the correlation among multi-level features through an interaction method, hence improving information selection during feature fusion. The experimental results illustrate the validity of the SDA-YOLO model. SDA-YOLO attains the mAP of 96.0% and a detection speed of 93.6 frames per second on the self-explosion dataset, demonstrating superior performance compared to analogous models, and it still has excellent detection accuracy under interference conditions. The performance of SDA-YOLO on the DOTA dataset demonstrates its robustness and adaptability.

The dataset in this paper has some limitations and does not include other inspection pictures from different areas. The model developed in this study concentrates on the detection of self-explosion flaws in insulators and excludes the identification of other defects, such as fouling, shedding, and cracking, hence presenting certain limits in practical applications. In our forthcoming research, we will gather insulator photos from various areas to augment the dataset’s diversity and include samples of more kinds of insulator flaws to expand the model’s practical application. We intend to investigate more efficient feature extraction methods to reduce the model’s complexity while maintaining detection accuracy, and implement it on the UAV edge platform to assess the model’s inference performance and practical application.

Author Contributions

Conceptualization, Z.Y. and W.X.; methodology, Z.Y.; software, Z.Y.; validation, Z.Y., W.X., and N.C.; formal analysis, Y.C.; investigation, Z.Y.; resources, K.W.; data curation, Y.C. and M.X.; writing—original draft preparation, Z.Y.; writing—review and editing, W.X.; visualization, N.C.; supervision, K.W. and H.X.; project administration, E.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no funding.

Data Availability Statement

Upon reasonable request, the data used in this study can be provided.

Acknowledgments

The authors would like to thank the editors and anonymous reviewers for their valuable suggestions on improving this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Han, S.; Hao, R.; Lee, J. Inspection of insulators on high-voltage power transmission lines. IEEE Trans. Power Deliv. 2009, 24, 2319–2327. [Google Scholar] [CrossRef]
Jenssen, R.; Roverso, D. Automatic autonomous vision-based power line inspection: A review of current status and the potential role of deep learning. Int. J. Electr. Power Energy Syst. 2018, 99, 107–120. [Google Scholar] [CrossRef]
Ballard, D.H. Generalizing the Hough transform to detect arbitrary shapes. Pattern Recognit. 1981, 13, 111–122. [Google Scholar] [CrossRef]
Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
Haralick, R.M.; Shanmugam, K.; Dinstein, I.H. Textural features for image classification. IEEE Trans. Syst. Man Cybern. 2007, SMC-3, 610–621. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
Chen, C.; Liu, M.Y.; Tuzel, O.; Xiao, J. R-CNN for small object detection. In Proceedings of the Asian Conference on Computer Vision, Taipei, Taiwan, 20–24 November 2016; Springer International Publishing: Cham, Switzerland, 2016; pp. 214–230. [Google Scholar]
Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 1–9. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Zhao, W.; Xu, M.; Cheng, X.; Zhao, Z. An insulator in transmission lines recognition and fault detection model based on improved faster RCNN. IEEE Trans. Instrum. Meas. 2021, 70, 1–8. [Google Scholar] [CrossRef]
Zhou, M.; Wang, J.; Li, B. ARG-Mask RCNN: An infrared insulator fault-detection network based on improved Mask RCNN. Sensors 2022, 22, 4720. [Google Scholar] [CrossRef]
Lu, W.; Zhou, Z.; Ruan, X.; Yan, Z.; Cui, G. Insulator detection method based on improved Faster R-CNN with aerial images. In Proceedings of the 2021 2nd International Symposium on Computer Engineering and Intelligent Communications (ISCEIC), Nanjing, China, 6–8 August 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 417–420. [Google Scholar]
Tang, J.; Wang, J.; Wang, H.; Wei, J.; Wei, Y.; Qin, M. Insulator defect detection based on improved faster R-CNN. In Proceedings of the 2022 4th Asia Energy and Electrical Engineering Symposium (AEEES), Chengdu, China, 25–28 March 2022; pp. 541–546. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Khanam, R.; Hussain, M. YOLOv11: An overview of the key architectural enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar]
Liu, C.; Wu, Y.; Liu, J.; Sun, Z. Improved YOLOv3 network for insulator detection in aerial images with diverse background interference. Electronics 2021, 10, 771. [Google Scholar] [CrossRef]
Han, G.; He, M.; Zhao, F.; Xu, Z.; Zhang, M.; Qin, L. Insulator detection and damage identification based on improved lightweight YOLOv4 network. Energy Rep. 2021, 7, 187–197. [Google Scholar] [CrossRef]
Wang, T.; Zhai, Y.; Li, Y.; Wang, W.; Ye, G.; Jin, S. Insulator defect detection based on ML-YOLOv5 algorithm. Sensors 2023, 24, 204. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Dou, Y.; Yang, K.; Song, X.; Wang, J.; Zhao, L. Insulator defect detection based on BaS-YOLOv5. Multimed. Syst. 2024, 30, 212. [Google Scholar] [CrossRef]
He, M.; Qin, L.; Deng, X.; Liu, K. MFI-YOLO: Multi-fault insulator detection based on an improved YOLOv8. IEEE Trans. Power Deliv. 2024, 39, 168–179. [Google Scholar] [CrossRef]
Zhang, L.; Li, B.; Cui, Y.; Lai, Y.; Gao, J. Research on improved YOLOv8 algorithm for insulator defect detection. J. Real-Time Image Proc. 2024, 21, 22. [Google Scholar] [CrossRef]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Wang, W.; Xie, E.; Song, X.; Zang, Y.; Wang, W.; Lu, T.; Yu, G.; Shen, C. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8440–8449. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Wey, T.; Andreetto, M.; Adam, H. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Gottschalk, S.; Lin, M.C.; Manocha, D. OBBTree: A hierarchical structure for rapid interference detection. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, New Orleans, LA, USA, 4–9 August 1996; pp. 171–180. [Google Scholar]
Wu, K.; Chen, Y.; Lu, Y.; Yang, Z.; Yuan, J.; Zheng, E. SOD-YOLO: A high-precision detection of small targets on high-voltage transmission lines. Electronics 2024, 13, 1371. [Google Scholar] [CrossRef]
Ding, X.; Zhang, X.; Han, J.; Ding, G. Diverse branch block: Building a convolution as an inception-like unit. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2021; pp. 10886–10895. [Google Scholar]
Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. RepVGG: Making VGG-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13733–13742. [Google Scholar]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 2002, 86, 2278–2324. [Google Scholar] [CrossRef]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs beat YOLOs on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Yang, X.; Yan, J.; Feng, Z.; He, T. R3Det: Refined single-stage detector with feature refinement for rotating object. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 19–21 May 2021; Volume 35, pp. 3163–3171. [Google Scholar]
Xie, X.; Cheng, G.; Wang, J.; Yao, X.; Han, J. Oriented R-CNN for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 3520–3529. [Google Scholar]

Figure 1. The structure diagram of YOLO11.

Figure 2. The structure diagram of bottleneck, C3, C3k2. C3k2 (False) means using the bottleneck module, C3k2 (True) means using the C3 module.

Figure 3. Comparison of HBB and OBB labeling. (a) Represents HBB labeling. (b) Represents OBB labeling.

Figure 4. The DBB module training and reasoning structure.

Figure 5. The AIFI module function diagram.

Figure 6. The structure diagram of SDA-YOLO.

Figure 7. Effect pictures before and after adding interference. (a) The original image. (b) The interference image.

Figure 8. Comparison of detection accuracy under different conditions. Column A represents the detection accuracy of the SDA-YOLO model under both original and interference conditions. Column B represents the detection accuracy of SDA-YOLO and YOLOv11s models under interference conditions.

Figure 9. The mAP and R curves of different models in the detection process of insulators and self-explosion defects. (a) The total mAP curve. (b) The total R curve.

Figure 10. Comparison of heatmaps based on Grad-CAM technology. (a–d) The original image (the red box in the image is the location of the insulator self-explosion defect), the heatmaps generated by the S-YOLO model, the heatmaps generated by the SD-YOLO model, and the heatmaps generated by the SDA-YOLO model, respectively. The more reddish the color of the heat map, the more attention the model pays to the area.

Figure 11. (a–c) represent three sets of insulator inspection images of transmission lines in different background environments with obvious insulator self-explosion defects. The first column is the three original images, the second is the YOLOv11s model detection result image, and the third is the SDA-YOLO model detection result image.

Figure 12. (a–c) represent three sets of insulator inspection images of transmission lines in different background environments with insulator self-explosion defects with small and inconspicuous sizes. The first column is the three original images, the second is the YOLOv11s model detection result image, and the third is the SDA-YOLO model detection result image.

Table 1. Dataset segmentation.

Dataset	Training Set	Validation Set	Test Set
Our dataset	2073	296	593

Table 2. Comparative experiment.

Model (OBB)	P (%)	R (%)	mAP@0.5 (%)	Params (M)	GFLOPs	FPS
YOLOv5n	99.7	78.9	81.1	2.6	7.3	121.1
YOLOv5s	98.9	82.6	87.5	9.4	24.8	109.9
YOLOv8n	95.5	80.9	87.6	3.1	8.3	118.1
YOLOv8s	99.8	83.7	88.5	11.4	29.4	109.4
YOLOv11n	98.8	78.7	87.2	2.7	6.6	116.8
YOLOv11s	98.9	84.5	89.4	9.7	22.3	106.3
SDA-YOLO	98.1	92.3	96.0	10.9	30.3	93.6

Table 3. Ablation experiment results. ✓ means add the corresponding module, ✗ means do not add the corresponding module.

Model (OBB)	SODL	DBB	AIFI	mAP@0.5 (%)	Params (M)	GFLOPs	FPS
YOLOv11s	✗	✗	✗	89.4	9.7	22.3	106.3
S-YOLO	✓	✗	✗	91.9	9.8	30.2	79.3
SD-YOLO	✓	✓	✗	94.5	9.8	30.2	89.2
SDA-YOLO	✓	✓	✓	96.0	10.9	30.3	93.6

Table 4. DOTA dataset comparison experiment, where PL, SH, ST, BD, TC, BC, GT, HA, BR, LV, SV, HE, RA, SB, and SP stand for plane, ship, storage-tank, baseball-diamond, tennis-court, basketball-court, ground-track-field, harbor, bridge, large-vehicle, small-vehicle, helicopter, roundabout, soccer-ball-field, and swimming-pool.

Model (OBB)	Target Class (AP%)															mAP @0.5 (%)
Model (OBB)	PL	SH	ST	BD	TC	BC	GT	HA	BR	LV	SV	HE	RA	SB	SP	mAP @0.5 (%)
R3Det [40]	70.9	46.8	42.1	48.8	80.7	26.3	38.7	41.3	26.0	42.5	19.6	7.3	40.1	24.4	29.0	39.0
Oriented_RCNN [41]	79.3	70.0	44.5	54.7	81.1	28.1	50.4	51.4	34.5	57.2	25.7	19.2	43.3	45.3	29.4	47.6
YOLOv5n	93.4	91.5	72.3	76.8	91.4	39.5	50.2	79.6	46.6	83.0	69.0	4.7	67.0	29.0	65.2	63.9
YOLOv5s	94.8	93.5	77.1	74.9	93.6	47.0	56.4	81.9	54.6	84.2	68.0	18.6	70.3	38.6	66.7	66.7
YOLOv8n	93.5	92.0	73.1	72.8	92.5	40.5	51.2	80.5	48.3	82.4	66.8	21.7	63.2	36.5	63.9	63.9
YOLOv8s	94.6	91.2	73.0	74.0	93.4	42.8	55.5	80.3	55.8	82.4	66.6	20.2	70.0	37.9	66.7	67.0
YOLOv11n	93.1	92.3	72.1	74.8	91.9	35.3	51.9	78.5	50.2	83.5	69.4	13.5	64.2	36.7	64.9	64.8
YOLOv11s	95.2	92.7	75.9	74.4	93.2	43.2	56.9	80.2	52.3	83.3	67.9	22.5	70.2	38.2	65.6	67.4
SDA-YOLO	95.0	93.9	81.6	74.3	92.5	45.3	59.6	81.8	61.0	84.3	71.7	30.3	72.4	41.2	71.1	70.4

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, Z.; Xu, W.; Chen, N.; Chen, Y.; Wu, K.; Xie, M.; Xu, H.; Zheng, E. SDA-YOLO: Multi-Scale Dynamic Branching and Attention Fusion for Self-Explosion Defect Detection in Insulators. Electronics 2025, 14, 3070. https://doi.org/10.3390/electronics14153070

AMA Style

Yang Z, Xu W, Chen N, Chen Y, Wu K, Xie M, Xu H, Zheng E. SDA-YOLO: Multi-Scale Dynamic Branching and Attention Fusion for Self-Explosion Defect Detection in Insulators. Electronics. 2025; 14(15):3070. https://doi.org/10.3390/electronics14153070

Chicago/Turabian Style

Yang, Zhonghao, Wangping Xu, Nanxing Chen, Yifu Chen, Kaijun Wu, Min Xie, Hong Xu, and Enhui Zheng. 2025. "SDA-YOLO: Multi-Scale Dynamic Branching and Attention Fusion for Self-Explosion Defect Detection in Insulators" Electronics 14, no. 15: 3070. https://doi.org/10.3390/electronics14153070

APA Style

Yang, Z., Xu, W., Chen, N., Chen, Y., Wu, K., Xie, M., Xu, H., & Zheng, E. (2025). SDA-YOLO: Multi-Scale Dynamic Branching and Attention Fusion for Self-Explosion Defect Detection in Insulators. Electronics, 14(15), 3070. https://doi.org/10.3390/electronics14153070

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SDA-YOLO: Multi-Scale Dynamic Branching and Attention Fusion for Self-Explosion Defect Detection in Insulators

Abstract

1. Introduction

2. Related Work

2.1. YOLOv11 Algorithm

2.2. Oriented Bounding Box (OBB)

2.3. Small Object Detection Layer (SODL) [32]

2.4. Diverse Branch Block (DBB)

2.5. Adaptive Interaction Feature Integration (AIFI)

3. Model Improvement

4. Experiment

4.1. Dataset

4.2. Experimental Environment and Experimental Parameters

4.3. Evaluation Index

4.4. Experimental Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI