YOLOv11-Based UAV Foreign Object Detection for Power Transmission Lines

Gao, Depeng; Yin, Yihan; Zhang, Han; Li, Changping; Wang, Bingshu

doi:10.3390/electronics14183577

Open AccessArticle

YOLOv11-Based UAV Foreign Object Detection for Power Transmission Lines

by

Depeng Gao

¹,

Yihan Yin

²,

Han Zhang

³,

Changping Li

⁴ and

Bingshu Wang

^4,5,*

¹

School of Yonyou Digital and Intelligence, Nantong Institute of Technology, Nantong 226001, China

²

School of Resources Engineering, Xi’an University of Architecture and Technology, Xi’an 710055, China

³

School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), Northwestern Polytechnical University, Xi’an 710072, China

⁴

School of Software, Northwestern Polytechnical University, Xi’an 710129, China

⁵

Guangdong Provincial Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media Security, Shenzhen University, Shenzhen 518060, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(18), 3577; https://doi.org/10.3390/electronics14183577

Submission received: 30 July 2025 / Revised: 28 August 2025 / Accepted: 1 September 2025 / Published: 9 September 2025

Download

Browse Figures

Versions Notes

Abstract

Foreign object detection on transmission lines poses a significant threat to power grid security, while conventional manual inspection methods are inefficient and pose safety risks. To overcome the challenges of detecting foreign objects in complex environments, this paper proposes an enhanced YOLOv11_SDI detection framework with two key contributions. Firstly, a novel hierarchical Spatial-channel Dynamic Inference (SDI) module is integrated into YOLOv11, employing an adaptive feature fusion mechanism to enhance multi-scale representation. Secondly, a lightweight spatial attention unit is introduced to improve region-of-interest localization without compromising computational efficiency. In addition, the publicly available FOTL_Drone dataset is expanded to 5980 UAV images through systematic data augmentation, covering six critical foreign object categories. Comprehensive experiments validate the model’s superior performance, achieving state-of-the-art 95.2% mAP@0.50 with only 3.74 M parameters, demonstrating its potential for practical transmission line inspection applications.

Keywords:

foreign object detection; YOLOv11; spatial-channel dynamic inference; UAV

1. Introduction

With the rapid development of electric power infrastructure and the continuous promotion of urban and rural construction, the transmission network, as the core carrier of electric power transmission [1,2], faces increasing challenges for safe operation. Due to the natural environment and human activities, bird nesting materials, kites, balloons and plastic wastes frequently appear on transmission lines and towers [3,4]. These exogenous materials not only lead to a significant increase in the incidence of phase-to-phase short-circuit accidents but also may lead to the deterioration of insulation performance, local field strength distortion and other chain reactions. This seriously threatens the stable operation of the power grid [5,6,7]. Automatic detection of foreign objects can provide effective early warnings for operation and maintenance personnel.

Currently, researchers around the world proposed many methods to detect foreign objects on transmission lines. These methods can be divided into two categories. The first are the traditional machine learning methods [8,9,10]. They need complex image preprocessing steps, for example, noise elimination, image enhancement and other steps. They also rely on manually defining and extracting features (e.g., HOG, SIFT). Although this type of methods can achieve certain detection results in specific scenarios, the hand-crafted features are difficult to adapt to the complex environments. It is difficult to effectively differentiate between foreign targets with different morphologies, resulting in wrong detections [11,12,13].

The second category are deep learning approaches [14,15,16]. Jiang et al. [17] address the persistent small-target omission in aerial power-line inspection by grafting a lightweight Convolutional Block Attention Module onto YOLOv5s, redirecting scarce computational capacity to informative regions while retaining the CIoU loss for tighter regression. Li et al. [18] proposed a hybrid CNN–Transformer architecture that innovatively combines convolutional spectral–spatial feature extraction with self-attention-based temporal modeling, significantly advancing dense time-series crop classification by simultaneously addressing sensor heterogeneity and long-range phenological dependency capture. Guan et al. [19] proposed an enhanced YOLOv8 framework integrating deformable convolutions and dynamic attention to address occlusion challenges in X-ray security inspection. Yu et al. [20] optimized the hyperparameters of YOLOv7 by genetic algorithms and introduced spatial depth conversion convolution, which improves the detection of small targets and low-resolution images. These methods have their advantages on the given datasets; however, they can only solve some simple scenes with a small number of foreign objects.

The motivation of this paper is to propose a method that can adjust to complex scenes with many common foreign objects on transmission lines. To achieve this goal, this paper introduces hierarchical Spatial-channel Dynamic Inference (SDI) module into YOLO detector. The module effectively solves the multiscale detection problems in complex scenarios through an innovative hierarchical spatial-channel attention mechanism. The SDI module can effectively suppress background interference and enhance the recognition capability of low-contrast targets. Experimental results demonstrate that this module significantly improves detection accuracy. Some detection results are given in Figure 1.

The main contributions of this paper can be summarized as follows:

This paper proposes an end-to-end detector YOLOv11_SDI for the task of foreign object detection on transmission lines. The novelty is to integrate SDI module into the YOLOv11 network.
This paper incorporates a spatial attention unit to co-optimize the multi-layer features to achieve the adaptive fusion of semantic information and detailed features. It enhances the model’s attentions on critical regions with high efficiency.
Experiments conducted on the EFOD_Drone dataset demonstrate the effectiveness of the proposed YOLOv11_SDI model, which achieves a 94.1% average accuracy, a mAP@0.50 of 0.952, and outperforms existing mainstream methods.

2. Related Work

2.1. Foreign Object Detection Methods

Various methods have been proposed for this task of foreign object detection on transmission lines. They can be divided into traditional approaches based on hand-crafted features [21,22] and deep learning-based approaches. In recent years, the deep learning-based methods are becoming the mainstream because of their excellent detection capabilities.

Guo et al. [23] proposed a model integrating deformable residual convolutions and adaptive multiscale fusion with learnable positional encoding. The method significantly improves detection accuracy for small objects in complex backgrounds. Ji et al. [24] proposed FusionNet to reconcile the persistent trade-off between speed and accuracy in severe-weather foreign object detection on power lines, where rain and fog obscure small targets. Sun et al. [25] presented ST-YOLOv8, an enhanced YOLOv8 model that targets the low recall and localization errors for small, occluded foreign objects on transmission lines by integrating Swin Transformer, AFPN and Focal-SIoU loss to strengthen global context, multi-scale fusion and hard-sample learning, achieving efficient automatic detection on a real-world Jilin power-line dataset.

Su et al. [7] introduce a data-composition pipeline that pastes foreground objects onto real power-line backgrounds. It designs an edge-oriented detector, EpNet, to overcome the severe data scarcity and background clutter in transmission line foreign object detection. Wang et al. [26] introduced BiFPN to counter scarce data and cluttered drone views in transmission line foreign object detection, lifting mAP@0.50 to 89.6% with a compact 3 M-parameter design that outperforms heavier counterparts while still leaving very small targets and dataset breadth for further refinement. Han et al. [27] unveil TD-YOLO, a UAV-centric lightweight YOLOv7-Tiny that compresses 74.8% params via Ghost yet restores small-target recall with scSE-PA fusion and NWD-CIoU, gaining 0.71% mAP at 23.5 FPS on Jetson NX (Manufacturer: NVIDIA Corporation; City: Santa Clara; Country: USA), yet incurs a minor accuracy gap versus the full-scale model. Bae et al. [28] developed YOLO-RACE, a model incorporating CARAFE for content-aware upsampling and ResCBAM for joint channel-spatial attention. It demonstrated significant improvements in detecting small and densely packed objects on multiple challenging datasets. This work underscores the value of dynamic feature reconstruction and attentive feature refinement, principles that align with and motivate our SDI module design.

Existing studies have made significant progress in enhancing the performance of target detection. Wang et al. [29] developed a Receptive Field Module and Mutual Fusion Decoder to optimize semantic information utilization. Alamri et al. [30] enhanced state-of-the-art detectors (Fast R-CNN and YOLO) through contextual information integration. Further advancing backbone design, Wang et al. [31] proposed FADC-ResNet, featuring dynamic expansion rate adjustment and adaptive frequency sensing for improved multi-scale target recognition. Mao et al. [32] introduced a dual-branch sparse decoder with reparameterized orthogonal downsampling to efficiently detect infrared small targets while reducing computational redundancy, achieving state-of-the-art performance with significantly improved inference speed. Li et al. [33] push UAV object detection toward on-board real-time operation by distilling YOLOv5s onto the DJI M300 RTK, compressing latency to 33.9 FPS while sustaining 92.1% mAP on the bird nest dataset, yet gains come at the cost of daylight-only RGB reliance and reduced robustness under occlusion or adverse weather.

Despite these advancements, critical limitations persist regarding computational complexity and deployment efficiency. Current methods often prove impractical for resource-constrained applications, particularly in transmission line inspection scenarios where real-time processing and edge deployment are essential.

2.2. Foreign Object Detection Datasets

The datasets are very critical for the learning methods [16]. Chen et al. [34] released a new railroad transmission line foreign object dataset: 14,615 images containing 40,541 annotations across four common object types, generated by ChatGPT-4 (24 May 2023 release, OpenAI API)-driven text-to-image synthesis. Wang et al. [26] presented FOTL_Drone, a curated UAV-view dataset of 1495 annotated images capturing six common foreign object classes on transmission lines, harvested from web searches, video frames, and drone flights. Wang et al. [35] built a new transmission line dataset that augments CPLID with 400 images sourced from power-supply bureaus, expanding to 1600 samples via rotation, translation and cropping. It comprises 1749 nests, 4961 insulators and 3912 dampers, offering richer and more balanced annotations than existing public sets.

Zhu et al. [36] released an 8000-image transmission line monitoring set captured by tower-mounted cameras, encompassing 10,852 annotated foreign objects against complex backgrounds. Electric Power Science Research Institute of the Yunnan Branch [37] supplied a drone-acquired transmission line foreign object dataset, whose images range from 640 × 640 to 2738 × 2270 pixels and are annotated across six classes—trash, twig, nest, kite, bird and balloon.

Overall, the released datasets provide training resources for foreign object detection task. However, it still lacks of images with multiple classes, extreme weather conditions, and varied geographical environments.

3. Methodology

3.1. The Enhanced YOLOv11 with Spatial-Channel Dynamic Inference

The task of foreign object detection in transmission line inspection scenarios presents several formidable technical challenges that significantly impact detection performance. First and foremost, the multiscale nature of target objects—ranging from large electrical equipment to small birds or debris—necessitates a robust feature extraction framework capable of handling substantial scale variations. Additionally, the detection of small objects is particularly challenging due to their limited pixel representation in high-altitude drone imagery, often resulting in insufficient feature discriminability. Furthermore, complex background interference, including cluttered vegetation, varying weather conditions, and intricate transmission line infrastructure, frequently leads to false positives and missed detections in conventional approaches. These challenges are further exacerbated by real-time processing requirements for unmanned aerial vehicle based inspection systems.

To address these critical limitations, we propose YOLOv11_SDI, an enhanced object detection framework that combines computational efficiency with state-of-the-art detection performance. Building upon the YOLOv11 architecture, our solution introduces several key innovations specifically designed to overcome the aforementioned challenges in transmission line monitoring scenarios. The proposed framework maintains the computational advantages of one-stage detection while significantly improving detection accuracy through advanced feature fusion mechanisms.

As illustrated in Figure 2, our proposed architecture follows an end-to-end detection pipeline that processes input images and generates precise detection outputs, including bounding box coordinates and classification labels. The system’s core innovation lies in its novel detection head architecture, which incorporates our specially designed Spatial-channel Dynamic Inference (SDI) module. This module serves as a plug-and-play enhancement element that can be seamlessly integrated into the YOLOv11 framework.

The SDI module employs a sophisticated spatial-channel dual-attention mechanism that operates on multiple levels of the feature hierarchy. This mechanism performs two complementary functions: (1) spatial attention weights focus on discriminative regions containing potential objects of interest, while (2) channel attention emphasizes the most informative feature maps. Through this dual-path attention approach, the module effectively refines multilevel encoder features, preserving critical spatial details while enhancing semantic representation.

Furthermore, the SDI module implements an adaptive feature fusion strategy that dynamically balances high-level semantic information with low-level spatial details. This is achieved through a learnable weighting mechanism that optimally combines features from different scales based on their relative importance for the detection task. The module’s ability to preserve spatial information is particularly crucial for maintaining detection accuracy of small objects, while its dynamic nature ensures computational efficiency by avoiding redundant feature processing.

Figure 3 provides the SDI structure. First, in the spatial dimension, the feature map is processed by an improved spatial attention module

φ_{i}^{s}

, which generates a refined spatial weight matrix by emphasizing subtle spatial relationships and key regions, thereby enhancing the detection sensitivity for small and occluded objects. Simultaneously, in the channel dimension, a channel attention module

ϕ_{i}^{c}

is employed to dynamically recalibrate channel weights by modeling the dependencies between them. The two attention modules enable the model to effectively integrate fine-grained spatial details with global channel information. This forms a robust feature representation tailored for detecting diverse foreign objects in complex transmission line scenarios, as follows:

f_{i}^{1} = ϕ_{i}^{c} (φ_{i}^{s} (f_{i}^{0})),

(1)

where

f_{i}^{1}

denotes the processed feature map of layer i, and

φ_{i}^{s}

and

ϕ_{i}^{c}

denote the spatial and channel attention parameters of layer i, respectively. In addition, we apply a

1 \times 1

convolution to reduce the number of channels of

f_{i}^{1}

to c, where c is a hyperparameter. The feature map thus obtained is denoted as

f_{i}^{2}

and has the form

f_{i}^{2} \in R^{H_{i} \times W_{i} \times c}

, where

H_{i}

,

W_{i}

, and c represent the width, height, and number of channels of

f_{i}^{2}

, respectively.

Next, we employ a

1 \times 1

convolutional layer to adjust the channel dimension of the feature map and reduce its channel number to the preset hyperparameter c. Through this operation, we obtain a new feature representation, denoted as

f_{i}^{2} \in R^{H_{i} \times W_{i} \times c}

, where

W_{i}

and

H_{i}

represent the width and height of the feature map, respectively, and c represents the adjusted channel number. This channel dimensionality reduction operation can not only effectively control the complexity of the model but also promote the information fusion between different channel features while keeping the spatial dimension of the feature map (

W_{i} \times H_{i}

) unchanged.

In addition, we feed the optimized feature maps into the decoder for further processing. In each layer i of the decoder, we dynamically adjust the size of the feature map using the expected output resolution of that layer as a reference. For the ith level of the decoder, we perform an up-sampling or down-sampling operation on the input feature maps to make their spatial resolution consistent with the requirements of

f_{i}^{2}

. This process can be formalized as

f_{i j}^{3} = \{\begin{matrix} D (f_{j}^{2}, (H_{i}, W_{i})) & if j < i, \\ I (f_{j}^{2}) & if j = i, \\ U (f_{j}^{2}, (H_{i}, W_{i})) & if j > i, \end{matrix}

(2)

where D, I and U stand for adaptive mean pooling, constant mapping and bilinear interpolation of

f_{j}^{2}

to the resolution of

H_{i} \times W_{i}

, respectively, and these operations satisfy

1 \leq i, j \leq M

, where M denotes the total number of layers of the decoder, ensuring that the feature map can be flexibly adjusted among different layers.

To further optimize the adjusted feature representation, we apply a

3 \times 3

convolutional layer to each resolution-adapted feature map

f_{i j}^{3}

for processing, and the computational process can be expressed as

f_{i j}^{4} = θ_{i j} (f_{i j}^{3}),

(3)

where

θ_{i j}

represents the parameters of the smoothing convolution, and

f_{i j}^{4}

represents the jth smoothed feature map of the ith layer. After adjusting all the layer i feature maps to the same resolution, we perform element-by-element Hadamard product operation on all the adjusted feature maps as a way to enhance the semantic information and detail expression of layer i features as follows:

f_{i}^{5} = H ([f_{i 1}^{4}, f_{i 2}^{4}], \dots, f_{i M}^{4}) .

(4)

3.2. The Variants YOLOv11_SDI

To rigorously evaluate the architectural impact of different integration approaches for our proposed Spatial-Dynamic Inference (SDI) module, we conducted a comprehensive ablation study using the YOLOv11 baseline architecture as our experimental platform. Specifically, we systematically investigated five distinct embedding configurations by inserting the SDI module at critical junctions throughout the network’s feature hierarchy, resulting in five architecturally varied implementations (designated as 1-YOLOv11_SDI through 5-YOLOv11_SDI).

This carefully designed ablation framework serves two primary scientific purposes: first, it quantitatively disentangles the relationship between module placement depth and detection performance; second, it identifies the optimal architectural configuration that maximizes multi-scale feature interaction while maintaining computational efficiency essential for real-time UAV applications. Each variant was subjected to identical training protocols and evaluated on our standardized EFOD_Drone benchmark dataset under controlled experimental conditions to ensure statistically valid comparisons. The five integration strategies, illustrated in Figure 4, represent strategically chosen insertion points spanning the network’s feature extraction pipeline.

This ablation systematically disentangles the influence of SDI depth and placement, revealing the architectural locus that maximizes multi-scale feature interaction while preserving inference efficiency. By benchmarking all variants on the identical EFOD_Drone split, we ensure fair comparison and establish an empirically grounded guideline for integrating lightweight dynamic inference modules in UAV-based power-line inspection networks. The five head structures of YOLOv11 are illustrated in Figure 4. This systematic evaluation not only provides empirical evidence for optimal SDI placement but also establishes generalizable design principles for incorporating dynamic inference modules in power-line inspection networks.

4. Experiments

4.1. Evaluation Metrics and Dataset

Evaluation metrics are employed to measure the quantitative performance of models. In this paper, four metrics are selected, namely, precision, recall, mean average precision (mAP), and inference time.

I o U = \frac{a r e a (C) \cap a r e a (G)}{a r e a (C) \cup a r e a (G)}

(5)

{A P}_{c l a s s} = \int_{0}^{1} P (R) d R

(6)

m A P = \frac{\sum A P}{N}

(7)

P r e c i s i o n = \frac{T P}{T P + F P} \times 100 %

(8)

R e c a l l = \frac{T P}{T P + F N} \times 100 %

(9)

The evaluation employs standard detection metrics where TP, FP, and FN denote true positives, false positives, and false negatives, respectively, while n represents the number of classes. For per-class accuracy assessment, AP is calculated based on the precision–recall curve. The overall model performance is measured by mean average precision (mAP), with mAP@0.50 (IoU threshold at 0.5) and mAP@0.50:0.95 (average across IoU thresholds from 0.5 to 0.95 in 0.05 increments) being the primary metrics in this study.

Among them, the mAP metric contains two key variants: mAP@0.50, which measures detection performance at an IoU threshold of 0.5, primarily assessing basic detection capabilities, and mAP@0.50:0.95, which reflects the average performance across the range of IoU thresholds from 0.5 to 0.95 (with a step size of 0.05), providing a more comprehensive view of the model’s performance [38]. Moreover, we conduct all experiments under the standardized configuration detailed in Table 1.

To evaluate the effectiveness of the proposed method, a foreign object detection dataset is essential. The original FOTL_Drone public dataset [26] is designed for foreign object detection on transmission lines. It covers six types of typical foreign objects with 1495 annotated images. Figure 5 illustrates representative samples, encompassing both static threats like kites and dynamic targets such as fire, alongside their brightness, cropping and flipping augmentations displayed in the subsequent three columns. It is uniformly labeled using the labeling tool according to the PASCAL VOC standard. However, limited by the scale and diversity of the original data, it is difficult to meet the requirements of training high-robustness detection models for complex power inspection scenarios.

To address this issue, we expand the dataset through effective data enhancement strategies. Three key enhancement strategies are adopted: random cropping (crop ratio range: 0.7–1.3 of original size), random flipping (horizontal

p = \frac{1}{3}

, vertical

p = \frac{1}{3}

, both

p = \frac{1}{3}

, no-keep), and brightness adjustment (

\pm 20 %

variation). The random cropping strategy enriches the distribution of target positions and local feature samples. The random flipping strategy improves the model’s invariance to object orientation and mirror viewpoints. The brightness adjustment strategy simulates the light changes in different inspection time periods and weather conditions. With the help of these strategies, the dataset size increase from the 1495 to 5980, which significantly improves data diversity. It also effectively alleviates the problem of insufficient sample representation in the original dataset. The expanded dataset maintains the coverage of 6 typical foreign object categories, and the reasonable division according to the 8:2 ratio, which is of great significance for improving the robustness of models in complex scenes. Table 2 shows the proportion of each category in the expanded dataset. It can be seen that the samples of each category are relatively balanced.

4.2. The Selection of Feature Enhancement Module

This study systematically evaluates the performance of five feature enhancement modules in the YOLOv11 framework for the challenge of multiscale target detection from the UAV perspective. The experimental results in Table 3 show that the SDI module exhibits the best balance of detection accuracy and computational efficiency, with an average mAP@0.50 of 95.2%, and the inference time is controlled at 107.04 ms.

This study comprehensively evaluates five feature enhancement modules for UAV-based multiscale target detection. BiFPN [26] demonstrates superior computational efficiency at 99.42 ms through bidirectional cross-scale connections, yet its lack of explicit attention mechanisms results in suboptimal 94.5% mAP@0.50 for occluded targets. GoldYOLO [39] incorporates static channel attention but suffers from performance inconsistency, notably achieving only 90.7% precision in fire detection due to inflexible feature adaptation. The iEMA module’s adaptive attention design [40] comes at the cost of 113.98 ms processing time and reduced 94.0% mAP@0.50, as its recurrent mechanism inadequately balances spatial-channel relationships. The GAM module [37] processes channel and spatial attention sequentially, achieving a 94.4% precision in nest detection but exhibiting a 119.54 ms inference latency and a 92.8% average mAP@0.50, reflecting the trade-off between its global attention benefits and sequential processing limitations.

In summary, considering the detection performance and real-time requirements, the SDI module is selected for the foreign object detection task. It is embedded into YOLOv11 in this paper. Next section will presents the model selection.

4.3. Model Selection

Table 4 shows the performance of the YOLOv11_SDI model trained on the original dataset, and Table 5 shows the results on the expanded dataset. It can be easily concluded that the data enhancement strategies are very effective to expand the original FOTL_Drone dataset. The model trained on the expanded dataset achieves substantial improvements in the metrics of precision, recall, and mAP.

According to the results listed in Table 4, Table 5 and Table 6, most of the YOLOv11_SDI models outperform the baseline YOLOv11 in terms of accuracy, recall, and mAP@0.50 in all six categories. It can be observed clearly that 1-YOLOv11_SDI demonstrates a significant advantage in terms of overall detection performance. This demonstrates that the introduction of the SDI module in the feature fusion stage can effectively enhance the model’s feature characterization ability. Specially, it is worth noting that the inference time of 1-YOLOv11_SDI is between 120 and 147 ms while maintaining high detection accuracy.

Compared to the other variants, the performance of 2-YOLOv11_SDI and 3-YOLOv11_SDI shows divergent characteristics. 2-YOLOv11_SDI maintains a similar mAP@0.50 as 1-YOLOv11_SDI in the Fire and Person categories. Still, it shows a performance dropback of 1–3% in categories such as Nest and Kite, and inference time is not significantly optimized. However, it performs best in inference efficiency; its recall metrics are generally lower than those of the other variants.

The 4-YOLOv11_SDI and 5-YOLOv11_SDI exhibit different optimization characteristics. The 4-YOLOv11_SDI achieves the highest mAP@0.50:0.95 of 78.9% in the Kite category, demonstrating the effectiveness of the SDI module in large object detection tasks. The 5-YOLOv11_SDI excels in computational efficiency, with an average inference time reduction of 15–20% compared to the baseline.

Analyzing from the perspective of multiscale detection capability, the significant difference in object size from the UAV viewpoint puts stringent requirements on the model. It can be seen from Table 6 that 1-YOLOv11_SDI achieves excellent mAP@0.50 of 95.7% and 95.8% in challenging categories such as Balloon and Fire, indicating that the fusion of the SDI module at the low-level feature stage can effectively retain the detailed information and avoid the subsequent layers of information decay. In contrast, the all-level fusion strategy adopted by 5-YOLOv11_SDI has the lowest computational effort, but it may result in a decreasing of 1–2% in mAP@0.50 for all categories due to feature confusion, indicating that this scheme has a negative impact on detection performance compared to 1-YOLOv11_SDI.

Considering the detection accuracy, generalization ability, and real-time requirements, 1-YOLOv11_SDI achieves a good trade-off on all evaluation metrics. The variant not only achieves an improvement of 3.5% in mAP@0.50 but also demonstrates a top accuracy of 96.3% on the most challenging small object detection tasks (e.g., kite). This superiority primarily stems from replacing the earliest concatenation operation with our SDI module, which preserves crucial low-level spatial details before semantic abstraction. The adaptive weighting mechanism enhances discriminative features at this critical stage, effectively amplifying subtle patterns essential for small object detection in complex backgrounds. Thus, 1-YOLOv11_SDI is selected as our model.

4.4. Ablation Study

To validate the effectiveness of the SDI module, including its core components and the impact of spatial-channel attention ordering, we conducted an ablation study on the EFOD_Drone dataset, with results summarized in Table 7. The study compares the baseline YOLOv11, variants with individual spatial attention or channel attention, and two full SDI variants that differ in attention execution order. The baseline YOLOv11 serves as a reliable reference to quantify the contributions of each component and the influence of attention ordering in the SDI module.

The spatial attention component, designed to capture local discriminative regions (e.g., edge details of kites or texture of nests), alone increases mAP@0.50 to 93.5% and mAP@0.50:0.95 to 72.8%, demonstrating its ability to enhance spatial localization of small, irregular targets in cluttered transmission line backgrounds. In contrast, the channel attention component, focused on global feature discrimination across channels (e.g., distinguishing fire’s spectral characteristics from the sky), increases mAP@0.50 to 93.1%, which is 0.9% higher than the baseline, highlighting its value in refining feature discrimination by recalibrating channel weights to prioritize informative signals.

The SDI variant with spatial-channel ordering achieves 94.1% precision, 94.4% recall and 95.2% mAP@0.50, 3.4%, 3.4% and 3.0% over baseline. Its mAP@0.50:0.95 reaches 74.4%, 4.3% over baseline. The channel-spatial variant outperforms the baseline but lags slightly, with 94.1% mAP@0.50 and 73.5% mAP@0.50:0.95. Both variants incur consistent increases in inference time and GFLOPs, yet their performance gains—especially the spatial-channel design’s superiority in key mAP metrics—justify this trade-off, as high accuracy is a core requirement for transmission line inspection.

4.5. Our Model’s Training

Figure 6 comprehensively illustrates the training dynamics and performance evolution of our proposed YOLOv11_SDI model.

The upper portion of the figure presents the integrated precision–recall–F1 curves across six distinct target categories, plotted against varying confidence thresholds. These curves provide critical insights into the model’s discriminative capabilities for different object classes, revealing category-specific detection characteristics and performance boundaries.

Figure 7 illustrates the key metric curves of the proposed YOLOv11_SDI model during the training process. The bounding box loss curve demonstrates the model’s progressive refinement of localization accuracy, while the classification loss reflects its evolving discriminative power. Notably, the simultaneous convergence of all loss components indicates stable training dynamics without optimization conflicts. The mAP trajectories further validate consistent performance gains throughout the training process. The comprehensive training curves not only confirm the model’s robust convergence properties but also establish a transparent framework for performance analysis and algorithmic improvement in transmission line inspection tasks.

4.6. Quantitative Comparison with Other Approaches

To validate the efficacy of our proposed model, we conducted a comprehensive comparison with seven mainstream object detection methods, including YOLOv5 [41], YOLOv7 [42] YOLOX [43], YOLOv8n [44], YOLOv11 [45], RetinaNet [46], and Gold_YOLO [47] on the EFOD_Drone dataset. These models span a wide range of architectural designs, from lightweight networks to computationally intensive frameworks. Our evaluation focuses on three critical metrics: precision, recall, and mAP@0.50, alongside model complexity (parameter count).

It can be concluded from Table 8 that our approach demonstrates three key advantages over existing methods:

Superior Accuracy: Achieves the highest mAP@0.50 (95.2%), outperforming all competitors, including YOLOv8n (92.4%) and YOLOX (91.2%).
Optimal Efficiency: With only 3.74 M parameters, our model strikes a better accuracy–efficiency balance than larger models (e.g., YOLOv7, 37.2 M) while surpassing lighter models (e.g., YOLOv8n, 3.2 M) in performance.
Robust Feature Learning: The highest precision (94.1%) and recall (94.4%) indicate robustness against false positives and missed detections, critical for drone-based inspections in complex environments.

For the quantitative performance analysis,

Precision–Recall Trade-Off: Our method improves precision by +1.4% (vs. YOLOv8n) and +2.0% (vs. YOLOX), reducing false alarms. Simultaneously, it boosts recall by +3.9% (vs. YOLOv8n) and +3.4% (vs. YOLOX), enhancing object coverage.
mAP@0.50 Dominance: The 95.2% mAP@0.50 signifies a +2.8% absolute gain over YOLOv8n (92.4%) and a +4.0% gain over YOLOX (91.2%), despite comparable parameter counts. Notably, our model outperforms YOLOv7 (85.3% mAP) and RetinaNet (87.1% mAP) by >9%, despite their 10× larger sizes.
Efficiency–Accuracy Pareto Frontier: As illustrated in Table 8, our method resides on the optimal Pareto front, achieving higher accuracy with fewer parameters than all alternatives. For instance, compared to Gold_YOLO (5.6M, 87.5% mAP), our model reduces parameters by 33% while improving mAP by 7.7%. Against YOLOv11 (7.5 M, 92.2% mAP), we use 50% fewer parameters yet deliver +3.0% higher mAP. Notably, while RT-DETR achieves competitive performance (88.2% mAP), its parameter size (41.96 M) is 11.2× larger than our model, making it impractical for resource-constrained UAV deployments.

Our experiments confirm that the proposed model not only advances detection accuracy but also redefines efficiency benchmarks for UAV-based power line inspections. The consistent superiority in precision, recall, and mAP@0.50—coupled with lightweight deployment—positions our method as a practical and scalable solution for real-world applications.

4.7. Visual Results

Figure 8 presents qualitative detection results of the proposed model across diverse challenging scenarios. The visualization highlights three key challenges in transmission line inspections:

Low-Contrast Targets: Objects such as personnel and fires (e.g., eighth row) exhibit dark features due to sky-dominated backgrounds, yet remain detectable.
Occlusion Handling: The model successfully addresses obstructions caused by transmission infrastructure (e.g., towers and wires), demonstrating resilience to partial occlusions.
Small-Scale Fire Detection: Critically, the system accurately identifies even small-scale fire incidents (e.g., drone-induced fires), which are vital for early hazard prevention.

Aditionally, the model maintains consistent performance under multiple scale variations, adverse lighting conditions and partial occlusions. Particularly noteworthy is that two detection boxes appear for the same fire target in Figure 8, which is due to overlapping multi-scale feature responses in YOLOv11_SDI where its SDI module-fused low-level flame details and high-level semantics with fire’s dynamic combustion causing divergent scale features, and their comparable responses with spatial overlap exceeding the NMS threshold (0.45) result in two retained high-confidence boxes. These results empirically validate the model’s robustness for real-world UAV inspections and its potential to enhance transmission line monitoring systems.

5. Discussion

Existing transmission line foreign object datasets primarily include 4–6 common object categories (e.g., nests, kites) while omitting rare but critical hazards such as industrial debris and ice accretion. While these datasets encompass more than 90% of reported incidents [27], their limited scope fails to account for real-world long-tail distributions. Notably, the exclusion of ice-related objects may compromise model reliability during winter operations. Future research should focus on expanding category diversity through utility collaborations and synthetic data generation techniques [33].

In terms of Real-World Deployment, while demonstrating strong benchmark performance, practical deployment of UAV-based detection systems faces significant challenges [16]. Current models’ computational demands often exceed edge device capabilities for sustained aerial operations. Environmental factors cause mAP degradation in adverse weather, while wildlife interactions present ecological concerns. Variable lighting, seasonal vegetation changes, and underrepresented object types further impact reliability, necessitating research into adaptive multi-modal systems that balance accuracy with real-world constraints.

Although the proposed YOLOv11_SDI achieves high mAP scores in detecting foreign objects on transmission lines, we acknowledge that certain detection failures remain. As illustrated in Figure 9, one typical type of error is missed detection, indicated by the red arrows. In Figure 9a, the target is relatively small and indistinct, leading to its omission. In Figure 9b, the flame appears blackened and charred due to contact with other foreign objects—a scenario that is relatively uncommon in the training dataset—also resulting in a missed detection.

These cases highlight the need for further improvement in detecting obscured and small targets. Future efficiency improvements could also leverage sparsity-based optimization strategies, as suggested by recent advances in sparse representation learning [49,50]. Among these, Yang et al. [51] proposed QueryDet, a two-step feature-pyramid-based detector optimized for small object detection. This design balances the detection benefits of high-resolution features and the efficiency of avoiding background redundant computation—effectively reducing costs while boosting small-object performance [52]. These methods demonstrate that exploiting the inherent sparsity of foreground objects can significantly reduce computational redundancy in background regions. Incorporating such sparse sampling techniques may further enhance our framework’s efficiency for real-time UAV deployment scenarios.

6. Conclusions

In this study, we present YOLOv11_SDI, an end-to-end foreign object detection framework tailored for transmission line inspection. The proposed method incorporates a novel spatial-channel synergistic attention mechanism to enhance multiscale feature fusion, achieving state-of-the-art performance with 95.2% mAP@0.50 on the benchmark dataset. To mitigate the limitations of insufficient sample diversity in the public FOTL_Drone dataset, we employ a comprehensive data augmentation pipeline comprising random cropping, random flipping, and brightness adjustment, thereby improving data variability and model generalizability.

For future research, we plan to develop an adaptive illumination enhancement module to boost detection robustness under low-light conditions, and investigate small-target super-resolution techniques to further improve recognition accuracy for minute objects. These advancements are expected to strengthen the framework’s applicability in real-world transmission line monitoring scenarios, particularly for UAV-based autonomous inspections.

Author Contributions

Conceptualization, D.G. and B.W.; methodology, B.W. and Y.Y.; validation, Y.Y. and C.L.; formal analysis, C.L. and H.Z.; investigation, D.G.; resources, B.W.; data curation, C.L. and H.Z.; writing—original draft preparation, Y.Y. and D.G.; writing—review and editing, B.W. and Y.Y.; visualization, Y.Y. and B.W.; supervision, B.W. and H.Z.; project administration, B.W. and H.Z.; funding acquisition, D.G. and B.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported in part by the National Key Research and Development Program of China under Grant 2024YFB3908500, the Youth Fund of the National Natural Science Foundation of China under number 62406249, the Natural Science Basic Research Program of Shaanxi under number 2024JC-YBQN-0612, and the National Key Laboratory of Space Target Awareness under number STA2024KGJ0202. This work is also supported by the National Natural Science Foundation of China under number 62576281, the Guangdong Provincial Key Laboratory under the number 2023B1212060076, the Science and Research Project of Nantong Institute of Technology under the number WP202535, and the Science and Technology Plan Project of Nantong, under the number JC2023023.

Data Availability Statement

Data is available at: https://github.com/yyh7979/EFOD_Drone, accessed on 24 August 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SDI	Spatial-channel Dynamic Inference
FOTL_Drone	Foreign Object detection on Transmission Lines from a Drone-view
EFOD_Drone	Foreign Object detection on Transmission Lines from a Drone-view

References

Zhang, S.; Li, H.C.; Song, Y.; Yan, C.; Wang, J. YOLOv5-Based Foreign Object Detection Algorithm for Transmission Lines. In Proceedings of the 2024 6th Asia Symposium on Image Processing (ASIP), Tianjin, China, 13–15 June 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 40–46. [Google Scholar]
Wang, G. FMH-YOLO: Detecting Foreign Objects on Transmission Lines via Enhanced Yolov8. Acad. J. Eng. Technol. Sci. 2025, 8, 8–17. [Google Scholar] [CrossRef]
Lu, Y.; Li, D.; Li, D.; Li, X.; Gao, Q.; Yu, X. A Lightweight Insulator Defect Detection Model Based on Drone Images. Drones 2024, 8, 431. [Google Scholar] [CrossRef]
Li, Z.; Zhang, Y.; Wu, H.; Suzuki, S.; Namiki, A.; Wang, W. Design and Application of a UAV Autonomous Inspection System for High-Voltage Power Transmission Lines. Remote Sens. 2023, 15, 865. [Google Scholar] [CrossRef]
Faisal, M.A.A.; Mecheter, I.; Qiblawey, Y.; Fernandez, J.H.; Chowdhury, M.E.; Kiranyaz, S. Deep Learning in Automated Power Line Inspection: A Review. Appl. Energy 2025, 385, 125507. [Google Scholar] [CrossRef]
Wang, J.; Jin, L.; Li, Y.; Cao, P. Application of End-to-End Perception Framework Based on Boosted DETR in UAV Inspection of Overhead Transmission Lines. Drones 2024, 8, 545. [Google Scholar] [CrossRef]
Su, J.; Su, Y.; Zhang, Y.; Yang, W.; Huang, H.; Wu, Q. EpNet: Power Lines Foreign Object Detection with Edge Proposal Network and Data Composition. Knowl.-Based Syst. 2022, 249, 108857. [Google Scholar] [CrossRef]
Dai, L.; Zhang, X.; Gardoni, P.; Lu, H.; Liu, X.; Krolczyk, G.; Li, Z. A New Machine Vision Detection Method for Identifying and Screening Out Various Large Foreign Objects on Coal Belt Conveyor Lines. Complex Intell. Syst. 2023, 9, 5221–5234. [Google Scholar] [CrossRef]
Li, J.; Nie, Y.; Cui, W.; Liu, R.; Zheng, Z. Transmission Line Foreign Object Detection Based on Improved YOLOv3 and Deployed to the Chip. In Proceedings of the 2020 3rd International Conference on Machine Learning and Machine Intelligence, Hangzhou, China, 18–20 September 2020; pp. 100–104. [Google Scholar]
Xu, L.; Song, Y.; Zhang, W.; An, Y.; Wang, Y.; Ning, H. An Efficient Foreign Objects Detection Network for Power Substation. Image Vis. Comput. 2021, 109, 104159. [Google Scholar] [CrossRef]
Chen, B.; Liu, L.; Zou, Z.; Shi, Z. Target Detection in Hyperspectral Remote Sensing Image: Current Status and Challenges. Remote Sens. 2023, 15, 3223. [Google Scholar] [CrossRef]
Hao, J.; Yan, G.; Wang, L.; Pei, H.; Xiao, X.; Zhang, B. A Lightweight Transmission Line Foreign Object Detection Algorithm Incorporating Adaptive Weight Pooling. Electronics 2024, 13, 4645. [Google Scholar] [CrossRef]
Yao, N.; Zhu, L.F. A Novel Foreign Object Detection Algorithm Based on GMM and K-Means for Power Transmission Line Inspection. J. Phys. Conf. Ser. 2020, 1607, 012014. [Google Scholar] [CrossRef]
Benelmostafa, B.-E.; Medromi, H. PowerLine-MTYOLO: A Multitask YOLO Model for Simultaneous Cable Segmentation and Broken Strand Detection. Drones 2025, 9, 505. [Google Scholar] [CrossRef]
Wu, Y.; Zhao, S.; Xing, Z.; Wei, Z.; Li, Y.; Li, Y. Detection of Foreign Objects Intrusion into Transmission Lines Using Diverse Generation Model. IEEE Trans. Power Deliv. 2023, 38, 3551–3560. [Google Scholar] [CrossRef]
Zhang, X.; Li, J. A Survey on Detecting Foreign Objects on Transmission Lines Based on UAV Images. In Proceedings of the International Conference on Intelligent Robotics and Applications, Xi’an, China, 31 July–2 August 2024; Springer Nature: Singapore, 2024; pp. 388–402. [Google Scholar]
Jiang, T.; Li, C.; Yang, M.; Wang, Z. Improved YOLOv5s Algorithm for Object Detection with an Attention Mechanism. Electronics 2022, 11, 2494. [Google Scholar] [CrossRef]
Li, Z.; Chen, G.; Zhang, T. A CNN-Transformer Hybrid Approach for Crop Classification Using Multi-Temporal Multisensor Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 847–858. [Google Scholar] [CrossRef]
Guan, F.; Zhang, H.; Wang, X. An Improved YOLOv8 Model for Prohibited Item Detection with Deformable Convolution and Dynamic Head. J. Real-Time Image Process. 2025, 22, 84. [Google Scholar] [CrossRef]
Yu, Y.; Lv, H.; Chen, W.; Wang, Y. Research on Defect Detection for Overhead Transmission Lines Based on the ABG-YOLOv8n Model. Energies 2024, 17, 5974. [Google Scholar] [CrossRef]
Su, Y.; Sun, R.; Lin, G.; Wu, Q. Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 7004–7014. [Google Scholar]
Su, Y.; Deng, J.; Sun, R.; Lin, G.; Wu, Q. A Unified Transformer Framework for Group-Based Segmentation: Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 5879–5895. [Google Scholar] [CrossRef]
Guo, X.; Bao, Y.; Jiang, H.; Feng, Z.; Sun, Y. RDBL-Net: Detection of Foreign Objects on Transmission Lines Based on Positional Encoding Multiscale Feature Fusion. Int. J. Sens. Netw. 2025, 47, 61–71. [Google Scholar] [CrossRef]
Ji, C.; Jia, X.; Huang, X.; Zhou, S.; Chen, G.; Zhu, Y. FusionNet: Detection of Foreign Objects in Transmission Lines During Inclement Weather. IEEE Trans. Instrum. Meas. 2024, 73, 5021218. [Google Scholar] [CrossRef]
Sun, H.; Shen, Q.; Ke, H.; Duan, Z.; Tang, X. Power Transmission Lines Foreign Object Intrusion Detection Method for Drone Aerial Images Based on Improved YOLOv8 Network. Drones 2024, 8, 346. [Google Scholar] [CrossRef]
Wang, B.; Li, C.; Zou, W.; Zheng, Q. Foreign Object Detection Network for Transmission Lines from Unmanned Aerial Vehicle Images. Drones 2024, 8, 361. [Google Scholar] [CrossRef]
Han, G.; Wang, R.; Yuan, Q.; Zhao, L.; Li, S.; Zhang, M.; He, M.; Qin, L. Typical Fault Detection on Drone Images of Transmission Lines Based on Lightweight Structure and Feature-Balanced Network. Drones 2023, 7, 638. [Google Scholar] [CrossRef]
Bae, M.H.; Park, S.W.; Park, J.; Jung, S.H.; Sim, C.B. YOLO-RACE: Reassembly and Convolutional Block Attention for Enhanced Dense Object Detection. Pattern Anal. Appl. 2025, 28, 90. [Google Scholar] [CrossRef]
Wang, J.; Guo, Y.; Tan, X.; Lan, Y.; Han, Y. Enhancing Green Guava Segmentation with Texture Consistency Loss and Reverse Attention Mechanism Under Complex Background. Comput. Electron. Agric. 2025, 216, 110308. [Google Scholar] [CrossRef]
Alamri, F.; Pugeault, N. Improving Object Detection Performance Using Scene Contextual Constraints. IEEE Trans. Cogn. Dev. Syst. 2020, 14, 1320–1330. [Google Scholar] [CrossRef]
Wang, S.; Jiang, H.; Yang, J.; Ma, X.; Chen, J. AMFEF-DETR: An End-to-End Adaptive Multiscale Feature Extraction and Fusion Object Detection Network Based on UAV Aerial Images. Drones 2024, 8, 523. [Google Scholar] [CrossRef]
Mao, Q.; Li, Q.; Wang, B.; Zhang, Y.; Dai, T.; Chen, C.P. SpirDet: Towards Efficient, Accurate and Lightweight Infrared Small Target Detector. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5006912. [Google Scholar] [CrossRef]
Li, H.; Dong, Y.; Liu, Y.; Ai, J. Design and Implementation of UAVs for Bird’s Nest Inspection on Transmission Lines Based on Deep Learning. Drones 2022, 6, 252. [Google Scholar] [CrossRef]
Chen, Z.; Yang, J.; Feng, Z.; Zhu, H. RailFOD23: A Dataset for Foreign Object Detection on Railroad Transmission Lines. Sci. Data 2024, 11, 72. [Google Scholar] [CrossRef]
Wang, S.; Tan, W.; Yang, T.; Zeng, L.; Hou, W.; Zhou, Q. High-Voltage Transmission Line Foreign Object and Power Component Defect Detection Based on Improved YOLOv5. J. Electr. Eng. Technol. 2024, 19, 851–866. [Google Scholar] [CrossRef]
Zhu, J.; Guo, Y.; Yue, F.; Yuan, H.; Yang, A.; Wang, X.; Rong, M. A Deep Learning Method to Detect Foreign Objects for Inspecting Power Transmission Lines. IEEE Access 2020, 8, 94065–94075. [Google Scholar] [CrossRef]
Wang, Z.; Yuan, G.; Zhou, H.; Ma, Y.; Ma, Y. Foreign-Object Detection in High-Voltage Transmission Line Based on Improved YOLOv8m. Appl. Sci. 2023, 13, 12775. [Google Scholar] [CrossRef]
Yue, G.; Liu, Y.; Niu, T.; Liu, L.; An, L.; Wang, Z.; Duan, M. Glu-YOLOv8: An Improved Pest and Disease Target Detection Algorithm Based on YOLOv8. Forests 2024, 15, 1486. [Google Scholar] [CrossRef]
Buhari, A.M.; Ooi, C.P.; Baskaran, V.M.; Baskaran, V.M.; Phan, R.C.; Wong, K.; Tan, W.H. Invisible Emotion Magnification Algorithm (IEMA) for Real-Time Micro-Expression Recognition with Graph-Based Features. Multimed. Tools Appl. 2022, 81, 9151–9176. [Google Scholar] [CrossRef]
Jordan, S. Using E-Assessment to Learn About Learning. In Proceedings of the CAA 2013 International Conference, Southampton, UK, 9–10 July 2013; pp. 1–12. [Google Scholar]
Jocher, G.; Stoken, A.; Borovec, J.; Liu, C.; Hogan, A.; Diaconu, L.; Poznanski, J.; Yu, L.; Rai, P.; Ferriday, R.; et al. Ultralytics/yolov5: V3.0. Zenodo. 2020. Available online: https://zenodo.org/records/3983579 (accessed on 31 August 2025).
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 7464–7475. [Google Scholar]
Zhang, Y.; Xu, W.; Yang, S.; Xu, Y.; Yu, X. Improved YOLOX Detection Algorithm for Contraband in X-Ray Images. Appl. Opt. 2022, 61, 6297–6310. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Hua, Z.; Wen, Y.; Zhang, S.; Xu, X.; Song, H. E-YOLO: Recognition of Estrus Cow Based on Improved YOLOv8n Model. Expert Syst. Appl. 2024, 238, 122212. [Google Scholar] [CrossRef]
Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 2980–2988. [Google Scholar]
Wang, C.; He, W.; Nie, Y.; Guo, J.; Liu, C.; Wang, Y.; Han, K. Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism. Adv. Neural Inf. Process. Syst. 2024, 36, 51094–51112. [Google Scholar]
Li, T.; Zhu, C.; Wang, Y.; Li, J.; Cao, H.; Yuan, P.; Gao, Z.; Wang, S. LMFC-DETR: A Lightweight Model for Real-Time Detection of Suspended Foreign Objects on Power Lines. IEEE Trans. Instrum. Meas. 2025, 74, 2539319. [Google Scholar] [CrossRef]
Xiao, C.; Xiao, C.; An, W.; Zhang, Y.; Su, Z.; Li, M.; Sheng, W.; Pietikäinen, M.; Liu, L. Highly Efficient and Unsupervised Framework for Moving Object Detection in Satellite Videos. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 11532–11539. [Google Scholar] [CrossRef] [PubMed]
Kavukcuoglu, K.; Ranzato, M.A.; LeCun, Y. Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition. arXiv 2020, arXiv:1010.3467. [Google Scholar]
Yang, C.; Huang, Z.; Wang, N. QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 13668–13677. [Google Scholar]
Wu, S.; Xiao, C.; Wang, Y.; Yang, J.; An, W. Sparsity-Aware Global Channel Pruning for Infrared Small-target Detection Networks. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5615011. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram and identification results of foreign object detection for transmission lines. Left: Schematic diagram of six types of foreign objects (generated using the AI tool “Jimeng AI” with the prompt: “Generate a clear and scientific schematic diagram showing six types of foreign objects near transmission lines: nests, kites, balloons, fire, person, and monkeys. The style should be minimalist and labeled”; the output was subsequently optimized by the authors). Right: Identification results of some foreign objects by our proposed detector.

Figure 2. Architecture of the YOLOv11_SDI. It has a detection head based on our proposed spatial-channel dynamic inference module for foreign object detection on transmission lines.

Figure 3. The spatial-channel dynamic inference network.

Figure 4. Insertion of SDI at different positions of the head structure of YOLOv11. (a) Represents the original structure of YOLOv11 head, while (b–f) depict the integration of the SDI structure at five different locations.

Figure 5. Examples of data augmentation: selected original samples (Left, Column 1) and their augmented versions (Right, Columns 2–4).

Figure 6. Evolution of the SDI module’s performance in the EFOD_Drone dataset during the training process.

Figure 7. Curves of the key indicators of the YOLOv11_SDI model in the EFOD_Drone dataset during the training process.

Figure 8. Visualization of detection results of the YOLOv11_SDI model.

Figure 9. Missed detection failures. (a) A partially occluded balloon that was not detected; (b) a small fire that was also undetected. Red arrows indicate the missed objects.

Table 1. Experimental environment.

Device	New Configuration (SDI)
Operating System	Windows 11
GPU	NVIDIA GeForce RTX 3080 (12 G)
GPU Accelerator	CUDA v11.6
Scripting Languages	Python v3.8
Frameworks	PyTorch v2.0
Compilers	Anaconda3 (v2023.09-0, 64-bit), PyCharm Community Edition v2023.3.2
Target detection algorithms	YOLOv11

Table 2. Proportions of each category in the EFOD_Drone dataset.

	Nest	Kite	Balloon	Fire	Person	Monkey
Proportion	23.2%	13.8%	15.2%	15.4%	16.3%	16.1%

Table 3. Detailed comparison between YOLOv11_SDI and the improved model on the EFOD_Drone dataset.

Class	Model	Precision	Recall	mAP		Inference	GFLOPS
Class	Model	Precision	Recall	@0.50	@0.50:0.95	(ms)	GFLOPS
Nest	SDI	93.3	93.0	95.9	76.7	120.33	11.5
	BiFPN	93.6	92.2	94.3	71.6	106.28	10.8
	GoldYOLO	93.0	93.4	94.8	74.1	136.08	12.5
	iEMA	91.1	92.3	95.5	71.1	122.71	11.8
	GAM	94.4	90.9	93.1	76.4	129.12	12.0
Kite	SDI	95.0	94.1	96.3	77.7	107.56	11.5
	BiFPN	91.1	90.8	93.7	75.7	101.10	10.8
	GoldYOLO	92.6	93.8	95.5	73.4	126.62	12.5
	iEMA	93.2	87.0	95.3	75.5	110.98	11.8
	GAM	93.5	91.4	94.1	76.2	112.25	12.0
Balloon	SDI	96.2	91.6	95.7	77.3	147.38	11.5
	BiFPN	93.1	92.7	95.4	74.3	136.80	10.8
	GoldYOLO	95.3	91.2	94.5	75.3	154.16	12.5
	iEMA	93.1	88.7	94.2	74.1	152.91	11.8
	GAM	94.5	89.9	92.1	76.3	139.81	12.0
Fire	SDI	93.8	93.8	95.8	70.1	133.13	11.5
	BiFPN	91.0	91.5	95.9	70.8	135.93	10.8
	GoldYOLO	90.7	93.3	94.5	67.8	143.16	12.5
	iEMA	89.7	90.1	94.8	64.3	147.36	11.8
	GAM	92.1	90.7	93.7	69.3	145.94	12.0
Person	SDI	89.7	90.9	94.2	74.2	65.78	11.5
	BiFPN	87.4	90.8	94.0	70.1	57.6	10.8
	GoldYOLO	86.3	91.3	93.4	73.0	66.29	12.5
	iEMA	89.1	87.1	92.4	69.2	52.80	11.8
	GAM	88.9	87.2	91.7	68.5	70.83	12.0
Monkey	SDI	94.4	88.9	93.3	71.4	68.08	11.5
	BiFPN	90.3	88.6	93.6	69.7	58.81	10.8
	GoldYOLO	94.2	87.1	92.5	72.1	70.31	12.5
	iEMA	91.6	88.1	91.9	67.2	67.2	11.8
	GAM	92.5	89.9	92.1	68.92	69.73	12.0
Average	SDI	93.7	92.1	95.2	74.6	107.04	11.5
	BiFPN	91.1	91.1	94.5	72.0	99.42	10.8
	GoldYOLO	92.0	91.7	94.2	72.6	116.10	12.5
	iEMA	91.3	88.9	94.0	70.2	113.98	11.8
	GAM	92.4	89.9	92.8	70.6	119.54	12.0

Table 4. The average performance comparison of different models on the original FOTL_Drone dataset.

Model	Precision	Recall	mAP		Inference	GFLOPS
Model	Precision	Recall	@0.50	@0.50:0.95	(ms)	GFLOPS
YOLOv11	88.9	89.2	89.2	56.1	87.15	8.2
1-YOLOv11_SDI	92.1	89.8	91.0	65.1	107.04	11.5
2-YOLOv11_SDI	91.3	87.6	89.5	63.4	104.44	11.2
3-YOLOv11_SDI	89.9	84.7	83.4	54.3	97.56	10.5
4-YOLOv11_SDI	90.6	89.8	89.7	59.6	102.03	10.9
5-YOLOv11_SDI	87.1	84.6	85.9	52.1	88.83	9.1

Table 5. Comparison of average performance of models on the expanded EFOD_Drone dataset.

Model	Precision	Recall	mAP		Inference	GFLOPS
Model	Precision	Recall	@0.50	@0.50:0.95	(ms)	GFLOPS
YOLOv11	90.7	91.0	92.2	70.1	87.15	8.2
1-YOLOv11_SDI	94.1	94.4	95.2	74.4	107.04	11.5
2-YOLOv11_SDI	93.2	92.3	95.5	72.3	104.44	11.2
3-YOLOv11_SDI	93.1	89.4	93.9	72.4	97.56	10.5
4-YOLOv11_SDI	94.3	92.8	94.9	72.7	102.03	10.9
5-YOLOv11_SDI	92.7	91.2	94.7	71.6	88.83	9.1

Table 6. Detailed comparative analysis of the original YOLOv11 with five variants on the EFOD_Drone dataset.

Model	Category	Precision	Recall	mAP@0.50	mAP@0.50:0.95	Inference (ms)
YOLOv11	Nest	90.6	91.0	93.3	70.8	105.93
	Kite	91.9	92.8	92.8	69.6	100.78
	Balloon	90.2	89.2	91.3	76.0	94.56
	Fire	89.2	90.9	89.8	65.4	106.78
	Person	86.5	89.7	88.7	69.2	54.86
	Monkey	90.7	87.4	90.4	65.1	59.96
1-YOLOv11_SDI	Nest	93.3	93.0	95.9	76.7	120.33
	Kite	95.0	94.1	96.3	77.7	107.56
	Balloon	96.2	91.6	95.7	77.3	147.38
	Fire	93.8	93.8	95.8	70.1	133.13
	Person	89.7	90.9	94.2	74.2	65.78
	Monkey	94.4	88.9	93.3	71.4	68.08
2-YOLOv11_SDI	Nest	91.6	91.6	95.0	72.9	122.84
	Kite	92.5	91.2	94.7	77.1	114.74
	Balloon	94.6	89.5	94.8	76.2	139.91
	Fire	93.8	90.7	95.5	68.3	133.05
	Person	89.6	91.8	94.1	71.9	57.26
	Monkey	91.9	87.6	92.3	69.8	58.86
3-YOLOv11_SDI	Nest	91.0	91.3	94.8	70.3	117.34
	Kite	90.7	90.0	94.5	75.1	100.69
	Balloon	92.3	89.2	93.7	73.3	141.69
	Fire	93.4	89.6	95.4	64.8	123.93
	Person	88.0	91.3	93.9	69.7	47.38
	Monkey	94.7	85.4	92.1	67.2	53.75
4-YOLOv11_SDI	Nest	93.2	93.3	96.0	75.0	120.03
	Balloon	93.9	89.7	94.3	75.9	140.01
	Fire	96.1	92.4	95.2	69.4	128.333
	Person	91.5	91.3	94.5	74.5	58.74
	Monkey	93.7	89.3	93.1	71.1	60.22
5-YOLOv11_SDI	Nest	90.6	93.6	94.3	71.6	88.16
	Kite	90.2	92.8	95.1	75.5	95.42
	Balloon	95.2	89.9	94.4	74.4	132.48
	Fire	90.7	90.7	94.9	67.4	109.53
	Person	87.1	89.2	92.5	69.3	49.98
	Monkey	89.2	87.1	91.7	67.7	57.41

Table 7. Ablation and ordering validation for SDI module: evaluating component roles and spatial-channel sequence effects.

Model	Precision	Recall	mAP		Inference	GFLOPS
Model	Precision	Recall	@0.50	@0.50:0.95	(ms)	GFLOPS
YOLOv11	90.7	91.0	92.2	70.1	87.15	8.2
Spatial attention	92.9	92.6	93.5	72.8	100.56	10.3
Channel attention	92.6	91.8	93.1	71.3	102.18	10.8
Channel–spatial_order	93.6	93.8	94.1	73.5	107.04	11.5
Spatial–channel_order (Ours)	94.1	94.4	95.2	74.4	107.04	11.5

Table 8. Quantitative results of multiple models on the EFOD_Drone dataset.

Model	Parameters (M)	Precision (%)	Recall (%)	mAP@0.50	GFLOPS
YOLOv5 [41]	7.0	91.4	90.9	91.8	16.5
YOLOv7 [42]	37.2	87.2	81.2	85.3	105.3
YOLOX [43]	5.02	92.1	91.0	91.2	12.8
YOLOv8n [44]	3.2	92.7	90.5	92.4	8.3
YOLOv11 [45]	7.5	90.7	91.0	92.2	15.2
RetinaNet [46]	37.7	85.8	85.6	87.1	98.7
Gold_YOLO [47]	5.6	87.5	82.1	87.5	13.5
RT-DETR [48]	41.96	87.9	86.3	88.2	112.4
Ours	3.74	94.1	94.4	95.2	11.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, D.; Yin, Y.; Zhang, H.; Li, C.; Wang, B. YOLOv11-Based UAV Foreign Object Detection for Power Transmission Lines. Electronics 2025, 14, 3577. https://doi.org/10.3390/electronics14183577

AMA Style

Gao D, Yin Y, Zhang H, Li C, Wang B. YOLOv11-Based UAV Foreign Object Detection for Power Transmission Lines. Electronics. 2025; 14(18):3577. https://doi.org/10.3390/electronics14183577

Chicago/Turabian Style

Gao, Depeng, Yihan Yin, Han Zhang, Changping Li, and Bingshu Wang. 2025. "YOLOv11-Based UAV Foreign Object Detection for Power Transmission Lines" Electronics 14, no. 18: 3577. https://doi.org/10.3390/electronics14183577

APA Style

Gao, D., Yin, Y., Zhang, H., Li, C., & Wang, B. (2025). YOLOv11-Based UAV Foreign Object Detection for Power Transmission Lines. Electronics, 14(18), 3577. https://doi.org/10.3390/electronics14183577

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

YOLOv11-Based UAV Foreign Object Detection for Power Transmission Lines

Abstract

1. Introduction

2. Related Work

2.1. Foreign Object Detection Methods

2.2. Foreign Object Detection Datasets

3. Methodology

3.1. The Enhanced YOLOv11 with Spatial-Channel Dynamic Inference

3.2. The Variants YOLOv11_SDI

4. Experiments

4.1. Evaluation Metrics and Dataset

4.2. The Selection of Feature Enhancement Module

4.3. Model Selection

4.4. Ablation Study

4.5. Our Model’s Training

4.6. Quantitative Comparison with Other Approaches

4.7. Visual Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI