YOLOv8-ECCα: Enhancing Object Detection for Power Line Asset Inspection Under Real-World Visual Constraints

Ait el haj, Rita; Benelmostafa, Badr-Eddine; Medromi, Hicham

doi:10.3390/a19010066

Open AccessArticle

YOLOv8-ECCα: Enhancing Object Detection for Power Line Asset Inspection Under Real-World Visual Constraints

by

Rita Ait el haj

^1,*,

Badr-Eddine Benelmostafa

^1,*

and

Hicham Medromi

²

¹

System Architecture Team (EAS), Engineering Research Laboratory (LRI), National High School of Electricity and Mechanic (ENSEM), Hassan II University, Casablanca 20100, Morocco

²

Research Foundation for Development and Innovation in Science and Engineering, Casablanca 20250, Morocco

^*

Authors to whom correspondence should be addressed.

Algorithms 2026, 19(1), 66; https://doi.org/10.3390/a19010066

Submission received: 16 September 2025 / Revised: 29 November 2025 / Accepted: 27 December 2025 / Published: 12 January 2026

Download

Browse Figures

Versions Notes

Abstract

Unmanned Aerial Vehicles (UAVs) have revolutionized power-line inspection by enhancing efficiency, safety, and enabling predictive maintenance through frequent remote monitoring. Central to automated UAV-based inspection workflows is the object detection stage, which transforms raw imagery into actionable data by identifying key components such as insulators, dampers, and shackles. However, the real-world complexity of inspection scenes poses significant challenges to detection accuracy. For example, the InsPLAD-det dataset—characterized by over 30,000 annotations across diverse tower structures and viewpoints, with more than 40% of components partially occluded—illustrates the visual and structural variability typical of UAV inspection imagery. In this study, we introduce YOLOv8-ECCα, a novel object detector tailored for these demanding inspection conditions. Our contributions include: (1) integrating CoordConv, selected over deformable convolution for its efficiency in preserving fine spatial cues without heavy computation; (2) adding Efficient Channel Attention (ECA), preferred to SE or CBAM for its ability to enhance feature relevance using only a single 1D convolution and no dimensionality reduction; and (3) adopting Alpha-IoU, chosen instead of CIoU or GIoU to produce smoother gradients and more stable convergence, particularly under partial overlap or occlusion. Evaluated on the InsPLAD-det dataset, YOLOv8-ECCα achieves an mAP@50 of 82.75%, outperforming YOLOv8s (81.89%) and YOLOv9-E (82.61%) by +0.86% and +0.14%, respectively, while maintaining real-time inference at 86.7 FPS—exceeding the baseline by +2.3 FPS. Despite these improvements, the model retains a compact footprint (28.5 GFLOPs, 11.1 M parameters), confirming its suitability for embedded UAV deployment in real inspection environments.

Keywords:

Unmanned Aerial Vehicles (UAVs); power line inspection; object detection; YOLO

1. Introduction

Unmanned Aerial Vehicles (UAVs) have become an increasingly reliable option for power-line inspection in industrial environments, offering tangible improvements in operational efficiency, safety, and cost control [1]. By enabling frequent and remote monitoring of electrical infrastructure, UAV-based systems help mitigate power interruptions, reduce human exposure to hazardous sites, and support more proactive maintenance strategies. When coupled with deep-learning-based object detection, these platforms become even more valuable: they can automatically identify and localize key components of transmission lines, thereby accelerating inspection workflows and ensuring traceable documentation for future assessments and infrastructure planning [2].

Within an automated UAV inspection pipeline, the detection stage plays a particularly central role. It represents the first step where raw aerial images are translated into structured, actionable information through the precise localization of critical components such as insulators, dampers, yokes, spacers, and lightning-rod shackles. These elements often exhibit substantial intra-class variation in shape, material (e.g., glass versus polymer), and scale, which complicates their recognition. The reliability of all subsequent modules—ranging from defect classification to 3D reconstruction and change detection—hinges directly on the accuracy of this initial detection step. As depicted in Figure 1, detection effectively acts as the gateway through which downstream analyses operate. Errors at this stage tend to propagate throughout the entire pipeline, potentially compromising the integrity of the inspection. Given this pivotal function, our work places particular emphasis on strengthening detection under realistic and often challenging field conditions, ensuring that the system’s foundation remains both robust and resilient.

Despite their growing maturity, detecting components in real-world UAV imagery remains difficult due to the inherent visual complexity of outdoor environments. Inspection scenes typically involve multi-scale object distributions, frequent occlusions, dense structural arrangements, variable lighting, and highly diverse camera viewpoints. These factors collectively hinder the performance of conventional detectors. To address such challenges, we introduce YOLOv8-ECCα, an enhanced version of the YOLOv8 architecture specifically adapted to the visual variability and scale dynamics encountered in power-line inspection.

Our contributions focus on three targeted improvements. First, we incorporate CoordConv layers into the backbone’s C2f modules to strengthen the encoding of positional information and improve robustness to viewpoint distortions. Second, we embed an Efficient Channel Attention (ECA) module at the junction between the backbone and the neck, enabling the network to better highlight meaningful feature channels and suppress background clutter. Third, we replace the default CIoU loss with Alpha-IoU, a more flexible regression loss known to stabilize optimization and improve bounding-box alignment under low-overlap or partially occluded scenarios. Importantly, these additions preserve the lightweight and real-time characteristics of YOLOv8; given that the baseline has already demonstrated strong performance on embedded hardware [3], we hypothesize that YOLOv8-ECCα remains similarly deployable on UAV-integrated processors.

Our evaluation relies on the InsPLAD-det dataset (https://github.com/andreluizbvs/InsPLAD (accessed on 25 December 2025)), a comprehensive benchmark comprising seventeen types of power-line assets and six categories of defects—including corrosion, breakage, and foreign objects such as bird nests—captured under diverse viewpoints and environmental conditions. Experimental results indicate that our model consistently outperforms established detectors such as YOLOv7, YOLOv8, and YOLOv9, while sustaining high inference speeds compatible with real-time inspection requirements.

2. Related Work

Numerous artificial intelligence models have been proposed for visual object detection, ranging from classical convolutional neural networks (CNNs) to more recent transformer-based architectures. Traditional CNN-based detectors such as Faster R-CNN, SSD, and RetinaNet laid the groundwork for modern detection systems, while transformer-based models—including DETR, DINO, and Swin Transformer—introduced a paradigm shift by relying on self-attention and global context modeling [4,5]. These transformer-based approaches have achieved impressive accuracy across various domains, including aerial and industrial inspection, largely thanks to their capacity for scale invariance, robustness to object deformation, and modeling of long-range dependencies.

Despite these strengths, transformer detectors generally require substantial computational resources, both in terms of memory and inference time. This limitation often restricts their use on lightweight or embedded platforms such as UAVs. For this reason, the YOLO (You Only Look Once) family remains the most widely deployed solution in real-time industrial applications and UAV-based inspection tasks [6]. Since YOLOv5, the framework has evolved rapidly—through YOLOv6, YOLOv7, YOLOv8, and more recently YOLOv10 [7]—each iteration introducing architectural refinements such as anchor-free heads, decoupled classification/regression branches, more efficient neck structures, and improved training strategies, all while maintaining the real-time performance that characterizes YOLO models.

Within the domain of power line inspection, a growing number of studies have customized YOLO-based detectors to better address the complexities of real-world aerial imagery [8]. These challenges include multi-scale object distribution, large variations in object size within the same class, cluttered backgrounds, occlusions from structural elements or vegetation, perspective distortions, and inconsistent lighting. Several research efforts have centered on improving YOLOv5. For example, Chen et al. [9] enhanced YOLOv5s by reclustering anchor boxes using k-means, integrating CBAM attention into the backbone, and employing focal loss to mitigate class imbalance. Their improved model reached a 98.1% mAP while retaining real-time inference. Similarly, Hu et al. (2023) [10] strengthened YOLOv5s for “self-exploding” insulator faults by adding a BiFPN module for multi-scale fusion, an SPD (space-to-depth) operation for better small-feature extraction, and dual attention mechanisms (CBAM and SimAM). These refinements delivered higher precision (+2.0%), recall (+0.9%), and mAP (+1.5%) compared to the original YOLOv5s, particularly in the detection of tiny defects under complex backgrounds.

Zhang et al. (2024) [11] developed Insulator-YOLO, a YOLOv5-based variant optimized for small-defect detection and efficiency. Their modifications included replacing the CSPDarknet backbone with a lightweight GhostNetV2, adding SE and CBAM attention modules, and incorporating a bi-directional feature pyramid (BiFPN). Loss optimization combined CIoU with normalized Wasserstein distance (NWD) and focal loss to address class imbalance. These enhancements improved mAP from 86.45% to 89.65%, outperforming YOLOv5, YOLOv7, and YOLOv8 on their insulator datasets.

Beyond YOLOv5, other researchers have leveraged the capacity of YOLOv7 models. Bojian Chen et al. (2024) [12] introduced ID-YOLOv7, which integrates a novel Edge Detailed Shape Data Augmentation (EDSDA) method to highlight insulator contours, alongside a Cross-Channel and Spatial Multi-Scale Attention (CCSMA) module and a redesigned Re-BiC neck for richer feature fusion. They also incorporated an improved MPDIoU loss to refine localization. These innovations increased YOLOv7’s mAP by +7.2%—reaching 85.7% on a Su-22 kV insulator dataset—and achieved 90.3% mAP at 53 FPS on PASCAL VOC, illustrating its robustness in complex visual conditions. In another lightweight-focused effort, Yulu Zhang et al. (2023) [13] proposed a compact YOLOv7 variant containing Depthwise Separable Convolutions fused with SE attention (DSC-SE) and GSConv layers in the neck, significantly reducing model size and computation. The resulting architecture reached 95.2% accuracy while weighing only 7.9 MB—running at 13 FPS on a Jetson Nano, demonstrating feasibility for UAV-embedded deployments.

The introduction of YOLOv8 further expanded the landscape of one-stage detectors for power line inspection. Zhang et al. (2025) [14] proposed MRB-YOLOv8, which incorporates Multi-Spectral Channel Attention and replaces YOLOv8’s standard C2f modules with Receptive Field Attention Convolution (RFAConv) [15]. Additional improvements included a weighted BiFPN and an extra detection scale, enabling more robust detection of defects such as flashover, breakage, and self-explosion. Their enhanced model achieved significant gains of +3.2% mAP@50 and +3.6% mAP@50:95 over the baseline. In a similar spirit, Du Zhang et al. (2023) [16] introduced PAL-YOLOv8, a pruned YOLOv8-nano model optimized for small targets. They integrated PKI-Blocks to simplify C2f structures, applied an Adown downsampling layer inspired by YOLOv9, used Focal-SIoU loss, and performed layer-wise pruning. The resulting model, weighing just 2.7 MB (3.9 GFLOPs), achieved 95.0% mAP@50—surpassing YOLOv8-nano by 5.5% and outperforming YOLOv9-tiny by 2.6%.

Collectively, these works highlight a consistent trend: integrating attention mechanisms (CBAM, ECA, SE), advanced multi-scale fusion structures (BiFPN, FPN variants), specialized loss functions (focal loss, IoU variants), and lightweight convolutional blocks (GhostNet, DSC-SE, GSConv) can markedly improve the performance of YOLO-based detectors in the demanding context of power line inspection.

Following this direction, YOLOv8-ECCα incorporates three targeted enhancements to strengthen detection in UAV imagery. First, CoordConv layers are integrated into the YOLOv8 backbone to explicitly encode positional information, improving sensitivity to spatial alignment—a key factor in mitigating perspective distortions and varying object orientations. Second, an Efficient Channel Attention (ECA) module adaptively reweights feature channels to better separate relevant insulator features from complex backgrounds, drawing inspiration from prior ECA-based YOLO improvements but applied within the YOLOv8 architecture. Third, Alpha-IoU is adopted for bounding-box regression, providing a more flexible gradient landscape and improving localization for both small and large targets. Together, these elements—CoordConv, ECA, and Alpha-IoU—form the “ECCα” design and push performance forward for UAV-based power line asset detection.

3. Materials and Methods

3.1. Justification for Selecting YOLOv8 as the Baseline Model

In developing a detection model for UAV-based inspection of electrical infrastructure, selecting an appropriate baseline architecture is a crucial design choice. Although recent detectors—ranging from YOLOv9 to YOLOv12—have introduced incremental improvements, our decision to build upon YOLOv8 [17] is both intentional and pragmatic. YOLOv8 offers a balanced combination of architectural maturity, community support, and deployment readiness, three aspects that are particularly important in real-world aerial inspection applications.

Unlike some experimental architectures that emerge from academic prototypes but lack long-term maintenance, YOLOv8 benefits from continuous development by Ultralytics, the creators of the YOLO ecosystem. This active maintenance ensures stability, reliable tooling, and wide community adoption. For field-oriented applications—where reproducibility, reliability, and technical support are often just as important as model accuracy—such an ecosystem provides a clear advantage. YOLOv8 is therefore not only an accurate detector but also a production-ready framework that integrates smoothly into end-to-end UAV inspection pipelines with minimal engineering overhead.

From an architectural standpoint, YOLOv8 follows a modular design composed of three main components [18]: the backbone, the neck, and the detection head.

Backbone

The backbone performs feature extraction, progressively converting the input image into increasingly abstract feature representations. YOLOv8 employs a sequence of Conv → BatchNorm → SiLU blocks interleaved with C2f (Cross-Stage Partial with fusion) modules. As illustrated in the upper-right subfigure of Figure 2, each C2f module splits the input feature map, processes part of it through two successive Bottleneck layers, and then concatenates the outputs before a final convolutional fusion. This structure enhances gradient flow, encourages multi-path feature representation, and avoids unnecessary redundancy, all while maintaining high computational efficiency and a relatively small parameter count.

At the end of the backbone, YOLOv8 incorporates the SPPF (Spatial Pyramid Pooling—Fast) module, which expands the receptive field without reducing spatial resolution. As shown in Figure 2, SPPF applies three sequential 5 × 5 max-pooling layers whose outputs are stacked to encode broader spatial context. This step enriches the high-level semantic representation—capturing patterns such as shape, structural arrangement, and category-level cues—before the features are passed to the neck.

b.: Neck

The neck is responsible for multi-scale feature fusion, a critical requirement for aerial imagery where both large structures (e.g., transmission cables) and very small objects (e.g., insulators) must be detected simultaneously. YOLOv8 adopts a PAN-FPN–inspired structure [19,20], combining top-down and bottom-up feature flows. Feature maps from different stages are upsampled, merged across scales via concatenation, and further processed by C2f blocks to mix contextual information.

This bidirectional fusion strengthens the interaction between low-level spatial details and high-level semantic features, enabling the model to maintain fine-grained localization while preserving global context across multiple resolutions.

c.: Detection Head and Loss

The detection head in YOLOv8 follows a decoupled design, where object classification, objectness estimation, and bounding box regression are processed through separate branches. This architectural choice helps to minimize interference between tasks and allows each branch to be optimized more effectively for its respective objective.

For bounding box localization, YOLOv8 adopts the Complete Intersection over Union (CIoU) loss [21], an extension of standard IoU that incorporates both positional and geometric alignment. This loss encourages predicted boxes to not only overlap well with the ground truth but also align closely in terms of center location and aspect ratio. The CIoU loss is formally defined as:

L_{Ciou} = 1 - IoU (B, B^{*}) + \frac{ρ^{2} (c, c^{*})}{d^{2}} + α v

(1)

where

$IoU (B, B^{*})$ denotes the intersection-over-union between the predicted bounding box BB and the ground-truth box B ∗ B*;
$ρ^{2} (c, c^{*})$ is the squared Euclidean distance between the center points c and c*, promoting accurate centering of predictions;
d is the diagonal length of the smallest enclosing box covering both B and B*, serving as a normalization factor;
v measures the divergence between the aspect ratios of the two boxes;
$α$ is a scaling factor that balances the influence of the aspect ratio penalty based on the current IoU.

This comprehensive loss structure is particularly suited to aerial inspection tasks, where the shape, scale, and position of detected components (e.g., insulators, clamps, cables) vary significantly and often appear in complex or distorted perspectives.

To further improve localization accuracy, YOLOv8 integrates Distribution Focal Loss (DFL) [22] into the regression pipeline. Rather than predicting continuous bounding box offsets directly, DFL treats localization as a classification task over discrete bins, allowing the model to learn a distribution over sub-pixel positions. The loss is computed as:

L D F L = - \sum_{b = 1}^{B} p_{b}^{*} \log (\hat{p_{b}})

(2)

where

$p_{b}^{*}$ is the soft target probability assigned to bin b, typically computed using linear interpolation between two neighboring bins;
$\hat{p_{b}}$ is the model’s predicted probability for bin bb;
and B is the total number of bins (e.g., 16 or 32).

This bin-based representation enables the model to achieve sub-pixel precision in bounding box localization—a critical requirement when detecting fine-grained infrastructure details, such as cable joints, micro-cracks, or small electrical components.

For classification and objectness confidence, YOLOv8 uses Binary Cross-Entropy (BCE) for multi-label classification, and either BCE or Focal Loss for objectness estimation, depending on the degree of class imbalance in the dataset. Focal Loss is particularly effective in sparse detection scenarios, as it reduces the contribution of well-classified background examples and focuses training on harder, informative samples.

All loss components are combined into a unified training objective:

L t o t a l = λ c l s \cdot L c l s + λ o b j \cdot L o b j + λ b o x \cdot L r e g

(3)

where

Lcls: classification loss (Binary Cross-Entropy);
Lobj: objectness loss (Focal Loss or BCE);
Lreg = LCIoU + LDFL: composite regression loss.

The scalar coefficients λcls, λobj, and λbox are hyperparameters used to balance the relative importance of the three tasks.

In summary, YOLOv8n combines architectural efficiency with strong detection performance and real-time feasibility, making it an ideal candidate for field deployment. Its modular structure allows for targeted enhancements, and its lightweight profile ensures compatibility with edge devices. In the next section, we build upon this foundation to present YOLOv8-ECCα, a specialized variant tailored to address the visual and operational challenges of power line inspection.

3.2. YOLOv8-ECCα: Enhanced Architecture

Our improved model, dubbed YOLOv8-ECCα, aims to bolster the detector’s robustness to the complex visual challenges of power line inspection (e.g., multi-scale objects, occlusion, perspective distortion, cluttered backgrounds) while maintaining a lightweight design. The name “ECCα” reflects the three modular enhancements introduced: Efficient Channel attention (ECA), CoordConv layers, and an α-IoU loss. Rather than fundamentally altering the YOLOv8 architecture, we insert these components as streamlined plugins to strengthen feature learning and localization. The overarching goal is to improve the model’s generalization to difficult scenarios without inflating the model size or compromising real-time inference. Below, we detail each added component and its role and why we introduced it in the YOLOv8-ECCα architecture (see Figure 3).

3.2.1. CoordConv-Augmented C2F

To improve spatial reasoning in object detection tasks under real-world conditions, we integrate Coordinate Convolution (CoordConv) layers into the C2f blocks of the YOLOv8 backbone, specifically replacing the standard 3 × 3 convolution at the input of each stage (P2–P4). This design choice directly addresses a known limitation of traditional convolutional neural networks: their translation invariance, which leads to a lack of awareness of absolute spatial positions within an image.

As illustrated in Figure 4, a conventional convolutional layer (left) takes input feature maps of shape w × h × C and applies convolution operations without incorporating explicit spatial coordinate information. This design works well for many tasks but becomes problematic when object localization depends on where an object appears in the image, such as along the frame’s edges or under varying perspectives in UAV-acquired imagery.

In contrast, the CoordConv layer (right side of Figure 4) augments the input tensor by concatenating two additional channels: the i-coordinate (horizontal position) and the j-coordinate (vertical position). This results in a feature map of size w × h × (C + 2), which is then fed into the convolutional operation. This modification gives the model explicit access to positional information, allowing it to learn spatially aware feature representations.

In our implementation, each CoordConv layer was defined using a 3 × 3 kernel, stride = 1, and padding = 1, ensuring compatibility with the original YOLOv8 backbone dimensions. The integration increased the total parameter count by less than +0.02 M and had a negligible effect on GFLOPs (+0.01 G), confirming its lightweight nature.

This mechanism is particularly advantageous in UAV-based power line inspection for the following reasons:

Edge Object Localization: In wide-angle drone images, critical components such as insulators or dampers may appear near the borders of the frame. CoordConv improves the network’s ability to detect such edge-located targets by embedding coordinate priors.
Small Target Sensitivity: The small scale and sparse distribution of components in aerial images make them difficult to detect with regular convolutions. CoordConv enhances sensitivity to spatial variation, reducing false negatives.
Viewpoint Robustness: UAV perspectives vary significantly due to pitch, yaw, and altitude changes. By learning from absolute positions, the model gains resilience against rotation and perspective distortion—challenges common in real-world drone inspection.

Overall, the use of CoordConv—visualized in Figure 4—enables the model to better capture geometric relationships and positional priors, enhancing feature alignment across scales and improving detection accuracy in complex visual scenes. Importantly, this improvement is achieved with minimal computational overhead, preserving the model’s suitability for real-time UAV deployment.

3.2.2. Efficient Channel Attention (ECA) Module

To enhance feature discrimination while maintaining low computational overhead, we integrate the Efficient Channel Attention (ECA) module at the interface between the backbone and the neck of the YOLOv8-ECCα architecture. The ECA mechanism, originally proposed by Wang et al. (2020) [23], replaces the costly squeeze-and-excitation (SE) block with a streamlined alternative that maintains accuracy without introducing fully connected layers or dimensionality reduction.

As illustrated in Figure 5, the ECA block begins by applying a global average pooling (GAP) operation to the input feature map X, compressing each spatial channel into a single scalar value. Instead of learning channel dependencies via dense layers, ECA employs a 1D convolutional layer with a dynamically determined kernel size k = ψ(C), which captures local cross-channel interactions. The resulting attention weights are then passed through a sigmoid activation and multiplied element-wise with the original feature map, yielding the recalibrated output X^.

This efficient design ensures that important feature channels—those carrying semantic information about power line components—are enhanced, while channels associated with noise or background clutter are suppressed. The module introduces only a negligible number of additional parameters, making it especially suitable for real-time UAV-based deployment, where computational resources are constrained.

In UAV imagery used for power line inspection, ECA provides three principal benefits:

Complex Background Suppression: Drone-captured scenes often exhibit visually rich and cluttered backgrounds (e.g., vegetation, sky, infrastructure). ECA selectively boosts feature responses that correspond to relevant objects (e.g., insulators, spacers), improving signal-to-noise separation.

Multi-Scale Feature Enhancement: Located at the transition to the neck—where multi-scale fusion occurs—ECA strengthens channels that carry salient cues from small or occluded objects. This helps preserve detail in high-level feature maps and supports consistent detection across scales.

Real-Time Efficiency: Unlike heavier attention mechanisms, ECA’s kernel-based strategy avoids complex operations and supports fast inference. This allows the model to maintain high detection precision without sacrificing speed, ensuring suitability for embedded UAV systems.

As illustrated in Figure 3, the ECA block is inserted after the final C2f modules in the backbone and before feature fusion operations begin in the neck. This specific placement is strategically motivated: the backbone is responsible for extracting hierarchical semantic features from the input image, while the neck aggregates multi-scale information from these features for final detection. By inserting ECA at the transition point, the model can selectively enhance semantically meaningful channels just before multi-scale fusion occurs. This ensures that the most informative signals—particularly those related to small or occluded objects—are preserved and emphasized as they propagate through the subsequent stages.

By integrating ECA at this critical junction, our YOLOv8-ECCα model gains robustness in complex visual environments while preserving the lightweight footprint necessary for on-board inference.

3.2.3. α-IoU Losses for Bounding Box Regression

While YOLOv8 originally employs the CIoU loss to guide bounding box regression, this formulation, despite its geometric completeness, can suffer from gradient saturation under low-overlap scenarios—common in UAV-based imagery where occlusion, perspective distortion, or small object scale frequently hinder accurate alignment. To address this limitation, we upgrade the localization loss by introducing α-IoU [24], a generalized version of IoU-based losses that embeds a tunable exponent α into the IoU term.

The intuition behind α-IoU is to reshape the loss surface by applying a power transformation to the IoU score, yielding:

L_{α - IoU} = 1 - {IoU}^{α}

(4)

This flexible formulation enables stronger or weaker penalization depending on the value of α. Specifically, values α < 1 attenuate the penalty for low-IoU predictions—allowing the model to continue learning from partial overlaps—while α > 1 emphasizes precision when overlaps are already high. In our setting, this adaptability is particularly beneficial for the aerial inspection domain, where components are often partially visible, distant, or geometrically distorted.

In our experiments, we empirically set α = 1.5, after limited tuning in the range [1.0–2.0], which yielded the most stable convergence and highest mean IoU on the validation set. Although the original α-IoU study recommends α ≈ 3 for large-scale generic detection tasks, we found that such a steep setting led to unstable gradients under the low-overlap, small-object conditions typical of UAV-based imagery. The softer exponent (α = 1.5) preserved gradient flow for partially visible or distant components—achieving a balanced trade-off between precision and recall.

By replacing CIoU with α-IoU in the regression branch, we provide the model with a more informative and smoother gradient landscape during training. This results in improved convergence behavior and better bounding box alignment, especially for hard-to-localize objects such as small shackles, partially occluded insulators, or components observed from extreme angles. Importantly, this modification is purely in the training objective and incurs no additional computational cost during inference, making it suitable for real-time UAV deployments.

3.3. Dataset

To evaluate our approach under these real-world constraints, we employ the InsPLAD-det dataset, a realistic and publicly available benchmark specifically designed for UAV-based power line inspection tasks [25]. This dataset encompasses 17 distinct asset categories commonly encountered in power transmission infrastructure, including insulators, shackles, yokes, dampers, ID plates, and spacers. As shown in Table 1, it comprises over 28,000 annotated instances distributed across a wide range of components, with significant variation in both object scale and frequency. The total number of images and annotations per category highlights the diversity and richness of the dataset, making it well-suited for evaluating fine-grained object detection models in adverse settings.

An illustration of sample scenes from the InsPLAD-det dataset is shown in Figure 6. These examples highlight key visual challenges: high object density, overlapping structures, and the presence of background clutter such as vegetation and metallic noise. The camera viewpoints also vary considerably, ranging from lateral and top-down views to oblique perspectives, further testing the robustness of detection models.

Overall, the InsPLAD dataset provides a comprehensive and challenging testbed for developing and benchmarking detection methods that aim to operate reliably under realistic UAV inspection conditions.

4. Experiments

4.1. Environment Configuration

The training and evaluation procedures were carried out in a stable computing environment designed to meet the requirements of deep learning–based object detection. All experiments were performed on a Windows 10 (64-bit) system, which provided a consistent and reliable platform for model development and testing.

The hardware setup included an Intel Core i5-10400F processor running at 2.90 GHz, offering sufficient multi-core performance for data preprocessing and general pipeline management. GPU-accelerated training was conducted on an NVIDIA GeForce RTX 2080 Ti with 11 GB of VRAM, enabling efficient parallel computation and smooth handling of high-resolution image inputs.

Model development was carried out using Python 3.7, a widely adopted language within the machine learning community due to its rich ecosystem of libraries and tools. PyTorch 1.8 served as the primary deep learning framework, selected for its dynamic computation graph and flexibility in implementing custom architectures.

GPU acceleration leveraged CUDA 10.1, which optimized matrix operations and significantly reduced training time by fully utilizing the capabilities of the RTX 2080 Ti.

4.2. Models

This section presents the models selected for comparative evaluation, focusing on state-of-the-art YOLO architectures that reflect different design generations and levels of complexity. The objective is to benchmark the proposed YOLOv8-ECCα model against established detectors that are widely recognized for their performance in object detection tasks, particularly under real-world conditions involving multi-object, multi-scale scenes.

The selected models include:

YOLOv7-L: a large variant of the YOLOv7 family, known for its strong accuracy due to a well-balanced backbone and neck design. YOLOv7 [26] introduces re-parameterized convolutions and efficient training strategies that improve performance on standard benchmarks.
YOLOv8m: the medium variant of YOLOv8, representing the default performance-capacity trade-off in the Ultralytics YOLOv8 lineup. It serves as a direct architectural sibling to our baseline (YOLOv8s), with increased depth and width for better feature representation.
YOLOv9-E/s [27]: a more recent architecture incorporating generalized efficient layer aggregation and task-aligned detection head, designed to push accuracy while maintaining computational efficiency. It reflects the latest trends in feature fusion and decoupled head design.

These models were chosen for their representativeness of the recent evolution of the YOLO family, as well as for their proven effectiveness on various public benchmarks. Although they differ in size and computational load, they provide meaningful reference points to assess the accuracy and robustness of our model under the same evaluation conditions. Also, it is important to note that our primary comparison was performed against the YOLOv8-s baseline, which shares an identical model scale and computational footprint with the proposed YOLOv8-ECCα (≈11 M parameters, ≈28 GFLOPs). This ensures a fair and balanced evaluation of efficiency and accuracy, avoiding any artificial advantage that could arise from comparing models of different sizes.

The comparisons with larger variants such as YOLOv8-m, YOLOv7-L, and YOLOv9-E are included solely for contextual understanding, demonstrating that our improved small-scale model can effectively compete with—and in some cases outperform—these heavier architectures in both precision and speed.

In addition, practical deployment aspects were taken into account. The YOLOv8s backbone was selected as the baseline for YOLOv8-ECCα because it offers an ideal balance between the precision of larger models (e.g., YOLOv8 m/L) and the compactness of lightweight variants (e.g., YOLOv8n or YOLO-Nano). Furthermore, the YOLOv8 framework natively supports export to optimized deployment formats such as ONNX, TensorRT, and TFLite, which facilitate model compression and execution on embedded hardware. These export capabilities, described in the Ultralytics documentation, enable substantial reductions in model size (up to 3×) and inference latency while preserving accuracy, thus reinforcing the suitability of YOLOv8-ECCα for UAV-based inspection scenarios.

4.3. Metrics

To provide a comprehensive and balanced assessment of model performance, we rely on a set of widely adopted metrics that capture both detection accuracy and computational efficiency. These indicators are particularly important in UAV-based inspection of electrical infrastructure, where precision, responsiveness, and scalability are essential.

Our primary accuracy measure is the mean Average Precision at an IoU threshold of 0.50 (mAP@0.50). This metric evaluates the model’s ability to correctly localize and classify objects when there is at least 50% overlap between predicted and ground-truth bounding boxes. It is a standard benchmark in object detection and offers a high-level view of overall detection quality.

To better characterize the trade-off between false positives and false negatives, we also report the F1-score, defined as the harmonic mean of precision and recall. This scalar value provides a balanced perspective on the model’s capability to correctly identify objects while avoiding misclassification. In addition, we separately analyze recall—reflecting the ability to retrieve all relevant objects—and precision, which measures how many predicted positives are correct. High recall is particularly important in safety-critical inspection tasks to prevent missed detections, whereas high precision helps reduce unnecessary alerts or interventions.

Beyond accuracy-oriented metrics, we evaluate the computational profile of each model. GFLOPs (Giga Floating Point Operations) provide an estimate of theoretical complexity by indicating the number of operations required during a forward pass—an especially relevant metric when considering deployment on hardware-constrained systems. We also report the total number of trainable parameters, which reflects the model’s memory footprint and potential suitability for embedded or real-time applications.

Finally, inference speed is quantified using Frames per Second (FPS), which measures how many images the model can process every second. This metric is crucial for real-time inspection workflows, where latency and responsiveness directly influence operational feasibility.

4.4. Training Process

All models were trained under consistent experimental settings to ensure a fair and reproducible comparison. The training process spanned 100 epochs, allowing sufficient time for convergence and stabilization of model performance as shown in Figure 7. The initial learning rate was set to 0.01, and the optimization was carried out using the Adam optimizer, selected for its adaptive learning capabilities and robustness. A momentum parameter of 0.937 and a weight decay coefficient of 5 × 10⁻⁴ were applied to improve generalization and mitigate overfitting. The batch size was set to 4, balancing memory efficiency and gradient stability.

All experiments were conducted using the default Ultralytics YOLOv8 training configuration, ensuring consistency and reproducibility across models. The input image resolution was fixed at 640 × 640 pixels, and training employed a cosine learning rate schedule with an initial learning rate of 0.01. Standard data augmentation techniques were applied, including mosaic augmentation, random horizontal flipping, HSV color-space adjustment, and random scaling, to enhance robustness to illumination changes, object orientation, and background variability. These hyperparameters correspond to the empirically optimized default configuration of the Ultralytics YOLOv8 framework [28], widely validated by both its developers and the research community to provide stable convergence and strong generalization across diverse object detection tasks. Minor adjustments were made to accommodate hardware constraints, notably setting the batch size to 4 on a GPU equipped with 11 GB of VRAM, while all other parameters remained consistent with the official configuration.

The dataset was partitioned into training, validation, and test sets in a 7:2:1 ratio using a stratified random sampling strategy applied to the list of image labels rather than the raw image tensors. The implementation leveraged the StratifiedShuffleSplit utility from the scikit-learn library, which preserves the per-class annotation ratios of the InsPLAD-det dataset and prevents overrepresentation of dominant categories such as glass insulators. This approach ensured that rare classes—such as polymer insulator tower shackles or spacers—were adequately represented in all subsets. A fixed random seed (42) was used to guarantee reproducibility and to maintain consistent evaluation across all experimental runs.

5. Results

5.1. Comparative Analysis of Our Contribution Against Recent Advancements

This section presents a comparative analysis of our proposed model, YOLOv8-ECCα, against four widely recognized YOLO variants: YOLOv7-L, YOLOv8m, YOLOv8s, and YOLOv9-E. The objective is to assess detection performance and computational efficiency under a unified benchmark using four key metrics: mAP@50, GFLOPS, number of parameters, and inference speed (FPS). The results are summarized in Table 2.

As shown in Table 2, the proposed YOLOv8-ECCα achieves the best overall performance across all evaluated metrics. It attains an mAP@50 of 82.75%, an F1-score of 82.00%, a recall of 79.92%, and a precision of 83.67%. Compared with the second-best model YOLOv9-E (82.61% mAP, 81.00% F1, 79.21% recall, 82.95% precision), YOLOv8-ECCα provides slightly higher accuracy (+0.14% mAP) and F1 (+1%), confirming more stable detection confidence while maintaining similar recall. When benchmarked against the baseline YOLOv8s (81.89% mAP, 79.50% F1), the proposed model yields an improvement of +0.86% mAP and +2.5% F1, together with a 2.2% gain in precision, showing that the added CoordConv, ECA, and Alpha-IoU modules substantially reinforce feature discrimination and localization under challenging UAV inspection scenes.

Additionally, comparison with YOLOv9-s (a similarly sized model)—which delivers 82.30% mAP, 81.10% F1, 79.10% recall, 82.40% precision, and 65.5 FPS—further strengthens the fairness of the evaluation. Although YOLOv9-s exhibits slightly better accuracy than YOLOv8-s, YOLOv8-ECCα still achieves superior performance in both accuracy and throughput, validating its competitive advantage among similarly sized models.

Notably, YOLOv8-ECCα achieves this superior accuracy while maintaining the lowest computational cost. With only 28.52 GFLOPS, it reduces compute demands by 27.7% compared to YOLOv8m and is 2.5× more efficient than YOLOv9-E (72.62 GFLOPS). This efficiency reflects the contribution of lightweight modules—CoordConv, ECA, and Alpha-IoU—which enhance representation power without unnecessary overhead.

In terms of model size, YOLOv8-ECCα maintains a compact architecture with only 11.13 million parameters, making it 57% smaller than YOLOv8m and nearly 4.7× smaller than YOLOv7-L. This compactness facilitates faster loading, reduced memory usage, and easier deployment—especially important for UAV systems or constrained platforms, even though edge deployment is not the primary focus of this work.

The most striking result lies in inference speed. YOLOv8-ECCα reaches 86.73 FPS, outperforming YOLOv8s (84.71 FPS), and running at more than double the speed of YOLOv9-E and YOLOv7-L, and 68% faster than YOLOv8m. This real-time capability makes it well-suited for high-throughput UAV inspections or large-scale offline analysis pipelines.

In summary, YOLOv8-ECCα achieves the best trade-off between detection performance and computational efficiency. It lies on the Pareto front, offering state-of-the-art accuracy with significantly fewer operations and parameters than its competitors. While models like YOLOv9-E offer competitive accuracy, their high computational cost limits their scalability in real-world deployments. YOLOv8-ECCα thus emerges as a compelling solution for practical power line inspection tasks where both speed and precision are critical.

To further validate the robustness of our model, we conducted a multi-seed evaluation using three different random seeds (0, 46, and 2025) as shown in Table 3. Across all runs, our model consistently outperformed the YOLOv8-s baseline on all evaluation metrics, including mAP@50, F1-score, Recall, and Precision. The best results were obtained with seed 46, confirming that performance gains are not tied to a particular initialization and demonstrating strong robustness to seed variability. Even in the least favorable configuration (seed 2025), the proposed model maintained a clear advantage, indicating improved optimization stability and convergence behavior compared to the baseline. To assess the significance of these improvements, we performed a paired t-test across seeds. This test evaluates whether the performance differences between models are consistent rather than due to chance. The results show that gains in mAP@50, F1-score, and Precision are statistically significant (p < 0.05), validating that the observed increases are unlikely to be attributed to random fluctuations. Here, the p-value measures the probability that such improvements could occur randomly; thus, p < 0.05 indicates strong evidence that the performance gains are real rather than incidental. Although Recall also exhibited consistent improvements, its difference did not reach statistical significance, likely due to the limited number of seeds (N = 3), which reduces statistical power. Overall, these findings confirm that our method provides reproducible and statistically supported improvements over YOLOv8-s, strengthening confidence in the reliability and general applicability of the proposed architecture. We acknowledge, however, that a larger number of random seeds (e.g., 20 or more) would further increase statistical power and strengthen these conclusions, but this was limited by computational constraints.

To gain deeper insight into the behavior of YOLOv8-ECCα, we perform a per-class detection analysis on the InsPLAD-det dataset. As illustrated in Figure 8, the model achieves an overall mean Average Precision (mAP@50) of 82.75%, with notable variation across the 17 power-line component classes. The vertical axis of the figure corresponds to the class indices defined as follows: class 0 represents glass insulators; classes 1 to 4 correspond to various shackle types; classes 5 and 6 denote lightning-rod suspension and polymer insulators; and the remaining classes encompass dampers, ID plates, yokes, and spacers.

The best-performing classes include the tower ID plate (class 13, AP = 1.00), polymer insulator upper shackle (class 9, AP = 0.99), and glass insulator (class 0, AP = 0.98). Other categories, such as yoke suspension, spiral damper, and stockbridge damper, also exceed 0.95 AP, reflecting high reliability for large, planar, and visually distinctive components with consistent textures and clear background separation.

Conversely, detection performance decreases for several small or visually ambiguous components. For example, the polymer insulator tower shackle (class 8) attains only 0.56 AP, while spacer (class 10), glass-insulator small shackle (class 2), and glass-insulator big shackle (class 1) range between 0.63 and 0.71 AP. These parts are often partially occluded by conductors or cross-arms, metallically reflective, and similar in shape or material to neighboring elements, making them harder to distinguish under UAV imaging conditions. Moreover, their small pixel footprint and dataset imbalance amplify localization errors, leading to reduced IoU scores even when detections are qualitatively correct.

To illustrate these limitations, Figure 9 presents representative error-analysis examples highlighting missed detections for the glass-insulator small shackle (class 2, AP = 0.67). White circles mark the regions where the detector failed to identify the component, despite its partial visibility. These omission cases typically occur when the object overlaps with metallic cross-arms or background clutter of similar chromaticity, leading to weak feature activation in the neck layers. Such examples confirm that the remaining detection gaps mainly stem from spatial occlusion, background interference, and small-scale targets—common challenges in UAV-based inspection imagery. This visual analysis complements the quantitative per-class AP results, offering a more detailed understanding of model errors and their underlying causes.

This variability is consistent with the physical inspection challenges inherent to aerial imagery—occlusion, viewpoint diversity, and high-contrast lighting—all of which degrade feature distinctiveness. The overall results confirm that the proposed architectural enhancements (CoordConv and Efficient Channel Attention) strengthen both spatial and channel sensitivity. Nonetheless, the remaining low-AP classes highlight potential avenues for improvement, such as multi-scale context aggregation, occlusion-aware attention mechanisms, and targeted data augmentation to increase robustness against small-object and cluttered-scene conditions.

5.2. Ablation Study: Contribution of Each Component

To assess the individual and cumulative contributions of each architectural modification, we conduct a comprehensive ablation study based on the YOLOv8s backbone. Four configurations are evaluated: the baseline YOLOv8s, YOLOv8s with CoordConv, YOLOv8s with CoordConv and Efficient Channel Attention (ECA), and the final version incorporating CoordConv, ECA, and Alpha-IoU—referred to as YOLOv8-ECCα. The evaluation spans both detection quality and computational efficiency, using key metrics including mAP@50, F1-score, recall, precision, GFLOPS, parameter count, and inference speed (FPS). The results are summarized in Table 4.

Each component incrementally improves detection performance. The baseline YOLOv8s reaches 81.89% mAP@50, with an F1-score of 0.78, recall of 77.74%, and precision of 81.47%. Introducing CoordConv yields immediate benefits: mAP@50 increases to 82.29%, while precision improves by over 1.5% and recall by 0.59%. This confirms the effectiveness of coordinate-aware convolutions in improving spatial sensitivity, especially for small and precisely located components—an essential requirement for power line asset detection. Incorporating the Efficient Channel Attention (ECA) module further boosts accuracy to 82.63% mAP@50, with recall reaching 79.87% and F1-score rising to 0.81. These gains reflect ECA’s ability to enhance channel-wise feature discrimination by emphasizing informative signals while suppressing irrelevant noise—a significant advantage in cluttered aerial scenes typical of UAV inspections. Replacing the standard CIoU loss with Alpha-IoU in the final configuration further enhances localization precision. The full model, YOLOv8-ECCα, achieves the best performance across all metrics: 82.75% mAP@50, 0.82 F1-score, 83.67% precision, and 79.92% recall. The superiority of Alpha-IoU, which gives more weight to high-overlap predictions, is particularly relevant in inspection scenarios where precise bounding box alignment is critical for fault identification and maintenance planning. Importantly, all configurations remain highly efficient. The parameter count is essentially constant at ~11.13 M, and GFLOPS remains stable around 28.52 G. The addition of CoordConv also yields a small boost in inference speed, with FPS increasing from 84.71 to 88.45. Even in the final configuration, FPS remains high at 86.73, demonstrating that the enhancements do not compromise real-time applicability. Overall, the ablation study highlights the complementary roles of the proposed modules: CoordConv improves spatial alignment, ECA strengthens feature discrimination, and Alpha-IoU enhances localization quality. Together, they result in a lightweight yet high-performing detector that meets the demanding requirements of UAV-based power line inspection, combining speed, accuracy, and efficiency in a single architecture.

Figure 10 provides a visual comparison of the detection performance between the standard YOLOv8 model and the enhanced YOLOv8-ECCα variant across several typical asset categories found in power transmission infrastructure. The results are illustrated using colored bounding boxes overlaid on the input images; these colors are arbitrary and do not convey any semantic meaning.

The first row displays predictions from a baseline model, YOLOv8, without any additional modules. While the model is capable of detecting multiple components simultaneously—including those with varying sizes and spatial orientations—it frequently exhibits missed detections and inaccuracies. Some bounding boxes are either absent or poorly aligned, compromising the spatial precision required for reliable inspection. In scenarios involving partially occluded components or visually complex backgrounds, the model tends to produce ambiguous results, revealing its limitations in real-world operational conditions. These deficiencies reflect familiar challenges in drone-based inspections, where factors such as occlusion, background clutter, wiring density, and diverse camera angles significantly impact detection robustness.

In contrast, the second row shows predictions produced by the YOLOv8-ECCα model. This enhanced architecture consistently demonstrates precise object localization, even under visually challenging conditions. The model is capable of isolating components accurately despite the presence of small structural details and overlapping elements. Notably, YOLOv8-ECCα successfully detects objects that are partially occluded by metallic structures, indicating strong robustness to occlusion. It also effectively recognizes chains of insulators observed from oblique viewpoints, maintaining well-proportioned and tightly fitted bounding boxes.

Overall, this visual comparison underscores the superior performance of YOLOv8-ECCα in complex scenarios, particularly in terms of:

Encoding spatial relationships and maintaining robustness to viewpoint distortions.
Enhancing object detection in cluttered or congested environments.
Improving bounding box alignment in cases with low overlap or partial occlusion.

These improvements collectively enhance robustness against stochastic visual noise and uncertain detection conditions—an aspect that aligns with the general error-analysis framework discussed in [29].

These results highlight the potential of YOLOv8-ECCα as a reliable model for automated inspection tasks in outdoor settings, where accurate and robust object detection is critical.

6. Discussion

While YOLOv8-ECCα demonstrates clear gains in detection accuracy under demanding visual conditions—particularly in scenes characterized by clutter, occlusion, and strong intra-class variation—it is important to reflect on the broader implications of these results for real-world deployment. One of the strengths of the proposed model lies in its architectural simplicity. Despite the added modules, YOLOv8-ECCα remains aligned with the lightweight design philosophy of YOLOv8-s, a model already recognized for its real-time performance on embedded devices such as NVIDIA Jetson platforms and other edge AI accelerators [30,31].

The enhancements introduced in this work—namely CoordConv layers, the Efficient Channel Attention (ECA) module, and the Alpha-IoU loss—improve feature sensitivity and localization accuracy without imposing significant computational overhead. As reflected in our benchmarks, YOLOv8-ECCα maintains nearly identical FLOPs and parameter counts to its YOLOv8-s baseline, yet consistently achieves higher mAP, F1-score, and recall. Its inference speed, which exceeds 86 FPS on the GPU, remains well above the minimum threshold required for real-time processing, thereby confirming the model’s operational suitability for UAV-based inspection and related time-critical applications.

Beyond algorithmic efficiency, practical deployment considerations must also be addressed. In addition to the model’s lightweight architecture, further reductions in complexity can be achieved through compression strategies such as pruning and quantization, as well as conversion to deployment-oriented formats like ONNX or TFLite. These formats facilitate integration with optimized inference engines (e.g., TensorRT, CoreML, Coral TPU), reducing memory requirements and energy consumption—two essential factors for UAV or edge-AI environments.

Although we did not conduct evaluations directly on embedded hardware, the close computational footprint between YOLOv8-ECCα (28.52 GFLOPs, 11.13 M parameters) and YOLOv8-s—a model already validated on Jetson and other resource-constrained devices [32,33]—suggests that similar performance can be expected in embedded settings. This assumption is further reinforced by prior studies demonstrating the efficiency of CoordConv, ECA, and Alpha-IoU on limited hardware [34]. Nevertheless, a full validation will require targeted experimentation.

To this end, future work will include direct benchmarking on devices such as Jetson Nano, Jetson Orin, and Google Coral TPU. Such evaluations will provide concrete information regarding latency, power consumption, and thermal behavior under real-world operational constraints, ensuring not only accuracy but also practical deployability.

A second limitation concerns dataset diversity. While InsPLAD-det offers a realistic and well-annotated benchmark, its geographic and environmental variability remains limited. This is symptomatic of a broader challenge in this research domain: the scarcity of publicly available datasets that capture the full spectrum of real-world inspection conditions, including weather variations, seasonal effects, and structural differences across power networks. Although we selected InsPLAD-det for its completeness and realism, future efforts will aim to broaden data diversity. This can be achieved by integrating samples from different climatic regions and infrastructure types, and by employing augmentation strategies that simulate adverse weather (e.g., rain, fog, glare, or low illumination). Cross-dataset evaluation, once additional datasets become available, will also be essential to assess model robustness across operational contexts.

We also acknowledge that GPU memory consumption was not directly measured in this study. While FLOPs and parameter size provide helpful indicators of computational efficiency, memory usage may vary significantly depending on runtime configuration and batch size. Future experiments will therefore incorporate detailed memory profiling to better evaluate the model’s suitability for deployment under varying hardware constraints.

In summary, YOLOv8-ECCα strikes a compelling balance between detection performance and computational efficiency. Its lightweight nature, inherited from YOLOv8-s, coupled with targeted architectural enhancements, positions it as a strong candidate for real-time inspection tasks and potential embedded deployment. Moving forward, we plan to validate its behavior on edge devices, investigate its robustness across diverse operational domains, and continue exploring architectural refinements to further enhance scalability. This includes evaluating alternative backbones such as MobileNetV3-Lite or GhostNet, which could reduce complexity while preserving accuracy.

7. Conclusions

This study introduces YOLOv8-ECCα, a lightweight yet effective deep learning model designed to meet the specific challenges of UAV-based power line inspection. Building on the robustness of the YOLOv8 architecture, the proposed approach incorporates three focused enhancements: an Efficient Channel Attention (ECA) module to better balance channel-wise feature importance, CoordConv layers to preserve spatial priors and improve geometric consistency, and the Alpha-IoU loss to strengthen bounding-box regression through adaptive power-based penalization. Together, these elements help address the major obstacles encountered in aerial inspection imagery, notably multi-scale variability, occlusion, and high intra-class diversity.

The model was evaluated on the InsPLAD dataset, a comprehensive real-world benchmark comprising seventeen component categories captured under varied environmental and operational conditions. Experimental results show that YOLOv8-ECCα consistently surpasses recent state-of-the-art detectors—including YOLOv7-L, YOLOv8-m, and YOLOv9—in both accuracy and inference speed. In particular, the model achieves higher mAP@50 and F1-score while requiring fewer parameters and GFLOPs, and maintains an inference rate above 86 FPS, underscoring its suitability for time-sensitive industrial inspection workflows.

Although embedded deployment is not the central focus of this work, it is worth noting that YOLOv8—our architectural foundation—has already demonstrated strong performance on edge platforms such as NVIDIA Jetson and mobile GPUs. Since YOLOv8-ECCα retains an almost identical computational footprint, its compatibility with embedded systems is a reasonable expectation. This positions the model as a promising candidate for on-board UAV inference, where memory and latency constraints remain critical considerations.

Future research will aim to validate this assumption through experiments conducted directly on embedded hardware and to explore additional optimization strategies such as pruning, quantization, and lightweight backbone substitution. Expanding the training data to incorporate broader weather scenarios, more diverse backgrounds, and additional defect types will also be essential to strengthen generalization across different operational contexts.

In summary, YOLOv8-ECCα offers a well-balanced combination of accuracy, speed, and computational efficiency, addressing several key challenges in automated power line inspection. Its modular design and strong performance suggest that it may also be readily transferable to related applications, including infrastructure monitoring, agricultural asset detection, and autonomous navigation, highlighting its potential as a scalable and versatile detection framework.

Author Contributions

Conceptualization, R.A.e.h. and B.-E.B.; methodology, R.A.e.h. and B.-E.B.; software, R.A.e.h. and B.-E.B.; validation, R.A.e.h. and B.-E.B.; formal analysis, R.A.e.h. and B.-E.B.; investigation, R.A.e.h. and B.-E.B.; resources, R.A.e.h. and B.-E.B.; data curation, R.A.e.h. and B.-E.B. writing—original draft preparation, R.A.e.h. and B.-E.B.; writing—review and editing, R.A.e.h. and B.-E.B.; visualization, R.A.e.h.; supervision, H.M.; project administration, H.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

There are no restrictions on the sharing of relevant data in this study.

Acknowledgments

During the preparation of this manuscript, the authors used ChatGPT-4o solely for language refinement, including grammar, phrasing, and text editing. The AI tool did not participate in the research design, data analysis, experimental development, scientific interpretation, or the generation of technical content. All AI-assisted text was fully reviewed, verified, and revised by the authors, who take full responsibility for the final scientific content. All authors have agreed to this acknowledgment.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Mendu, B.; Mbuli, N. State-of-the-Art Review on the Application of Unmanned Aerial Vehicles (UAVs) in Power Line Inspections: Current Innovations, Trends, and Future Prospects. Drones 2025, 9, 265. [Google Scholar] [CrossRef]
Faisal, A.A.; Mecheter, I.; Qiblawey, Y.; Fernandez, J.H.; Chowdhury, M.E.; Kiranyaz, S. Deep Learning in Automated Power Line Inspection: A Review. Appl. Energy 2025, 385, 125507. [Google Scholar] [CrossRef]
Rey, L.; Bernardos, A.M.; Dobrzycki, A.D.; Carramiñana, D.; Bergesio, L.; Besada, J.A.; Casar, J.R. A Performance Analysis of You Only Look Once Models for Deployment on Constrained Computational Edge Devices in Drone Applications. Electronics 2025, 14, 638. [Google Scholar] [CrossRef]
Edozie, E.; Shuaibu, A.N.; John, U.K.; Sadiq, B.O. Comprehensive Review of Recent Developments in Visual Object Detection Based on Deep Learning. Artif. Intell. Rev. 2025, 58, 277. Available online: https://link.springer.com/article/10.1007/s10462-025-11284-w?utm_source=chatgpt.com (accessed on 24 July 2025). [CrossRef]
Cao, J.; Peng, B.; Gao, M.; Hao, H.; Li, X.; Mou, H. Object Detection Based on CNN and Vision-Trcesqmcdansformer: A Survey. IET Comput. Vis. 2025, 19, e70028. Available online: https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cvi2.70028 (accessed on 24 July 2025). [CrossRef]
Ali, M.L.; Zhang, Z. The YOLO Framework: A Comprehensive Review of Evolution, Applications, and Benchmarks in Object Detection. Computers 2024, 13, 336. Available online: https://www.mdpi.com/2073-431X/13/12/336?utm (accessed on 24 July 2025). [CrossRef]
Wang, C.-Y.; Liao, H.-Y.M. YOLOv1 to YOLOv10: The fastest and most accurate real-time object detection systems. APSIPA Trans. Signal Inf. Process 2024, 13, 1–38. [Google Scholar] [CrossRef]
Aitelhaj, R.; Benelmostafa, B.E.; Medromi, H. APF-YOLOV8: Enhancing Multiscale Detection and Intra-Class Variance Handling for UAV-Based Insulator Power Line Inspections. F1000Research 2024, 14, 141. [Google Scholar] [CrossRef]
Chen, C.; Yuan, G.; Zhou, H.; Ma, Y. Improved YOLOv5s model for key components detection of power transmission lines. Math. Biosci. Eng. 2023, 20, 7738–7760. [Google Scholar] [CrossRef]
Hu, C.; Min, S.; Liu, X.; Zhou, X.; Zhang, H. Research on an Improved Detection Algorithm Based on YOLOv5s for Power Line Self-Exploding Insulators. Electronics 2023, 12, 3675. Available online: https://www.mdpi.com/2079-9292/12/17/3675 (accessed on 24 July 2025). [CrossRef]
Zhang, N.; Su, J.; Zhou, Y.; Chen, H. Insulator-YOLO: Transmission Line Insulator Risk Identification Based on Improved YOLOv5. Processes 2024, 12, 2552. Available online: https://www.mdpi.com/2227-9717/12/11/2552 (accessed on 24 July 2025).
ID-YOLOv7: An Efficient Method for Insulator Defect Detection in Power Distribution Network. Available online: https://ouci.dntb.gov.ua/en/works/4YZD8e3l/ (accessed on 24 July 2025).
Zhang, Y.; Li, J.; Fu, W.; Ma, J.; Wang, G. A Lightweight YOLOv7 Insulator Defect Detection Algorithm Based on DSC-SE. PLoS ONE 2023, 18, e0289162. [Google Scholar] [CrossRef]
Xu, J.; Zhao, S.; Li, Y.; Song, W.; Zhang, K. MRB-YOLOv8: An Algorithm for Insulator Defect Detection. Electronics 2025, 14, 830. [Google Scholar] [CrossRef]
Ling, Z.; Xin, Q.; Lin, Y.; Su, G.; Shui, Z. Optimization of autonomous driving image detection based on RFAConv and triplet attention. arXiv 2024, arXiv:2407.09530. [Google Scholar]
Zhang, D.; Cao, K.; Han, K.; Kim, C.; Jung, H. PAL-YOLOv8: A Lightweight Algorithm for Insulator Defect Detection. Electronics 2025, 13, 3500. [Google Scholar] [CrossRef]
Yaseen, M. What is YOLOv8: An In-Depth Exploration of the Internal Features of the Next-Generation Object Detector. arXiv 2024, arXiv:2408.15857. [Google Scholar]
Jocher, G.; Qiu, J.; Chaurasia, A. Ultralytics YOLO, version 8.0.0; Ultralytics: San Francisco, CA, USA, January 2023. Range, K.; Jocher, G. Brief Summary of YOLOv8 Model Structure. GitHub Issue, 2023. Available online: https://github.com/ultralytics/ultralytics/issues/189 (accessed on 27 April 2023).
Li, H.; Xiong, P.; An, J.; Wang, L. Pyramid Attention Network for Semantic Segmentation. arXiv 2018, arXiv:1805.10180. [Google Scholar] [CrossRef]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. arXiv 2017, arXiv:1612.03144. [Google Scholar] [CrossRef]
Zheng, Z.; Wang, P.; Ren, D.; Liu, W.; Ye, R.; Hu, Q.; Zuo, W. Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. arXiv 2021, arXiv:2005.03572. [Google Scholar] [CrossRef]
Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection. arXiv 2020, arXiv:2006.04388. [Google Scholar] [CrossRef]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. arXiv 2020, arXiv:1910.03151. [Google Scholar] [CrossRef]
He, J.; Erfani, S.; Ma, X.; Bailey, J.; Chi, Y.; Hua, X.S. α-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression. Adv. Neural Inf. Process Syst. 2021, 34, 20230–20242. [Google Scholar]
Vieira e Silva, A.L.B.; de Castro Felix, H.; Simões, F.P.M.; Teichrieb, V.; dos Santos, M.; Santiago, H.; Sgotti, V.; Lott Neto, H. Insplad: A dataset and benchmark for power line asset inspection in UAV images. Int. J. Remote Sens. 2023, 44, 7294–7320. [Google Scholar] [CrossRef]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
Wang, C.-Y.; Yeh, I.-H.; Liao, H.-Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar] [CrossRef]
Ultralytics. Ultralytics YOLOv8 Default Configuration File. Available online: https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/default.yaml (accessed on 15 October 2025).
Zhao, H.; Yan, L.; Hou, Z.; Lin, J.; Zhao, Y.; Ji, Z.; Wang, Y. Error Analysis Strategy for Long-Term Correlated Network Systems: Generalized Nonlinear Stochastic Processes and Dual-Layer Filtering Architecture. IEEE Internet Things J. 2025, 12, 33731–33745. [Google Scholar] [CrossRef]
Luo, Y.; Ci, Y.; Jiang, S.; Wei, X. A novel lightweight real-time traffic sign detection method based on an embedded device and YOLOv8. J. Real-Time Image Process 2024, 21, 24. [Google Scholar] [CrossRef]
Neamah, O.N.; Almohamad, T.A.; Bayir, R. Enhancing Road Safety: Real-Time Distracted Driver Detection Using Nvidia Jetson Nano and YOLOv8. In Proceedings of the Zooming Innovation in Consumer Technologies Conference (ZINC), Novi Sad, Serbia, 22–23 May 2024; pp. 194–198. [Google Scholar]
Qengineering. YoloV8-TensorRT-Jetson_Nano: A Lightweight C++ Implementation of YoloV8 Running on NVIDIA’s TensorRT Engine. GitHub Repository. 2024. Available online: https://github.com/Qengineering/YoloV8-TensorRT-Jetson_Nano (accessed on 25 December 2025).
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
Adam, M.A.A.; Tapamo, J.R. Enhancing YOLOv5 for Autonomous Driving: Efficient Attention-Based Object Detection on Edge Devices. J. Imaging 2025, 11, 263. [Google Scholar] [CrossRef]

Figure 1. Diagram illustrating the central role of the detection stage in UAV-based inspection: Yellow rectangles indicate Regions of Interest (ROIs) automatically extracted for various downstream utility tasks, including but not limited to inventory, anomaly detection, and geolocated inspection.

Figure 2. Diagrammatic Representation of YOLOv8 Network Architecture.

Figure 3. Diagrammatic Representation of YOLOv8-ECCα Architecture.

Figure 4. Explanatory Diagram of CoordConv.

Figure 5. Explanatory Diagram of Efficient Channel Attention (ECA) Module.

Figure 6. Sample inspection scenes from the InsPLAD-det dataset, illustrating the diversity of power-line components and their annotated bounding boxes. The red and green boxes highlight detected or labeled objects (e.g., insulators, shackles, dampers, and fittings). The color of the boxes has no semantic meaning and is used only for visual clarity to distinguish overlapping annotations within the same image.

Figure 7. Validation loss evolution and convergence behavior of YOLOv8 and its enhanced variants, evaluated on the validation set.

Figure 8. Per-class detection performance on the InsPLAD dataset using YOLOv8-ECCα (mAP = 82.75%). The class indices shown on the vertical axis correspond, respectively, to the following power line components: 0—glass insulator, 1—glass insulator big shackle, 2—glass insulator small shackle, 3—glass insulator tower shackle, 4—lightning rod shackle, 5—lightning rod suspension, 6—polymer insulator, 7—polymer insulator lower shackle, 8—polymer insulator tower shackle, 9—polymer insulator upper shackle, 10—spacer, 11—spiral damper, 12—stockbridge damper, 13—tower ID plate, 14—vari-grip, 15—yoke, 16—yoke suspension.

Figure 9. Per-class detection performance on the InsPLAD dataset using YOLOv8-ECCα (mAP = 82.75%). The class indices shown on the vertical axis correspond, respectively, to the following power line components: 0—glass insulator.

Figure 10. Illustrative comparison of model predictions YOLOv8 (top)/YOLOv8-ECCα (bottom) improvement on the validation set.

Table 1. Asset overview in InsPLAD-det: Number of Images of the 17 assets, and Total Annotations per asset.

Asset Category	Number of Images	Number of Annotations
Damper—Spiral	943	1020
Damper—Stockbridge	1761	6953
Glass Insulator	2778	2978
Glass Insulator Big Shackle	152	296
Glass Insulator Small Shackle	143	263
Glass Insulator Tower Shackle	106	195
Lightning Rod Shackle	112	195
Lightning Rod Suspension	709	710
Tower ID Plate	242	242
Polymer Insulator	3173	3244
Pol. Insulator Lower Shackle	1760	1824
Pol. Insulator Upper Shackle	1691	1692
Pol. Insulator Tower Shackle	567	567
Spacer	93	94
Vari-grip	560	1008
Yoke	1661	1661
Yoke Suspension	2716	6520

Total Images/Annotations: 10,607/28,933.

Table 2. Comparative results of our contribution against recent advancements.

Model	mAP:50(%)	F1 (%)	Recall (%)	Precision (%)	GFLOPS	Params	Fps
Yolov7-L	78.62	77.40	76.10	78.80	37.637 G	52.401 M	41.96
Yolov8m	82.24	80.20	78.00	82.50	39.441 G	25.857 M	51.73
Yolov8s	81.89	79.50	77.74	81.47	28.538 G	11.136 M	84.71
YOLOv9-s	82.30	81.10	79.10	82.40	26.421 G	7.1 M	65.5
Yolov9-E	82.61	81.00	79.21	82.95	72.624 G	48.6455 M	42.36
Ours	82.75	82.00	79.92	83.67	28.522 G	11.132 M	86.73

Table 3. Multi-seed evaluation of Ours vs. YOLOv8-s.

Seed	Model	mAP@50 (%)	F1 (%)	Recall (%)	Precision (%)
0	Ours	82.75	82.00	79.92	83.67
0	Yolov8s	81.89	79.50	77.74	81.47
46	Ours	82.90	82.20	80.35	83.95
46	YOLOv8-s	81.60	79.20	77.30	81.20
2025	Ours	83.25	82.70	80.85	84.65
2025	YOLOv8-s	82.05	80.10	78.10	82.05

Table 4. Performance Metrics from Ablation study of YOLOv8s Enhancements with CoordConv, ECA, and Alpha-IoU.

Model	mAP:50	F1	Recall	Precision	GFLOPS	Params	Fps
Yolov8s	81.89%	0.78	77.74%	81.47%	28.538 G	11.136 M	84.71
Yolov8s + CoordConv	82.29%	0.79	78.33%	83.07%	28.519 G	11.132 M	88.45
Yolov8s + CoordConv + ECA	82.63%	0.81	79.87%	83.47%	28.522 G	11.132 M	86.73
Yolov8s + CoordConv + ECA + Alpha-IOU (YOLOv8-ECCα)	82.75%	0.82	79.92%	83.67%	28.522 G	11.132 M	86.73

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ait el haj, R.; Benelmostafa, B.-E.; Medromi, H. YOLOv8-ECCα: Enhancing Object Detection for Power Line Asset Inspection Under Real-World Visual Constraints. Algorithms 2026, 19, 66. https://doi.org/10.3390/a19010066

AMA Style

Ait el haj R, Benelmostafa B-E, Medromi H. YOLOv8-ECCα: Enhancing Object Detection for Power Line Asset Inspection Under Real-World Visual Constraints. Algorithms. 2026; 19(1):66. https://doi.org/10.3390/a19010066

Chicago/Turabian Style

Ait el haj, Rita, Badr-Eddine Benelmostafa, and Hicham Medromi. 2026. "YOLOv8-ECCα: Enhancing Object Detection for Power Line Asset Inspection Under Real-World Visual Constraints" Algorithms 19, no. 1: 66. https://doi.org/10.3390/a19010066

APA Style

Ait el haj, R., Benelmostafa, B.-E., & Medromi, H. (2026). YOLOv8-ECCα: Enhancing Object Detection for Power Line Asset Inspection Under Real-World Visual Constraints. Algorithms, 19(1), 66. https://doi.org/10.3390/a19010066

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

YOLOv8-ECCα: Enhancing Object Detection for Power Line Asset Inspection Under Real-World Visual Constraints

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Justification for Selecting YOLOv8 as the Baseline Model

3.2. YOLOv8-ECCα: Enhanced Architecture

3.2.1. CoordConv-Augmented C2F

3.2.2. Efficient Channel Attention (ECA) Module

3.2.3. α-IoU Losses for Bounding Box Regression

3.3. Dataset

4. Experiments

4.1. Environment Configuration

4.2. Models

4.3. Metrics

4.4. Training Process

5. Results

5.1. Comparative Analysis of Our Contribution Against Recent Advancements

5.2. Ablation Study: Contribution of Each Component

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI