An Enhanced YOLOv8 Model with Symmetry-Aware Feature Extraction for High-Accuracy Solar Panel Defect Detection

Lin, Xiaoxia; Xiao, Xinyue; Sun, Lin; Yang, Xiaodong; Leng, Chunwei; Li, Yan; Niu, Zhenyu; Meng, Yingzhou; Gong, Weihao

doi:10.3390/sym17071052

Open AccessArticle

An Enhanced YOLOv8 Model with Symmetry-Aware Feature Extraction for High-Accuracy Solar Panel Defect Detection

by

Xiaoxia Lin

¹,

Xinyue Xiao

¹

,

Lin Sun

^1,*,

Xiaodong Yang

¹,

Chunwei Leng

²,

Yan Li

²,

Zhenyu Niu

²,

Yingzhou Meng

¹

and

Weihao Gong

¹

College of Intelligent Equipment, Shandong University of Science and Technology, Taian 271001, China

²

Hanqing Data Consulting Co., Ltd., Zibo 255000, China

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(7), 1052; https://doi.org/10.3390/sym17071052

Submission received: 3 June 2025 / Revised: 23 June 2025 / Accepted: 1 July 2025 / Published: 3 July 2025

(This article belongs to the Section Engineering and Materials)

Download

Browse Figures

Versions Notes

Abstract

The growing popularity of solar panels is crucial for global decarbonization, but harsh environmental conditions can lead to defects such as cracks, fingerprints, and short circuits. Existing methods face the challenge of detecting multi-scale defects while maintaining real-time performance. This paper proposes a solar panel defect detection model, DCE-YOLO, based on YOLOv8. The model incorporates a C2f-DWR-DRB module for multi-scale feature extraction, where the parallel DRB branch models spatial symmetry through symmetric-rate dilated convolutions, improving robustness and consistency. The COT attention module strengthens long-range dependencies and fuses local and global contexts to achieve symmetric feature representation. The lightweight and efficient detection head improves detection speed and accuracy. The CIoU loss function is replaced with WIoU, and a non-monotonic dynamic focusing mechanism is used to mitigate the effect of low-quality samples. Experimental results show that compared with the YOLOv8 benchmark, DCE-YOLO achieves a 2.1% performance improvement on mAP@50 and a 4.9% performance improvement on mAP@50-95. Compared with recent methods, DCE-YOLO exhibits broader defect coverage, stronger robustness, and a better performance-efficiency balance, making it highly suitable for edge deployment. The synergistic interaction between the C2f-DWR-DRB module and COT attention enhances the detection of symmetric and multi-scale defects under real-world conditions.

Keywords:

defect detection; YOLOv8; multi-scale information; attention mechanism; WIoU

1. Introduction

Despite the numerous environmental problems caused by the use of fossil fuels, the growing global demand for energy continues to increase. This demand is especially driven by the rapid economic growth of developing countries, resulting in increased greenhouse gas emissions and exacerbated pollution problems [1]. Solar energy, as a clean and sustainable energy source, has been expanding its scope of application and is gradually becoming one of the core strategies to mitigate climate change and achieve carbon neutrality. Among these, photovoltaic (PV) power generation technology, as the primary method of harnessing solar energy, is advancing rapidly. Solar panels are the core component of a photovoltaic power system, and their the performance and quality directly determine the efficiency [2].

However, increased reliance on solar energy also brings technical challenges, particularly with regard to the reliability of photovoltaic components. In practical applications, solar panels often exhibit various defects due to several factors. Material limitations, mechanical and thermal deviations during processing, and prolonged operation contribute to defects such as cracks, fingers, black cores, and thick lines [3]. These defects can reduce energy conversion efficiency, lead to power degradation [4,5], and even cause the failure of entire modules, thereby seriously affecting the long-term operation and economic performance of PV systems and increasing maintenance costs. Efficient and accurate defect detection has become crucial to ensure solar panel quality and enhances PV system reliability.

In order to address the above challenges, researchers have explored various optimization strategies for solar energy systems, including material improvements and advanced monitoring technologies. Other areas of solar energy technology have accumulated valuable optimization experiences. For example, parabolic trough solar collectors (PTSC) have improved system robustness by adopting material optimization measures such as nanofluids and selective coatings [6], providing a reference for preventive maintenance of photovoltaic systems. Intelligent image recognition technology has also brought new opportunities for improving defect detection efficiency. Image processing methods based on deep learning, such as the staged reconstruction technique of generative adversarial networks (GAN), have demonstrated significant advantages in extracting micro-features [7]. These methods can further optimize the accuracy of crack recognition.

These challenges are further exacerbated in distributed photovoltaic scenarios, where system constraints and environmental complexity require higher detection accuracy. Green building and low-carbon city concepts have promoted the increased deployment of distributed photovoltaic (PV) systems. These systems are commonly installed in built environments such as industrial parks, commercial rooftops, and residential buildings. However, the available installation area in such environments is limited, and the system structures are typically more compact. This imposes higher requirements on PV module quality, power generation efficiency, and fault detection capabilities [8,9,10]. The performance of PV power generation is closely linked to the quality of solar panels [2,3], and even minor defects can negatively affect output power and operational stability. Furthermore, solar panels operate outdoors for extended periods under harsh conditions such as high temperatures, dust, humidity, and corrosive environments. Without timely maintenance, they are susceptible to cracking, hot spots, and other forms of functional degradation [11]. These factors reduce the operational lifespan of the modules and impair overall system performance. Therefore, developing a reliable and efficient method for defect detection is critical to ensuring the stable operation of centralized and distributed PV systems.

Currently, solar panel defect detection methods mainly include manual, physical, and automated inspection techniques based on machine vision [12]. Among these, manual visual inspection suffers from high subjectivity and low efficiency. Physical inspection methods typically employ sensors such as ultrasonic, laser, and electrical probes to collect surface condition data from solar panels [13,14]. However, these methods can be intrusive and may potentially damage the panels. In contrast, machine vision approaches—based on image processing and feature extraction—offer enhanced objectivity and automation. For example, Tsai et al. [15] proposed an anisotropic diffusion-based detection method that effectively segments microcracks using image enhancement and morphological operations. Kang et al. [16] employed a Kalman filtering algorithm to analyze electrical parameters and identify abnormal power declines, although their method could not localize specific faults. Al-Waisy et al. [17] developed a hybrid classification system by combining Inception-V3 and ResNet50 models for defect classification. Venkatesh et al. [18] proposed a fault identification method using aerial imagery. Their approach integrated CNN-based feature extraction with a decision tree classifier to achieve fault detection.

In recent years, deep learning-based methods have become the dominant paradigm for solar panel defect detection due to their superior feature extraction capabilities. For instance, Cao et al. [19] proposed the YOLOv5s-GBC algorithm to enhance detection accuracy and later developed YOLOv8-gd [20], which achieves a lightweight design through depth-wise separable convolution and a BiFPN structure. Huang et al. [12] introduced YOLOv5-BDL, integrating an improved LCA attention mechanism, while Zhang et al. [21] incorporated a deformable convolutional CSP module and an ECA attention mechanism into YOLOv5. Notably, Rohith et al. [22] developed SparkNet with an innovative Fire Modules architecture, which achieved 95% detection accuracy for surface contaminants through its squeeze-expand operations. In the field of hot spot detection, Khang et al. [23] used the RetinaNet framework to process thermal imaging data and achieve automatic identification of hot spots in photovoltaic modules. Bassil et al. [24] proposed a hybrid detection framework that combines the feature extraction capabilities of EfficientNetB7 and visual Transformers to achieve an accuracy rate of 97% in photovoltaic panel dust recognition. Karakan et al. [25] built a multi-architecture comparison system, and SqueezeNet achieved the best accuracy of single crystal (97.82%)/multiple crystals (96.29%) in EL image defect classification. Similarly, Li et al. [26] proposed GBH-YOLOv5, which enhances detection performance for small objects. In addition to convolutional neural networks, transformer architectures have shown promising potential in PV applications. Dwivedi et al. [27] were the first to apply Vision Transformer (ViT) models to solar panel detection. Zhuang et al. [28] proposed a multi-component attention convolution method that improves feature extraction. To address the issue of data imbalance, Jiang et al. [29] utilized generative adversarial networks (GANs) for data augmentation. Tang et al. [30] enhanced the recognition of linear defects by integrating the Hessian matrix into the convolutional framework. Moreover, Zhang et al. [31] proposed a saliency-guided neural network to improve the segmentation of electroluminescence (EL) images.

Beyond single-modality approaches, multimodal fusion techniques have also gained traction to enhance robustness and fault detection accuracy. For multimodal fusion approaches, Di Tommaso et al. [10] developed an automated multi-stage model based on the YOLOv3 network and computer vision techniques to process thermal and visible images for detecting various defects. Lei et al. [32] introduced a Deeplab-YOLO approach that combines Deeplabv3+ for image segmentation with YOLOv5 for defect detection, specifically targeting hot spots on infrared images of photovoltaic panels. Zhao et al. [33] proposed the PV-UNet model, which demonstrates strong defect localization capabilities in remote sensing images. Furthermore, with the widespread adoption of UAV technology, UAV-based PV inspection systems have emerged as a research hotspot [34,35], enabling effective coverage of large power station areas and the identification of localized, small-scale defects. These advancements have significantly improved the precision, reliability, and practicality of photovoltaic fault detection.

In existing research on defect detection in solar panels, despite significant progress, there are still limitations in the coverage of defect types. For example, Cao et al. [19] primarily focused on two basic defect types. The hybrid detection framework proposed by Bassil et al. [24] only implements the presence or absence discrimination of dusty cracks and spots; YOLOv8-gd [20] covered seven defect types,: black core sheets, black spot sheets, short circuit black sheets, over-soldered sheets, broken grids, bright and dark sheets, and hidden cracks. However, it fails to cover critical geometric defects such as star cracks, which often severely impact the structural integrity of the components. Zhang et al. [21] and Jiang et al. [29] focused on limited defect types such as cracks, fingerprints, and scratches. Li et al. [26] examined five types of cell-level anomalies, including damaged cells and cells with obvious bright spots. However, they did not cover critical electrical safety hazards like short circuits.

In contrast, the DCE-YOLO model proposed in this study achieves more comprehensive and detailed defect coverage. It covers seven typical and representative defect types, including cracks, finger breaks, black cores, thick lines, star cracks, horizontal dislocations, and short circuits. Among these, star cracks and horizontal dislocations, as key geometric defects, have not been adequately addressed in previous studies. Meanwhile, short circuits, as severe electrical faults, remain a blind spot in many existing detection methods. Such defects not only directly impact the performance and lifespan of photovoltaic modules but also pose risks to operational safety and maintenance costs.

Therefore, DCE-YOLO achieves significant breakthroughs in defect type diversity and practical application value, providing a robust technical foundation for comprehensive detection and precise maintenance of photovoltaic modules. The main contributions of this paper are as follows:

This method addresses YOLOv8’s shortcomings in managing complex backgrounds and long-range dependencies by integrating the Contextual Transformer (COT) attention mechanism to enhance spatial contextual relationships. It also improves symmetry-aware feature representation by fusing static local and dynamic global context cues, thereby improving the model’s robustness in complex environments and its ability to capture structured defect patterns.
The backbone network integrates the C2f-DWR-DRB module, which combines a scalable receptive field design with a standard convolution structure. Its parallel Dilated Reparam Block (DRB) branch employs symmetric rate-configured dilated convolutions, enabling better modeling of the spatial symmetry commonly found in solar panel defect patterns. This integration improves multi-scale feature extraction while reducing computational complexity, achieving a better balance between accuracy and efficiency. This integration improves the capacity for multi-scale feature extraction while reducing computational complexity, thus achieving a more favorable trade-off between accuracy and efficiency.
The detection head of YOLOv8 is improved, and a lightweight detection head Detect-Efficient is designed, which optimizes the utilization of computing resources and improves the detection accuracy and speed of the model. This makes the model more suitable for real-time deployment on edge devices.
The Wise-IoU (WIoU) loss function is used instead of the original loss function to solve the problem of uneven sample quality, enhancing the model’s robustness to noisy or low-quality annotations and improving convergence stability during training.

2. Related Work

2.1. Symmetry-Aware Feature Extraction

Symmetry perception feature extraction refers to computational methods that explicitly encode geometric patterns (such as bilateral, radial, or periodic symmetry) in visual data. This concept is particularly relevant to detecting structural defects in photovoltaic (PV) panel images. In PV detection, three common types of symmetry exist: bilateral symmetry in grid cracks and fingerprint stains; radial symmetry in star-shaped cracks; and periodic symmetry in uniformly spaced cell malfunctions.

Although symmetry perception learning has succeeded in other fields, its application in photovoltaic defect detection remains limited. For example, Li et al. [36] proposed PCBSSD, a self-supervised method that leverages rotational and reflective symmetry in PCB components to detect defects without annotations. In biometric design, recent studies [37] have prioritized symmetry in facial modeling for ergonomic customization, highlighting the broader relevance of symmetry modeling.

In sun-related visual tasks, attention mechanisms have been introduced to enhance sensitivity to subtle defects. sd-yolo [38] integrates the Convolutional Block Attention Module (CBAM) to improve the detection of low-contrast dirt such as guano, achieving a 40.2% improvement on mAP@50. However, CBAM focuses on local saliency rather than explicitly modeling geometric symmetry. Similarly, Efficient Channel Attention (ECA) [39] was integrated into the EfficientNet-V2 backbone to enhance small defect detection under noisy infrared imaging. Although effective, it enhances the discriminative ability of general features rather than encoding symmetry.

These examples demonstrate that current methods primarily focus on local context modeling without explicitly incorporating symmetry priors, such as bilateral, radial, or periodic patterns, often inherent to PV defects. This paper proposes a symmetry-aware architecture for more robust and structure-aware defect detection to address this gap.

2.2. Multi-Scale Defect Detection

Multi-scale defect detection refers to the ability of a model to identify and accurately locate defects that exhibit significant differences in size, texture, and spatial location, which is particularly critical in photovoltaic (PV) panel inspection. Defect types range from microcracks to large-area cell misalignments, imposing stringent requirements on the model’s scale perception capabilities.

In recent years, several YOLO-based models have incorporated multi-scale feature fusion mechanisms into their structural design to enhance detection performance. For example, the YOLOv5s-GBC model [19] combines attention mechanisms with a bidirectional feature pyramid network (BiFPN) to enhance defect recognition capabilities across different scales. On an electroluminescence (EL) image dataset, its detection accuracy and inference speed improved by 2% and 20.3%, respectively, compared to YOLOv5. Similarly, the YOLOv8-GD model [20] optimizes the YOLOv8 backbone network by integrating deep convolutions (DW-Conv) with grouped shuffle convolutions (GSConv) and introduces the BiFPN structure to achieve efficient multi-scale fusion, improving mAP@0.5 and mAP@0.5–0.95 by 4.2% and 5.7%, respectively.

In addition to the YOLO series, two-stage detectors have also progressed in multi-scale modeling. A study based on Faster R-CNN [40] introduced a three-branch dilated convolutional module, shuffle-based cross-layer connections, and a similar non-maximum suppression mechanism. This approach achieved effective fusion of cross-resolution features, improving the detection accuracy and robustness for defects of different scales.

Meanwhile, the Transformer architecture has been introduced to enhance multi-scale perception capabilities. The DPiT model [41] integrates a multi-scale aggregation module with cross-window self-attention (CW-MSA) into the Swin Transformer. This approach fuses deep and shallow features through attention mechanisms, achieving a Top-1 classification accuracy of 91.7% on the ELPV dataset. However, this method does not incorporate geometric structure priors, resulting in limited modeling capabilities for structural defects.

In summary, current multi-scale detection methods have significantly progressed in structural optimization and feature fusion. However, most methods have not effectively combined the spatial structural prior features commonly found in photovoltaic images, especially in real-world scenarios with diverse defect morphologies and large-scale spans. The robustness and generalization ability of the models still have room for improvement. Therefore, further exploration of detection methods that integrate structural constraints and scale adaptability has important research and application value.

2.3. YOLOv8

YOLOv8 has four components: Input, Backbone, Neck, and Head. The structure of YOLOv8 is shown in Figure 1.

Input: The images in the dataset vary in size. YOLOv8 preprocesses them by first resizing the long edge to 640 pixels while maintaining the aspect ratio, then padding the short edge with gray to form a 640 × 640 square. This strategy minimizes distortion and preserves object proportions. Subsequently, pixel values are normalized to [0, 1] to standardize inputs, which stabilizes gradient updates and improves training convergence.

Backbone: The Backbone network in YOLOv8 extracts a multi-scale feature map from the input image. Backbone has three parts: Conv, C2f, and Spatial Pyramid Pooling Fast (SPPF). The Conv module has three functional components: Conv2d, BatchNorm2d, and the SiLU activation function. The C2f module draws on the design concept of the E-ELAN in YOLOv7. It enhances gradient flow and feature diversity through denser cross-stage connectivity and feature-splitting operations while maintaining lightweight flow information. This improvement significantly increases the convergence speed and model performance compared to the C3 module in YOLOv5. The SPPF module, which is based on SPPNet, improves computational efficiency and maintains the ability to extract features at different scales. It uses multiple sequential max-pooling layers with 5 × 5 kernels instead of just one large pooling operation like the original SPP approach [42].

Neck: To enhance the model’s capability in identifying objects across varying scales, the neck component refines and combines the hierarchical feature map’s output by the backbone. Drawing inspiration from both the Feature Pyramid Network (FPN) [43] and the Path Aggregation Network (PAN) [44], it establishes a bidirectional architecture designed for multi-scale feature integration. The FPN paths convey rich semantic information of deep features from the top down, while the PAN paths convey high-resolution localization information of shallow features from the bottom up. This bidirectional fusion of multi-scale features significantly improves the robustness and accuracy of detection, especially enhancing the detection of small targets. In addition, YOLOv8 simplifies the network architecture without compromising the quality of feature mapping. It eliminates the traditional 1 × 1 downsampling layer and instead uses interpolation methods to adjust the feature map resolution.

Head: YOLOv8 uses a Decoupled-Head structure, eliminating the previous objectness branch. The detection head is directly decoupled into two independent paths: one for the classification task and the other for the regression task, a design that reduces inter-task interference. In addition, YOLOv8 employs an innovative anchorless approach to directly predict the target’s centroid coordinates and width–height offsets (x, y, w, h). This replaces the traditional anchor box-based offset prediction, thus eliminating the reliance on predefined anchor parameters. This approach simplifies the training process while improving the model’s ability to adapt to targets of different sizes.

3. Methodology

This paper proposes a new architecture, DCE-YOLO, integrating C2f-DWR-DRB, Contextual Transformer (COT) Attention, and Detect-Efficient modules into the YOLOv8 framework. Specifically, a COT attention mechanism is added after the SPPF module in the backbone to enhance contextual relationships across different spatial locations in the feature map. The DWR-DRB module replaces the original neck structure within the C2f module to enable more effective extraction of multi-scale contextual information. A lightweight detection head is introduced to improve both detection speed and accuracy. Furthermore, Wise-IoU (WIoU) is adopted in place of the traditional CIoU loss function to enhance overall detection performance. The structure of DCE-YOLO is illustrated in Figure 2.

3.1. C2f-DWR-DRB

In the task of solar panel defect detection, YOLOv8 exhibits certain limitations in processing multi-scale features. This is especially evident when dealing with symmetric or repetitive structural patterns commonly observed in defects such as grid-like cracks or fingerprint traces. Traditional convolutional neural networks often rely on deeper hierarchies or more complex structures to address multi-scale variations, which can compromise the real-time performance of the model. To enhance YOLOv8’s capability in handling multi-scale features, this study proposes replacing the Bottleneck module in the original C2f structure with a DWR-DRB module, forming a new C2f-DWR-DRB module.

By integrating the DWR and DRB structures, the proposed module can efficiently extract contextual information across multiple scales, thereby enhancing the backbone network’s feature extraction capability. Embedding the C2f-DWR-DRB module within the backbone minimizes computational complexity while improving detection performance. The original C2f module structure in YOLOv8 is illustrated in Figure 3a, while the modified C2f-DWR-DRB structure proposed in this study is shown in Figure 3b.

3.1.1. DWR

The DWR (Dilation-wise Residual) [45] module adopts a residual connection architecture, as shown in Figure 4. This module divides the feature extraction process into two stages, regional residual and semantic residual, which efficiently capture multi-scale contextual information and integrate feature maps generated by multi-scale perception. First, the input features are preprocessed through a 3 × 3 convolution layer, followed by batch normalization (BN) and ReLU activation functions. This generates a series of feature maps of varying sizes, referred to as region activations. This stage primarily extracts local features, laying the foundation for subsequent multi-scale processing. In the second step, three 3 × 3 depth-wise convolution (DConv) layers with different dilation rates are used to simplify the convolution operation in the depth direction. This multi-dilation rate design constructs receptive fields of different scales. Small dilation rates focus on local detail features, medium dilation rates capture medium-range context, and large dilation rates model more global spatial relationships. Through this multi-scale feature fusion strategy, the module can better retain the filtering process of important information at different scales, which is called semantic reconstruction. Subsequently, a 1 × 1 convolution layer maps the concatenated high-dimensional features to a low-dimensional space, thereby reducing model complexity. Finally, residual connections are used to add the dimension-reduced features to the original input features, achieving feature reuse.

It should be noted that this paper optimizes and adjusts the convolution structure of the DWR module while retaining its core structure and multi-scale processing mechanism. Specifically, we replace the multi-scale depth separable convolution in the original design with standard convolution and the DRB module. This modification reduces computational complexity and improves computational efficiency. The improved structure, together with the DRB module, constitutes the DWR-DRB module. The detailed design will be discussed in Section 3.1.3.

3.1.2. DRB

The DRB (Dilated Reparam Block) [46] module aims to extend the large kernel convolutional layer by merging the non-dilated large kernel convolution with many extended small kernel convolutional layers. Specifically, features are extracted from the input feature map using a non-dilated large kernel convolution and multiple dilated small kernel convolutions. Each convolutional layer learns a different representation of the features and sums the output features after their respective batch normalization (BN) layers. When converting the DRB module to a large kernel convolutional layer for inference, the output of each convolutional layer needs to be merged into the output of the large kernel convolution. This is converted into a convolutional layer with an expansion of r > 1 by a specific function with an appropriate amount of zero padding on both sides (ignoring input pixels in the inflated convolution is equivalent to inserting additional zero entries in the convolutional kernel). The size of the original convolution kernel is

W \in R^{k \times k}

, and the convolution kernel after inserting zeros is called the equivalent convolution kernel, which has the size

W^{'} \in R^{((k - 1) r + 1) \times (k - 1) r + 1))}

.

Based on this improvement, this paper designs two DRB modules to improve the DWR module. The first structure employs a double-parallel-layer configuration, as illustrated in Figure 5a. This structure uses dilation rates of r = (1, 2) and a convolution kernel size of k = (3, 3) to simulate a large kernel convolution with an effective size of K = 5. The second is a three-parallel-layer structure, shown in Figure 5b, which uses a dilation rate of r = (1, 2, 3) and a kernel size of k = (5, 3, 3) to enhance a large kernel convolution with K = 7.

3.1.3. DWR-DRB

To improve the efficiency and multi-scale contextual modeling capabilities of the DWR module in solar cell defect detection, this paper proposes the DWR-DRB structure, as shown in Figure 6. This structure first replaces the multi-scale depth-separable convolutions in the DWR module with standard 3 × 3 convolutions to reduce computational complexity. Two dilated reparameterized blocks (DRB) are then introduced to construct equivalent receptive field sizes of K = 5 and K = 7, enhancing the module’s ability to semantically model defects of different scales and structures. The DWR module employs a two-stage structure of “regional residuals” and “semantic residuals.” The module first uses shallow convolutions to capture local activation information. It combines convolutional branches with different dilation rates to integrate deep semantic context, effectively enhancing the multi-level expression of spatial structure. The DRB module further employs a parallel convolutional structure with multiple dilation rates r = 1, 2 or r = 1, 2, 3 to concurrently model local and remote feature dependencies across multiple receptive field scales, thereby establishing cross-scale contextual feature fusion capabilities. The DRB structure with K = 5 provides fine-grained perception capabilities for local defects such as finger, black_core, and short_circuit, while the structure with K = 7 is suitable for capturing larger-scale structural defects such as crack, star_crack, and horizontal_dislocation, enhancing the model’s structured understanding of overall defect morphology. Meanwhile, the symmetric receptive fields constructed by DRB facilitate the explicit modeling of common defect structures such as bilateral symmetry, radial symmetry, and periodic distribution. This design improves the model’s response to repetitive textures and edge microstructures. The DWR and DRB structures provide context-aware and structural modeling capabilities at different scales, respectively. Their combination achieves more comprehensive semantic fusion and geometric symmetry perception, while complementing the COT structure discussed later to collectively enhance model performance.

3.2. COT Attention

Traditional Convolutional Neural Networks (CNNs) exhibit limitations in capturing long-distance contextual information. These networks struggle to effectively extract and integrate semantic information at a distance. Consequently, YOLOv8 demonstrates suboptimal performance when handling complex contexts and long-distance dependencies. Thus, this paper designs a Contextual Transformer (COT) attention mechanism, the structure of which is shown in Figure 7.

The COT attention mechanism module enhances feature representation by fusing static and dynamic contexts. The module first performs a k × k convolution operation on the input feature map to extract local static context information, denoted as K₁. The static context K₁ is then concatenated with the query vector Q. The dynamic multi-head attention weight matrix is generated by two 1 × 1 convolutional layers. Unlike the traditional self-attention mechanism, this method is based on a local correlation matrix at each position. This method reinforces attention learning through the interaction between queries and keys while integrating static context. Subsequently, the dynamic attention weight matrix is multiplied by the value vector V, processed via a 1 × 1 convolution, to obtain the global dynamic context information denoted as K₂. Ultimately, the model’s ability to capture spatial relationships in the feature map is significantly enhanced by fusing the static context K₁ and the global context K₂, enabling effective modeling of dynamic dependencies among the input features. The fusion of local static and global dynamic contexts allows the model to better capture spatial symmetries that frequently appear in solar panel defect patterns, such as bilateral or radial symmetry, thereby improving symmetry-aware contextual representation.

The COT attention mechanism focuses on enhancing the contextual relationships between different locations in the feature map. It captures relationships between features at the local scale and models long-distance dependencies at the global scale. The C2f-DWR-DRB module enhances the model’s ability to precisely detect and locate flaws of various sizes by effectively aggregating multi-scale contextual information. Global and local interactions are combined to enhance the expressive power of feature representations, enabling a richer portrayal of contextual information in intricate visual scenes. This improves the model’s accuracy in flaw detection.

3.3. Detect-Efficient

YOLOv8 performs well in target detection tasks due to its efficient network structure and refined detection head design. The YOLOv8 detection head consists of two branches: one for regression and one for classification. These branches extract features using two consecutive 3 × 3 convolutions and a single 1 × 1 convolution, respectively. Finally, the model computes the bounding box loss (Bbox.loss) and classification loss (Cls.loss) separately. Although the two consecutive 3 × 3 convolutions can extract richer features, they may also introduce feature redundancy and cause some loss of fine-grained details.

To this end, this paper designs a lightweight detection head, Detect-Efficient, which optimizes the utilization of computing resources and improves the detection accuracy and speed of the model. Detect-Efficient first performs feature extraction through a 1 × 1 convolution and a 3 × 3 convolution and then divides into two branches. Each branch performs a 1 × 1 convolution and calculates Bbox.loss and Cls.loss, respectively, and the structure is shown in Figure 8. Detect-Efficient has lower computational complexity than YOLOv8’s detection head.

Additionally, the Distribution Focal Loss (DFL) contributes substantially to bounding box regression by discretizing coordinate predictions to improve localization precision. This study raises the reg max value from 16 to 20 to improve detection performance. This modification achieves more accurate coordinate prediction by increasing the number of discrete bins used to represent each coordinate, which effectively increases the output dimension per coordinate in the regression head. Although this adjustment introduces additional parameters and slightly increases computational cost, it leads to better localization accuracy. With proper optimization strategies, the impact on inference speed remains minimal, making the trade-off worthwhile.

In most cases, many high-resolution photos must be detected in real time to identify solar panel defects. By combining the convolutions of the two branches, Detect-Efficient significantly decreases computational complexity while increasing detection speed.

3.4. WIOU

The YOLOv8 network uses Complete Intersection over Union (CIoU) Loss as its bounding box loss function. This loss function optimizes the centroid distance, overlap area, and aspect ratio difference between predicted and actual boxes, thereby improving the accuracy of the predicted box concerning the target item. However, CIoU Loss does not take into account the equalization of the quality of the training samples during the training process. For instance, the number of samples for some common defects is much larger than that for rare defects, which may lead to a reduction in the training efficiency of the network. In addition, due to the stochastic nature of the matching between predicted and real frames at the early stage of training, unstable gradient updates may further affect the optimization effect of the model and even reduce the final detection performance. To solve this problem, Wise-IoU (WIoU) [47] enhances the model’s robustness to low-quality data by using a dynamic non-monotonic focusing mechanism and a gradient gain technique. The effect of low-quality samples is minimized, and the model’s focus on normal-quality anchor frames and overall detector performance is enhanced.

For the object detection task, IoU is used to measure the overlap between the ground truth bounding boxes and the predicted bounding boxes. The data used to calculate IoU is shown in Figure 9. To clarify the equations used, the variables involved are defined as follows: W_t and H_t are the width and height of the intersection area between the target and prediction frames; w and h are the width and height of the prediction frame; w_gt and h_gt are the width and height of the target frame; (x, y) are the prediction frame’s center coordinates; (x_gt, y_gt) are the target frame’s center coordinates; W_g and H_g are the width and height of the smallest bounding frame that contains both the prediction and the target frames. All the above variables are illustrated in Figure 9.

The WIoUv1 loss function [47] consists of an IoU-based term and a distance-based term. Its complete formulation is provided in Equation (1), where

L_{I o U}

is defined in Equation (2), and

R_{W I o U}

in Equation (3).

L_{W I o U v 1} = R_{W I o U} L_{I o U},

(1)

L_{I o U} = 1 - I o U = 1 - \frac{W_{i} H_{i}}{w h + w_{g t} h_{g t} - W_{i} H_{i}},

(2)

R_{W I o U} = e x p (\frac{(x - x_{g t})^{2} + (y - y_{g t})^{2}}{(W_{g}^{2} + H_{g}^{2})^{*}})

(3)

The superscript * indicates that W_g and H_g are excluded from the gradient computation to ensure numerical stability and remove potential barriers to loss function convergence.

Conventional bounding box loss functions may lead to gradient vanishing problems when both width W_t and height H_t are zero. Introducing WIoU ∈ [1, e) into the distance metric significantly amplifies the LIoU ∈ [0, 1] of the standard quality anchor boxes. This approach reduces the focus on the high-quality anchor box and minimizes the impact on the center of mass distance. Because of the dynamic nature of WIoU, the quality assessment criteria for anchor boxes are adaptively modified, allowing for real-time optimization of the gradient gain allocation strategy based on the circumstances at hand.

In solar panel defect detection tasks, image diversity and complexity present challenges. Blurred or small-sized defects represent low-quality examples that may interfere with the noise of the model training process and provide unreliable gradient signals. WIoU loss relatively reduces the weight of low-quality samples by assigning adaptive weights to different categories (e.g., critical targets and medium-quality samples), thereby mitigating their negative impact on parameter updates. This allows the model to focus more on the detection accuracy of key objects and enhances its handling of medium-quality samples, ultimately improving overall detection effectiveness and performance.

4. Experimentation

4.1. Dataset

The solar panel defect dataset contains 4447 annotated images covering seven key defect types: crack, finger, black_core, thick_line, star_crack, horizontal_dislocation, and short_circuit. Each image is annotated with the location and category of the defect, and a single image may contain multiple defect types. The distribution includes the following categories: Crack (1009 images, 1026 labels), Finger (1502 images, 2949 labels), Black_core (962 images, 1247 labels), Thick_line (775 images, 981 labels), Star_crack (122 images, 134 labels), Horizontal_dislocation (266 images, 798 labels), and Short_circuit (492 images, 492 labels). Given the limited size of the dataset and the severe imbalance in category distribution, we used an 8:1:1 split for training, validation, and testing. The split was performed using random sampling to ensure that each subset retained a representative distribution of all defect types. The dataset contains only 4447 labeled images with a highly uneven defect distribution. For example, Star_crack has only 134 instances, while Finger has nearly 3000. To address this issue, ensuring a sufficiently large and diverse training set is crucial for effective model learning, especially for rare defect types. Although only 10% of the data (445 images) is allocated to validation and testing, we created these subsets through random sampling to maintain a representative distribution of all categories. This method enables reliable evaluation of the model’s generalization performance while maximizing the training samples to enhance stability and robustness.

To enhance the model’s robustness to common disturbances in actual photovoltaic images (such as geometric changes and uneven lighting), various data augmentation strategies were introduced during training. These included random affine transformations (rotation, scaling, and translation) to simulate image geometric deformations, as well as brightness and contrast perturbations to simulate non-uniform lighting. These augmentation techniques significantly improved the model’s generalization ability in complex scenes. Additionally, the multi-branch structure and context modeling mechanism designed inherently possess spatial consistency preservation capabilities, which further mitigated the impact of the aforementioned disturbances on detection accuracy.

4.2. Model Selection

YOLOv8 is an advanced target detection algorithm with five different models that meet different needs. YOLOv8n represents the smallest and fastest model but provides relatively low accuracy. The YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x models outperform YOLOv8n but are larger and slower, requiring higher computational requirements and are more challenging to deploy. Considering all factors, YOLOv8n is chosen as the baseline model in this paper.

In terms of resolution selection, this study systematically evaluated the performance of three scales ranging from 640 × 640 to 1280 × 1280 through experimental testing. The results are shown in Table 1, and 640 × 640 was ultimately determined as the optimal input resolution. The experimental results reveal the following: First, in terms of detection accuracy, the model at 640 × 640 resolution reaches 90% mAP@50 and 63.7% mAP@50-95. Compared with 800 × 800 resolution, mAP@50 is only reduced by 0.5%, but mAP@50-95 is increased by 2.6%. Compared with 1280 × 1280 resolution, mAP@50 and mAP@50-95 increase by 1.4% and 2.0%, respectively, showing better comprehensive detection performance. Second, in terms of computational efficiency, the 640 × 640 resolution achieved an inference speed of 107.1 FPS, representing improvements of 26.2% and 88.5% compared to 800 × 800 and 1280 × 1280, respectively. Additionally, it significantly reduced GPU memory usage, enhancing the feasibility of industrial deployment. The effectiveness of this choice is attributed to the innovative multi-scale feature extraction design of the model. The C2f-DWR-DRB module enhances detection capabilities for small objects through an expandable receptive field. The COT attention mechanism effectively compensates for information loss caused by resolution limitations through global–local feature fusion. This design enables the model to maintain detection accuracy while significantly enhancing practical application value.

4.3. Experimental Environment

The experimental solar panel defect detection setup consists of the following hardware specifications: an Intel Core i7-12700H processor (12th generation) operating at 2.30GHz, with 32GB RAM and an NVIDIA GeForce RTX 3080 Ti GPU. The software environment employs a 64-bit Windows 11 operating system, with Python 3.8 as the programming language. The implementation utilizes PyTorch 2.0.1 as the deep learning framework and key software libraries, including numpy 1.24.4, OpenCV 4.9.0, and CUDA 10.0.

4.4. Indicators for Model Evaluation

In the YOLO series of models, the primary evaluation metrics include P, R, AP, and mAP. Among these, P (precision) refers to the proportion of positive samples correctly predicted as positive samples. R (recall rate) refers to the proportion of true positive samples correctly detected. AP (average precision) is defined as the average of the average precision across different recall rates, which can be obtained by calculating the area under the P-R curve. MAP (mean average precision) is defined as the average of the AP values across all categories.

4.5. Experiments on the C2f-DWR-DRB Module

In the DWR-DRB architecture, DRB modules with different kernel sizes replace the multi-rate dilated depth-wise convolution in the DWR module structure. This study evaluated two modified configurations. The first configuration, C2f-DWR, replaces the bottleneck module in the standard C2f module with the DWR module. The second configuration, C2f-DWR-DRB, replaces the bottleneck module with the DWR-DRB module. The experimental results are shown in Table 2. The experimental results demonstrate that the C2f-DWR-DRB model exhibits significant advantages over the C2f-DWR model. The model size decreases from 6.2 MB to 6.1 MB. Computational complexity reduces from 8.1 GFLOPs to 8.0 GFLOPs. The parameter count decreases from 2,966,245 to 2,901,925. These improvements enhance the model’s operational efficiency while maintaining detection accuracy, making it more suitable for deployment on resource-constrained edge devices.

4.6. Experiments on the Detect-Efficient Module

Three configurations were tested to evaluate the detection head impact. These include the original YOLOv8 detection head (reg_max = 16), an improved version with increased regression granularity (reg_max = 20), and the proposed lightweight detection head (reg_max = 20). The experimental results are shown in Table 3. The experimental results show that increasing reg_max from 16 to 20 in the YOLOv8 detector improves recall and mAP@50-95, indicating higher localization accuracy. However, this also leads to a slight decrease in Precision and mAP@50, while GFLOPs, parameter count, and inference time increase. In contrast, replacing the detection head with the proposed detection-efficient design achieves consistent improvements across all key metrics compared to the baseline reg_max = 16: accuracy increases by 0.2%, recall improves by 2.6%, mAP@50 improves by 0.4%, and mAP@50-95 improves by 2.4%. Computational cost is significantly reduced, with GFLOPs decreasing by 1.3 and inference time shortened by 0.4 ms. These results demonstrate that Detect-Efficient benefits not only from more refined bounding box regression (reg_max = 20) but also from improved computational efficiency through optimized structure. This makes it more suitable for real-time or edge device deployment without sacrificing accuracy.

4.7. Error Analysis

The results of the confusion matrix for DCE-YOLO are shown in the figure, which visualizes how the model performs on the data. In addition to the seven types of defects, the figure includes background categories. Each row represents the actual category, while each column represents the predicted category. The rightmost column shows the total misdetection probability for each type. The last row represents the total missed detection probability. Taking crack as an example, the correct prediction rate is 87%, with a 10% missed detection rate. The probabilities of being misclassified as finger and star crack are 3% and 1%, respectively. The main diagonal of the confusion matrix represents the probability that predicted categories align with actual categories. Values closer to 1.0 indicate superior model performance. Figure 10 demonstrates that five categories achieve correct prediction probabilities of 85% or higher, confirming the accuracy of the proposed model.

4.8. Graphical Analysis

The performance comparison results of DCE-YOLO and YOLOv8 under the same parameter settings (input resolution 640 × 640 px) and training conditions are shown in Figure 11. Experiments show that DCE-YOLO initially demonstrates slightly lower mAP@50 and mAP@50-95 than YOLOv8 during early training stages. However, DCE-YOLO shows significant advantages in later training stages. The mAP@50-95 improvement is particularly significant, which verifies the effectiveness of the proposed method. In addition, DCE-YOLO performs better in detection accuracy and shows better training stability. Its training curve has more minor fluctuations and is smoother in the interval of 50–150 epochs. With the increase of training rounds, the performance advantage of DCE-YOLO continues to expand and stably maintains the lead in the later stage of training. The experimental results fully prove that DCE-YOLO effectively improves the comprehensive detection performance of the model.

4.9. Comparative Experiments

To validate the superiority of the proposed algorithm, we conducted comparative experiments using the same solar panel defect detection dataset on multiple mainstream object detection models, including YOLOv5n, YOLOv9t, YOLOv10n, YOLO11n, YOLO12n, Faster R-CNN, SSD, and RT-DETR. The experimental results are shown in Table 4. The experimental results show that the proposed DCE-YOLO achieves the highest values in recall rate, mAP, and F1, outperforming all comparison methods. Compared with two-stage anchor-based models such as Faster R-CNN and SSD, DCE-YOLO improves mAP by 28.5% and 2.0%, respectively, highlighting its superior localization and classification capabilities on complex defect patterns. Even compared with recent transformer-based models (e.g., RT-DETR), DCE-YOLO still achieves a high mAP of 1.0%, demonstrating its efficiency and accuracy on relatively small datasets with structural defect features. To enable a fair comparison with lightweight YOLO variants, the depth and width parameters of YOLOv5n to YOLO12n were standardized. At the same model scale, DCE-YOLO achieves mAP scores that are 1.6%, 1.6%, 2.3%, 2.3%, and 2.9% higher than YOLOv5n, YOLOv9t, YOLOv10n, YOLO11n, and YOLO12n, respectively. These improvements are attributed to the enhanced feature extraction, symmetric perception design, and optimized detection head of the proposed architecture, which collectively contribute to higher detection robustness and better generalization in real-world photovoltaic detection tasks.

4.10. Ablation Experiments

The ablation experiment verifies the improved module’s optimization effect by comparing the improved module’s performance after joining the network. The improved model X represents the network with added modules; the added modules are denoted by “√”, and the unadded modules are denoted by “×”; the experimental results are shown in Table 5. The introduction of the C2f-DWR-DRB module alone brings significant improvements. Precision increases by 6.2%, mAP@50 increases by 1.2%, and mAP@50-95 increases by 0.7%. Computational complexity reduces to 8.0 GFLOPs. After combining the COT attention mechanism, the model increases mAP@50 by 1.4% and mAP@50-95 by 1.4%, while maintaining high Precision (88.5%). After using the Detect-Efficient detection head, the key breakthrough is achieved: Recall is significantly increased to 87.9%, mAP@50 is increased by 1.8%, mAP@50-95 is increased by 3.8%, and GFLOPs is greatly reduced from 8.1 to 7.2. Finally, the WIoU loss function is introduced to achieve the optimal balance of the model, the precision is increased by 5.8%, the recall rate is increased by 3.7%, the mAP@50 is increased by 2.1%, and the mAP@50-95 is increased by 4.9%. The synergistic effects of all modules comprehensively improve the final model’s performance on key metrics including Precision, Recall, and mAP@50. The model maintains computational efficiency advantages, validating the effectiveness of the improved approach in solar panel defect detection tasks.

4.11. Visualization Results

Figure 12 illustrates the detection effect of the optimized model, where the number of effective detection frames and the confidence level are proportional to the excellence of the model. Analyzing the number of detected frames: The number of effectively detected frames of the three images in Figure 12b is more than that of Figure 12a. Regarding the confidence level, the confidence level of the detected frames in Figure 12b is mostly higher than that of the detected frames in Figure 12a. The experimental results show that the improved algorithm can accurately extract the target features and obtain better detection results.

4.12. Computational Cost Analysis

To further evaluate the practical feasibility of DCE-YOLO, we analyzed its computational complexity and runtime performance. As shown in Table 5, the proposed DCE-YOLO achieves a low computational cost of 7.2 GFLOPs, reducing the computational cost by 0.9 GFLOPs compared to the original YOLOv8n using the C2f-DWR-DRB and COT modules. The average inference time per image at a resolution of 640 × 640 is 2.9 ms, enabling real-time processing at approximately 107 FPS. The complete model was trained on a NVIDIA RTX 3080 Ti GPU for 200 epochs with a batch size of 16, taking approximately 2.5 h in total. The testing time on the validation set of 445 images was less than 2 s, highlighting the model’s efficiency and suitability for edge deployment. The experimental results demonstrate that DCE-YOLO achieves a good balance between computational cost and detection performance.

5. Discussion

Solar panel defect detection faces two major challenges: insufficient multi-scale feature extraction and severe class imbalance. To address these issues, this paper proposes the DCE-YOLO model, which integrates several architectural improvements, including the C2f-DWR-DRB module, COT attention mechanism, lightweight detection head (Detect-Efficient), and WIoU loss function. This design balances accuracy and efficiency, providing a comprehensive solution for photovoltaic defect detection.

The DCE-YOLO architecture explicitly considers the geometric priors of solar panel defects, such as cracks, short circuits, and grid breaks, which typically exhibit bilateral or radial symmetry. The DRB module uses convolutions with different expansion rates to create symmetric receptive fields, while the DWR structure fuses semantic and regional information. This design significantly enhances the model’s ability to recognize bilateral and radial symmetric defect structures (e.g., star_crack and horizontal_misalignment). The C2f-DWR-DRB module adopts a multi-branch design, enabling parallel modeling at different receptive field scales. This effectively improves the model’s adaptability to small-scale defects (e.g., finger defects) and large-scale defects (e.g., cracks and horizontal misalignments). Experiments show that replacing this module can improve mAP@50 by 1.2% and mAP@50-95 by 0.7%. It also reduces the amount of calculation to 8.0 GFLOPs, demonstrating excellent detection-efficiency balance ability.

The COT attention mechanism addresses the limitations of traditional YOLO models in context modeling. The COT module captures spatial dependencies between distant regions in an image by introducing the interaction and fusion of static local context and dynamic global context. This effectively enhances the detection capability of fine-grained defects on complex backgrounds. After incorporating the COT mechanism, mAP@50 and mAP@50-95 increased by 0.4% and 0.7%, respectively.

Detect-Efficient is a lightweight detection head designed to improve the accuracy of bounding box regression. Compared to the original detection head in YOLOv8, Detect-Efficient achieves a better trade-off between accuracy and computational efficiency, making it particularly suitable for industrial deployment and real-time applications.

Additionally, the WIoU loss function employs a dynamic reweighting mechanism to reduce the influence of low-quality samples on gradient updates. This enables the model to focus more effectively on high-quality samples, which is particularly beneficial for underrepresented classes such as star_crack and short_circuit, significantly improving recall.

Experimental results demonstrate that DCE-YOLO outperforms mainstream models such as YOLOv8n, YOLOv9t, and YOLOv10n on key metrics, including mAP@50, mAP@50-95, recall, and F1-score. The lightweight and efficient detection head also significantly reduces computational costs, enabling the model to be deployed smoothly on edge devices such as NVIDIA Jetson Xavier. Compared to larger models like YOLOv8l, DCE-YOLO better meets the low-power and low-latency requirements of real-world photovoltaic system monitoring tasks.

The model demonstrates strong generalization capabilities in image testing and real-world inspection scenarios. In the future, we plan to explore knowledge distillation and semi-supervised learning strategies to improve performance in low-annotation environments. We also aim to extend DCE-YOLO to support multimodal inputs, such as infrared and multispectral images, to achieve broader applications in photovoltaic defect detection.

Although DCE-YOLO demonstrates strong performance and architectural innovation, the method still has certain limitations. First, while the proposed C2f-DWR-DRB module and COT attention enhance the extraction of multi-scale and contextual features, the model primarily relies on RGB visual information. This limits its ability to detect defects with weak textures or low visual contrast, such as potential cracks or temperature-induced anomalies, which can be better captured by infrared or electroluminescent modes. Although the WIoU loss mitigates this issue to some extent, rare classes like star_crack and short_circuit remain underrepresented, potentially affecting generalization performance in real-world deployments. Third, although the model is lightweight compared to standard YOLO variants, the integration of multiple modules (e.g., COT, DRB) increases its architectural complexity. This added complexity may pose challenges for deployment on ultra-constrained devices or embedded systems with strict latency and memory budgets. Finally, the current design does not incorporate domain adaptation or transfer learning strategies, which may be essential for adapting to different panel types, imaging conditions, and on-site environments. These limitations suggest that future work should explore multimodal fusion, efficient architecture search, and learning from limited or unlabeled data to further improve robustness and scalability.

6. Conclusions

Given the rapid expansion of the photovoltaic sector, enhanced quality inspection methods are essential for the widespread adoption of solar panels. The inefficiency and subjectivity of conventional manual inspection hinder the ability to meet modern industry requirements. As a result, deep learning-based automated flaw detection technology has become a significant research focus. This paper introduces the DCE-YOLO approach to address the challenges of multi-scale feature extraction and sample imbalance in contemporary detection tasks. This results in improved solar panel defect detection performance. The model is enhanced in several respects compared to YOLOv8n. Detection accuracy and inference speed are improved by refining the design of the detection head, optimizing the feature extraction module, effectively utilizing contextual information, and modifying the loss function. The model achieves 92.1% mAP@50 and 68.6% mAP@50-95 on the solar panel dataset, demonstrating substantial performance improvements.

This work presents a reliable and effective approach for assessing the quality of solar panels with various practical applications. With the rapid development of photovoltaic technology and the popularization of distributed photovoltaic systems, this model can be widely applied in unmanned aerial vehicle inspection systems, intelligent monitoring devices, and other scenarios, providing reliable technical support for the efficient automated detection of photovoltaic power plants. To advance intelligent inspection technologies in the solar sector, our study will investigate more efficient feature extraction architectures and explore the optimization of future model deployment on edge devices.

Author Contributions

The authors confirm their contribution to the paper as follows: Writing—original draft, X.X.; Writing—review & editing, X.L., L.S., X.Y., C.L., Y.L., Z.N., Y.M. and W.G.; conceptualization: X.X., X.L. and X.Y.; methodology: L.S.; software: Y.M.; formal analysis: X.X. and W.G.; investigation: C.L. and Y.L.; funding acquisition: Z.N. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Shandong Province Science and Technology Small and Medium-sized Enterprises Innovation Ability Improvement Project (Grant no. 2024TSGC0285) and the Taian Science and Technology Innovation Development Project (Grant no. 2023GX027). Lin Sun was supported by the Shandong Province Science and Technology Small and MediumSized Enterprises Innovation Ability Improvement Project (grant no. 2024TSGC0285), and Xiaoxia Lin was supported by the Taian Science and Technology Innovation Development Project (grant no. 2023GX027).

Data Availability Statement

The data that support the findings of this study are available from the author Xinyue Xiao, upon reasonable request.

Conflicts of Interest

Authors Chunwei Leng, Yan Li, and Zhenyu Niu are employed by Hanqing Data Consulting Co., Ltd. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Aramendia, E.; Brockway, P.E.; Taylor, P.G.; Norman, J.B. Exploring the effects of mineral depletion on renewable energy technologies net energy returns. Energy 2024, 290, 130112. [Google Scholar] [CrossRef]
Parthiban, R.; Ponnambalam, P. An Enhancement of the Solar Panel Efficiency: A Comprehensive Review. Front. Energy Res. 2022, 10, 937155. [Google Scholar] [CrossRef]
Istratov, A.A.; Hieslmair, H.; Vyvenko, O.F.; Weber, E.R.; Schindler, R. Defect recognition and impurity detection techniques in crystalline silicon for solar cells. Sol. Energy Mater. Sol. Cells 2002, 72, 441–451. [Google Scholar] [CrossRef]
Tsai, D.-M.; Wu, S.-C.; Chiu, W.-Y. Defect Detection in Solar Modules Using ICA Basis Images. IEEE Trans. Ind. Inform. 2013, 9, 122–131. [Google Scholar] [CrossRef]
Dhimish, M.; d’Alessandro, V.; Daliento, S. Investigating the Impact of Cracks on Solar Cells Performance: Analysis Based on Nonuniform and Uniform Crack Distributions. IEEE Trans. Ind. Inform. 2022, 18, 1684–1693. [Google Scholar] [CrossRef]
Al-Rabeeah, A.Y.; Seres, I.; Farkas, I. Recent Improvements of the Optical and Thermal Performance of the Parabolic Trough Solar Collector Systems. Facta Univ. Ser. Mech. Eng. 2022, 20, 73–94. [Google Scholar] [CrossRef]
Edalatpanah, S.A.; Marinkovic, D.; Parandavar, Z. Novel GAN-Based Image Completion: Addressing Structure and Texture Consistency in Missing Regions. Comput. Eng. Technol. Innov. 2024, 1, 1–10. [Google Scholar] [CrossRef]
Han, D.; Du, M.-H.; Dai, C.-M.; Sun, D.; Chen, S. Influence of defects and dopants on the photovoltaic performance of Bi₂S₃: First-principles insights. J. Mater. Chem. A 2017, 5, 6200–6210. [Google Scholar] [CrossRef]
Pratt, L.; Govender, D.; Klein, R. Defect detection and quantification in electroluminescence images of solar PV modules using U-net semantic segmentation. Renew. Energy 2021, 178, 1211–1222. [Google Scholar] [CrossRef]
Di Tommaso, A.; Betti, A.; Fontanelli, G.; Michelozzi, B. A multi-stage model based on YOLOv3 for defect detection in PV panels based on IR and visible imaging by unmanned aerial vehicle. Renew. Energy 2022, 193, 941–962. [Google Scholar] [CrossRef]
Ingenhoven, P.; Belluardo, G.; Makrides, G.; Georghiou, G.E.; Rodden, P.; Frearson, L.; Herteleer, B.; Bertani, D.; Moser, D. Analysis of Photovoltaic Performance Loss Rates of Six Module Types in Five Geographical Locations. IEEE J. Photovolt. 2019, 9, 1091–1096. [Google Scholar] [CrossRef]
Huang, J.; Zeng, K.; Zhang, Z.; Zhong, W. Solar panel defect detection design based on YOLO v5 algorithm. Heliyon 2023, 9, e18826. [Google Scholar] [CrossRef] [PubMed]
Li, Q.; Wang, W.; Ma, C.; Zhu, Z. Detection of physical defects in solar cells by hyperspectral imaging technology. Opt. Laser Technol. 2010, 42, 1010–1013. [Google Scholar] [CrossRef]
Israil, M.; Anwar, S.A.; Abdullah, M.Z. Automatic detection of micro-crack in solar wafers and cells: A review. Trans. Inst. Meas. Control 2012, 35, 606–618. [Google Scholar] [CrossRef]
Tsai, D.-M.; Chang, C.-C.; Chao, S.-M. Micro-crack inspection in heterogeneously textured solar wafers using anisotropic diffusion. Image Vis. Comput. 2010, 28, 491–501. [Google Scholar] [CrossRef]
Kang, B.-K.; Kim, S.-T.; Bae, S.-H.; Park, J.-W. Diagnosis of Output Power Lowering in a PV Array by Using the Kalman-Filter Algorithm. IEEE Trans. Energy Convers. 2012, 27, 885–894. [Google Scholar] [CrossRef]
Al-Waisy, A.S.; Ibrahim, D.; Zebari, D.A.; Hammadi, S.; Mohammed, H.; Mohammed, M.A.; Damasevicius, R. Identifying defective solar cells in electroluminescence images using deep feature representations. PeerJ Comput. Sci. 2022, 8, e992. [Google Scholar] [CrossRef]
Venkatesh, S.N.; Sugumaran, V.; Subramanian, B.; Josephin, J.S.F.; Varuvel, E.G. A comparative study on bayes classifier for detecting photovoltaic module visual faults using deep learning features. Sustain. Energy Technol. Assess 2024, 64, 103713. [Google Scholar] [CrossRef]
Cao, Y.; Pang, D.; Yan, Y.; Jiang, Y.; Tian, C. A photovoltaic surface defect detection method for building based on deep learning. J. Build. Eng. 2023, 70, 106375. [Google Scholar] [CrossRef]
Cao, Y.; Pang, D.; Zhao, Q.; Yan, Y.; Jiang, Y.; Tian, C.; Wang, F.; Li, J. Improved YOLOv8-GD deep learning model for defect detection in electroluminescence images of solar photovoltaic modules. Eng. Appl. Artif. Intell. 2024, 131, 107866. [Google Scholar] [CrossRef]
Zhang, M.; Yin, L. Solar Cell Surface Defect Detection Based on Improved YOLO v5. IEEE Access 2022, 10, 80804–80815. [Google Scholar] [CrossRef]
Rohith, G.; Manish, D.S.; Narasimhan, A.R.; Dhavale, A.U.; John, R.R. SparkNet-A Solar Panel Fault Detection Deep Learning Model. Ieee Access 2025, 13, 75599–75617. [Google Scholar] [CrossRef]
Khang, N.P.H.; Triet, N.M.; Van Tuan, H.; Nhan, N.C. Applying RetinaNet Machine Learning Models for Hot-Spot Detection in Thermal Images of Photovoltaic Panel. IEEJ Trans. Electr. Electron. Eng. 2025. ahead of print. [Google Scholar] [CrossRef]
Bassil, J.; Noura, H.N.; Salman, O.; Chahine, K.; Guizani, M. Efficient combination of deep learning and tree-based classification models for solar panel dust detection. Intell. Syst. Appl. 2025, 26, 200509. [Google Scholar] [CrossRef]
Karakan, A. Detection of Defective Solar Panel Cells in Electroluminescence Images with Deep Learning. Sustainability 2025, 17, 1141. [Google Scholar] [CrossRef]
Li, L.; Wang, Z.; Zhang, T. GBH-YOLOv5: Ghost Convolution with BottleneckCSP and Tiny Target Prediction Head Incorporating YOLOv5 for PV Panel Defect Detection. Electronics 2023, 12, 561. [Google Scholar] [CrossRef]
Dwivedi, D.; Babu, K.V.S.M.; Yemula, P.K.; Chakraborty, P.; Pal, M. Identification of surface defects on solar PV panels and wind turbine blades using attention based deep learning model. Eng. Appl. Artif. Intell. 2024, 131, 107836. [Google Scholar] [CrossRef]
Zhuang, J.; Peng, Q.; Wu, F.; Guo, B. Multi-component attention-based convolution network for color difference recognition with wavelet entropy strategy. Adv. Eng. Inform. 2022, 52, 101603. [Google Scholar] [CrossRef]
Jiang, F.; Huang, Q.; Zhang, S.; Liang, J.; Wu, Z.; Zhu, H. An Enhancement Generative Adversarial Networks Based on Feature Moving for Solar Panel Defect Identification. IEEE Sens. J. 2023, 23, 23744–23752. [Google Scholar] [CrossRef]
Tang, W.; Yang, Q.; Hu, X.; Yan, W. Convolution neural network based polycrystalline silicon photovoltaic cell linear defect diagnosis using electroluminescence images. Expert Syst. Appl. 2022, 202, 117087. [Google Scholar] [CrossRef]
Zhang, J.; Shen, Y.; Jiang, J.; Fang, S.; Chen, L.; Yan, T.; Li, Z.; Zhang, K.; Wei, H.; Guo, W. Automatic Detection of Defective Solar Cells in Electroluminescence Images via Global Similarity and Concatenated Saliency Guided Network. IEEE Trans. Ind. Inform. 2023, 19, 7335–7345. [Google Scholar] [CrossRef]
Lei, Y.; Wang, X.; An, A.; Guan, H. Deeplab-YOLO: A method for detecting hot-spot defects in infrared image PV panels by combining segmentation and detection. J. Real-Time Image Process. 2024, 21, 52. [Google Scholar] [CrossRef]
Zhao, Z.; Chen, Y.; Li, K.; Ji, W.; Sun, H. Extracting Photovoltaic Panels From Heterogeneous Remote Sensing Images with Spatial and Spectral Differences. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 5553–5564. [Google Scholar] [CrossRef]
Wang, Z.; Zheng, P.; Bahadir Kocer, B.; Kovac, M. Drone-Based Solar Cell Inspection with Autonomous Deep Learning. In Infrastructure Robotics: Methodologies, Robotic Systems and Applications; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2024; pp. 337–365. [Google Scholar]
Meribout, M.; Kumar Tiwari, V.; Pablo Peña Herrera, J.; Najeeb Mahfoudh Awadh Baobaid, A. Solar panel inspection techniques and prospects. Measurement 2023, 209, 112466. [Google Scholar] [CrossRef]
Li, J.; Da, F.; Yu, Y. PCBSSD: Self-supervised symmetry-aware detector for PCB displacement and orientation inspection. Measurement 2025, 243, 116342. [Google Scholar] [CrossRef]
Zhang, J.; Chen, L.; Ren, H.; Luximon, Y.; Li, P. Face2Wear: An automatic and user-friendly facewear personalization framework with 3D symmetry-aware face registration using RGB-D selfies. Comput.-Aided Des. 2025, 185, 103888. [Google Scholar] [CrossRef]
Naeem, U.; Chadda, K.; Vahaji, S.; Ahmad, J.; Li, X.; Asadi, E. Aerial Imaging-Based Soiling Detection System for Solar Photovoltaic Panel Cleanliness Inspection. Sensors 2025, 25, 738. [Google Scholar] [CrossRef]
Yang, X.; Li, Y.; Yang, L.; Zhang, Y.; Wang, X.; Zhang, Q. High-noise solar panel defect identification method based on the improved EfficientNet-V2. J. Renew. Sustain. Energy 2024, 16, 053704. [Google Scholar] [CrossRef]
Wang, Y.; Hou, T.; Zhang, X.; Shangguan, H.; Zhang, P.; Li, J.; Wei, B. Surface defect detection of solar cell based on similarity non-maximum suppression mechanism. Signal Image Video Process. 2023, 17, 2583–2593. [Google Scholar] [CrossRef]
Xie, X.; Liu, H.; Na, Z.; Luo, X.; Wang, D.; Leng, B. DPiT: Detecting Defects of Photovoltaic Solar Cells with Image Transformers. IEEE Access 2021, 9, 154292–154303. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. arXiv 2016, arXiv:1612.03144. [Google Scholar]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. arXiv 2018, arXiv:1803.01534. [Google Scholar]
Wei, H.; Liu, X.; Xu, S.; Dai, Z.; Dai, Y.; Xu, X. DWRSeg: Rethinking Efficient Acquisition of Multi-scale Contextual Information for Real-time Semantic Segmentation. arXiv 2022, arXiv:2212.01173. [Google Scholar]
Ding, X.; Zhang, Y.; Ge, Y.; Zhao, S.; Song, L.; Yue, X.; Shan, Y. UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition. arXiv 2023, arXiv:2311.15599. [Google Scholar]
Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar]

Figure 1. Architecture of the baseline YOLOv8 model.

Figure 2. Architecture of the proposed DCE-YOLO model.

Figure 3. Comparison between module structures: (a) original C2f; (b) modified C2f-DWR-DRB.

Figure 4. The structure of the DWR module.

Figure 5. DRB module structures: (a) DRB module for two parallel layers; (b) DRB modules for three parallel layers.

Figure 6. The structure of the DWR-DRB model.

Figure 7. The structure of the COT attention mechanism.

Figure 8. Lightweight Detect-Efficient head design.

Figure 9. IoU calculation between predicted and ground truth bounding boxes.

Figure 10. Confusion matrix of DCE-YOLO model.

Figure 11. Comparison of model mAP curves: (a) mAP@50; (b) mAP@50-95.

Figure 12. Visualization results: (a) YOLOv8n; (b) DCE-YOLO.

Table 1. Metrics variation with resolution scaling on solar panel defect detection.

Resolution Size	Precision (%)	Recall (%)	mAP@50 (%)	mAP@50-95 (%)	FPS	Inference (ms)
640 × 640	0.834	0.856	0.9	0.637	107.08	2.8
800 × 800	0.853	0.845	0.905	0.627	84.95	4.0
1280 × 1280	0.806	0.826	0.886	0.617	56.82	6.8

Table 2. Performance comparison between C2f-DWR and C2f-DWR-DRB modules.

Model	Weights (MB)	GFLOPs	Params
YOLOv8n-C2f_DWR	6.2 MB	8.1	2,966,245
YOLOv8n-C2f_DWR_DRB	6.1 MB	8.0	2,901,925

Table 3. Ablation results of Detect-Efficient module.

Model	reg_max	Precision (%)	Recall (%)	mAP@50 (%)	mAP@50-95 (%)	GFLOPs	Params	Inference (ms)
YOLOv8n + COT + C2f_DWR_DRB	16	0.885	0.853	0.914	0.651	8.5	3,478,949	3.0
YOLOv8n + COT + C2f_DWR_DRB	20	0.882	0.877	0.91	0.667	9.1	3,612,729	3.2
YOLOv8n + COT + C2f_DWR_DRB + Detect-Efficient	20	0.887	0.879	0.918	0.675	7.2	3,627,353	2.6

Table 4. Comparative experimental results.

Model	Precision (%)	Recall (%)	MAP (%)	F1 (%)
Faster RCNN	0.5417	0.743	0.636	0.587
SSD	0.910	0.731	0.901	0.789
RT-DETR	0.88	0.891	0.911	0.885
YOLOv5n	0.859	0.887	0.905	0.873
YOLOv8n	0.834	0.856	0.9	0.877
YOLOv9t	0.828	0.889	0.905	0.857
YOLOv10n	0.881	0.854	0.898	0.876
YOLO11n	0.853	0.852	0.898	0.852
YOLO12n	0.919	0.779	0.892	0.843
DCE-YOLO	0.892	0.893	0.921	0.892

Table 5. Results of ablation experiments.

Model	C2f-DWR-DRB	COT	Detect-Efficient	WIOU	Precision (%)	Recall (%)	mAP@50 (%)	mAP@50-95 (%)	GFLOPs
YOLOv8n	×	×	×	×	0.834	0.856	0.9	0.637	8.1
Improved1	√	×	×	×	0.896	0.852	0.912	0.644	8.0
Improved2	√	√	×	×	0.885	0.853	0.914	0.651	8.5
Improved3	√	√	√	×	0.887	0.879	0.918	0.675	7.2
Improved4	√	√	√	√	0.892	0.893	0.921	0.686	7.2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lin, X.; Xiao, X.; Sun, L.; Yang, X.; Leng, C.; Li, Y.; Niu, Z.; Meng, Y.; Gong, W. An Enhanced YOLOv8 Model with Symmetry-Aware Feature Extraction for High-Accuracy Solar Panel Defect Detection. Symmetry 2025, 17, 1052. https://doi.org/10.3390/sym17071052

AMA Style

Lin X, Xiao X, Sun L, Yang X, Leng C, Li Y, Niu Z, Meng Y, Gong W. An Enhanced YOLOv8 Model with Symmetry-Aware Feature Extraction for High-Accuracy Solar Panel Defect Detection. Symmetry. 2025; 17(7):1052. https://doi.org/10.3390/sym17071052

Chicago/Turabian Style

Lin, Xiaoxia, Xinyue Xiao, Lin Sun, Xiaodong Yang, Chunwei Leng, Yan Li, Zhenyu Niu, Yingzhou Meng, and Weihao Gong. 2025. "An Enhanced YOLOv8 Model with Symmetry-Aware Feature Extraction for High-Accuracy Solar Panel Defect Detection" Symmetry 17, no. 7: 1052. https://doi.org/10.3390/sym17071052

APA Style

Lin, X., Xiao, X., Sun, L., Yang, X., Leng, C., Li, Y., Niu, Z., Meng, Y., & Gong, W. (2025). An Enhanced YOLOv8 Model with Symmetry-Aware Feature Extraction for High-Accuracy Solar Panel Defect Detection. Symmetry, 17(7), 1052. https://doi.org/10.3390/sym17071052

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Enhanced YOLOv8 Model with Symmetry-Aware Feature Extraction for High-Accuracy Solar Panel Defect Detection

Abstract

1. Introduction

2. Related Work

2.1. Symmetry-Aware Feature Extraction

2.2. Multi-Scale Defect Detection

2.3. YOLOv8

3. Methodology

3.1. C2f-DWR-DRB

3.1.1. DWR

3.1.2. DRB

3.1.3. DWR-DRB

3.2. COT Attention

3.3. Detect-Efficient

3.4. WIOU

4. Experimentation

4.1. Dataset

4.2. Model Selection

4.3. Experimental Environment

4.4. Indicators for Model Evaluation

4.5. Experiments on the C2f-DWR-DRB Module

4.6. Experiments on the Detect-Efficient Module

4.7. Error Analysis

4.8. Graphical Analysis

4.9. Comparative Experiments

4.10. Ablation Experiments

4.11. Visualization Results

4.12. Computational Cost Analysis

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI