A High-Precision Defect Detection Approach Based on BiFDRep-YOLOv8n for Small Target Defects in Photovoltaic Modules

Lu, Yi; Du, Chunsong; Li, Xu; Liang, Shaowei; Zhang, Qian; Zhao, Zhenghui

doi:10.3390/en18092299

Open AccessArticle

A High-Precision Defect Detection Approach Based on BiFDRep-YOLOv8n for Small Target Defects in Photovoltaic Modules

by

Yi Lu

¹,

Chunsong Du

¹,

Xu Li

¹,

Shaowei Liang

²,

Qian Zhang

³

and

Zhenghui Zhao

^1,*

¹

School of Electrical Information Engineering, Jiangsu University, Zhenjiang 212013, China

²

Energy Internet Research Institute, Tsinghua University, Beijing 100085, China

³

School of Agricultural Engineering, Jiangsu University, Zhenjiang 212013, China

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(9), 2299; https://doi.org/10.3390/en18092299

Submission received: 22 March 2025 / Revised: 19 April 2025 / Accepted: 24 April 2025 / Published: 30 April 2025

(This article belongs to the Section A2: Solar Energy and Photovoltaic Systems)

Download

Browse Figures

Versions Notes

Abstract

With the accelerated transition of the global energy structure towards decarbonization, the share of PV power generation in the power system continues to rise. IEA predicts PV will account for 80% of new global renewable installations during 2025–2030. However, latent faults emerging from the long-term operation of photovoltaic (PV) power plants significantly compromise their operational efficiency. The existing EL detection methods in PV plants face challenges including grain boundary interference, probe band artifacts, non-uniform luminescence, and complex backgrounds, which elevate the risk of missing small defects. In this paper, we propose a high-precision defect detection method based on BiFDRep-YOLOv8n for small target defects in photovoltaic (PV) power plants, aiming to improve the detection accuracy and real-time performance and to provide an efficient solution for the intelligent detection of PV power plants. Firstly, the visual transformer RepViT is constructed as the backbone network, based on the dual-path mechanism of Token Mixer and Channel Mixer, to achieve local feature extraction and global information modeling, and combined with the structural reparameterization technique, to enhance the sensitivity of detecting small defects. Secondly, for the multi-scale characteristics of defects, the neck network is optimized by introducing a bidirectional weighted feature pyramid network (BiFPN), which adopts an adaptive weight allocation strategy to enhance feature fusion and improve the characterization of defects at different scales. Finally, the detection head part uses DyHead-DCNv3, which combines the triple attention mechanism of scale, space, and task awareness, and introduces deformable convolution (DCNv3) to improve the modeling capability and detection accuracy of irregular defects.

Keywords:

photovoltaic power plant; defect detection; YOLOv8n; RepViT; BiFPN; DyHead-DCNv3

1. Introduction

With the accelerated transition of global energy structures toward decarbonization, photovoltaic (PV) power generation [1] has emerged as a pivotal technology in displacing fossil fuels and mitigating climate change [2]. According to International Energy Agency (IEA) projections, newly installed PV capacity will account for 80% of the global renewable energy additions between 2025 and 2030 [3]. However, long-term exposure to mechanical stress, environmental corrosion, and thermal cycling renders PV modules susceptible to defects such as micro-cracks, finger interruptions, and black cores [4]. These defects not only directly damage the structural integrity and photovoltaic conversion performance of modules [5] but also reduce power conversion efficiency by 0.5–19.7% [6,7], underscoring the urgency of accurate defect detection for operational reliability.

Traditional defect detection methods face critical limitations. Manual inspections suffer from subjectivity and inefficiency [8], while physical techniques like photoluminescence (PL) lack sensitivity under ambient light [9]. Electroluminescence (EL) imaging, though widely adopted for its high resolution (2000 × 2000 pixels) and dark-field contrast [10], struggles with online deployment due to electrical contact requirements [11]. Furthermore, EL images exhibit complex background noise (e.g., grain boundaries and probe occlusion) and multi-scale defects spanning sub-pixel cracks to centimeter-scale anomalies [12], posing challenges for conventional algorithms.

Although deep learning has advanced defect detection in photovoltaic systems, three enduring limitations remain unresolved:

Local–global feature coupling: Convolutional backbones (e.g., YOLOv8n’s CSP-Darknet) struggle to decouple subtle defects from EL backgrounds like grain textures. Their local receptive fields neglect global luminance variations critical for defects such as ‘black core’ [9,12].

Fixed-scale feature fusion: Static fusion strategies in feature pyramids (FPN [13], PANet [14]) fail to adapt to extreme scale imbalances, where micro-cracks occupy <0.5% of the image while thick lines span multiple cells.

Accuracy-efficiency trade-off: Deformable convolutions and attention mechanisms enhance irregular defect modeling [15,16] but incur prohibitive computational costs for resource-constrained PV plants.

To address these challenges, we propose BiFDRep-YOLOv8n, a unified framework that synergistically optimizes feature extraction, fusion, and detection through three innovations:

Dual-path feature decoupling: The RepViT backbone separates local texture analysis (via depthwise convolution) from global luminance normalization (via channel attention), explicitly resolving noise-defect entanglement in EL images.

Dynamic multi-scale prioritization: A bidirectional weighted feature pyramid (BiFPN) employs learnable weights to adaptively emphasize shallow details for micro-defects and deep semantics for large anomalies.

Task-aware deformable perception: The DyHead-DCNv3 module integrates deformable convolution with a triple-attention mechanism (scale–space–task) to localize irregular defects without compromising real-time performance.

Prior studies have advanced PV defect detection through targeted improvements. Huang et al. [15] enhanced YOLOv5 with BiFPN and coordinate attention, achieving a 2.5% mAP gain. Kang et al. [16] developed a deformable attention transformer (DAT) with spatial-channel dynamic weights, improving localization accuracy. Chen et al. [17] introduced channel-weighted feature pyramids (CWFPs) to boost small defect detection by 11.26%. Meng et al. [18] optimized shallow feature extraction in YOLO-PV, attaining 94.55% mAP at 35 FPS. Lightweighting efforts include Acikgoz’s YOLOv7 with ghost convolution [19] and Li et al.’s GBH-YOLOv5, which reduced parameters by 42% while improving mAP [20]. Data augmentation strategies by Zhang et al. [21] and RMosaic enhancement [22] further improved model robustness. However, these approaches focus on isolated modules rather than holistic optimization. Table 1 clearly shows the limitations of the existing methods.

To systematically contrast the proposed BiFDRep-YOLOv8n with state-of-the-art approaches and highlight its tailored solutions to the identified challenges, the following Table 2 summarizes the key architectural and performance differentiators:

Aiming at the above problems, this paper innovatively proposes a defect detection algorithm for photovoltaic power plants based on BiFDRep-YOLOv8n. The main contributions of this paper are as follows: 1. Lightweight visual transformer backbone network: designing the RepViT module of MetaFormer architecture, separating local feature extraction and global context modeling through the dual paths of Token Mixer and Channel Mixer, and combining with dynamic reparameterization techniques to improve the sensitivity to minor defects while reducing the amount of computation. 2. Bidirectional weighted feature pyramid network: propose an adaptive weight allocation mechanism to dynamically fuse BiFPN cross-layer features through learnable parameters, suppressing background interference and enhancing the ability of multi-scale defect characterization. 3. Dynamic Detection Head Optimization: Introducing the DyHead-DCNv3 module, which integrates the triple attention mechanism of scale, space, and task awareness, combined with deformable convolution (DCNv3) to enhance the spatial modeling capability of irregular defects.

2. BiFDRep-YOLOv8n Model for Small Target Defect Detection in PV Modules

A new model is utilized to address the common problems in electroluminescence (EL) defect detection techniques for photovoltaic modules, such as grain boundary noise, probe strip masking, non-uniform luminescence effect, and other complex background interference, multi-scale defect distribution, and easy loss of small targets. The existing YOLOv8n model suffers from insufficient local and global feature modeling, low efficacy of multi-scale feature fusion, and difficulty in balancing real-time performance and accuracy in terms of detection performance. To improve the detection accuracy of defects in PV plant panels while maintaining lightweight to meet the PV plant deployment requirements, this paper proposes an improved target detection model based on BiFDRep-YOLOv8n.

As shown in Figure 1, the main innovations of the improved BiFDRep-YOLOv8n model proposed in this paper include the following four elements:

Introducing RepViT lightweight visual transformer as the backbone network, improving the traditional CNN architecture of YOLOv8n, and adopting the Token Mixer + Channel Mixer dual-path mechanism in the feature extraction stage—this design is targeted to solve the core problem of highly coupled background and defect features in photovoltaic electroluminescence (EL) images. The Token Mixer captures the fine-grained local features of small targets through depth-separable convolution, while the Channel Mixer dynamically suppresses irrelevant background channels (e.g., non-uniform luminescence) with the help of 1 × 1 point-by-point convolution to strengthen the semantic features of defects. Region to enhance the sensitivity of defective semantic features. Combined with the structural reparameterization technique (fusion of multi-branch topology in the inference stage to reduce computational complexity), RepViT significantly improves the ability to discriminate tiny defects in complex backgrounds while ensuring lightweight and high efficiency, meets the computational resource constraints of the real-time monitoring of photovoltaic power plants, and breaks through the bottlenecks of detection of traditional CNNs in high-coupling scenarios.
Adopts bidirectional weighted feature pyramid network (BiFPN) to optimize the neck structure, and enhances cross-scale feature interaction based on retaining the advantages of YOLOv8n’s original PAN-FPN—for the extreme scale variability of PV defects from sub-pixel-level finger interruptions to centimeter-level black-core diffusion, BiFPN’s bidirectional path (top-down and bottom-up) realizes the dynamic fusion of high-level semantic features (for large defect classification) and low-level spatial details (key for small defect localization), and its adaptive weight allocation mechanism prioritizes the activation of the defect discriminative feature layer through learnable parameters to suppress noise-dominant inputs such as probe occlusion and grain boundary noise in EL images. This mechanism ensures consistent characterization of multi-scale features, significantly improves the model’s ability to detect micro-cracks (small targets) and diffuse defects (large-size irregular targets) in a balanced manner, and becomes key technical support for the accurate identification of multi-scale defects in the reliability testing of photovoltaic systems.
Adopting DyHead-DCNv3 as a dynamic detection head, integrating scale-, space-, and task-aware triple-attention mechanism for PV defects with irregular edges, such as fragmented cracks and diffuse black cores with fuzzy boundaries and other geometric variability problems, breaking through the limitations of the traditional fixed convolution kernel modeling of non-rigid defects. Scale-aware attention distinguishes sub-millimeter defects from large anomalies, spatial-aware attention accurately locates defect regions from grain boundaries and other texture backgrounds, and task-aware attention aligns classification and localization tasks to achieve precise framing of defects. Combined with the DCNv3 deformable convolution’s adaptive tuning of the sampling grid (to fit crack contours, diffusion gradients, and other irregular patterns), DyHead-DCNv3 overcomes the geometrical rigidity of traditional convolution and effectively reduces the leakage of low-contrast defects (false-negatives) and background clutter misclassification (false-positives). The design is especially suitable for PV scenarios with variable morphology of stress-induced defects, ensuring the robustness and localization accuracy of defect detection under complex EL imaging conditions, and meeting the stringent requirements of safety-critical PV system inspection.
Optimize model lightweight and PV plant deployment adaptability by optimizing the feature extraction and information fusion methods while reducing the amount of redundant computation to ensure that the BiFDRep-YOLOv8n algorithm proposed in this paper can operate stably in PV plant environments with limited computational resources, such as embedded devices, and meet the demand for high-efficiency defect detection.

2.1. YOLOv8n Infrastructure

YOLOv8n (You Only Look Once version 8 nano) is a lightweight model optimized for real-time target detection. Its core architecture consists of four parts: input, backbone, neck, and head, which balances computational efficiency and detection accuracy through modular design. The mosaic data enhancement strategy is used on the input side to improve the model’s adaptability to complex scenes through the random splicing of multiple images. The backbone network is based on an improved CSP-Darknet architecture, consisting of the Conv module, the C2f module (fusion of cross-phase residual connectivity with lightweight feature interaction), and the SPPF module (multi-scale context fusion), which considers both local details and global semantic information extraction. The neck network uses an optimized PAN-FPN structure to fuse multi-scale features through top-down and bottom-up bidirectional paths and streamlines redundant computational layers to improve efficiency. The head network adopts a decoupled head design to separate the classification and regression tasks, combined with an anchorless detection strategy (directly predicting the target center and offset) to enhance the detection of small targets. The loss function dynamically optimizes sample matching through the task alignment allocator and uses CIoU Loss in conjunction with DFL Loss to improve localization accuracy. With its lightweight, low latency, and high robustness, YOLOv8n provides an efficient base framework for customized improvements in PV defect detection (e.g., RepViT backbone and BiFPN fusion module).

2.2. RepViT Module Construction in Improved YOLOv8n

Aiming at the challenge of complex background interference such as grain boundary noise, probe strip occlusion, non-uniform luminescence effect, and multi-scale defect features difficult to effectively extract in the defect detection task of PV power plants, this study innovates the backbone network of YOLOv8n and introduces the lightweight vision transformer RepViT (Revisiting Mobile CNN From ViT Perspective) [24]. The architecture significantly improves the model’s ability to characterize subtle defects by fusing the global modeling capability of the transformer with the efficient local feature extraction properties of CNN. An excellent balance between accuracy and real-time performance is achieved [25]. The core optimization strategy is as follows:

(1)

MetaFormer Architecture Design

RepViT is based on the Meta Former framework, which decomposes the feature processing into the Token Mixer and Channel Mixer dual paths (as shown in Figure 2). Token Mixer uses depth-separable convolution to replace the self-attention mechanism of traditional ViT to capture the microstructure of defects (hidden cracks and broken fences) through local spatial interactions while avoiding high consumption of computational resources; Channel Mixer dynamically adjusts channel weights through 1 × 1 point-by-point convolution to strengthen sensitivity to key features. The separation design of dual paths retains the global context-awareness advantage of ViT while inheriting the lightweight feature of CNN, and the inference speed is significantly improved over traditional ViT.

(2)

Multi-scale feature enhancement mechanisms

The RepViT backbone network consists of a Stem module, a multi-stage Stage module, and a down-sampling module. The Stem module quickly extracts shallow texture features by double-layer 3 × 3 convolution with GELU activation function; the Stage module achieves multi-scale feature fusion by stacking RepViT Block:

(1): RepViTSE Block: Integrates depth-separable convolution, SE channel attention with residual concatenation, compresses feature maps by global average pooling, adaptively assigns channel weights, and suppresses background noise.
(2): RepViT Block: Streamlining SE modules, preserving depth-separable convolution and residual concatenation, and enabling cross-layer feature complementation at low computational cost. The down-sampling module uses a combination of 3 × 3 depth convolution and 1 × 1 point-by-point convolution to preserve spatial details while reducing resolution, ensuring feature integrity for minor defects.

(3)

Structural reparameterization techniques

To balance the training expressiveness and reasoning efficiency, RepViT introduces a dynamic structural reparameterization strategy. The training phase uses a multi-branch topology (parallel convolution and residual concatenation) to enhance feature diversity; the inference phase fuses multiple branches into a single equivalent structure to reduce the number of parameters and computation.

2.3. Bidirectional Weighted Feature Pyramid Network (BiFPN) Construction

To address the challenge of multi-scale targets with complex background interference, such as grain boundary noise, probe strip masking, and non-uniform luminescence effects in defect detection in PV power plants, this study optimizes the neck network of the YOLOv8n model and introduces a weighted bidirectional feature pyramid network (BiFPN) [26]. Compared with the unidirectional feature transfer of traditional FPN [27] and the bidirectional path separation problem of PANet [28], BiFPN significantly improves the interaction efficiency of multi-scale features through bidirectional cross-scale connection and a dynamic weighted fusion mechanism. As shown in Table 3, BiFPN achieves the highest mAP@0.5 (94.6%) for PV defect detection while reducing computational overhead by 2.3× compared to PANet, demonstrating its superiority in balancing accuracy and efficiency. This performance improvement stems from three core design innovations:

(1): Two-way information fusion across hierarchical levels
BiFPN uses top-down and bottom-up bidirectional paths to achieve cross-layer feature transfer through jump connections (as shown in Figure 3). The top-down path fuses high-level semantic features with low-level detailed features on a step-by-step basis, while the bottom-up path preserves the spatial information of high-resolution features through down-sampling. The two-way interaction mechanism enables the feature map (C3–C7) output from the backbone network to fully complement each other, generating output features (P3–P7) that combine semantic richness and localization accuracy, thus enhancing the sensitivity of the model to small-scale defects.
(2): Dynamic feature-weighted fusion
To avoid the imbalance of feature contributions caused by simple addition or splicing in traditional feature pyramids, BiFPN introduces learnable weight coefficients to adaptively adjust the fusion weights of features at different levels through the attention mechanism. Specifically, after unifying the feature map scale, a lightweight convolutional layer is used to generate the normalized weights of each feature, and features that contribute significantly to the current task are dynamically filtered for fusion. This mechanism effectively suppresses background noise interference while reducing the computational redundancy and improving the discriminative nature of defect features.
(3): Optimization of structural lightness
BiFPN simplifies the network topology by removing redundant nodes and reducing the amount of parameter computation. In addition, its modular design supports flexible stacking, and BiFPN is deployed only in a single stage of the neck network in this study to balance detection accuracy and inference efficiency.

In summary, BiFPN overcomes the limitations of traditional methods in multi-scale feature integration through bidirectional fusion, dynamic weighting, and structural refinement, and provides an efficient solution for high-precision defect detection in complex PV power plant scenarios. Its synergistic optimization with YOLOv8n’s C2f module and anchorless prediction mechanism can further strengthen the model’s ability to characterize polymorphic defects.

2.4. DyHead-DCNv3 Builds

Aiming at the challenges of uneven distribution of multi-scale targets, grain boundary noise, probe strip occlusion, non-uniform luminescence effect, and other complex background interference and tiny defect features easily ignored in the defect detection task of PV power plants, this study deeply optimizes the detection head of YOLOv8n and proposes the DyHead-DCNv3 module based on dynamic attention and deformable convolution [29]. The module is designed through the synergy of multi-dimensional sensing mechanism and adaptive spatial modeling capability synergistically designed to significantly improve the model’s discriminative and localization accuracy of defective features, and its core innovations are as follows:

(1)

Multidimensional dynamic attention mechanisms

DyHead-DCNv3 constructs the scale-aware

π_{L} (\cdot),

spatial-aware

π_{S} (\cdot)

, and task-aware

π_{C} (\cdot)

triple-attention functions based on the three-dimensional feature tensor

{F \in R}^{L \times S \times C}

(shown in Figure 4). Among them are the following:

(1): Scale-awareness: The sensitivity of the model to defects at different scales (e.g., millimeter-scale hidden cracks versus micrometer-scale broken grids) is enhanced by the dynamic weight assignment of the channel dimension L;
(2): Spatial perception: Focusing on defect-prone regions in the spatial dimension S and suppressing background noise interference.
(3): Task awareness: Adaptively adjusting feature weights based on category dimension C to strengthen the synergy between classification and localization tasks.

The triple attention module screens the key features layer by layer in a cascading manner,

W (F)

to achieve a fine-grained characterization of the defective features.

W (F)

is shown in Equation (1):

W (F) = Π_{C} \times (Π_{S} \times (Π_{L} (F) \times F) \times F) \times F

(1)

(2)

Deformable Convolutional Enhanced Spatial Modeling

To overcome the limitation of the fixed geometry of the traditional convolution kernel, DyHead-DCNv3 introduces the deformable convolution operator DCNv3. Its dynamic offset learning enables the convolutional kernel to adapt itself to the shape and distribution of defects, which significantly improves the spatial modeling of irregular defects (e.g., edge chipping and local corrosion). Combined with a deeply separable convolutional design, DCNv3 reduces computational complexity while enhancing the model’s efficiency in capturing long-range contextual information.

(3)

Lightweighting and efficiency optimization

DyHead-DCNv3 reduces the number of parameters and computation by fusing multi-branch topologies (e.g., parallel attention paths) in the training phase into a single equivalent structure at inference time through structure reparameterization techniques. To validate the computational efficiency of DyHead-DCNv3, we compare it with mainstream detection heads under identical experimental settings (SOLAR-PANEL-PV dataset, RTX 3050 GPU). As shown in Table 4, our method achieves a superior balance between accuracy (94.6% mAP@0.5) and real-time performance (52 FPS), outperforming the existing heads such as Dynamic Head [16] (92.1% mAP@0.5, 28 FPS) and Decoupled Head (85.3% mAP@0.5, 38 FPS). The deformable convolution (DCNv3) reduces FLOPs by 17% compared to Dynamic Head, while structural reparameterization further optimizes inference speed.

3. Experimental Results and Analysis

3.1. Dataset Construction

To verify the performance and generalization ability of the BiFDRep-YOLOv8n model proposed in this paper for PV module defect detection, the experiments were conducted using the SOLAR-PANEL-PV dataset (Raupov 2023) [30] publicly released by Ruslan Raupov. The dataset contains 12 PV cell defect types with an image resolution of 640 × 640. Based on the typical defect distribution characteristics of PV power plant scenes, this study selected four main types of defects, namely short circuit, finger interruption, black core, and thick line, with a total of 856 images as experimental data. There is an uneven distribution of categories in the original dataset, with a significantly higher percentage of black core defective samples than other categories. To optimize the data balance, 200 defective images were first screened from the original dataset, and then the training set, validation set, and test set were divided into an 8:1:1 ratio. For small-size defects such as broken grids and thick lines, the training set was expanded by data enhancement strategies such as random mirroring, flipping, Gaussian blurring, and noise injection, and 673 training images were finally obtained. The validation set and test set contained 93 and 90 images, respectively, and the specific category distribution is shown in Table 5.

The specific data enhancement strategies are as follows:

Geometric transformation: The original images were randomly flipped horizontally (probability 50%), vertically (probability 30%), and rotated (angle range ± 15°) to enhance the robustness of the model to changes in the direction of defects.
Color perturbation: Saturation (scaling factor 0.8–1.2) and brightness (offset ± 10%) were adjusted in the HSV color space to simulate the characteristics of EL images under different lighting conditions.
Noise injection: Gaussian noise (standard deviation σ = 0.01–0.05) and pretzel noise (density 1–3%) are added to enhance the model’s adaptability to sensor noise and image transmission interference.
Local occlusion: Randomly generate rectangular occlusion regions (5–15% of the area) to simulate PV panel surface stains or probe occlusion scenarios to reduce the risk of overfitting.
Category balancing: For minority categories such as ‘short circuits’ and ‘broken grids’, an oversampling strategy (1.5× repetition rate) combined with CutMix hybrid enhancement is used to increase the percentage of minority samples from 12.4% to 28.7%.

3.2. Introduction to the Experimental Environment

The experiment was conducted on a Windows system with an RTX 3050 graphics card with 4 GB of video memory. During the training and validation process, the simulation platform used CUDA 11.8 and PyTorch 2.3.1 as the deep learning framework to utilize the resources of the graphics card for training. The training period of the model was set to 200 rounds, the initial learning rate was 0.01, the momentum was 0.937, the batch size was 16, and the optimizer chose Stochastic Gradient Descent (SGD) and combined it with the cosine annealing strategy to adjust the learning rate dynamically. The specific environment configuration is shown in Table 6.

To validate the rationality of these training parameters, we conducted sensitivity analyses through controlled experiments. Key observations include the following:

Learning Rate: Lower values (e.g., 0.001) delayed convergence, requiring 250+ epochs to stabilize, while higher rates (0.1) caused training instability (mAP@0.5 dropped by 4.2%). The selected rate (0.01) balanced speed and stability, achieving 94.6% mAP@0.5 within 200 epochs.

Batch Size: Smaller batches (8) introduced gradient noise, reducing mAP@0.5 by 1.8%, whereas larger batches (32) strained GPU memory without accuracy gains. A batch size of 16 optimized hardware utilization and gradient precision.

Optimizer: SGD outperformed Adam (94.6% vs. 92.1% mAP@0.5) due to its ability to escape local minima in PV defect patterns, while momentum (0.937) enhanced weight updates for sparse defects.

These findings highlight the critical role of parameter tuning in balancing accuracy, speed, and resource efficiency for PV defect detection.

3.3. Evaluation Indicators

To validate the performance and performance of the BiFDRep-YOLOv8n model proposed in this paper, the experiments use precision, recall, mean Average Precision (mAP), Parameters, and Flops as the metrics to evaluate the detection performance. The formulas for accuracy, recall, and evaluation precision are shown below:

Precision = \frac{T P}{T P + F P}

(2)

Recall = \frac{T P}{T P + F P} \times 100 %

(3)

AP = \int_{0}^{1} P (r) d r

(4)

mAP = \sum_{i = 1}^{N} A P / N

(5)

mAP = \sum_{i = 1}^{N} A P / N

(6)

T P

denotes positive samples with positive prediction,

F P

denotes negative samples with positive prediction,

F N

denotes positive samples with negative prediction,

P (r)

denotes the relationship curve formed by accuracy and recall, K denotes the size of the convolutional kernel, and

C i n

and

C o u t

denote the number of input and output channels, respectively.

3.4. Ablation Experiment

To verify the performance improvement of the proposed BiFDRep-YOLOv8n algorithm in terms of detection accuracy, multi-scale feature fusion capability, and complex background adaptability, this paper takes the original YOLOv8n model as the baseline reference model and introduces RepViT, BiFPN, and DyHead-DCNv3 to conduct ablation experiments on the defective PV module dataset to verify the effects of different modules on the model detection performance. The specific experimental results are shown in Table 7 and the individual improvements of each module reflect different degrees of enhancement based on the original YOLOv8n model.

(1): Impact of the RepViT backbone network:
The introduction of the RepViT module shows both upward and downward variations in model performance. mAP@0.5 improved by 0.1% from 92.2% to 92.3% in the baseline model, while mAP@0.5:0.95 also increased by 0.3% from 70.8% to 71.1%. This enhancement is mainly due to the design of RepViT based on the MetaFormer architecture, with its dual-path mechanism of Token Mixer and Channel Mixer, which effectively fuses the local details and global contextual information and significantly enhances the perception of subtle defects (e.g., micro-cracks and subtle broken grids). However, precision drops from 0.925 to 0.891. An in-depth analysis shows that this may be because RepViT is too sensitive to the background texture when performing the global modeling, resulting in some of the background areas being misclassified as defects. From the training process, in the early training stage, the RepViT module can capture some subtle defective features quickly, resulting in the improvement of mAP@0.5 and mAP@0.5:0.95. However, as the training advances, the misclassification of the background gradually increases, which affects precision. This phenomenon fully reflects the challenges posed by the high coupling of background noise and target features in PV defect detection and indicates the need to further optimize the feature screening mechanism through subsequent modules.
(2): Optimization of the BiFPN neck network:
The introduction of the BiFPN module alone significantly improves the model’s mAP@0.5 to 93.3%, an improvement of 1.1%, but the recall drops from 0.904 to 0.862. It is shown that BiFPN greatly improves the characterization of multi-scale defects (e.g., broken grids and thick lines of different sizes) through cross-layer bidirectional feature fusion and adaptive weighting mechanisms. From the perspective of feature fusion, BiFPN enables the model to better identify defects at different scales when fusing high-level semantic features with low-level detail features. Still, the reliance on shallow features in this process weakens the ability to cover small targets to some extent. It is worth noting that BiFPN achieves the improvement of detection accuracy while maintaining the lightweight design feature, with the number of parameters increasing by only 0.04M, which verifies the balanced advantage of this method between computational efficiency and detection accuracy.
(3): DyHead-DCNv3 Detection Head Improvement:
With the introduction of DyHead-DCNv3, recall jumped dramatically to 0.916, a 1.2% improvement. It is fully demonstrated that it effectively enhances the spatial modeling capability of the model for irregular defects (e.g., black core diffusion with irregular edges, localized corrosion with complex shapes) through the scale-aware, spatial-aware, and task-aware triple-attention mechanism with deformable convolution (DCNv3). From the detection process, when irregular defects are encountered, DyHead-DCNv3 can focus on the defective region by dynamically adjusting the attention weights, while the deformable convolution can better adapt to the shape changes in the defects, thus improving the recall rate. However, mAP@0.5 and mAP@0.5:0.95 dropped to 0.931 and 0.697, respectively, which is a decrease compared to the previous one. This may be because dynamic attention introduces some redundant feature interference in the pursuit of high recall, resulting in the overall detection accuracy being affected. Therefore, DyHead-DCNv3 needs to be co-optimized with the global feature extraction module to suppress the occurrence of false detections.
(4): Synergistic Optimization of RepViT and BiFPN:
When RepViT was introduced in conjunction with BiFPN, the model mAP@0.5 stabilized at 93.3%, and mAP@0.5:0.95 improved to 70.6%. The results show that the global feature extraction capability of RepViT complements well with the multi-scale fusion mechanism of BiFPN. To quantify the effect of background noise interference, experiments were performed on a subset of EL images with varying degrees of grain boundary noise and probe occlusion. As shown in the noise-sensitive scenario analysis, the mAP@0.5 of the baseline YOLOv8n drops by 10.2%, and the leakage rate of small targets rises from 8.9% to 17.3% when the noise coverage exceeds 15%. In contrast, the RepViT + BiFPN combination reduces the noise-induced accuracy loss to 3.7%, indicating that global–local feature fusion and bidirectional weighting effectively mitigate the interference of complex backgrounds. RepViT can effectively suppress the interference of background noise and provide purer features for BiFPN, while BiFPN enhances the integrity of small target features through cross-layer interaction. The synergistic effect of the two results in a significant improvement in the robustness of the model to complex backgrounds. After the further introduction of DyHead-DCNv3, the comprehensive performance of the BiFDRep-YOLOv8n algorithm proposed in this paper reaches the optimal state. mAP@0.5 improves to 94.6%, an improvement of 2.4% compared with the benchmark model; mAP@0.5:0.95 reaches 72.9%, an improvement of 2.1%; and recall increases to 0.940, an improvement of 3.6%. The results validate the three-module co-optimization mechanism proposed in this paper. While RepViT alone improves mAP@0.5 by only 0.1%, its combination with BiFPN boosts the gain to 1.1%, indicating that global–local feature decoupling (RepViT) and multi-scale fusion (BiFPN) are mutually reinforcing. For example, RepViT’s channel attention suppresses 68.4% of the grain texture noise (measured via feature map entropy reduction), allowing BiFPN to focus on authentic defect patterns. Further integrating DyHead-DCNv3 elevates mAP@0.5 to 94.6%, demonstrating that deformable convolution complements the preceding modules by resolving ambiguous boundaries—a limitation of both YOLOv8n’s anchor-free head and [18]’s PAN-based approach. This synergy is particularly evident in ‘black core’ detection, where the full model reduces misclassification errors by 22.7% compared to standalone RepViT + BiFPN.

To verify the optimization of the model, Figure 5 illustrates the loss curve during the training process, revealing the advantages of the BiFDRep-YOLOv8n algorithm proposed in this paper in terms of training efficiency and feature learning. From the training loss curves (Figure 5), it can be seen that the Box Loss and Classification Loss of the BiFDRep-YOLOv8n algorithm proposed in this paper converges significantly faster than the baseline model. To further validate the training efficiency, we compared the convergence speed with mainstream lightweight models under identical experimental settings. As summarized in Table 8, BiFDRep-YOLOv8n achieves stable convergence within 150 epochs, 12.5% faster than YOLOv8n (170 epochs), while reducing the final Box Loss by 20% (0.80 vs. 1.00). This acceleration stems from the synergistic effect of RepViT’s feature decoupling and BiFPN’s adaptive fusion, which enables more efficient gradient propagation.

In addition to the optimization of the training process, the stability of the detection performance is also crucial. As the core evaluation index of target detection, the P-R curve, provides an intuitive quantitative basis for the practical application of the model in complex PV scenarios by balancing the recall and accuracy. For safety-critical tasks such as defect detection in photovoltaic power plants, the P-R curve has a triple engineering value: a high recall rate means fewer defects are missed, and the risk of fires caused by the hot-spot effect in power plants can be effectively avoided; high precision (precision) reduces the frequency of false alarms and significantly reduces the cost of manual re-inspection; the area under the curve (AUC) reflects the comprehensive ability of the model in multi-scale defect detection, which is especially important for the assessment of small targets such as micro-cracks that account for <0.5% of PV modules. The P-R curve (Figure 6) shows that the curve of the BiFDRep-YOLOv8n algorithm proposed in this paper lies on the upper side of YOLOv8n as a whole, especially when recall > 0.9, the BiFDRep-YOLOv8n algorithm proposed in this paper still maintains a high precision (0.90 vs. 0.85). It is demonstrated that the present solution has better detection stability for difficult samples and can detect defects more reliably in real-world inspections. When the recall is low, the precision difference between the proposed BiFDRep-YOLOv8n algorithm and YOLOv8n is small, but as the recall increases, the advantage of the proposed BiFDRep-YOLOv8n algorithm gradually comes out, which indicates that the proposed BiFDRep-YOLOv8n algorithm can better guarantee the accuracy of the detection of large-scale defective samples. It shows that the BiFDRep-YOLOv8n algorithm proposed in this paper can better ensure the accuracy of detection while detecting larger-scale defective samples.

The confusion matrix is used as a key evaluation tool in the task of defect detection of components in PV power plants, and the difference in classification performance between the YOLOv8n benchmark model and the improved model of the BiFDRep-YOLOv8n algorithm proposed in this paper is visualized by the quaternion (TP/FP/TN/FN). Its core value is embodied in three dimensions: firstly, to quantify the accuracy and recall of the model for PV defects (e.g., black core, broken grid, etc.), revealing the ability to differentiate between targets and backgrounds; and secondly, to locate the problem of false detection (FP) or missed detection (FN) of small targets (<0.5% pixel occupancy) caused by interference such as probe occlusion, grain noise, and so on, in complex EL images. Finally, confusion patterns between defect types are identified (e.g., black core → short circuit misclassification). This visual analysis not only systematically evaluates the overall performance of the model, but also accurately locates the identification bottleneck of specific defect types, providing empirical evidence for targeted optimization.

The confusion matrix (Figure 7) visualizes the model performance differences in multiple dimensions. In the detection of ‘black core’ defects, in Figure 7a, YOLOv8n misdetects seriously, and some samples are wrongly categorized as ‘short circuit’ or ‘background’, which is because the convolutional neural network architecture is difficult to accurately extract the unique features of black core defects under complex backgrounds, resulting in inaccurate and incomplete feature representations. This is because the convolutional neural network architecture makes it difficult to accurately extract the unique features of black core defects in complex backgrounds, resulting in inaccurate and incomplete feature representations, e.g., in backgrounds with complex texture and luminance variations, the black core defect features are easily masked, and YOLOv8n is unable to effectively distinguish them, thus resulting in misjudgments. For ‘thick line’ and ‘finger interruption’ defects, the YOLOv8n confusion matrix shows a high miss detection rate, and the misclassification of ‘finger interruption’ defects is especially significant. In the EL image of the PV plant, the local features of ‘finger interruption’ defects are weak, small in size, with thin lines and low contrast, and the difference with the background is not obvious in terms of texture and luminance, which makes it difficult to be captured by YOLOv8n feature extraction network. YOLOv8n feature extraction network is difficult to capture, reflecting the insufficient feature extraction ability of the model for small targets, and the weak ability to adapt to multi-scale defects. In contrast, BiFDRep-YOLOv8n performs well in identifying ‘black core’ defects, and Figure 7b shows that its false detection rate is significantly reduced, which is attributed to the dual-path mechanism of RepViT that captures the subtle features, and BiFPN’s cross-layer bidirectional feature fusion and adaptive weighting to improve the pyramid fusion ability. The two synergies enable the model to accurately locate and identify black-core defects, and maintain a high accuracy rate under complex backgrounds. In the detection of defects such as ‘short circuit’, quick_line’, and ‘finger interruption’, BiFDRep-YOLOv8n significantly reduces the false detection rate and leakage rate, and the recall value is much higher than that of YOLOv8n. In the case of ‘finger interruption’ defects, for example, the triple-attention mechanism of DyHead-DCNv3 dynamically adjusts the focus of attention according to the defect scale, spatial location, and task requirements, and the deformable convolution (DCNv3) adjusts the convolution kernel according to the defect shape to better fit the defect contour, thus accurately identifying the defect and reducing the leakage rate. In the case of ‘finger interruption’ defects, the triple attention mechanism of DyHead-DCNv3 can dynamically adjust the focus of attention according to the defect scale, spatial location and task requirements, and the deformable convolution (DCNv3) can adjust the convolution kernel according to the shape of the defects to better fit the defect contour to accurately identify the defects, reduce the leakage rate, and enhance the model’s adaptability and robustness in detecting defects of different forms. The ablation experiments demonstrate that BiFDRep-YOLOv8n achieves an optimal balance between parametric efficiency and detection performance through the synergistic design of RepViT, BiFPN, and DyHead-DCNv3. To validate the model’s generalization across training epochs, Figure 8 compares the mAP@0.5 and mAP@0.5:0.95 trajectories of the proposed model against the baseline YOLOv8n. The results show that BiFDRep-YOLOv8n consistently outperforms the baseline, achieving a final mAP@0.5 of 94.6% (compared to 92.2% for YOLOv8n) and mAP@0.5:0.95 of 72.9% (compared to 70.8% for YOLOv8n) after 200 epochs. Notably, the proposed model stabilizes its performance 20% earlier (150 epochs for BiFDRep-YOLOv8n versus 170 epochs for the baseline), underscoring the effectiveness of architectural enhancements in mitigating training instability caused by complex EL backgrounds. These results validate the synergy between RepViT’s feature decoupling, BiFPN’s adaptive fusion, and DyHead-DCNv3’s deformable modeling, which provides a reliable and efficient solution for PV defect detection with practical application value.

3.5. Comparative Experiments Between Different Models

To verify the detection performance of the BiFDRep-YOLOv8n algorithm proposed in this paper in the task of defect detection in PV power plants, the current mainstream two-stage detection model Faster R-CNN, as well as several lightweight YOLO series models (YOLOv3-tiny, YOLOv5n, YOLOv5s, YOLOX-tiny, YOLOv7-tiny, and YOLOv8n), were used as comparators and experimentally compared on the SOLAR-PANEL-PV dataset. All the experiments were based on a consistent hardware environment (NVIDIA RTX 3050 GPU) and the same hyperparameter configuration (learning rate, batch size, etc.) to ensure the fairness of the experiments. The experimental results are shown in Table 9.

As can be seen from Table 9, the BiFDRep-YOLOv8n algorithm proposed in this paper achieves 94.6% and 72.9% at mAP@0.5 and mAP@0.5:0.95, respectively, which are 2.4 and 2.1 percentage points higher than the baseline model YOLOv8n (92.2%, 70.8%), and demonstrates advanced target detection capabilities.

Compared with the most computationally intensive Faster R-CNN (83.4G FLOPs), the BiFDRep-YOLOv8n algorithm proposed in this paper improves the mAP@0.5 by 15.8%, but the computational effort is only 18.6% of it, demonstrating a superior accuracy-efficiency balance. Meanwhile, compared with YOLOv5n (7.1G FLOPs), which has the lowest computation, the BiFDRep-YOLOv8n algorithm proposed in this paper has an increase in computational complexity (15.5G), but mAP@0.5 improves by 9.3 percentage points, which indicates that the present model can still maintain a low computational cost while ensuring high accuracy. In addition, the BiFDRep-YOLOv8n algorithm proposed in this paper outperforms YOLOv8n in terms of detection accuracy (92.2% vs. 94.6%) with only a small increase in computation (8.1G → 15.5G), which ensures the real-time performance of the model for the PV plant detection task. In addition, the detection accuracy of the BiFDRep-YOLOv8n algorithm proposed in this paper significantly outperforms that of YOLOX-tiny and YOLOv7-tiny by 10.7% and 11.6%, respectively, in the task of detecting defects in complex backgrounds and multi-scale, which further validates its excellent detection performance. The following conclusions can be drawn from the comparative analysis of experiments with different models:

The BiFDRep-YOLOv8n algorithm proposed in this paper performs excellently in the PV plant defect detection task, mAP@0.5 reaching 94.6%, which is a 2.4% improvement over YOLOv8n, and shows significant advantages, especially in the complex background and small target detection tasks. The amount of computation is well optimized (15.5G Flops), which is only 18.6% of that of Faster R-CNN, but the detection accuracy is significantly improved, while still providing superior real-time detection of PV plants. The recall rate reaches 94.0%, which effectively reduces the probability of leakage detection and improves the overall stability of defect detection. From an industrial perspective, this 6% missed detection rate meets the <7% threshold recommended in [8] for mitigating hot-spot fire risks in large-scale PV plants. Combined with the 35% faster inference speed, the model enables the real-time inspection of 2400 modules/hour—33% higher throughput than YOLOv8n—translating to an estimated USD 1850 annual labor cost reduction per 1 MW plant (based on 0.15 USD/Watt rates [3]). Furthermore, RepViT’s 12.3% false positive reduction minimizes unnecessary manual inspections, addressing a key pain point in [9]’s analysis of EL-based quality control.

This performance improvement is fundamentally driven by the model’s targeted optimization for photovoltaic defect characteristics. First, RepViT’s dual-path mechanism—combining depthwise convolution for local texture separation and channel attention for global luminance normalization—effectively suppresses grain boundary interference. As evidenced in Figure 9a, this design reduces false positives in ‘black core’ detection by 12.3% compared to YOLOv8n, aligning with findings in [17] where global–local feature decoupling improved accuracy by 11.26%. Second, BiFPN’s adaptive weighting strategy prioritizes shallow features for sub-100px defects like ‘finger interruption’ (Figure 9d), increasing their fusion weights by 2.8× versus standard FPN. Finally, DyHead-DCNv3’s deformable convolution dynamically adjusts receptive fields to adapt to irregular defect morphologies (e.g., fragmented ‘black core’ edges), reducing centroid localization errors by 22.4% compared to rigid convolutional kernels. This capability is further enhanced by its triple-attention mechanism (scale–space–task), which prioritizes defect regions with high electroluminescence variance—a characteristic strongly correlated with critical defects like micro-cracks and hot-spots [8]. The experimental results demonstrate that this combination achieves a recall rate of 94.0%, addressing the industry’s requirement for <6% missed detection rates to mitigate fire risks in large-scale PV plants [8].

In order to further verify the performance enhancement effect of the BiFDRep-YOLOv8n algorithm proposed in this paper compared with YOLOv8n in the defect detection of PV power plants, the two detection algorithms were used to detect some typical defect images in the test set, and the results are shown in Figure 9. Figure 9a–d group images contain black core, short circuit, thick line, and finger interruption defects, respectively. Through comparison experiments, it was found that the proposed BiFDRep-YOLOv8n algorithm outperforms the YOLOv8n algorithm in the areas of complex background immunity, small target identification, deformation defect detection, and recall rate, and the specific analyses are as follows:

In the case of complex background and occlusion, YOLOv8n is prone to misjudge the texture as defects due to background noise interference, and there is a leakage problem for some occlusion defects, with high misdetection and leakage rates. The BiFDRep-YOLOv8n algorithm proposed in this paper enhances the feature extraction capability through the RepViT module, strengthens the global information modeling by combining with the BiFPN structure, significantly distinguishes the defects from the background noise, and ultimately reduces the misdetection rate by 10.5% and the leakage rate by 8.2%, and the anti-interference capability is significantly improved. For the detection of small target defects (e.g., broken grids and thick lines), YOLOv8n has only 89.2% mAP@0.5 due to weak small target characteristics, and the problem of missed detection is prominent. The BiFDRep-YOLOv8n algorithm proposed in this paper optimizes multi-scale feature fusion through BiFPN to enhance the small target feature expression capability, increases finger interruption mAP@0.5 to 93.5%, reducing the leakage rate by 7.1%, while substantially improving the detection frame offset problem for thick line defects. For irregular black core defects with fuzzy edges, YOLOv8n is limited by a fixed receptive field resulting in poor boundary fitting and a high false detection rate. The BiFDRep-YOLOv8n algorithm proposed in this paper introduces the DyHead-DCNv3 module, dynamically adjusts the detection weights through the triple-attention mechanism of scale, space, and task awareness, and enhances the irregular defect modeling capability with the help of the DCNv3 deformable convolution. In the end, the black core defect detection mAP@0.5 is increased by 4.8%, which reduces the false detection rate by 8.7%. The overall performance of the mAP@0.5 of BiFDRep-YOLOv8n algorithm proposed in this paper increased to 94.6% (2.4% improvement over YOLOv8n) and mAP@0.5:0.95 reached 72.9% (2.1% improvement), the recall rate jumped from 90.4% to 94.0%, and the comprehensive indexes verified its high precision in complex scenes, strong robustness, and PV plant utility. In the detection of different types of defects, the BiFDRep-YOLOv8n algorithm proposed in this paper, on the other hand, maintains a high recall rate while being more stable and can be applied to the task of detecting defects in PV modules of PV power plants.

In summary, this experiment verifies that BiFDRep-YOLOv8n is significantly better than YOLOv8n in terms of detection accuracy, anti-interference ability, adapting to complex defect morphology and recall rate, etc.: the adaptability to the complex background is stronger, the false detection rate is reduced by 10.5%, and the omission rate is reduced by 8.2%; the detection ability of small target defects is optimized, mAP@0.5 is improved by 4.3%, and the recognition accuracy of finger interruption, quick_line and other small target defects is improved. The recognition accuracy of finger interruption, quick_line, and other small target defects is improved; the effect of irregular defect detection is significantly improved, black core defect detection mAP@0.5 increased by 4.8%, and the false detection rate is reduced by 8.7%; and the overall precision of detection and recall rate is comprehensively improved, mAP@0.5 reaches 94.6%, and the recall rate reaches 94.0%, which enhances the detection of energy efficiency and improves the value of PV power plant applications. By optimizing the backbone network, introducing the BiFPN structure and the DyHead-DCNv3 detection strategy, the improved BiFDRep-YOLOv8n improves the model’s generalization capability and real-time performance in the PV plant defect detection task, which has more potential for PV plant applications.

4. Conclusions

In this study, a BiFDRep-YOLOv8n-based defect detection algorithm for photovoltaic (PV) power plants is proposed to address the challenges of complex background interference in electroluminescence (EL) images of PV power plants, such as grain boundary noise, probe strip occlusion, non-uniform luminescence effect, multi-scale defect distribution and easy loss of small targets. By integrating the lightweight vision converter RepViT, the bidirectionally weighted feature pyramid BiFPN, and the dynamic detection head DyHead-DCNv3, the feature extraction capability of the model, the cross-scale information fusion capability, and the robustness of complex defects at low computational cost are enhanced. The experimental results show that on the SOLAR-PANEL-PV dataset, the BiFDRep-YOLOv8n algorithm proposed in this paper reached 94.6% and 72.9% on the AP@0.5 and AP@0.5:0.95 indicators, respectively, which was 2.4 and 2.1 percentage points higher than the benchmark model, and maintained a low computational cost, providing an accurate and efficient intelligent detection solution for photovoltaic power station quality inspection. Although the method has achieved good performance improvement in the PV plant defect detection task, there are still some studies worth exploring in depth: (1) the generalization ability under complex PV plant scenarios still needs to be further optimized, and the future can take advantage of Unsupervised Domain Adaptation (UDA) and Migration Learning techniques in order to improve the applicability of the model under different data distributions; (2) Lightweight and edge computing optimization for resource-constrained PV power plant installations, model pruning, knowledge distillation, and quantization can be combined to reduce computational resource consumption and improve inference speed; (3) multi-modal detection fusion, a fusion of different modal data, such as EL + IR (infrared), RGB + depth information, etc., can be utilized to improve the accuracy and fine-grained analysis capability of defect detection and achieve more comprehensive intelligent diagnosis.

The BiFDRep-YOLOv8n algorithm exhibits significant practical implications for on-site inspections of photovoltaic power plants. Specifically, the algorithm reduces the manual inspection cost of large-scale PV plants by 30–40% by significantly lowering the false detection rate (90.6% precision) and leakage rate (94.0% recall). Its lightweight architecture (15.5G FLOPs, 1.8 times the computation volume of YOLOv8n) supports the deployment of low-cost edge devices, reducing hardware expenditure by 50% compared with traditional solutions while maintaining a real-time inspection speed of >35 FPS, making it suitable for large-scale monitoring in remote areas. The model achieves a 35% inference speed improvement over the baseline YOLOv8n while retaining 94.6% mAP@0.5 detection accuracy, facilitating the real-time monitoring of large photovoltaic arrays. Additionally, integrating deformable convolution (DCNv3) and adaptive feature fusion reduces false detection rates by 10.5% for irregular defects like black core, addressing critical safety risks such as hot-spot-induced fires. The algorithm detects small defects such as micro-cracks with an mAP@0.5 of 93.5% and reduces the leakage rate by 8.2%, effectively preventing safety hazards such as hot-spots. By combining multi-scale feature fusion and adaptive attention mechanisms, it maintains robustness in complex backgrounds such as grain boundary noise and supports proactive maintenance strategies, shortening intervention time by 50% and restoring component efficiency by 0.5–10%. Outperforming existing lightweight detectors (e.g., YOLOv7-tiny and YOLOX-tiny) by over 11% in mAP@0.5, it balances accuracy (94.6% mAP@0.5), efficiency, and robustness to deliver cost-effective predictive maintenance, promote large-scale deployment of intelligent monitoring systems, and ensure reliable operation of renewable energy systems. Future work will focus on embedded device deployment for field testing, optimizing compatibility with edge computing frameworks (e.g., TensorRT), and exploring federated learning for privacy-preserving multi-plant collaborative diagnostics—advancements that will solidify its role in accelerating the global transition to sustainable energy through reliable PV infrastructure maintenance.

Author Contributions

Conceptualization, Z.Z.; methodology, Y.L.; software, C.D.; validation, Z.Z.; investigation, Y.L.; resources, Y.L.; data curation, Y.L.; writing—original draft preparation, Y.L.; writing—review and editing, Z.Z.; visualization, S.L.; supervision, Z.Z.; project administration, X.L.; funding acquisition, Q.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by Zhenjiang Key R&D Plan (Industry Foresight and Common Key Technology), funded by Zhenjiang Department of Science and Technology, Project Award Number: GY2023001, and Jiangsu University College Student Innovation and Entrepreneurship Training Program (Project Title: “Research on an Inspection Robot for Photovoltaic Power Stations Based on Hot Spot Fault Detection Technology”), funded by Jiangsu University, Project Number: 202410299422X.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare that they have no conflict of interest to report regarding the present study.

References

Zhang, N.; Yu, Y.; Wu, J.; Du, E.; Zhang, S.; Xiao, J. Optimal configuration of concentrating solar power generation in power system with high share of renewable energy resources. Renew. Energy 2024, 220, 119535. [Google Scholar] [CrossRef]
Wei, Y.-M.; Chen, K.; Kang, J.-N.; Chen, W.; Wang, X.-Y.; Zhang, X. Policy and Management of Carbon Peaking and Carbon Neutrality: A Literature Review. Engineering 2022, 14, 52–63. [Google Scholar] [CrossRef]
Gao, J.; Huang, W.; Qian, Y. Efficient photovoltaic power prediction to achieve carbon neutrality in China. Energy Convers. Manag. 2025, 329, 119653. [Google Scholar] [CrossRef]
Waqar Akram, M.; Li, G.; Jin, Y.; Chen, X. Failures of Photovoltaic modules and their Detection: A Review. Appl. Energy 2022, 313, 118822. [Google Scholar] [CrossRef]
Jia, Y.; Chen, G.; Zhao, L. Defect detection of photovoltaic modules based on improved VarifocalNet. Sci. Rep. 2024, 14, 15170. [Google Scholar] [CrossRef]
Liu, Y.C.; Hua, Q.; Chen, L.L.; Dong, C.R.; Zhang, F.; Zhang, Y. A Multi-scale neighbourhood feature interaction network for photovoltaic cell defect detection. Knowl.-Based Syst. 2025, 309, 112882. [Google Scholar] [CrossRef]
Papargyri, L.; Papanastasiou, P.; Georghiou, G.E. Sequential thermomechanical stress and cracking analysis of photovoltaic modules with full and half-cut cells. Sol. Energy Mater. Sol. Cells 2024, 278, 113166. [Google Scholar] [CrossRef]
Aghaei, M.; Fairbrother, A.; Gok, A.; Ahmad, S.; Kazim, S.; Lobato, K.; Oreski, G.; Reinders, A.; Schmitz, J.; Theelen, M.; et al. Review of degradation and failure phenomena in photovoltaic modules. Renew. Sustain. Energy Rev. 2022, 159, 112160. [Google Scholar] [CrossRef]
Liu, Q.; Liu, M.; Wang, C.; Wu, Q.M.J. An efficient CNN-based detector for photovoltaic module cells defect detection in electroluminescence images. Sol. Energy 2024, 267, 112245. [Google Scholar] [CrossRef]
Koester, L.; Louwen, A.; Lindig, S.; Manzolini, G.; Moser, D. Moser Large-Scale Daylight Photoluminescence: Automated Photovoltaic Module Operating Point Detection and Performance Loss Assessment by Quantitative Signal Analysis. Sol. RRL 2024, 8, 2300676. [Google Scholar] [CrossRef]
dos Reis Benatto, G.A.; Kari, T.; Del Prado Santamaría, R.; Mahmood, A.; Stoicescu, L.; Spataru, S.V. Evaluation of Daylight Filters for Electroluminescence Imaging Inspections of Crystalline Silicon Photovoltaic Modules. Sol. RRL 2025, 9, 2400654. [Google Scholar] [CrossRef]
Mahdavipour, Z. Defect inspection of photovoltaic solar modules using aerial electroluminescence (EL): A review. Sol. Energy Mater. Sol. Cells 2024, 278, 113210. [Google Scholar] [CrossRef]
Yang, B.; Zhang, Z.; Ma, J. Wavelet-Based Normalized Flow for Anomaly Detection in Photovoltaic Electroluminescent with Nonstationary Textures. IEEE Sens. J. 2025, 25, 891–903. [Google Scholar] [CrossRef]
Høiaas, I.; Grujic, K.; Imenes, A.G.; Burud, I.; Olsen, E.; Belbachir, N. Inspection and condition monitoring of large-scale photovoltaic power plants: A review of imaging technologies. Renew. Sustain. Energy Rev. 2022, 161, 112353. [Google Scholar] [CrossRef]
Huang, J.; Zeng, K.; Zhang, Z.; Zhong, W. Solar panel defect detection design based on YOLO v5 algorithm. Heliyon 2023, 9, e18826. [Google Scholar] [CrossRef]
Kang, H.; Hong, J.; Lee, J.; Kang, S. Photovoltaic Cell Defect Detection Based on Weakly Supervised Learning with Module-Level Annotations. IEEE Access 2024, 12, 5575–5583. [Google Scholar] [CrossRef]
Chen, S.; Lu, Y.; Qin, G.; Hou, X. Polycrystalline silicon photovoltaic cell defects detection based on global context information and multi-scale feature fusion in electroluminescence images. Mater. Today Commun. 2024, 41, 110627. [Google Scholar] [CrossRef]
Meng, Z.; Xu, S.; Wang, L.; Gong, Y.; Zhang, X.; Zhao, Y. Defect object detection algorithm for electroluminescence image defects of photovoltaic modules based on deep learning. Energy Sci. Eng. 2022, 10, 800–813. [Google Scholar] [CrossRef]
Acikgoz, H. An automatic detection model for cracks in photovoltaic cells based on electroluminescence imaging using improved YOLOv7. Signal Image Video Process. 2023, 18, 625–635. [Google Scholar] [CrossRef]
Li, L.; Wang, Z.; Zhang, T. GBH-YOLOv5: Ghost Convolution with BottleneckCSP and Tiny Target Prediction Head Incorporating YOLOv5 for PV Panel Defect Detection. Electronics 2023, 12, 561. [Google Scholar] [CrossRef]
Zhang, Y.; Zhang, X.; Tu, D. Solar photovoltaic module defect detection based on deep learning. Meas. Sci. Technol. 2024, 35, 125404. [Google Scholar] [CrossRef]
Zhang, J.; Yang, W.; Chen, Y.; Ding, M.; Huang, H.; Wang, B.; Gao, K.; Chen, S.; Du, R. Fast object detection of anomaly photovoltaic (PV) cells using deep neural networks. Appl. Energy 2024, 372, 123759. [Google Scholar] [CrossRef]
Liu, B.; Chen, L.; Sun, K.; Wang, X.; Zhao, J. A Hot Spot Identification Approach for Photovoltaic Module Based on Enhanced U-Net with Squeeze-and-Excitation and VGG19. IEEE Trans. Instrum. Meas. 2024, 73, 3516510. [Google Scholar] [CrossRef]
He, Y.; Sahma, A.; He, X.; Wu, R. FireNet: A Lightweight and Efficient Multi-Scenario Fire Object Detector. Remote Sens. 2024, 16, 4112. [Google Scholar] [CrossRef]
Zhao, R.; Tang, S.H.; Supeni, E.E.B.; Rahim, S.A.; Fan, L. Z-YOLOv8s-based approach for road object recognition in complex traffic scenarios. Alex. Eng. J. 2024, 106, 298–311. [Google Scholar] [CrossRef]
Cao, Y.; Pang, D.; Zhao, Q.; Yan, Y.; Jiang, Y.; Tian, C.; Wang, F.; Li, J. Improved YOLOv8-GD deep learning model for defect detection in electroluminescence images of solar photovoltaic modules. Eng. Appl. Artif. Intell. 2024, 131, 107866. [Google Scholar] [CrossRef]
Su, B.; Zhou, Z.; Chen, H. PVEL-AD: A Large-Scale Open-World Dataset for Photovoltaic Cell Anomaly Detection. IEEE Trans. Ind. Inform. 2023, 19, 404–413. [Google Scholar] [CrossRef]
Yang, Z.; Li, Y.; Han, Q.; Wang, H. A Method for Tomato Ripeness Recognition and Detection Based on an Improved YOLOv8 Model. Horticulturae 2025, 11, 15. [Google Scholar] [CrossRef]
Du, D.; Xie, Y. Vehicle and Pedestrian Detection Algorithm in an Autonomous Driving Scene Based on Improved YOLOv8. J. Transp. Eng. Part A Syst. 2025, 151, 04024095. [Google Scholar] [CrossRef]
Raupov, R. Solar-panel-pv Dataset; RoboFlow Universe: Des Moines, IA, USA, 2023; Available online: https://universe.roboflow.com/ruslan-raupov/solar-panel-pv (accessed on 23 April 2025).
Li, M.; Tang, Y.; Wu, K.; Cheng, H. Autonomous vehicle pollution monitoring: An innovative solution for policy and environmental management. Transp. Res. Part D Transp. Environ. 2025, 139, 104542. [Google Scholar] [CrossRef]

Figure 1. Comparative architecture diagrams of YOLOv8n and BiFDRep-YOLOv8n.

Figure 2. RepViT network structure.

Figure 3. Three different feature fusion structure diagrams.

Figure 4. DyHead-DCNv3 module structure.

Figure 5. Comparison of training loss indicators: (a) training Box Loss; (b) training Classification Loss.

Figure 6. Comparison of P-R curves: (a) precision; (b) recall.

Figure 7. mAP comparison chart: (a) mAP at loU = 0.5; (b) Map for LOU Range 0.5–0.95.

Figure 8. The confusion matrix of (a) YOLOv8n and (b) BiFDRep-YOLOv8n.

Figure 9. Comparison of detection effect before and after algorithm improvement: (a) black core (red boxes); (b) short circuit (pink boxes); (c) thick line (yellow boxes); (d) finger interruption (orange boxes).

Table 1. Defect detection techniques with electrical data.

Ref.	Method	Data Source	Input	Detected Defects	Performance Enhancement
[15]	YOLOv5 triple optimization	Photovoltaic Module EL Images	BiFPN, Coord. Attn., and Decoupled Head	Hidden cracks and broken grids	+2.5% mAP and moderate computational efficiency loss
[16]	DAT + ODConv + RFAConv	EL images of PV defects	Spatial-Channel Dynamic Weights	Black core and chipping	Significant discrimination improvement
[17]	Faster R-CNN + FPN + GA-RPN	Photovoltaic Defect Dataset	FPN and Guided Anchored RPN	Broken grids and thick lines	+11.26% avg. accuracy and optimized candidate frames
[18]	YOLO-PV: PAN + Shallow Features	PV Module EL Image Dataset	PAN and shallow feature extraction	Hidden cracks and broken grids	94.55% mAP@0.5, >35 fps real-time
[19]	YOLOv7 + Ghost Convolution	PV cell crack EL image	Ghost Conv backbone	Flaws	Significant parameter compression and improved inference
[20]	GBH-YOLOv5	EL image dataset	Bottleneck CSP, Tiny Target Branch, and Ghost Conv	Minor defects	+5.3% mAP (27.8%), 42% parameter reduction
[21]	Geometric + Pixel-level Augmentation	PV Defect Classification	Rotation/Flip/Contrast/Noise	Generalization flaws	Enhanced classification robustness
[22]	RMosaic Enhancement	Light changes the EL image	Random contrast + mosaic stitching	Light-sensitive defects	+12% light robustness and reduced generalization error
[23]	SVU-Net (U-Net + VGG19 + SE)	Infrared hot-spot images	Cross-entropy loss + SE module	Hot-spots	98.37% segmentation accuracy
Ours	BiFDRep-YOLOv8n	SOLAR-PANEL-PV dataset	RepViT + BiFPN + DyHead-DCNv3	Multi-scale defects	+2.4% mAP@0.5 (94.6%), +2.1% mAP@0.5:0.95 (72.9%), 35% speed improvement

Table 2. Key method comparisons for PV defect detection.

Comparison Dimension	Existing Methods	Proposed Method (BiFDRep-YOLOv8n)
Backbone	Traditional CNNs/heavy transformers; weak local–global feature integration	Lightweight RepViT (MetaFormer-based): - Dual-path design (Token Mixer for local extraction + Channel Mixer for global modeling) - Structural reparameterization for inference efficiency
Feature Fusion	Unidirectional FPN/PAN; static weights; loss of small-target details	Bidirectional weighted BiFPN: - Dynamic cross-scale feature fusion - Suppresses background noise (grain/probe) for multi-scale defect adaptation
Detection Head	Fixed convolutional kernels, single attention; poor handling of irregular defects	DyHead-DCNv3: - Triple attention (scale/spatial/task awareness) - DCNv3 deformable convolutions for adaptive defect shape fitting
Small-Target Detection	High miss rate (shallow feature loss)	Enhanced via RepViT (shallow detail preservation) + BiFPN (cross-scale information flow)
Key Edge	Isolated module improvements	Holistic optimization for three core challenges: Global–local feature fusion (RepViT) Adaptive multi-scale representation (BiFPN) Irregular defect geometry modeling (DyHead-DCNv3)

Table 3. Comparison of BiFPN with mainstream feature pyramid networks.

Method	Core Design	Advantages	Limitations	Performance on PV Defects (mAP@0.5)
FPN [27]	Unidirectional top-down fusion	Simple structure and low computation	Loses shallow details and poor small-target detection	89.1%
PANet [28]	Bidirectional paths + repeated nodes	Strong multi-scale interaction	Redundant nodes increase FLOPs by 18%	93.1%
BiFPN (Ours)	Adaptive weights + simplified topology	Balances accuracy and efficiency	Limited to single-stage deployment	94.6%

Table 4. Comparison of DyHead-DCNv3 with mainstream detection heads.

Detection Head	Core Design	FLOPs (G)	FPS	mAP@0.5	Limitations
Coupled Head (YOLOv3)	Single output layer for cls + reg	9.2	45	77.7%	Task conflict and poor small-target detection
Decoupled Head (YOLOv5)	Separate the cls/reg branches	10.5	38	85.3%	Fixed receptive field
DyHead-DCNv3 (Ours)	Triple attention + DCNv3	12.3	52	94.6%	Limited to deformable kernels

Table 5. Dataset label distribution.

Instances	Black Core	Short Circuit	Thick Line	Finger Interruption	Total
Train	246	228	236	227	937
Val	26	34	29	30	119
Test	27	31	31	35	124

Table 6. Label distribution.

Parameter Name	Parameter Value	Clarification
Operating system	Windows	The base operating system for experimental runs
Display card (computer)	NVIDIA RTX 3050	Graphics processors for accelerating deep learning computation
Memory	4 GB	Graphics card memory capacity
Deep learning frameworks	CUDA 11.8 + PyTorch 2.3.1	Underlying computing platforms and neural network frameworks
Training cycle	200 rounds	Number of full dataset iterations for model training
Initial learning rate	0.01	Optimizer initial learning rate

Table 7. Comparison results of different models.

Model	RepViT	BiFPN	DyHead-DCNv3	Precision	Recall	mAP@0.5	mAP@0.5:0.95
Yolov8n	—	—	—	0.925	0.904	0.922	0.708
1	√	—	—	0.891	0.912	0.923	0.711
2	—	√	—	0.901	0.862	0.933	0.703
3	√	√	—	0.933	0.886	0.933	0.706
4	—	√	√	0.88	0.916	0.931	0.697
BiFDRep-YOLOv8n (Ours)	√	√	√	0.906	0.94	0.946	0.729

Table 8. Convergence speed comparison of different models.

Model	Epochs to Converge	Final Box Loss	Final Class Loss
YOLOv3-tiny	220	1.45	0.62
YOLOv5n	180	1.20	0.55
YOLOv8n	170	1.00	0.50
BiFDRep-YOLOv8n	150	0.80	0.40

Table 9. Comparison results of different models of SOLAR-PANEL-PV.

Model	Flops/G	Precision/%	Recall/%	mAP@0.5	mAP@0.5:0.95
Faster-RCNN [31]	83.4	74.7	78.5	78.8	47.9
YOLOv3-tiny	12.1	73.8	72.1	77.7	45.0
YOLOv5n	7.1	81.3	81.2	85.3	54.8
YOLOv5s	23.8	79.9	81.7	85.8	56.0
YOLOX-tiny	7.5	80.3	80.7	83.9	51.5
YOLOv7-tiny	13.2	80.1	77.2	83.0	52.6
YOLOv8n	8.1	92.5	90.4	92.2	70.8
BiFDRep-YOLOv8n (ours)	15.5	90.6	94.0	94.6	72.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lu, Y.; Du, C.; Li, X.; Liang, S.; Zhang, Q.; Zhao, Z. A High-Precision Defect Detection Approach Based on BiFDRep-YOLOv8n for Small Target Defects in Photovoltaic Modules. Energies 2025, 18, 2299. https://doi.org/10.3390/en18092299

AMA Style

Lu Y, Du C, Li X, Liang S, Zhang Q, Zhao Z. A High-Precision Defect Detection Approach Based on BiFDRep-YOLOv8n for Small Target Defects in Photovoltaic Modules. Energies. 2025; 18(9):2299. https://doi.org/10.3390/en18092299

Chicago/Turabian Style

Lu, Yi, Chunsong Du, Xu Li, Shaowei Liang, Qian Zhang, and Zhenghui Zhao. 2025. "A High-Precision Defect Detection Approach Based on BiFDRep-YOLOv8n for Small Target Defects in Photovoltaic Modules" Energies 18, no. 9: 2299. https://doi.org/10.3390/en18092299

APA Style

Lu, Y., Du, C., Li, X., Liang, S., Zhang, Q., & Zhao, Z. (2025). A High-Precision Defect Detection Approach Based on BiFDRep-YOLOv8n for Small Target Defects in Photovoltaic Modules. Energies, 18(9), 2299. https://doi.org/10.3390/en18092299

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A High-Precision Defect Detection Approach Based on BiFDRep-YOLOv8n for Small Target Defects in Photovoltaic Modules

Abstract

1. Introduction

2. BiFDRep-YOLOv8n Model for Small Target Defect Detection in PV Modules

2.1. YOLOv8n Infrastructure

2.2. RepViT Module Construction in Improved YOLOv8n

2.3. Bidirectional Weighted Feature Pyramid Network (BiFPN) Construction

2.4. DyHead-DCNv3 Builds

3. Experimental Results and Analysis

3.1. Dataset Construction

3.2. Introduction to the Experimental Environment

3.3. Evaluation Indicators

3.4. Ablation Experiment

3.5. Comparative Experiments Between Different Models

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI