An Improved YOLOv8n Method for Small Thermal Defect Detection of Photovoltaic Modules in UAV Infrared Inspection

He, Tengfei; Mao, Zhongyuan; Zhong, Yuanchang

doi:10.3390/rs18121986

Open AccessArticle

An Improved YOLOv8n Method for Small Thermal Defect Detection of Photovoltaic Modules in UAV Infrared Inspection

by

Tengfei He

,

Zhongyuan Mao

and

Yuanchang Zhong

^*

School of Electronic Engineering, Chongqing University, Chongqing 400044, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(12), 1986; https://doi.org/10.3390/rs18121986 (registering DOI)

Submission received: 9 May 2026 / Revised: 4 June 2026 / Accepted: 9 June 2026 / Published: 15 June 2026

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

A lightweight task-specific YOLOv8n-based detector is proposed for small thermal defect detection of photovoltaic modules in UAV infrared inspection.
The proposed method improves detection accuracy and localization quality while maintaining only 1.03 M parameters, a 2.4 MB model size, and real-time inference performance.

What are the implications of the main findings?

The study demonstrates that task-oriented network redesign is effective for detecting small, weak, and boundary-ambiguous thermal defects in UAV infrared photovoltaic inspection.
The proposed method offers a lightweight and practical solution for real-time photovoltaic module inspection under complex infrared backgrounds.

Abstract

To address missed detections, false alarms, and deployment limitations in thermal defect detection of photovoltaic modules from unmanned aerial vehicle (UAV) infrared images, this paper proposes an improved detection method based on You Only Look Once version 8 nano (YOLOv8n). The proposed method is optimized according to the characteristics of UAV infrared photovoltaic inspection, including small thermal targets, weak and diffuse thermal responses, complex backgrounds, and lightweight deployment requirements. Specifically, a P2 shallow feature layer is introduced to enhance fine-grained feature perception for small thermal defects, while Ghost Convolution (GhostConv) is incorporated into the backbone to reduce model complexity. In addition, C2f-Large Separable Kernel Attention (C2f-LSKA) is embedded in the neck to strengthen contextual and spatial feature modeling under complex infrared backgrounds, and Wise-IoU version 3 (WIoUv3) is adopted to improve bounding box regression and localization stability for boundary-ambiguous thermal anomalies. Experiments are conducted on a self-constructed UAV infrared thermal imaging dataset. From nearly 10,000 inspection images, 3000 representative images are selected and manually annotated, covering typical challenges such as small hot spots, low-contrast defects, complex background interference, and diffuse abnormal temperature-rise regions. Compared with the baseline YOLOv8n, the proposed method improves Precision, Recall, mean average precision at an IoU threshold of 0.5 (mAP@0.5), and mean average precision averaged over IoU thresholds from 0.5 to 0.95 (mAP@0.5:0.95) by 5.1, 11.4, 9.6, and 13.2 percentage points, respectively, while reducing the number of parameters and model size by 65.8% and 61.9%, respectively. These results indicate that the proposed method improves detection accuracy and localization quality under the evaluated UAV infrared inspection setting while maintaining lightweight characteristics.

Keywords:

UAV infrared inspection; photovoltaic modules; thermal defect detection; YOLOv8n; lightweight object detection

1. Introduction

In recent years, driven by the continued advancement of the carbon peaking and carbon neutrality goals as well as the rapid growth of the renewable energy industry, the installed capacity of photovoltaic (PV) power generation has increased substantially. Meanwhile, PV power stations are evolving toward larger-scale, more centralized, and increasingly intelligent operation [1,2,3]. As the core components of PV power generation systems, PV modules are inevitably subjected to manufacturing imperfections, environmental degradation, thermal cycling, partial shading, and electrical mismatch during long-term operation, which may give rise to thermal defects such as local overheating, hot spots, and abnormal temperature rise [4]. These defects not only degrade the power generation efficiency of PV modules and accelerate their performance deterioration, but may also trigger encapsulation aging, local burn damage, and even severe fire hazards [5]. Therefore, the development of rapid, accurate, and non-contact approaches for detecting thermal defects in operating PV modules has become a critical issue in the intelligent operation and maintenance of PV power stations [6].

Conventional inspection of PV power stations mainly relies on manual examination assisted by handheld thermal imagers. However, such an approach typically suffers from low inspection efficiency, high labor intensity, strong dependence on operator experience, and limited coverage, making it difficult to satisfy the practical requirements of high-frequency and high-precision inspection in large-scale PV power stations [7]. In contrast, unmanned aerial vehicle (UAV) platforms offer significant advantages, including high mobility, wide inspection coverage, flexible deployment, and superior operational efficiency. When integrated with infrared thermography, UAVs enable long-range and non-contact inspection of large-scale PV arrays under non-shutdown conditions, allowing rapid acquisition of surface temperature field information from PV modules and thereby facilitating effective identification of hot spots, locally abnormal heating regions, and potentially faulty modules. Therefore, thermal defect detection methods for PV modules in UAV-based infrared inspection scenarios have emerged as an important research direction in the intelligent operation and maintenance of PV power stations [8,9,10].

In recent years, deep learning-based object detection algorithms have been increasingly applied to photovoltaic thermal defect detection. Among them, You Only Look Once (YOLO)-based methods have attracted increasing attention owing to their favorable balance between detection accuracy and inference efficiency [11,12,13]. For example, Hong et al. introduced an infrared image enhancement strategy and an improved lightweight YOLOv8n model to improve the recognition accuracy of solar panel defects in infrared images [14]. However, such methods mainly emphasize image enhancement and lightweight detection, while the joint treatment of small thermal targets, blurred boundaries, and complex UAV inspection backgrounds remains insufficient. Xie et al. proposed ST-YOLO based on YOLOv8s for photovoltaic module defect detection using infrared thermal imaging, improving feature extraction through structural modifications [15]. Nevertheless, the repeated downsampling and generic feature-fusion process in YOLO-based detectors may still weaken fine-grained thermal cues that are critical for small hot spots and locally abnormal temperature-rise regions. More recently, Ma et al. developed LFS-YOLO for PV panel defect detection using UAV infrared sensors, considering practical issues such as defect morphology variation, unclear boundary features, and small-target defects [16]. Although these studies have promoted the development of automatic PV thermal defect detection, existing methods still face challenges in simultaneously preserving shallow small-target features, suppressing false alarms caused by repetitive PV array backgrounds and thermal noise, improving localization stability for boundary-ambiguous thermal anomalies, and maintaining model compactness for UAV inspection scenarios. In addition, prompt learning has recently shown potential in remote sensing interpretation tasks, including prompt-guided instance segmentation and change captioning. These studies indicate that prompt-based mechanisms can improve task adaptation and semantic guidance in complex remote sensing scenes. However, their application to UAV-based infrared PV thermal defect detection remains relatively underexplored, especially under lightweight deployment constraints [17,18]. Therefore, it is of considerable theoretical significance and practical value to investigate a task-oriented lightweight detection framework for UAV-based infrared photovoltaic inspection [19,20,21].

To address the above challenges, this paper proposes an improved YOLOv8n-based detection algorithm for thermal defect detection of PV modules in UAV-based infrared inspection scenarios. The proposed method is intended to enhance the feature representation capability for small thermal defect targets, strengthen feature extraction and discrimination under complex backgrounds, and reduce model complexity while maintaining high detection performance. To this end, targeted optimizations are conducted at both the network architecture and loss function levels, thereby achieving a coordinated improvement in detection accuracy and inference efficiency. Experimental results demonstrate that the proposed method effectively improves the detection performance of infrared thermal defects and can provide reliable technical support for the intelligent and automated operation and maintenance of PV power stations.

The main contribution of this work lies in developing a task-specific lightweight detection framework for UAV-based infrared thermal defect inspection of photovoltaic modules. Different from directly applying a generic object detector to infrared PV images, the proposed framework is optimized according to the coupled characteristics of this task, including small-scale thermal defects, weak and diffuse thermal responses, repetitive PV array backgrounds, and deployment constraints on UAV inspection platforms. Specifically, the detection scale configuration, lightweight feature extraction path, contextual feature-fusion module, and bounding-box regression loss are jointly adapted to improve small-target perception, background robustness, localization stability, and model compactness. In addition, an expanded UAV infrared PV thermal defect dataset with representative challenging cases is constructed, and the effectiveness of the proposed framework is validated through ablation studies, comparisons with recent lightweight detectors, stricter mAP@0.5:0.95 evaluation, repeated-run statistics, and side-by-side qualitative comparisons.

The main contributions of this paper are summarized as follows:

(1) A task-specific lightweight detection framework is developed for UAV-based infrared thermal defect inspection of photovoltaic modules. The framework is designed to jointly address small-target thermal defect perception, recognition robustness under repetitive PV array backgrounds, localization stability for boundary-ambiguous anomalies, and lightweight deployment requirements.

(2) A multi-scale detection structure is redesigned for small thermal defects by introducing a P2 shallow feature detection branch and removing the redundant large-target branch under the present UAV inspection setting. This design enhances fine-grained thermal feature perception while reducing unnecessary computational overhead for the dominant defect scales in the constructed dataset.

(3) A lightweight and context-enhanced feature representation strategy is constructed by incorporating Ghost Convolution (GhostConv) into the backbone and C2f-Large Separable Kernel Attention (C2f-LSKA) into the neck feature-fusion stage. This design reduces model complexity while strengthening contextual and spatial feature discrimination under complex infrared backgrounds.

(4) A more rigorous experimental validation protocol is established by expanding the annotated UAV infrared PV thermal defect dataset to 3000 images, covering representative challenging cases such as small hot spots, low-contrast defects, complex background interference, and diffuse abnormal temperature-rise regions. Extensive ablation experiments, comparisons with recent lightweight detectors, mAP@0.5:0.95 evaluation, repeated-run statistics, and side-by-side qualitative results are provided to validate the effectiveness and stability of the proposed method.

2. YOLOv8 Network Architecture

YOLOv8 represents the latest advancement in the YOLO series of object detection algorithms. Building upon the strengths of previous versions, it further improves detection performance and inference speed through a series of structural optimizations and functional enhancements. To accommodate diverse application requirements, YOLOv8 provides five model variants, namely N, S, M, L, and X, according to different scaling coefficients [22]. According to the official Ultralytics implementation, YOLOv8 adopts advanced backbone and neck architectures together with an anchor-free split head. In the YOLOv8n configuration adopted in this study, the backbone is mainly composed of Conv, C2f, and Spatial Pyramid Pooling-Fast (SPPF) modules, the neck performs multi-scale feature fusion through upsampling, concatenation, and C2f blocks, and the head outputs predictions at three scales through the Detect layer [23]. As illustrated in Figure 1, the adopted YOLOv8n baseline progressively generates feature maps from P1 to P5 and performs detection on the P3, P4, and P5 feature levels.

In the backbone stage, the YOLOv8n configuration adopted in this study mainly uses stacked Conv and C2f modules together with an SPPF module for hierarchical feature extraction. Through progressive downsampling, the input image is transformed into multi-scale feature maps, enabling the network to capture objects of different sizes while maintaining a balance between detection accuracy and computational efficiency. After five downsampling operations, the input image is transformed into five feature maps at different scales, denoted as P1 to P5, allowing the model to effectively perceive and detect objects of various sizes. Compared with YOLOv5, YOLOv8 introduces more refined architectural optimization by replacing the original C3 module with the C2f structure. Through richer skip connections and feature split operations, the C2f module enhances gradient flow propagation, thereby enabling the model to learn and preserve critical feature information more effectively. In addition, a SPPF module is deployed at the end of the backbone. By sequentially stacking three max-pooling layers, the SPPF module captures multi-scale receptive field information. Compared with the conventional SPP structure, SPPF reduces computational cost and inference latency while maintaining the diversity and richness of feature representation, thereby improving the adaptability and detection accuracy of the model for input features of different scales.

In the neck stage, YOLOv8 adopts a Path Aggregation Network–Feature Pyramid Network (PAN-FPN) architecture [24] to fuse features from different hierarchical levels. This structure enhances the interaction between shallow detailed features and deep semantic features, optimizes the information flow across feature layers, and consequently improves the detection performance for multi-scale targets.

In the head stage, YOLOv8 employs a decoupled head design, which separates the classification task from the bounding box regression task, thereby improving detection accuracy and training stability. Moreover, YOLOv8 supports multi-scale prediction based on feature maps with downsampling factors of 8, 16, and 32, further enhancing its flexibility and accuracy in detecting targets of different sizes.

3. Improved Network Design

The improved network is designed according to the specific visual and deployment characteristics of UAV-based infrared PV inspection rather than by simply stacking independent modules. The overall design follows four task-driven objectives: preserving fine-grained thermal cues for small defects, reducing redundant computation for lightweight deployment, enhancing contextual discrimination under repetitive PV array backgrounds, and improving localization stability for boundary-ambiguous thermal anomalies. The proposed improvements are integrated into YOLOv8n in a structured manner at different stages of the network. Specifically, a P2 shallow feature layer is introduced into the detection head to enhance small-target thermal defect perception, while the original P5 large-target detection branch is removed as a task-specific structural simplification under the present UAV infrared inspection setting. As a result, the improved detector performs prediction on the P2, P3, and P4 feature levels. GhostConv is introduced into the backbone to reduce redundant computation and improve deployment efficiency. In the neck, part of the original C2f blocks are replaced by the proposed C2f-LSKA blocks to enhance contextual representation under complex backgrounds. In addition, the original bounding-box regression loss is replaced with Wise-IoU version 3 (WIoUv3) during training. The overall integration of these components into the adopted YOLOv8n baseline is illustrated in Table 1. In this design, the overall feature-extraction, feature-fusion, and prediction pipeline of YOLOv8n is retained, while the modifications are limited to the task-specific detection scale configuration, lightweight convolution replacement, neck-stage contextual enhancement, and bounding-box regression loss.

3.1. P2-Based Detection Head for Small Thermal Defects

In UAV-based infrared inspection scenarios for PV modules, thermal defect targets are generally manifested only as local high-temperature bright spots or anomalous temperature-rise regions due to variations in flight altitude, imaging perspective, and the projected scale of modules in the captured images. As a result, these targets usually exhibit characteristics such as small size, blurred boundaries, and irregular shapes. In the original YOLOv8 model, the input image is processed through multiple downsampling operations to generate five feature layers at different scales, namely P1, P2, P3, P4, and P5, while three detection heads are mainly constructed based on P3, P4, and P5, corresponding to feature maps of 80 × 80, 40 × 40, and 20 × 20, respectively. However, for small thermal defects in UAV-based infrared inspection images, repeated downsampling tends to cause the loss of shallow positional information and fine-grained thermal features, thereby limiting the capability of the original three-scale detection head structure to adequately perceive and accurately localize such targets.

To address the above issue, a P2-based detection head with a feature resolution of 160 × 160 is further introduced into the original network architecture. This design enables the model to better preserve and exploit the richer spatial positional cues and fine-grained local thermal anomaly details embedded in shallow features. As a result, the proposed network achieves improved representation and discrimination of small-scale hot spots and locally abnormal temperature-rise regions in infrared images, thereby alleviating the deficiency of the original YOLOv8 in small-target thermal defect detection. The architecture of the improved detection head with the P2 branch is illustrated in Figure 2.

The improved detection structure performs prediction on three feature levels, corresponding to P2, P3, and P4 with feature resolutions of 160 × 160, 80 × 80, and 40 × 40, respectively. Compared with the original YOLOv8n detection head based on P3, P4, and P5, the proposed structure introduces the high-resolution P2 branch to strengthen small-target thermal defect perception and removes the original P5 branch to reduce unnecessary computation for the dominant defect scales in the constructed dataset. The 160 × 160 P2 branch is specifically used to preserve fine-grained shallow thermal cues, while the P3 and P4 branches provide complementary feature representations for small- and medium-scale thermal anomalies. It should be noted that not all thermal anomalies in UAV infrared PV inspection are strictly small targets. Diffuse abnormal temperature-rise regions may also appear in some images. However, under the current acquisition condition with an approximate flight altitude of 100 m, most annotated thermal defects are projected as small or medium-scale regions in the resized 640 × 640 images. Therefore, the removal of the original P5 large-target detection branch is adopted as a task-specific structural simplification rather than a universal design choice for all inspection scenarios. The purpose is to allocate more representation capacity to shallow and middle-level features that are more relevant to the dominant defect scales in the constructed dataset, while reducing unnecessary computational overhead. The ablation results further show that this simplification does not impair the overall detection performance under the evaluated dataset, although its applicability to scenarios with a higher proportion of large-area thermal anomalies should be further verified.

3.2. GhostConv Module

GhostConv, proposed by Huawei in 2020, was first introduced in the GhostNet architecture [25]. Its core idea is to generate only a portion of the intrinsic feature maps through standard convolution and then produce additional representative feature maps by using inexpensive linear transformations, thereby reducing model parameters and computational cost while preserving feature representation capability.

As illustrated in Figure 3, the input feature map is first processed by a set of standard convolution kernels to generate part of the output feature maps. Subsequently, inexpensive linear operations are applied to these feature maps to produce another part of the output features. In practice, such inexpensive operations can be implemented by depthwise convolutions or small-kernel convolutions. In this manner, each original feature map can be used to derive multiple additional feature maps. Although the newly generated feature maps differ from the original ones in terms of information representation, they are complementary in the channel dimension and jointly enhance the model’s capability to represent the input data. Finally, the feature maps produced by standard convolution and those generated by inexpensive operations are concatenated along the channel dimension through a Concat module to form the final output feature map. Owing to this design, GhostConv effectively reduces the number of model parameters and computational complexity while maintaining strong feature representation capability, making it particularly suitable for deployment scenarios with limited computational resources, such as mobile and embedded devices. Therefore, the motivation of introducing GhostConv is to improve deployment efficiency by reducing redundant computation while preserving effective feature representation.

Assume that the kernel size of the standard convolution is k × k, the spatial resolution of the input feature map is H × W, and the numbers of input and output channels are C_in and C_out, respectively. Under these definitions, the computational complexity of GhostConv throughout the entire feature generation process can be formulated as

F L O P s_{c o n v} = k \times k \times C_{in} \times H \times W \times \frac{C_{out}}{2}

(1)

F L O P s_{C h e a p} = H \times W \times \frac{C_{o u t}}{2}

(2)

F L O P s_{G h o s t c o n v} = F L O P s_{c o n v} + F L O P s_{C h e a p}

(3)

where FLOPs_conv denotes the computational cost of generating half of the feature maps by standard convolution in the first stage, and FLOPs_Cheap denotes the computational cost of generating the other half through inexpensive operations in the second stage.

If standard convolution is directly applied to the input feature map to generate output feature maps of the same size as those produced by GhostConv, the computational cost of the entire convolution process can be expressed as follows:

F L O P s_{C o n v} = k \times k \times C_{in} \times H \times W \times C_{out}

(4)

Based on Equations (3) and (4), the computational cost ratio of GhostConv to standard convolution can be further derived as follows:

\frac{F L O P s_{G h o s t C o n v}}{F L O P s_{C o n v}} = \frac{k^{2} + 1}{2 k^{2}}

(5)

When the convolution kernel size is 3 × 3, it can be further observed that the computational cost of GhostConv in feature extraction is only approximately 50–60% of that of standard convolution, indicating a substantial reduction in computational overhead.

In the adopted implementation, GhostConv is used to replace the corresponding standard convolution operations in the lightweight backbone path, while keeping the overall feature extraction pipeline of YOLOv8n unchanged. This design reduces computational burden without altering the basic multi-scale detection framework.

3.3. C2f-LSKA Module

During UAV-based infrared inspection of PV modules, the detection of thermal defect targets is highly susceptible to complex background interference, including ground textures, shadow effects, repetitive array structures of PV modules, and infrared thermal noise. In addition, the task is further challenged by significant target-scale variations and blurred boundaries of thermal spots. To address these issues, a Large Separable Kernel Attention (LSKA) module is introduced in this study. By leveraging separable large-kernel convolutions, the LSKA module enhances the modeling of correlations among different regions of the feature map, thereby improving the representation capability of thermal defect targets in complex scenes while controlling the number of parameters and computational complexity.

Specifically, the LSKA module decomposes a large-kernel convolution into three components, namely depthwise convolution (DW-Conv), depthwise dilated convolution (DW-D-Conv), and channel convolution (1 × 1 convolution). Subsequently, the two-dimensional kernels in the depthwise convolution and depthwise dilated convolution are further factorized into two one-dimensional kernels, where an N × 1 kernel is used to perform convolution along the vertical direction and a 1 × N kernel is used to perform convolution along the horizontal direction. Through this decomposition strategy, the module achieves an effect approximately equivalent to that of large-kernel convolution while effectively reducing parameter count and computational overhead and improving computational efficiency, as illustrated in Figure 4. This design not only significantly lowers computational complexity, but also enables the model to capture richer long-range spatial dependency information from the input feature maps, thereby enhancing the recognition capability for thermal defect targets under complex backgrounds.

To achieve an effective integration of the C2f structure and the LSKA module, an LSKA structure is designed in this study, as illustrated in Figure 5. Specifically, the input feature map is first normalized by batch normalization, followed by nonlinear activation using the GELU function, and is then fed into the LSKA module for feature enhancement. The resulting features are finally fused with the original input feature map through a residual connection. Notably, two 1 × 1 convolution layers are introduced at the two ends of the LSKA module. The first one is used to reduce the channel dimension of the feature map, while the second one restores the channel dimension to match that of the input. This design reduces the number of model parameters and computational overhead while further enhancing the feature representation capability.

In this study, the Bottleneck units in the original C2f module are replaced with the proposed LSKA structure, thereby forming a new C2f-LSKA module, as illustrated in Figure 6. The original C2f module mainly focuses on capturing local features from the input data. However, its limitation lies in its emphasis on modeling only small local regions, making it difficult to establish sufficient correlations among different regions of the feature map. As a consequence, its feature representation capability and robustness may be constrained in complex visual scenarios. In contrast, the proposed C2f-LSKA module integrates the C2f structure with the LSKA mechanism, which significantly enhances the model’s ability to capture contextual semantic information, long-range dependencies, and spatial structural characteristics. As a result, the proposed module improves both the recognition robustness and the feature representation capability for infrared thermal defect targets under complex backgrounds. Thus, the proposed C2f-LSKA module is motivated by the need to enhance contextual modeling and robustness in complex UAV infrared inspection backgrounds.

In the adopted YOLOv8n architecture, the proposed C2f-LSKA blocks are deployed in the neck feature-fusion stage by partially replacing the original C2f modules. This partial replacement strategy is adopted to balance contextual feature enhancement and lightweight deployment. Specifically, C2f-LSKA is introduced into selected neck positions to strengthen long-range dependency modeling and background discrimination, while the remaining original C2f modules are retained to preserve efficient local feature extraction and feature reuse. In contrast, full replacement may introduce additional computational burden without necessarily providing a better accuracy-efficiency trade-off. Therefore, partial replacement is adopted in the final model, and its effectiveness is further verified by the comparison with full neck replacement in the ablation study.

3.4. Loss Function Improvement

A well-designed loss function plays a crucial role in improving the overall performance of an object detection model. In the original YOLOv8 framework, bounding box regression is optimized by a combination of Distribution Focal Loss (DFL) and Complete Intersection over Union (CIoU) loss. Among them, CIoU constrains the bounding box prediction process by jointly considering the center-point distance between the predicted box and the ground-truth box, the overlap region, and the aspect ratio discrepancy. However, in the task of small thermal defect detection from UAV-based infrared inspection images, CIoU still exhibits several limitations.

First, thermal defects are typically manifested only as localized high-temperature anomalous regions with small target scales, making slight variations in IoU insufficiently sensitive to the actual localization error. Second, thermal spots often exhibit irregular shapes and blurred boundaries, which limits the adaptability of CIoU to such geometric variations. Third, the balancing term in CIoU is formulated in a fixed manner, making it difficult to dynamically adapt the optimization emphasis to thermal defect samples of different scales and quality levels. As a result, its effectiveness in complex scenarios may be constrained. The corresponding formulations are given in Equations (6)–(8):

L_{CIoU} = L_{IoU} + \frac{ρ (b, b^{g t})}{c^{2}} + α υ

(6)

L_{IoU} = 1 - I o U

(7)

υ = \frac{4}{π^{2}} {(\arctan \frac{w^{g t}}{h^{g t}} - \arctan \frac{w}{h})}^{2}

(8)

The definitions of the involved parameters are illustrated in Figure 7. Specifically, IoU denotes the intersection over union between the predicted box and the ground-truth box; ρ(b,b_gt) represents the Euclidean distance between the center points of the predicted box b and the ground-truth box b_gt, and c denotes the diagonal length of the minimum enclosing box covering both boxes; α is a balancing factor used to weight the penalty term for shape mismatch; v is used to measure the similarity in aspect ratio between the predicted box and the ground-truth box; h and w denote the height and width of the predicted box, respectively; h_gt and w_gt denote the height and width of the ground-truth box, respectively.

In this study, the WIoUv3 is adopted as the bounding box regression loss [26]. Compared with CIoU, WIoUv3 first introduces a dynamic non-monotonic focusing mechanism, which enables the model to pay greater attention to localization quality during training and thus improves its sensitivity to small targets. In addition, WIoUv3 appropriately reduces the emphasis on low-quality anchor boxes in the later stage of training, thereby alleviating the influence of harmful gradients on parameter updates and improving both training efficiency and optimization stability. Furthermore, WIoUv3 places greater emphasis on the accuracy of center position regression, which is beneficial for improving the localization precision of small-scale thermal defect targets.

For the UAV-based infrared photovoltaic inspection task investigated in this study, these characteristics of WIoUv3 enable the model to more accurately regress the locations and boundaries of small-scale thermal defect targets while reducing the interference of low-quality samples during training. As a result, the proposed method further improves localization accuracy, training stability, and generalization performance in infrared thermal defect detection. The corresponding formulations are given in Equations (9)–(11).

L_{WIoUv 3} = (1 - \frac{W_{i} H_{i}}{S_{u}}) e x p (\frac{{(x - x_{g t})}^{2} + {(y - y_{g t})}^{2}}{{(W_{g}^{2} + H_{g}^{2})}^{*}}) γ

(9)

γ = \frac{β}{δ α^{β - δ}}

(10)

β = \frac{L_{IoU}^{*}}{{\bar{L}}_{IoU}} \in [0, + \infty)

(11)

where β denotes the outlier degree of the anchor box, with a smaller value indicating a higher-quality anchor box; γ is the non-monotonic focusing coefficient used to suppress the interference of low-quality anchor boxes during training; δ is a hyperparameter introduced to regulate the effect of β; (x,y) and (x_gt,y_gt) denote the center coordinates of the predicted box and the ground-truth box, respectively; W_g and H_g represent the width and height of the minimum enclosing box covering both the predicted box and the ground-truth box, respectively; S_u denotes the area of the non-overlapping region between the predicted box and the ground-truth box;

L_{I o U}^{*}

denotes the IoU loss of the current anchor box, and

{\bar{L}}_{I o U}

denotes the mean IoU loss.

For the infrared thermal defect detection task, the dynamic non-monotonic focusing mechanism of WIoUv3 is beneficial for handling small targets with blurred boundaries. In UAV infrared PV images, weak hot spots and diffuse abnormal temperature-rise regions often exhibit fuzzy thermal transitions, and extremely low-quality predicted boxes may introduce unstable regression gradients. WIoUv3 uses the outlier degree β to adaptively adjust the regression weight in a non-monotonic manner, so that ordinary-quality samples with useful localization information receive more attention, whereas extremely low-quality samples caused by ambiguous boundaries or thermal noise are relatively suppressed. This helps improve localization stability for boundary-ambiguous thermal defects. In this study, the focusing parameters were set as γ = 1.9 and δ = 3.0, where γ controls the strength of the non-monotonic focusing effect and δ regulates the influence of β on the regression weight. This setting provides a moderate focusing effect and was kept fixed in all experiments to ensure fair comparison among different ablation variants. During training, WIoUv3 is incorporated by replacing the original CIoU-based bounding-box regression loss in YOLOv8, while the classification-related loss, distribution focal loss, optimizer, training schedule, and other settings remain consistent with the adopted YOLOv8n baseline unless otherwise specified.

In summary, targeted improvements are introduced into the backbone, neck, and head of the YOLOv8 network in this study, resulting in an enhanced model for thermal defect detection in UAV-based infrared inspection scenarios. The overall architecture of the improved network is illustrated in Figure 8. Accordingly, the adoption of WIoUv3 is motivated by the need for more stable localization learning for small thermal defects with ambiguous boundaries.

4. Experimental Results and Analysis

4.1. Dataset Construction and Experimental Settings

In this study, a self-constructed UAV-based infrared thermography dataset is employed for experimental validation. The original dataset contains nearly 10,000 inspection images, all of which were acquired by a DJI Mavic 3T UAV (DJI, Shenzhen, China) equipped with an infrared thermal imaging camera at an approximate flight altitude of 100m. To ensure consistency in model input, all images were resized to a unified input resolution of 640 × 640.

The constructed dataset mainly contains typical thermal defect targets on PV module surfaces, including hot spots and locally abnormal temperature-rise regions. From the original inspection images, 3000 representative images were selected and manually annotated by the research team to establish an infrared thermal defect detection dataset for PV arrays. Among them, 2400 images were used for model training, 300 images for validation, and 300 images for testing. All annotations were uniformly converted into the normalized cx, cy, w, h format. Examples of UAV-based infrared inspection images and their corresponding thermal defect annotations are shown in Figure 9. In addition, the annotated dataset covers several representative challenging cases commonly encountered in UAV-based infrared PV inspection, including 1270 images with small hot spots, 820 images with low-contrast thermal defects, 1040 images involving complex background interference, and 710 images with diffuse abnormal temperature-rise regions. These categories are not mutually exclusive, since one image may contain multiple challenging characteristics. The attribute definitions and corresponding annotation rules are summarized in Table 2. It should be noted that the dataset used in this study was collected using the same UAV platform and infrared camera under a relatively consistent flight altitude. The annotated subset was selected from the original inspection images with the aim of balancing annotation cost and sample representativeness. In particular, the selected images cover typical thermal defect targets such as hot spots and locally abnormal temperature-rise regions, together with certain variations in defect scale, target location, and background complexity encountered in UAV-based infrared inspection scenarios. Therefore, the constructed dataset is sufficient to support a comparative evaluation of different detection models under representative task conditions. It should also be noted that the current dataset was collected using a single UAV platform and an infrared camera under a relatively consistent flight altitude of approximately 100 m. Therefore, the experimental results mainly reflect the detection performance under the investigated UAV infrared inspection setting. Although the selected images cover representative thermal defect patterns and background conditions observed in the collected inspection data, the cross-platform and cross-scene generalization capability of the proposed detector still requires further validation using data acquired by different UAV platforms, infrared sensors, flight altitudes, seasons, and PV power stations.

The hardware platform and experimental environment settings used for model training in this study are summarized in Table 3.

4.2. Evaluation Metrics

To comprehensively evaluate the detection performance of the proposed improved model, Precision (P), Recall (R), mean average precision at an IoU threshold of 0.5 (mAP@0.5), mean average precision averaged over IoU thresholds from 0.5 to 0.95 (mAP@0.5:0.95), the number of parameters, model size, giga floating-point operations (GFLOPs), and frames per second (FPS) are adopted as evaluation metrics in this study. Among them, Precision is used to measure the proportion of samples predicted as positive that are actually positive, and its formulation is given in Equation (12).

P = \frac{T P}{T P + F P}

(12)

where TP denotes the number of true positives, i.e., the targets correctly detected by the model, and FP denotes the number of false positives, i.e., the non-target samples that are incorrectly identified as targets.

Recall is used to measure the proportion of actual positive samples that are correctly detected by the model, and its formulation is given in Equation (13):

R = \frac{T P}{T P + F N}

(13)

where FN denotes the number of false negatives, i.e., the targets that are actually present but are not correctly detected by the model.

The mAP represents the mean value of Average Precision (AP) over all categories and is obtained by averaging the precision over different recall levels. In this study, both mAP@0.5 and mAP@0.5:0.95 are adopted for performance evaluation. Among them, mAP@0.5 is used as the primary metric to reflect the overall detection effectiveness for thermal defect targets with small scales and ambiguous boundaries, while mAP@0.5:0.95 is further introduced as a stricter metric to evaluate localization performance under multiple IoU thresholds. In this way, the evaluation protocol can more comprehensively reflect both the detection capability and the localization quality of the proposed method for UAV-based infrared thermal defect detection. AP is used to measure the overall precision performance of the model under different recall thresholds, and its formulation is given in Equation (14).

A P = \int_{0}^{1} p (r) d r

(14)

The mAP is calculated as shown in Equation (15).

m A P = \frac{1}{N} \sum_{i = 1}^{N} A P_{i}

(15)

where N denotes the total number of categories, and AP_i denotes the AP of the i-th category.

In addition to the number of parameters and model size, GFLOPs are further adopted to quantify the theoretical computational complexity of the model. All GFLOPs values are calculated under the same input resolution of 640 × 640 to ensure a fair comparison among different models.

4.3. Test Results and Analysis

4.3.1. Ablation Study

To explicitly evaluate the motivation and contribution of each proposed component, ablation experiments were designed and conducted on the self-constructed dataset. The original YOLOv8n was first adopted as the baseline model (Model 1).

On this basis, the P2-based detection head, the removal of the large-target detection head, GhostConv, C2f-LSKA, and WIoUv3 were successively introduced to construct eight improved variants. In this way, the effect of each modification on detection accuracy, model complexity, and deployment efficiency can be isolated and quantitatively analyzed, as summarized in Table 4.

In Table 4, Model 1 denotes the original YOLOv8n algorithm. Model 2 introduces an additional extra-small target detection head on the basis of the baseline model, resulting in improvements of 2.0%, 9.0%, 7.9%, and 9.9% in Precision, Recall, mAP@0.5, and mAP@0.5:0.95, respectively. To independently validate the effect of GhostConv under the complete detection-head setting, Model 3 further incorporates GhostConv while retaining the large-target detection head. Compared with Model 2, Model 3 improves Precision, Recall, mAP@0.5, and mAP@0.5:0.95 from 89.2%, 88.5%, 92.2%, and 69.0% to 89.5%, 88.7%, 92.5%, and 69.7%, respectively. Meanwhile, the number of parameters, model size, and GFLOPs are reduced by 30.7%, 29.7%, and 26.2%, respectively, and the FPS increases from 115 to 118. These results indicate that GhostConv can independently reduce model complexity while maintaining stable detection performance when the large-target detection head is retained. After removing the large-target detection head, Model 4 achieves Precision, Recall, mAP@0.5, and mAP@0.5:0.95 of 87.4%, 87.3%, 90.9%, and 66.6%, respectively. Meanwhile, the number of parameters and model size are reduced by 40.7% and 39.7%, respectively. This result suggests that, under the current UAV infrared inspection setting with an acquisition altitude of approximately 100 m, removing the large-target detection head does not impair the detection of the dominant thermal defect targets in the dataset, while it effectively reduces model complexity. Therefore, this modification should be understood as a task-specific trade-off for the present scenario rather than a universally optimal design for all target scales. Based on Model 4, Model 5 is further improved by introducing GhostConv. Compared with Model 4, its Precision, Recall, mAP@0.5, and mAP@0.5:0.95 increase by 2.3%, 2.2%, 1.8%, and 3.5%, respectively, while the number of parameters and model size are reduced by 31.4% and 26.3%, respectively. Model 6 introduces C2f-LSKA on the basis of Model 4. Compared with Model 4, its Precision, Recall, mAP@0.5, and mAP@0.5:0.95 increase by 2.2%, 1.1%, 1.2%, and 2.1%, respectively, while the number of parameters and model size are reduced by 21.3% and 18.4%, respectively. To further examine the C2f-LSKA replacement strategy, Model 7 adopts full neck replacement; however, it achieves comparable detection performance to Model 6 while increasing computational cost and reducing inference speed, indicating that partial replacement provides a more balanced strategy. Model 8 incorporates both GhostConv and C2f-LSKA simultaneously. Relative to Model 4, its Precision, Recall, mAP@0.5, and mAP@0.5:0.95 are improved by 3.9%, 1.8%, 1.9%, and 3.9%, respectively, while the number of parameters and model size are reduced by 42.3% and 36.8%, respectively.

On this basis, Model 9 further replaces the bounding-box regression loss with WIoUv3, thereby yielding the final model proposed in this study. Its Precision, Recall, mAP@0.5, and mAP@0.5:0.95 reach 92.3%, 90.9%, 93.9 ± 0.2%, and 72.3 ± 0.3%, respectively, achieving the best detection performance among all ablation variants. Compared with Model 8, Model 9 improves mAP@0.5 and mAP@0.5:0.95 by 1.1 and 1.8 percentage points, respectively, which indicates that WIoUv3 contributes not only to overall detection performance but also to localization quality under stricter IoU thresholds. From the perspective of computational complexity, the introduction of the P2-based detection head increases the GFLOPs from 8.6 to 12.2, whereas the removal of the large-target detection head and the use of GhostConv reduce the computational burden. As a result, the final model still maintains a relatively low computational complexity of 7.1 GFLOPs while achieving the best detection performance. Although the proposed improvements introduce additional feature extraction and fusion operations, Model 9 still achieves an inference speed of 121 FPS, which remains sufficient for real-time deployment in practical UAV infrared inspection scenarios. The normalized comparison of the evaluation metrics between Model 9 and Model 1 is shown in Figure 10. Specifically, P, R, mAP@0.5, and mAP@0.5:0.95 are improved by 5.1%, 11.4%, 9.6%, and 13.2%, respectively, while the number of parameters and model size are reduced by 65.8% and 61.9%, respectively. Figure 11 further presents the variation curves of Precision and mAP@0.5 during the training process for the proposed model and the original YOLOv8n model. It can be observed that the proposed model obtains higher final Precision and mAP@0.5 values than the baseline model. These results further demonstrate the effectiveness of the proposed improvements in enhancing detection accuracy while preserving lightweight characteristics and low computational complexity. The repeated-run results show that the standard deviations remain small and the relative ranking of the ablation models is generally consistent across different random seeds.

4.3.2. Comparative Experiments

As shown in Table 5, the proposed method achieves competitive detection performance among the compared models, with an mAP@0.5 of 93.9 ± 0.2% and an mAP@0.5:0.95 of 72.3 ± 0.3%. Compared with YOLOv8n, the proposed method improves mAP@0.5 and mAP@0.5:0.95 by 9.6 and 13.2 percentage points, respectively. Compared with the recent lightweight detectors YOLOv10n and YOLO11n, the proposed method improves mAP@0.5 by 7.4 and 6.6 percentage points, respectively, and improves mAP@0.5:0.95 by 9.9 and 9.2 percentage points, respectively. Compared with the domain-specific YOLOS-PV baseline, the proposed method achieves slightly lower mAP values but substantially reduces the number of parameters, model size, GFLOPs, and improves inference speed. These results indicate that the proposed method provides a favorable balance between detection accuracy, localization quality, and lightweight deployment efficiency.

In terms of model complexity, the proposed method contains only 1,029,853 parameters and has a model size of 2.4 MB, which are substantially lower than those of most compared detectors. Although the proposed method was evaluated on the same RTX 2080 Ti GPU platform as the compared models and achieved an inference speed of 121 FPS, this result should be interpreted as a reference throughput under the adopted experimental environment rather than a direct onboard deployment speed. In practical UAV inspection systems, the actual inference speed may be affected by the onboard edge device, power budget, memory capacity, image transmission pipeline, and inference engine optimization. Nevertheless, the relatively small model size, low parameter count, and moderate GFLOPs indicate that the proposed detector has favorable potential for subsequent edge-device deployment.

To further provide an intuitive comparison of the detection behavior of different models, representative qualitative detection results are shown in Figure 12. For clarity, five representative detectors are selected for visualization, including Faster R-CNN, YOLOv8n, YOLOv10n, YOLO11n, and the proposed method, while the complete quantitative comparison of all evaluated models is reported in Table 5. The selected UAV infrared inspection images cover several challenging cases, including small hot spots, diffuse abnormal temperature-rise regions, complex background interference, and low-contrast thermal defects.

As shown in Figure 12, different models exhibit distinct detection responses under representative UAV infrared inspection scenarios. In the small-hot-spot case, Faster R-CNN produces relatively lower-confidence predictions, whereas the YOLO-based detectors achieve more stable responses. Compared with YOLOv8n, YOLOv10n, and YOLO11n, the proposed method provides slightly higher confidence responses for the small thermal defect regions, indicating improved sensitivity to fine-grained thermal anomalies. In the diffuse abnormal temperature-rise case, all compared models can detect the main abnormal regions to some extent, but the proposed method maintains a more stable response for boundary-ambiguous thermal anomalies. For the complex-background case, the repetitive photovoltaic module textures and surrounding thermal interference may affect the detection responses of baseline detectors, whereas the proposed method still maintains reliable detection results. In the low-contrast thermal-defect case, the proposed method provides a higher-confidence response than Faster R-CNN and YOLOv8n, suggesting better robustness to weak thermal anomaly patterns.

Overall, the side-by-side qualitative comparison demonstrates that the proposed method not only improves the quantitative detection metrics but also provides more stable visual detection responses in representative UAV infrared inspection scenarios involving small targets, diffuse boundaries, complex backgrounds, and low-contrast defects.

It should be noted that the runtime evaluation in this study was conducted under a unified GPU-based experimental platform rather than a dedicated onboard or edge deployment device. Therefore, the reported FPS values are mainly used for fair relative comparison among different models under the same test condition. Nevertheless, the proposed method still achieves 121 FPS while maintaining high detection accuracy and low model complexity, which indicates favorable real-time deployment potential for UAV-based infrared inspection scenarios. In future work, dedicated runtime validation on practical edge or embedded deployment platforms will be further carried out.

5. Conclusions

This paper proposes an improved YOLOv8n-based target detection algorithm for surface thermal defects of PV modules. First, a new extra-small target detection layer is introduced into the original three-scale detection framework, while the largest-scale detection layer is removed, thereby improving the detection accuracy of small-scale thermal defect targets while reducing model complexity. Second, a lightweight GhostConv module is incorporated to further reduce the number of model parameters and compress the model size. Subsequently, a new C2f-LSKA module is designed in the neck network to replace part of the original C2f modules, thereby enhancing the model’s capability to capture contextual semantic information and model spatial structural features. Finally, WIoUv3 is adopted as the bounding-box regression loss to improve regression stability and localization accuracy during training. Experimental results on the expanded self-constructed UAV-based infrared thermography dataset show that, compared with the baseline YOLOv8n model, the proposed method improves Precision, Recall, mAP@0.5, and mAP@0.5:0.95 by 5.1, 11.4, 9.6, and 13.2 percentage points, respectively, while reducing the number of parameters and model size by 65.8% and 61.9%, respectively. In the comparative experiments, the proposed method achieves competitive mAP@0.5 and mAP@0.5:0.95 among the compared detectors, reaching 93.9 ± 0.2% and 72.3 ± 0.3%, respectively. These results indicate improved detection accuracy and localization quality under the evaluated UAV infrared inspection setting while maintaining lightweight characteristics. Several limitations should be acknowledged. The current experiments were conducted on a self-constructed UAV infrared dataset acquired using a single UAV platform and infrared camera at a relatively consistent flight altitude. Thus, the reported results mainly reflect the effectiveness of the proposed method under the evaluated inspection configuration, while its cross-platform, cross-sensor, cross-altitude, and cross-scene generalization still requires further validation. In addition, the runtime evaluation was performed on a GPU-based experimental platform rather than a dedicated onboard edge device. Future work will focus on multi-site data collection, compatible public-dataset validation, edge-platform deployment tests, and robustness improvement under weak low-contrast defects, diffuse abnormal temperature-rise regions, and complex thermal background interference. In addition, prompt learning and foundation-model-based remote sensing interpretation will be further explored to incorporate defect-location priors, thermal anomaly descriptions, and expert guidance for improving cross-scene adaptation in UAV infrared PV inspection.

Author Contributions

Conceptualization, T.H. and Y.Z.; methodology, T.H.; software, T.H.; validation, T.H. and Z.M.; formal analysis, T.H.; investigation, T.H. and Z.M.; resources, Y.Z.; data curation, T.H. and Z.M.; writing—original draft preparation, T.H.; writing—review and editing, Z.M. and Y.Z.; visualization, T.H.; supervision, Y.Z.; project administration, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number U22B2095, and the State Grid Chongqing Electric Power Research Institute, grant number SGCQDK00SBJS2200303. The APC was funded by the above grants.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to project-related restrictions and privacy considerations of the inspected photovoltaic power station.

Conflicts of Interest

The authors declare no conflict of interest.

References

Jumaboev, S.; Jurakuziev, D.; Lee, M. Photovoltaics Plant Fault Detection Using Deep Learning Techniques. Remote Sens. 2022, 14, 3728. [Google Scholar] [CrossRef]
Boubaker, S.; Kamel, S.; Ghazouani, N.; Mellit, A. Assessment of Machine and Deep Learning Approaches for Fault Diagnosis in Photovoltaic Systems Using Infrared Thermography. Remote Sens. 2023, 15, 1686. [Google Scholar] [CrossRef]
Mellit, A.; Kalogirou, S. Recent advances in the application of infrared thermographic imaging and embedded artificial intelligence for fault diagnosis and predictive maintenance of photovoltaic plants: Challenges and future directions. Renew. Sustain. Energy Rev. 2025, 223, 116057. [Google Scholar] [CrossRef]
de Oliveira, A.K.V.; Aghaei, M.; Rüther, R. Automatic inspection of photovoltaic power plants using aerial infrared thermography: A review. Energies 2022, 15, 2055. [Google Scholar] [CrossRef]
Pruthviraj, U.; Kashyap, Y.; Baxevanaki, E.; Kosmopoulos, P. Solar Photovoltaic Hotspot Inspection Using Unmanned Aerial Vehicle Thermal Images at a Solar Field in South India. Remote Sens. 2023, 15, 1914. [Google Scholar] [CrossRef]
Sabry, A.H.; Bıyıkoğlu, A.; Çamdali, U. A comprehensive review of unmanned aerial vehicle-based thermal imaging and deep learning for PV power plant anomaly detection and performance assessment. Eng. Appl. Artif. Intell. 2026, 163, 113070. [Google Scholar] [CrossRef]
Li, R.; Yan, W.; Xia, C. Dual-Branch Diffusion Detection Model for Photovoltaic Array and Hotspot Defect Detection in Infrared Images. Remote Sens. 2025, 17, 1084. [Google Scholar] [CrossRef]
Kuo, C.-F.J.; Chen, S.-H.; Huang, C.-Y. Automatic detection, classification and localization of defects in large photovoltaic plants using unmanned aerial vehicles (UAV) based infrared (IR) and RGB imaging. Energy Convers. Manag. 2023, 276, 116495. [Google Scholar] [CrossRef]
Mellit, A. An embedded solution for fault detection and diagnosis of photovoltaic modules using thermographic images and deep convolutional neural networks. Eng. Appl. Artif. Intell. 2022, 116, 105459. [Google Scholar] [CrossRef]
Hong, F.; Song, J.; Meng, H.; Wang, R.; Fang, F.; Zhang, G. A novel framework on intelligent detection for module defects of PV plant combining the visible and infrared images. Sol. Energy 2022, 236, 406–416. [Google Scholar] [CrossRef]
Wang, B.; Chen, Q.; Wang, M.; Chen, Y.; Zhang, Z.; Liu, X.; Gao, W.; Zhang, Y.; Zhang, H. PVF-10: A high-resolution unmanned aerial vehicle thermal infrared image dataset for fine-grained photovoltaic fault classification. Appl. Energy 2024, 376, 124187. [Google Scholar] [CrossRef]
Prasshanth, C.; Narayanan, S.B.; Sridharan, N.V.; Vaithiyanathan, S. Fault detection in photovoltaic systems using unmanned aerial vehicle-captured images and rough set theory. Sol. Energy 2025, 290, 113348. [Google Scholar] [CrossRef]
Li, J.; Tong, T.; Li, D.; Yuan, X.; Liu, P.; Zhang, J.; Zhu, X.; Zhao, D.; Fang, H. A dynamically adaptive and high-efficiency small object detection network for infrared thermographic images in online monitoring of solar photovoltaic panel defects. Energy 2025, 335, 138129. [Google Scholar] [CrossRef]
Hong, Y.; Pan, R.; Su, J.; Li, M. Infrared image detection of defects in lightweight solar panels based on improved MSRCR and YOLOv8n. Infrared Phys. Technol. 2024, 141, 105473. [Google Scholar] [CrossRef]
Xie, H.; Yuan, B.; Hu, C.; Gao, Y.; Wang, F.; Wang, C.; Wang, Y.; Chu, P. ST-YOLO: A defect detection method for photovoltaic modules based on infrared thermal imaging and machine vision technology. PLoS ONE 2024, 19, e0310742. [Google Scholar] [CrossRef]
Ma, Z.; Guo, H.; Wen, H.; Cao, Y. LFS-YOLO: A PV panel defect detection algorithm for drone infrared sensors. IEEE Sens. J. 2025, 25, 19592–19601. [Google Scholar] [CrossRef]
Chen, K.; Liu, C.; Chen, H.; Zhang, H.; Li, W.; Zou, Z.; Shi, Z. RSPrompter: Learning to prompt for remote sensing instance segmentation based on visual foundation model. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–17. [Google Scholar] [CrossRef]
Liu, C.; Zhao, R.; Chen, J.; Qi, Z.; Zou, Z.; Shi, Z. A decoupling paradigm with prompt learning for remote sensing image change captioning. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–18. [Google Scholar] [CrossRef]
Guo, Y.; Wang, X.; Lin, Z. RFE-YOLO: A study on photovoltaic module fault detection algorithm based on multimodal feature fusion. Sensors 2025, 25, 6774. [Google Scholar] [CrossRef]
Hong, Y.; Wang, L.; Su, J.; Li, Y.; Fang, S.; Li, W.; Li, M.; Wang, H. CEMP-YOLO: An infrared overheat detection model for photovoltaic panels in UAVs. Digit. Signal Process. 2025, 161, 105072. [Google Scholar] [CrossRef]
Pan, W.; Sun, X.; Wang, Y.; Cao, Y.; Lang, Y.; Qian, Y. Enhanced photovoltaic panel defect detection via adaptive complementary fusion in YOLO-ACF. Sci. Rep. 2024, 14, 26425. [Google Scholar] [CrossRef]
Wang, G.; Chen, Y.; An, P.; Hong, H.; Hu, J.; Huang, T. UAV-YOLOv8: A Small-Object-Detection Model Based on Improved YOLOv8 for UAV Aerial Photography Scenarios. Sensors 2023, 23, 7190. [Google Scholar] [CrossRef] [PubMed]
Zhang, R.; Deng, B.; Cheng, X.; Zhao, H. GCS-YOLOv8: A Lightweight Face Extractor to Assist Deepfake Detection. Sensors 2024, 24, 6781. [Google Scholar] [CrossRef]
Hu, J.-F.; Sun, J.; Lin, Z.; Lai, J.-H.; Zeng, W.; Zheng, W.-S. APANet: Auto-Path Aggregation for Future Instance Segmentation Prediction. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 3386–3403. [Google Scholar] [CrossRef]
Chen, C.; Chen, Z.; Li, H.; Wang, Y.; Lei, G.; Wu, L. Research on Defect Detection in Lightweight Photovoltaic Cells Using YOLOv8-FSD. Sensors 2025, 25, 843. [Google Scholar] [CrossRef] [PubMed]
Ding, S.; Jing, W.; Chen, H.; Chen, C. YOLO-Based Defects Detection Algorithm for EL in PV Modules with Focal and Efficient IoU Loss. Appl. Sci. 2024, 14, 7493. [Google Scholar] [CrossRef]

Figure 1. YOLOv8n network architecture.

Figure 2. Architecture of the improved detection head with the P2 branch.

Figure 3. Principle of GhostConv design.

Figure 4. LSKA architecture.

Figure 5. The structure of LSKA.

Figure 6. C2f-LSKA architecture.

Figure 7. Loss function parameter diagram.

Figure 8. Improved YOLOv8 network architecture diagram.

Figure 9. Partial UAV infrared inspection images and thermal defect samples.

Figure 10. Normalized comparison of evaluation metrics between YOLOv8n and the improved model.

Figure 11. Training-process comparison between YOLOv8n and the proposed method. (a) Precision curves during training; (b) mAP@0.5 curves during training.

Figure 12. Qualitative side-by-side comparison of detection results produced by different models on representative UAV infrared inspection images. From top to bottom, the four rows correspond to small hot spots, diffuse abnormal temperature-rise regions, complex background interference, and low-contrast thermal defects, respectively.

Table 1. Integration strategy of the proposed modules in the adopted YOLOv8n framework.

Module	Main Motivation	Integration Position	Main Role
P2-based detection head	Preserve fine-grained shallow thermal features for small defects	Head	Improve small-target detection
GhostConv	Reduce computational complexity	Backbone	Improve lightweight deployment efficiency
C2f-LSKA	Enhance contextual representation under complex backgrounds	Neck	Improve robustness and feature discrimination
WIoUv3	Improve localization for ambiguous small thermal defects	Training loss	Improve bounding-box regression stability

Table 2. Definition criteria and image numbers of representative challenging attributes in the constructed dataset.

Challenging Attribute	Definition Criterion	Parameter Range or Criterion	Number of Images
Small hot spots	A_r = w_bh_b/(WH)	A_r ≤ 0.005	1270
Low-contrast thermal defects	C_n = \|I_d − I_b\|/255	C_n ≤ 0.10	820
Complex background interference	Background interference criterion	Complex PV textures, shadows, ground thermal patterns, or non-defect thermal responses near the defect region	1040
Diffuse abnormal temperature-rise regions	Relative area and boundary-diffusion criterion	A_r > 0.005 with gradual thermal transition or visually ambiguous boundary	710

Note: W and H denote the width and height of the resized input image, respectively; w_b and h_b denote the width and height of the annotated bounding box; A_r denotes the relative area ratio of the defect box; I_d denotes the mean gray-level intensity inside the annotated defect region; I_b denotes the mean gray-level intensity of the surrounding local module background; and C_n denotes the normalized local contrast.

Table 3. Experimental environment configuration.

Name	Configuration
Operating System	Linux
Development Environment	CUDA 11.1
Processor (CPU)	Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50 GHz
Graphics Card (GPU)	RTX 2080 Ti (11 GB)
Epochs	150
Batch-size	8

Table 4. Results of ablation experiments.

Model Number	+Tiny Head	-Large Head	GhostConv	C2f-LSKA	WIoUv3	P/%	R/%	mAP@0.5/%	mAP@0.5:0.95/%	Parameters	Model Size/MB	GFLOPs	FPS/(f.s⁻¹)
1	×	×	×	×	×	87.2	79.5	84.3 ± 0.4	59.1 ± 0.5	3,007,453	6.3	8.6	130
2	√	×	×	×	×	89.2	88.5	92.2 ± 0.3	69.0 ± 0.4	3,085,436	6.4	12.2	115
3	√	×	√	×	×	89.5	88.7	92.5 ± 0.3	69.7 ± 0.4	2,138,698	4.5	9.0	118
4	√	√	×	×	×	87.4	87.3	90.9 ± 0.4	66.6 ± 0.5	1,784,452	3.8	8.1	126
5	√	√	√	×	×	89.7	89.5	92.7 ± 0.3	70.1 ± 0.4	1,224,713	2.8	6.5	124
6	√	√	×	P	×	89.6	88.4	92.1 ± 0.3	68.7 ± 0.4	1,405,261	3.1	8.8	118
7	√	√	×	F	×	89.8	88.7	92.0 ± 0.3	68.8 ± 0.4	1,350,481	3.0	9.6	116
8	√	√	√	P	×	91.3	89.1	92.8 ± 0.3	70.5 ± 0.3	1,029,853	2.4	7.1	121
9	√	√	√	P	√	92.3	90.9	93.9 ± 0.2	72.3 ± 0.3	1,029,853	2.4	7.1	121

Note: The reported mAP@0.5 and mAP@0.5:0.95 values are presented as mean ± standard deviation over three independent runs with different random seeds. “√” denotes that the corresponding component is used, whereas “×” denotes that it is not used. In the C2f-LSKA column, “P” denotes partial replacement of C2f modules in the neck, and “F” denotes full replacement of all C2f modules in the neck. The best value in each metric column is shown in bold.

Table 5. Results of comparative experiments.

Models	mAP@0.5/%	mAP@0.5:0.95/%	Parameters	Model Size/MB	GFLOPs	FPS/(f.s⁻¹)
Faster R-CNN	89.8 ± 0.4	65.0 ± 0.6	47,267,456	180.7	121	17.45
SSD	73.6 ± 0.6	47.8 ± 0.7	23,745,908	90.6	32.8	17.53
YOLOv3-tiny	74.4 ± 0.5	49.5 ± 0.8	12,128,178	24.4	13.2	169.49
YOLOv5n	82.7 ± 0.4	57.2 ± 0.6	2,503,139	5.3	4.5	125.00
YOLOv6n	83.6 ± 0.5	58.8 ± 0.6	4,233,843	8.7	11.2	142.85
YOLOv8n	84.3 ± 0.4	59.1 ± 0.5	3,007,453	6.3	8.6	129.87
YOLOv10n	86.5 ± 0.3	62.4 ± 0.5	2,781,368	5.8	6.7	130.33
YOLO11n	87.3 ± 0.3	63.1 ± 0.5	2,624,389	5.6	6.5	127.45
YOLOS-PV	94.6 ± 0.4	73.2 ± 0.5	46,571,268	178	90	20.2
Ours	93.9 ± 0.2	72.3 ± 0.3	1,029,853	2.4	7.1	121.00

Note: The best value in each metric column is shown in bold.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

He, T.; Mao, Z.; Zhong, Y. An Improved YOLOv8n Method for Small Thermal Defect Detection of Photovoltaic Modules in UAV Infrared Inspection. Remote Sens. 2026, 18, 1986. https://doi.org/10.3390/rs18121986

AMA Style

He T, Mao Z, Zhong Y. An Improved YOLOv8n Method for Small Thermal Defect Detection of Photovoltaic Modules in UAV Infrared Inspection. Remote Sensing. 2026; 18(12):1986. https://doi.org/10.3390/rs18121986

Chicago/Turabian Style

He, Tengfei, Zhongyuan Mao, and Yuanchang Zhong. 2026. "An Improved YOLOv8n Method for Small Thermal Defect Detection of Photovoltaic Modules in UAV Infrared Inspection" Remote Sensing 18, no. 12: 1986. https://doi.org/10.3390/rs18121986

APA Style

He, T., Mao, Z., & Zhong, Y. (2026). An Improved YOLOv8n Method for Small Thermal Defect Detection of Photovoltaic Modules in UAV Infrared Inspection. Remote Sensing, 18(12), 1986. https://doi.org/10.3390/rs18121986

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

An Improved YOLOv8n Method for Small Thermal Defect Detection of Photovoltaic Modules in UAV Infrared Inspection

Highlights

Abstract

1. Introduction

2. YOLOv8 Network Architecture

3. Improved Network Design

3.1. P2-Based Detection Head for Small Thermal Defects

3.2. GhostConv Module

3.3. C2f-LSKA Module

3.4. Loss Function Improvement

4. Experimental Results and Analysis

4.1. Dataset Construction and Experimental Settings

4.2. Evaluation Metrics

4.3. Test Results and Analysis

4.3.1. Ablation Study

4.3.2. Comparative Experiments

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI