1. Introduction
Growing concerns over energy security and the accelerating global transition to clean energy have positioned photovoltaic (PV) power generation as a cornerstone of the low-carbon energy system. According to the 2024 BP World Energy Outlook [
1] and the World Energy Statistics Yearbook 2024 [
2], newly installed solar and wind capacity reached 276 GW globally in 2023, of which 75% (346 GW) came from solar. Cumulative PV capacity reached 1412.09 GW, corresponding to a year-on-year increase of 32.4%.
PV panels operate outdoors for extended periods and are therefore susceptible to environmental stressors, which generate typical defects such as hotspots, partial shading, and bypass-diode failures [
3,
4]. These defects reduce generation efficiency and may cause thermal runaway or fire. Efficient and low-cost defect-detection technology is thus critical for maintaining plant performance over its lifetime. Existing approaches are broadly divided into two categories: (i) monitoring based on electrical characteristics (output voltage, current, and power) [
5,
6,
7] and (ii) detection based on infrared image analysis. Electrical methods require thousands of sensors for large plants, leading to prohibitive cost and declining efficiency at scale. Image-based methods, in contrast, are non-contact, deployment-friendly, and have become the mainstream research direction.
Traditional computer-vision approaches have been widely investigated. Tsanakas et al. [
8] applied Canny edge detection on thermal images for hotspot identification. Ngo et al. [
9] combined K-means clustering with DBSCAN for contour extraction, but edge information is easily obscured in cluttered backgrounds. Chen et al. [
10] employed single-channel thresholding and colour-space statistics, which improved generalisation yet still suffered from information loss due to statistical descriptors. With the rise in deep learning, Wu et al. [
11] adopted an improved LeNet-5 for fault classification, and Greco et al. [
12] applied the YOLO framework across 18 datasets. Deep models outperform traditional pipelines in feature representation but typically require millions of parameters [
13], imposing significant hardware costs. Existing methods therefore face a dual bottleneck: traditional algorithms lack accuracy under complex backgrounds, whereas deep networks are too heavy for edge deployment, and few studies achieve component-level localisation beyond coarse bounding-box detection. More specifically, traditional pipelines rely on hand-crafted thresholds whose stability degrades when irradiance, viewing angle, or array layout change between sites, while deep detectors that achieve state-of-the-art accuracy on benchmark PV datasets (e.g., YOLO-based and Faster R-CNN-based variants) generally adopt horizontal bounding boxes that include background pixels of neighbouring arrays, raising both false-detection and miss rates in densely arranged plants. Moreover, the majority of published works report only image-level or array-level outputs and do not return the row–column index of the faulty module that field crews actually need for on-site replacement, which leaves a concrete gap between reported accuracy and operational utility.
To address these gaps, this study proposes a two-stage automatic defect-recognition method that combines a lightweight deep detector with traditional image-processing cues, followed by a component-level localisation strategy. The main contributions are summarised as follows:
(1) A UAV-based infrared inspection platform is established and a large-scale PV infrared image dataset is constructed under standardised conditions and released for non-commercial research use.
(2) An improved rotating-box detector YOLO-CLO is developed upon YOLOv8-OBB, in which a lightweight C3m module replaces the original C2f, and a shared-convolution LSCD-OBB detection head is introduced. The detector enables real-time, high-precision extraction of individual PV arrays and is complemented by a multi-feature thresholding pipeline for defect classification.
(3) A component-level row–column localisation method is proposed, integrating UAV GNSS metadata, the Hough transform, and an improved K-means clustering scheme, enabling pixel-to-module mapping within any detected faulty array. Compared with prior YOLO-based PV detectors and Hough-based line-extraction methods, the novelty of this work lies in three specific design choices rather than in the use of these building blocks themselves: (i) the C3m module re-engineers the C2f/C3 design by replacing one CBS sub-block with a plain convolution and inserting a Mish activation after branch fusion, giving a different parameter–accuracy operating point from existing C2f variants; (ii) the LSCD-OBB head shares the two 3 × 3 Conv_GN blocks across the regression and classification branches while keeping the angle branch independent and introduces a learnable per-head scale factor to compensate for the resulting feature-scale mismatch—a configuration not adopted in standard YOLOv8-OBB; and (iii) the localisation module couples UAV-logged GNSS metadata, Hough-line extraction and prior-knowledge-initialised K-means clustering into a unified pipeline that returns absolute array coordinates and intra-array row–column indices, addressing the operational gap identified above.
4. Component-Level Defect Localisation
4.1. Overall Localisation Framework
The proposed localisation strategy comprises two sub-processes: array-level GNSS positioning and module-level row–column indexing. Array-level positioning takes the faulty-array image and the plant ledger as inputs. The cropped array is first mapped back to the original infrared image; the pixel offset between the array centroid and the image centre is computed; the UAV-logged GNSS of the image centre is retrieved; and the ledger entry whose GNSS offset to the image centre matches the pixel offset identifies the faulty array’s absolute position.
Module-level indexing only requires the faulty-array image. Component connection lines are detected and fitted to partition the array into rows and columns, yielding a module index matrix and a module-vertex coordinate matrix. Matching the defect pixel coordinates against these matrices returns the row–column index of the defective module, achieving component-level defect localisation. The integrated localisation workflow is shown in
Figure 11.
4.2. GNSS Matching of Faulty PV Arrays
The GNSS metadata logged by the UAV (latitude, longitude, altitude, camera attitude) are parsed directly from the infrared image header. With the top-left corner of the image as the origin, east as the +x direction and south as the +y direction, let the image centre pixel coordinates be (u_c, v_c) and its physical coordinates be (lat_c, lon_c). Let the centroid pixel coordinates of the faulty array be (u_a, v_a). The pixel distance and azimuth of the array centre relative to the image centre are then given by:
For each candidate array in the ledger with centroid coordinates (lat_i, lon_i), the geographic distance d_geo to the image centre is computed using the haversine formula:
where
R = 6371 km is the Earth’s radius, and Δ
lat = lat_i − lat_c, Δ
lon = lon_i − lon_c. The corresponding azimuth is:
The ledger entry whose (d_geo, φ_geo) best matches (d_pixel, φ_pixel) within a pre-set tolerance is identified as the faulty array, completing array-level GNSS localisation.
4.3. Module-Level Row–Column Localisation
A single PV array typically occupies around 38 m2 in utility-scale plants; array-level GNSS alone is insufficient to guide rapid on-site repair. To refine localisation, an improved Hough-line detector combined with prior-knowledge-initialised K-means clustering is used to extract array-segmentation lines from which the module index matrix and coordinate matrix are constructed. Matching the defect pixel coordinates against these matrices returns the exact row–column index of each defective module.
(1) Hough-Line Detection with High/Low-Frequency Enhancement. The Hough transform [
22] is widely used for line detection and is well-suited to PV array segmentation. Conventional pre-processing (grayscale conversion + denoising) is augmented here by enhancing both frequency bands: adaptive local histogram equalisation first amplifies high-frequency details, and a second equalisation emphasises low-frequency content. The two are fused at a rate of 0.5 to balance detail preservation and noise suppression. Image-space points are then transformed into parameter-space curves; a voting process in parameter space accumulates line candidates; and threshold-based extraction yields the final lines, which are drawn back in image space.
(2) Prior-Knowledge K-means Clustering. Given the regular geometry of PV arrays, the cluster numbers are fixed: K_x = 3 for horizontal lines and K_y = 14 for vertical lines. The standard K-means [
23] randomly initialises cluster centres, which can converge to local optima. An improved initialisation is therefore adopted: the image height is divided into three equal intervals, whose centres are used as initial horizontal centres, and the width is divided into fourteen equal intervals for the vertical centres. Each data point
is then assigned to the cluster of the nearest centre
by Euclidean distance, and centres are updated as the mean of their members:
Iteration continues until the cluster centres stabilise or the maximum iteration count is reached. The averaged line segments of each cluster yield the final segmentation lines that define the PV module grid. The fixed cluster numbers K_x = 3 and K_y = 14 reflect the module layout of the PV arrays in the present dataset and were chosen accordingly. For PV plants with a different number of modules per array, these values are not transferable as-is, which limits the direct generalisation of the current implementation. Two practical adaptations can be made without changing the underlying algorithm. First, since plant operators typically know the per-array module layout in advance, K_x and K_y can be read from the plant ledger together with the array centroid coordinates already used for GNSS matching, and the same prior-knowledge initialisation applies. Second, for arbitrary or unknown layouts, K_x and K_y can be estimated automatically from the Hough-line histogram by counting the number of dominant orientation peaks or by selecting K with an internal-validity criterion (e.g., the elbow point of within-cluster sum of squared distances, or the silhouette score) at the cost of additional computation. A data-driven K-selection mechanism along these lines is left for future work and is noted again in the discussion.
4.4. Localisation Results
Representative diagnostic-report outputs are shown in
Figure 12. The row–column indices of the identified modules match the ground truth, confirming that the proposed localisation strategy accurately maps image-plane defect coordinates to physical module indices within an array. A quantitative evaluation of the localisation accuracy was carried out on the test-set images that contain identified defective modules. Two aspects were assessed: (i) array-level GNSS matching, which compares the array identifier returned by the haversine-based ledger query in
Section 4.2 against the ground-truth identifier recorded during data collection; and (ii) module-level row–column indexing, which compares the (row, column) tuple returned by the Hough + K-means grid against manual annotation. Across the evaluated cases the proposed pipeline returned the correct array identifier for the large majority of faulty arrays, with failures concentrated in images where two adjacent arrays produced ledger entries within the matching tolerance; the dominant error mode for module-level indexing was an off-by-one error in the column direction, caused by missing inter-module gap lines at array edges that bias the K-means cluster centres. A larger-scale quantitative localisation study with per-row and per-column accuracy and a breakdown of GNSS-matching errors will be reported in follow-up work as additional ground-truth annotations become available.
5. Discussion
The proposed framework combines a lightweight rotating detector with traditional image-processing cues and a hybrid GNSS–Hough–K-means localisation strategy, addressing three recurrent bottlenecks in PV defect inspection: background noise introduced by horizontal bounding boxes, the heavy computational cost of deep detectors, and the absence of component-level localisation in existing pipelines. YOLO-CLO achieves 25.31% fewer parameters and 19.73% fewer GFLOPs than the YOLOv8-OBB baseline while increasing FPS by 17.37%, demonstrating that the C3m module and the shared-convolution LSCD-OBB head deliver a favourable accuracy–efficiency trade-off for edge deployment.
The multi-feature defect-detection stage benefits from the explicit physical priors of PV thermography: hotspots and diode failures violate the upper temperature bound of the array mean, whereas obstructions violate the lower bound and preserve the geometric outline of the occluding object. The union of gradient, grayscale, temperature, and morphological features is therefore discriminative enough to reach 100% on diode failures and 96.97% on hotspots without a learning-based classifier. The relatively lower obstruction accuracy (88.89%) is attributable to the wider shape and size variability of occluders, which occasionally breaches the fixed irregularity-index threshold; adaptive thresholding remains a promising direction.
Compared with YOLOv5 and Faster R-CNN, the proposed pipeline improves detection accuracy by 3.35 pp and 5.70 pp, respectively, while reducing miss and false-alarm rates. Unlike end-to-end deep detectors, the defect-detection stage does not require labelled defect data, only PV array labels, alleviating annotation cost—a non-trivial advantage for plant operators. The GNSS–Hough–K-means localisation module further extends the output from bounding-box coordinates to actionable module indices, closing the loop between detection and field maintenance.
Several limitations are acknowledged. First, the defect-detection thresholds, although physically motivated, depend on the imaging protocol (altitude, look angle, solar conditions). Second, the fixed cluster numbers K_x = 3 and K_y = 14 reflect the dataset-specific module layout; generalisation to other layouts will require a data-driven K-selection mechanism. Future work will explore adaptive thresholding, domain-adaptive detector fine-tuning, and on-board real-time inference. A more critical assessment further reveals the following limitations. (a) Defect category coverage. Only three defect types (hotspots, diode failures, obstructions) are addressed, and the number of annotated defect samples in the test set is modest; rare but operationally important faults such as cell cracks, snail trails, PID-induced degradation, and bypass-diode partial failure are not represented and will require a larger, more diverse defect corpus to evaluate reliably. (b) Environmental dependency. Data were acquired under clear-sky, fog-free conditions with the camera held nearly perpendicular to the module plane; performance under low-irradiance, partly cloudy, hazy, or oblique-view conditions is not characterised, and the thermal-difference thresholds used in
Section 3.3 are likely to require recalibration in those regimes. (c) Thermal-imaging artefacts. The pipeline does not currently distinguish thermal reflections of the sun or of warm structures from genuine hotspots, and small calibration drifts of the IR sensor between flights are not actively corrected. (d) Reliance on UAV metadata and camera calibration. Array-level GNSS matching depends on the accuracy of the on-board GNSS receiver and on the assumption that the camera optical axis is perpendicular to the module plane; non-trivial yaw, pitch, or roll deviations introduce pixel-to-ground projection errors that propagate to the ledger query, and we have not yet quantified this propagation rigorously. (e) Module-layout assumptions. The Hough + K-means localiser assumes that arrays consist of a regular rectangular grid of identical modules; arrays with mixed module sizes, missing modules, or non-rectangular layouts are not supported. (f) End-to-end deployment timing. The reported FPS reflects detector inference; the full diagnostic pipeline introduces additional latency from cropping, multi-feature thresholding, Hough/K-means, and GNSS matching, and the system has not yet been profiled on embedded UAV hardware. (g) Scalability and operational robustness. Real-world inspection missions involve actuator disturbances, vibration, and external perturbations that may degrade image-acquisition quality and localisation stability; the current evaluation does not stress-test these conditions. Future work will therefore focus on (i) adaptive thresholding driven by per-array statistics, (ii) domain-adaptive fine-tuning of the detector across plants and seasons, (iii) a data-driven K-selection module that lifts the fixed-grid assumption, (iv) explicit modelling of camera-attitude error in the GNSS-matching step, and (v) on-board real-time inference with end-to-end timing reported on embedded UAV platforms.
6. Conclusions
A two-stage framework for UAV-based PV defect detection and component-level localisation is presented. The rotating-box detector YOLO-CLO, built on YOLOv8-OBB with a lightweight C3m module and a shared-convolution LSCD-OBB head, achieves 99.1% mAP@0.5 and 96.7% mAP@0.5:0.95 with 8.52 M parameters, 23.6 GFLOPs, and 59.88 FPS, striking a strong balance between accuracy and efficiency. The multi-feature defect-detection pipeline attains 96.97%, 100%, and 88.89% accuracy on hotspots, diode failures, and obstructions, respectively, outperforming YOLOv5 and Faster R-CNN baselines. The GNSS–Hough–K-means localisation strategy accurately recovers the row–column index of each defective module within an array. The proposed framework exhibits low hardware dependency and strong deployment readiness, offering practical value for large-scale PV plant operation and maintenance. Future work will focus on adaptive thresholding and domain-generalisable defect detection to further enhance robustness in diverse field conditions.