TransferLearning-Driven Large-Scale CNN Benchmarking with Explainable AI for Image-Based Dust Detection on Solar Panels

Anwar, Hafeez

doi:10.3390/info17010052

Open AccessArticle

TransferLearning-Driven Large-Scale CNN Benchmarking with Explainable AI for Image-Based Dust Detection on Solar Panels

by

Hafeez Anwar

^1,2

¹

Department of Computer Science, National University of Computer and Emerging Sciences (NUCES-FAST), Jamrud Road 160 Industrial Estate Road, Phase 1 Hayatabad, Peshawar 25100, Pakistan

²

Interdisciplinary Center for Digital Humanities and Social Sciences, Friedrich-Alexander University, 91054 Erlangen, Germany

Information 2026, 17(1), 52; https://doi.org/10.3390/info17010052

Submission received: 8 December 2025 / Revised: 27 December 2025 / Accepted: 3 January 2026 / Published: 6 January 2026

(This article belongs to the Special Issue Addressing Real-World Challenges in Recognition and Classification with Cutting-Edge AI Models and Methods)

Download

Browse Figures

Versions Notes

Abstract

Solar panel power plants are typically established in regions with maximum solar irradiation, yet these conditions result in heavy dust accumulation on the panels causing significant performance degradation and reduced power output. The paper addresses this issue via an image-based dust detection solution powered by deep learning, particularly convolutional neural networks (CNNs). Most of such solutions use state-of-the-art CNNs either as backbones/features extractors, or propose custom models built upon them. Given such a reliance, future research requires a comprehensive benchmarking of CNN models to identify the ones that achieve superior performance on classifying clean vs. dusty solar panels both with respect to accuracy and efficiency. To this end, we evaluate 100 CNN models that belong to 16 families for image-based dust detection on solar panels, where the pre-trained models of these CNN architectures are used to encode solar panel images. Upon these image encodings, we then train and test a linear support vector machine (SVM) to determine the best-performing models in terms of classification accuracy and training time. The use of such a simple classifier ensures a fair comparison where the encodings do not benefit from the classifier itself and their performance reflects each CNN’s ability to capture the underlying image features. Experiments were conducted on a publicly available dust detection dataset, using stratified shuffle-split with 70–30, 80–20, and 90–10 splits, repeated 10 times. convnext_xxlarge and resnetv2_152 achieved the best classification rates of above 90%, with resnetv2_152 offering superior efficiency that is also supported by features analysis such as tSNE and UMAP, and explainableAI (XAI) such as LIME visualization. To prove their generalization capability, we tested the image encodings of resnetv2_152 on an unseen real-world image dataset captured via a drone camera, which achieved a remarkable accuracy of 96%. Consequently, our findings guide the selection of optimal CNN backbones for future image-based dust detection systems.

Keywords:

deep learning; convolutional neural networks (CNNs); image classification; classification accuracy; support vector machine (SVM); explainable-AI (XAI)

1. Introduction

Image classification, a subfield of computer vision, was traditionally based on hand-crafted feature extraction [1], such as color histograms [2], edge detection filters [3], local feature descriptors [4], and texture measures [5]. Once extracted from images, these features were used to train machine learning algorithms, such as support vector machines (SVMs) or k-nearest neighbors (k-NN), for classification. However, the rise of deep learning algorithms, particularly convolutional neural networks (CNNs) [1], has transformed image classification by automating feature extraction [6] and allowing hierarchical representations to be learned directly from raw image data [7]. This shift improved both the accuracy and scalability of classification frameworks [8,9].

The devastating drawbacks of fossil-fuel-based power generation [10] have led to a global shift toward renewable energy. Figure 1 illustrates this trend, showing that solar power remains the top contributor to sustainable energy. Figure 1a highlights solar photovoltaic (PV) as the dominant source of renewable energy in 2023, with an over 60% share across most regions except Africa. Figure 1b shows solar PV’s exponential growth in global capacity since 2019, while Figure 1c depicts a 44.2% drop in solar module prices in 2024, signaling significant technological advancement. These trends position solar PV as the cornerstone of renewable strategies and a key alternative to traditional power generation.

Despite the promising growth of solar PV, its operational efficiency is highly sensitive to environmental factors. For efficient power generation, solar panels are often installed in locations that receive extensive sunlight, such as arid and semi-arid regions [11]. However, in such areas, dust accumulation on the surface of solar panels is a persistent problem that reduces PV efficiency due to sunlight obstruction [12]. This issue becomes even more severe because these regions receive relatively low annual rainfall, which limits the natural cleaning of panels. Consequently, maintaining optimal solar panel performance in these environments requires innovative solutions to minimize the impact of dust and ensure consistent energy output.

To address these challenges, traditional manual cleaning methods are often employed; however, while effective, they are labor-intensive, water-intensive, and costly when performed regularly without assessing actual need. Automated cleaning systems [13], though promising, can also be resource-intensive if not optimized for need-based operation. Consequently, recent advancements [14] propose the use of automated image-based monitoring systems to identify exactly when panels require cleaning, thus ensuring efficient and timely maintenance. This approach offers a sustainable solution that reduces operational costs and resource consumption while maintaining optimal energy output. This paper proposes to extend this line of research and perform a comprehensive evaluation of the recently proposed CNN-based deep learning models for the task of dust detection on solar panels. Our approach is to encode/represent images using CNN models that are pre-trained on large-scale generic datasets such as ImageNet. These image encodings are then used for the task of image classification to detect dusty solar panels. We propose to evaluate the performance of 100 pre-trained CNN models both with respect to classification accuracy and computational efficiency. In addition to performance evaluation, we employ explainable AI (XAI) to verify whether the image classification results are driven by meaningful visual evidence on the solar panels rather than spurious background cues. This analysis helps in understanding the underlying causes of misclassifications and in assessing the reliability/quality of the image encodings. Nonetheless, such XAI-based verification is diagnostic in nature, while the design of corrective or mitigation strategies lies beyond the scope of this work.

Research Gap, Questions, Contributions, and Scope

Although CNNs have been widely employed for image-based dust detection on solar panels using standard architectures such as ResNet, VGG, and DenseNet, there exists no comprehensive benchmarking of recent state-of-the-art CNN models on publicly available solar panel datasets. As a result, the research community lacks a clear understanding of which CNN variants are most effective in terms of both classification accuracy and computational efficiency for this specific task. Furthermore, prior studies often focus on proposing new models or adopting existing ones without systematically comparing them across common evaluation settings. This limits fair comparisons and makes it difficult to understand the trade-offs involved when selecting a CNN backbone for real-world deployment. Additionally, many studies emphasize accuracy while ignoring key efficiency metrics such as inference time and FLOPs, which are essential for scalable and real-time deployment in the solar energy sector.

To address these limitations, this paper rigorously benchmarks 100 pre-trained CNN models from 16 architecture families on a real-world solar panel dust detection dataset. We assess each model based on its classification performance and computational cost to establish a reliable reference for future work in this domain.

Accordingly, the following research questions are addressed in the paper:

RQ-1: Which pre-trained CNN architectures achieve the highest classification accuracy for image-based dust detection on solar panels?
RQ-2: How do the evaluated CNN models compare in terms of computational efficiency, including training time, inference time for a single image, and FLOPs?
RQ-3: Which CNN models offer the best trade-off between classification performance and computational cost, making them suitable for real-time or resource-constrained deployment?

To answer these questions, the paper makes the following key contributions:

It presents the first large-scale benchmarking of 100 pre-trained CNN models from 16 architecture families for image-based dust detection on solar panels, providing a comprehensive comparative analysis of backbone performance.
It reports classification accuracy along with computational metrics such as training time and FLOPs to highlight the practical trade-offs between model performance and efficiency.
The benchmarking is conducted using a reproducible and modular evaluation pipeline, offering a standardized baseline for future research in solar panel monitoring using computer vision.

To frame these contributions appropriately, we briefly outline the scope and limitations of this paper:

We do not follow a multi-class classification approach for identifying “dirty” solar panels although the dirt may be of several types such as bird-droppings, debris, leaves, occlusion and defects, etc. Instead, we perform image-based binary classification to classify the solar panels as either “clean” or “dirty”. The details on the number of images per the dirt type is given in Section 3.
We do not propose a novel CNN architecture, nor do we aim to achieve the best results on the current dataset by fine-tuning a given CNN architecture to mitigate misclassifications. Rather, our focus is on benchmarking pre-trained CNN backbones, which is considered a compulsory preliminary step. This is because newly proposed CNNs are often derived from well-known architectures such as ResNet or VGG. Identifying the best backbone for a given problem can then lead to improved results through the discriminative features it extracts. Given the vast number of available CNN architectures, evaluating all of them is impractical, while evaluating too few would fail to capture meaningful comparative insights. To ensure fairness, diversity, and comprehensiveness, we therefore evaluate 100 representative models from 16 distinct CNN families, providing a solid empirical foundation for future model development in this domain.
We do not employ any additional sensing modalities such as infrared or electrostatic/deposition sensors—nor do we attempt to measure or estimate physical dust layer thickness μm, mass density (mg/cm²), or other quantitative deposition metrics. Our proposed classification pipeline is based on RGB images where the labels are assigned purely by visual criteria.
We do not conduct any end-to-end energy or cost-savings analysis of vision-based cleaning systems. Consequently, the paper does not deal with or present hardware deployment, real-time inference pipeline, or field-trial validation.
By clearly bounding the contribution of this paper to a comprehensive, reproducible comparison of 16 state-of-the-art CNN backbones, this paper provides a solid foundation upon which future research can achieve the following:
–
integrate the top-performing models into full monitoring systems;
–
extend the image dataset to accommodate the dirt types for multi-class classification;
–
incorporate additional sensors;
–
quantify operational and energy benefits.

2. Related Work

This section is divided into two subsections. The Section 2.1 presents an overview of recent studies that highlight the impact of dust pollution on the power generation capability of solar panels. The Section 2.2 reviews the latest image-based dust detection methods, which are essential components of a comprehensive dust monitoring and removal system for solar panels.

2.1. Impact of Dust Accumulation on Solar Panel Efficiency: A Global Perspective

To underscore the global relevance of dust accumulation on solar panel performance, we present findings from recent studies conducted in diverse regions:

India: In Dehradun, a 45-day exposure led to an average efficiency drop of 22.5% in polysilicon solar panels [15].
Jordan: At the Jordan University of Science and Technology, natural dust accumulation over three months resulted in a 13% decline in PV module output power [16].
Pakistan: Studies in Islamabad and Bahawalpur revealed that dust densities of 6.388 g/m² and 4.365 g/m² led to efficiency reductions of 15.08% and 12.61%, respectively, [17].
China: Research indicated that a one-micron dust layer could cause a 25.5% reduction in PV module efficiency, with a 70-day exposure leading to a 21.47% power output decrease [18].
Global Review: A comprehensive analysis reported that dust accumulation could reduce PV efficiency by up to 64%, with factors like tilt angle and environmental conditions playing significant roles [19].

These studies highlight the critical impact of dust on solar panel performance across various climatic regions, emphasizing the need for effective dust mitigation strategies.

2.2. Image-Based Dust Detection on Solar Panels

To mitigate the performance degradation of solar PVs caused by dust accumulation, researchers have devised solutions based on various technologies such as electrodynamic dust demoval (EDR) [20], digital image processing [21], and more recently, using image-based deep learning [14]. Since this paper extends the deep learning-based line of research, some recent methods are discussed in this section that utilizes deep learning.

MobileNetV1, V2, and V3 are used for classifying dusty and clean solar panel surfaces where MobileNetV1 achieves the best accuracy of 91.25%, demonstrating its feasibility for real-time applications [22]. A custom CNN model named “SolNet” is proposed to detect dusty solar panels and is shown to surpass state-of-the-art CNN models such as AlexNet and ResNet on a dataset that lacks challenges such as background clutter, partial occlusion and view-point changes [23]. A two-stage and end-to-end pipeline using a customized YOLOv5 model is proposed to detect and localize dust and bird droppings in UAV images of solar panels [24]. The comparison of semantic segmentation methods for dust detection is performed to demonstrate that deep learning methods outperform traditional machine learning in accuracy and robustness [25]. Addressing the unique challenges of offshore environments, a further study presents a framework that leverages image processing (HLS colorspace) and deep learning (Mask R-CNN), for effective dust detection on floating solar panels [26]. Deep learning combined with multi-view virtual reality imaging is proposed to provide a comprehensive assessment of dust accumulation patterns on solar panels [27]. Various deep learning architectures such as ResNet, EfficientNet, InceptionNetV3, MobileNetV2, and VGG are evaluated for the detection of dust on solar panels, identifying models that strike a balance between accuracy and computational efficiency [28]. CASolarNet [29] is proposed by incorporating channel attention into an EfficientNet-based architecture which is shown to enhance dust detection accuracy on solar panels. Various deep learning strategies such as EfficientNetB3, ResNet50, MobileNet, VGG19, Xception, InceptionResNetV2, VGG16, ResNet101, DenseNet201, and EfficientNetB7 are comprehensively evaluated to detect dust on solar panels, underscoring the importance of timely maintenance to maintain optimal performance [30]. A novel method for classifying photovoltaic (PV) system losses using thermographic non-destructive tests (TNDTs) combined with a CNN is proposed [31]. The method is shown to rapidly detect efficiency losses caused by dirt and malfunctions, achieving 98% accuracy with significant advantages in speed, cost reduction, and minimized electricity production losses. DVNET [32], an end-to-end solar panels dust detection model based on light transmittance estimation, is proposed. It uses image processing and attention mechanisms and is shown to accurately quantify dust density and generate transmittance maps thus demonstrating efficiency, scalability, and real-world applicability. An improved Adam optimizer for dust detection on solar photovoltaic panels is presented that integrates Warmup technology and cosine annealing to enhance training stability and generalization [33]. The proposed method is shown to outperform the standard Adam optimizer across multiple architectures including ResNet-18, VGG-16, and MobileNetV2 demonstrating superior convergence and accuracy, with potential economic and strategic benefits for solar energy maintenance. Previously, the current authors also investigated pre-trained CNN models such as AlexNet, DenseNet, ResNet, MobileNet, etc. combined with variants of SVM classifier for dust detection; in this investigation, DenseNet169 achieved notable accuracy, suggesting a cost-effective maintenance solution [14]. Since, our main focus is on the use of CNN models both custom and pre-trained models, in Table 1, we summarize the most relevant works that have utilized or evaluated multiple CNN models for the task of image-based dust detection on solar panels.

However, none of these studies have systematically compared a broad range of CNN architectures for dust detection, particularly with respect to balancing classification accuracy and computational cost. To address this gap, we present an extensive benchmark of 100 CNN models spanning 16 architecture families, evaluating each model’s dust-detection accuracy and computational efficiency.

3. Dataset

The dataset originates from a publicly available online repository [39] that comprises images depicting both clean and dusty solar panels. Overall, the dataset contains 1068 images, with 405 images representing clean solar panels and 664 images representing panels affected by dust and other contaminants. However, the originally uploaded dataset included more images than these 1068, where 1493 images depict clean solar panels while 1069 images belong to dusty solar panels. We performed a manual inspection of the complete dataset for potential duplicates, as there were images of the same scene captured at varying scales. This may have occurred because both a thumbnail and the original image were downloaded by the web crawler, each with a different name. Such cleaning is of utmost importance because the images are re-sized to a fixed size before being encoded with a pre-trained model. Hence, variants of the same image at different scales are all re-sized to the same dimensions and then encoded leading to biased classification results.

This dataset comes with immense challenges due to image variations some of which are depicted in Figure 2. The images are captured from a variety of viewpoints and under diverse daytime lighting conditions. Additionally, for the dusty panels, the dataset includes a range of images exhibiting different types and patterns of dirt accumulation. Interestingly, there are images where the cleaning process is shown, such that some solar panels are being cleaned while some of them are still dirty. Consequently, all these image variations make the image dataset challenging for recent state-of-the-art CNN models.

Based on these observations, the images in the dataset are classified purely on visual hallmarks rather than on quantifiable measurements such as dust-layer thickness in micrometers or mass density in milligrams per square centimeter. We therefore employ solely visual criteria for labeling “clean” and “dirty” panels in this Internet-sourced image dataset. The clean solar panel images predominantly depict new or freshly installed solar panels that have just undergone professional cleaning, characterized by uniform specular highlights across the entire glass surface and the absence of smudges, streaks, particulate clusters, or other irregularities under standard outdoor illumination. In contrast, the dusty solar panel images capture the real-world conditions in which a panel’s surface is occluded by environmental and biological contaminants such as dust and dirt deposits, bird droppings, fungal growths, and organic debris like leaves or pollen resulting in non-uniform reflectance and mottled shading over the affected areas. For a clearer picture, Table 2 shows the number of images in the dataset that represent each type of contaminant. This breakdown is provided for information only and is not used in any part of the experiments because the paper strictly deals with binary image-based classification of solar panels, i.e., categorizing them as either clean or dirty, regardless of the type of dirt present on the panel. Such a so-called coarse-grained binary classification serves as a first step to identify the dirty solar panels regardless of the type of the contaminant, as the cleaning procedures can be different for different dirt types. Consequently, such training on a diverse array of dirty solar panel images can ensure that the model identifies “dirtiness” in all its forms, creating a foundation that can be extended into multi-class classification framework using fine-grained classification [40].

4. Methodology

Her, we give a brief description of the proposed methodology, which involves encoding images using pre-trained CNN models and then using these encoded images to train and test a linear SVM using a stratified shuffle-split strategy as shown in Figure 3.

4.1. Image Encoding Using Pre-Trained CNN Model

In the first part of the methodology, a pre-trained CNN model is used to encode the solar panel images. Instead of training from scratch, a CNN model in its pre-trained form can be used as a features extractor, thus leveraging the knowledge that it has gained already from large-scale image datasets such as ImageNet, a process more commonly known as “transfer learning”. For a given image, the output from the final convolutional stage is a set of feature maps upon which, a global average pooling (GAP) layer is applied. This operation reduces each feature map to a single scalar by computing the average across all spatial locations. Consequently, the GAP layer transforms the set of feature maps into a compact feature vector. This vector encapsulates the most salient visual features of the image while significantly reducing the number of parameters and mitigating any overfitting that may occur due to the high dimensional feature maps. Figure 4 outlines the details of the CNN families and their variants used for image encoding, totaling 100 pre-trained models. Each CNN family alongside its variants is shown along with their corresponding feature-vector dimensions obtained via GAP, the number of floating point operations in GFLOPs, and their parameter numbers.

4.2. Classification Using a Linear SVM via Stratified Shuffle Split

The encoded image dataset is used as input to a linear support vector machine (SVM) classifier. The selection of a linear SVM is motivated by its reduced computational cost while being effective in high-dimensional space. In its linear setting, the SVM classifier learns to distinguish between the two classes, such as clean and dirty solar panels, by finding the optimal hyperplane that maximizes the margin between the two classes. The dataset is partitioned into training and test sets using a stratifiedshufflesplit strategy that ensures uniform class distribution across both the sets. This avoids biases that may result from imbalanced class representations. We also propose to use three splitting ratios which are 70% training with 30% testing, 80% training with 20% testing, and 90% training with 10% testing to analyze the impact of training set size on the achieved results. Consequently, we report the classification accuracy, precision, recall and F1-score, while also aim to calculate the training time taken by the linear SVM on image encodings generated by different CNN models.

Nonetheless, the use of a pre-trained CNN for image encoding coupled with a lightweight SVM classifier is a deliberate choice intended to minimize the local processing burden. This architecture provides a streamlined pipeline that is well-suited for future deployment on edge-computing hardware, where high-dimensional encodings can be processed with low latency to meet the demands of real-time monitoring.

5. Results and Discussion

Due to the large number of CNNs evaluated, we present the results in a systematic manner while give their details in the Appendix A. Since, we evaluate 100 CNN models that belong to 16 families, we explain and visually demonstrate the variants that are the best performers among their family members for each of the three settings of the data split. The same is done for reporting the training time where we show the variants that take the least time among family members.

5.1. Classification Accuracy

For the 100 pre-trained CNN models that span 16 families, we select, for each family, the best-performing variant based on mean classification accuracy across all the three data splits (70–30, 80–20, and 90–10). The results are shown in Figure 5 where modern architectures such as ConvNeXt, ResNetV2, and RegNetY show superior performance by achieving mean accuracies above 91%, with ConvNeXt peaking at 92.43% on the 90–10 split. Other families, including DenseNet and ConvNeXtV2, maintain competitive accuracies in the high 80s, while mid-tier performance is observed with models like ResNet, DPN, and Inception-Next, which stabilize at around 86–87%. On the lower end, older or less optimized families—such as CS3DarkNet, ShuffleNet, VGG, and even traditionally popular models like MobileNetV2, Inception, and GoogLeNet—generally achieve accuracies in the low to mid-80s, and models like Wide-ResNet, Xception, ResNeXt, and EfficientNetV2 achieve around 75–79%.

While increasing the training data consistently boosts performance across families, the relative ranking among those families remains largely stable. This highlights that while data is important, architectural design remains a crucial factor. These results clearly show that state-of-the-art CNN architectures that incorporate contemporary innovations in feature extraction are best suited as backbone feature extractors for the task of image-based dust detection on solar panels. Our results are particularly significant because many previous approaches utilize the older architectures as shown in Table 1, while these models perform adequately with sufficient data, they ultimately struggle to achieve significant performance gains.

5.2. Training Time

Figure 6 illustrates the training time performance of various model families when their image encodings are used to train a linear SVM. In this case, for each of the 16 families, the variant that achieves the lowest training time is selected. Similar to the classification accuracy, the training time is also measured for three distinct data splits (70–30, 80–20, and 90–10). These splits are denoted as “split-70”, “split-80”, and “split-90”, respectively. The values shown in the figure (e.g., 0.45, 0.54, 0.69 for one family) represent the mean training time taken by the linear SVM when trained on feature vectors derived from image encodings produced by that specific model family. Notably, the figure emphasizes the computational efficiency of certain families, such as inception_next, cs3darknet, and mobilenet_v3. These models consistently achieve lower training times even as the amount of training data increases, demonstrating their feasibility for applications where rapid model deployment or frequent retraining is essential.

In addition to the raw training time metrics, we employ a stratified shuffle-split method to partition the dataset, ensuring a consistent class distribution (e.g., clean versus dirty solar panels) across both the training and test sets. Doing so mitigates the potential bias that may occur due to class imbalance, providing a fair basis for comparing the computational efficiency of each model family. Figure 6 shows that while training time scales with data size, the rate of growth varies significantly across model families. Some architectures exhibit only a modest rise, whereas others show a much steeper increase. On the one hand, this analysis highlights the importance of efficient feature extraction models for reducing computational load; on the other hand, it identifies CNN model families that provide an optimal balance between classification performance and training time. These combined insights enable future researchers to select the CNN backbone for their custom architectures to perform image-based recognition of dusty solar panels in a more accurate and computationally efficient manner.

5.3. Overall Verdict

The extensive evaluation of 100 pre-trained CNN models across 16 families demonstrates the trade-offs between classification accuracy and training time. Models from modern families such as ConvNeXt and ResNetV2 not only consistently deliver top-tier accuracy (exceeding 91% on average across various training splits) but also demonstrate efficient learning dynamics that minimize training time. In contrast, while models such as DenseNet, ConvNeXtV2, and ResNet maintain competitive accuracy in the high 80s, their training times are moderately higher, making them better candidates for scenarios where slight increases in computational cost can be justified by incremental gains in precision. In addition to these findings, certain lightweight models like MobileNetV2, Inception, and GoogLeNet, despite achieving slightly lower accuracy in the low to mid-80s, offer significant advantages in training and inference speed, making them appealing for real-time applications or environments with limited computational resources.

5.4. Comparison of the Best-Performing Variants

The best-performing CNN variants that consistently achieve a classification accuracy of above 90% throughout all the three splits are resnetv2_152 and convnext_xxlarge. A further and detailed comparison of these two CNN models was carried out to determine a clear winner on the current dataset for image-based dust detection on solar panels. Since both models achieve the best result on the 90–10 data split, here, we also split the dataset with a 90–10 stratified strategy. The images are encoded with each of the pre-trained models while global average pooling is applied on their output feature maps to bring encodings into vectorized form. These encodings are then used to train and test a linear SVM to show the classification accuracies achieved by each of the model. The elaborate comparison of the best CNN models is explained in the following.

Accuracy vs. Floating Point Operations (FLOPs): Figure 7a shows the accuracy vs. FLOP comparison of the two models, where resnetv2_152 clearly outperforms convnext_xxlarge with fewer FLOPs and better classification accuracy.
Accuracy vs. Parameters count: Figure 7b shows the same behavior for the accuracy vs. parameter countm thus determining that resnetv2_152 is superior to convnext_xxlarge.
Confusion Matrices: Since the main aim is to detect dusty/dirty solar panels, convnext_xxlarge performs slightly better than resnetv2_152 at detecting dirty solar panels as shown in Figure 7c, as the former accurately detects dirty solar panels better than the latter.
Manifold analysis using t-SNE and UMAP: We also performed a manifold analysis for both models using t-SNE [41] and uniform manifold approximation and projection for dimension reduction (UMAP) [42] to show the effectiveness of the CNN model to discriminate among the features of both the classes. The results are shown in Figure 7d for t-SNE and Figure 7e for UMAP, where the resnetv2_152 can clearly be observed to cluster features of the same class better than convnext_xxlarge, hence providing strong evidence for its superior performance.
Performance across all the three splits: To show the collective performance of both the best-performing models across all three splits, we performed the experiments with each of the three splits. We show the collective behavior of both CNN models in Figure 8a. The results highlight the stability and effectiveness of both models, with convnext_xxlarge displaying a slightly higher median and lower variance.
Other performance metrics: The precision–recall (PR) curves for the two models under the 90–10 split are shown in Figure 8b. Both models exhibit strong precision–recall performance, with resnetv2_152 maintaining higher precision at elevated recall levels. The receiver operating characteristic (ROC) curves under the same split are shown in Figure 8c, demonstrating excellent true positive rates for both models across all false positive rates, with convnext_xxlarge showing a marginal edge in ROC performance.

In conclusion, while both resnetv2_152 and convnext_xxlarge outperform the other 98 CNN models, a detailed comparison shows that resnetv2_152 clearly outperforms convnext_xxlarge both with respect to classification accuracy and efficiency, and hence, is highly recommended for the task of image-based dust detection on solar panels.

5.5. Qualitative Analysis Using Explainable-AI

We also performed a qualitative analysis to determine the image regions that represent the dusty solar panels. To this end, we use the explainable AI (XAI)-based local interpretable model-agnostic explanations (LIME) [43] to visualize the image regions that affect the decision of an SVM to classify the image as that of a dusty solar panel. The integration of explainable AI (XAI) frameworks is increasingly vital from the perspective of industrial applications to bridge the gap between “black-box” model performance and human-interpretable logic [44]. A qualitative check on the decision taken by the SVM using the image encodings of a pre-trained CNN validates their capacity to accurately identify dust accumulation on the surface of solar panels. Nonetheless, our aim is not segmentation of the image regions that correspond to dusty solar panels, but to assess the qualitative performance of the underlying image encodings. Accordingly, our intention is not to benchmark XAI methods against human judgment, but to use them as a qualitative visual tool to demonstrate that the learned image encodings consistently highlight image regions corresponding to dust on solar panels. For these visualizations, the image encoding is performed with convnext_xxlarge and then the 90–10 split is used to train and test the SVM. We then select the test images where the label prediction is done with the highest score by the trained SVM and use these images to perform the LIME visualization. Since, we explicitly deal with image-based dust detection on solar panels, Figure 9 only depicts the results of LIME visualization on dusty solar panels, where the decisive image patches are enclosed in thick boundaries. The top two rows of Figure 9, i.e., (a) and (b), clearly show that the dusty patches have influenced the decision of the SVM to decide the labels for these images. Some of the areas of the solar panels are clean and it can clearly be observed that those image regions are not detected by the LIME visualization, which means that these patches have not or least contributed to decide the label for these solar panels. Similarly, the bottom two rows of Figure 9, i.e., (c) and (d), show the effect of the context where some of the context is captured in the case of (c) while none is captured in case of (d) as that type of context, i.e., blue and clear sky, is mostly captured with clean solar panels. In addition to LIME, Figure 10 shows the qualitative analysis performed using randomized input sampling for explanation of black-box models (RISE) [45], demonstrating the effectiveness of the proposed method to detect the image patches related to dusty parts of the solar panels.

5.6. Analyzing Misclassified Images of Dirty Solar Panels

A detailed visual inspection of the misclassified test images was performed to further investigate the causes of failure cases, particularly for dirty solar panels. The sole purpose of this inspection was to reveal the causes of misclassification, whether they are related to imaging conditions such as non-uniform illumination, varied orientation, background clutter, or due to the type of dirt or the solar panel itself. We are interested in finding dirty solar panels that should be cleaned to avoid power losses. If a clean panel is incorrectly classified as dirty, it may simply lead to an unnecessary cleaning, which, in this context, is still acceptable.

To this end, we encode the images using the best-performing pre-trained CNN model, which is resnetv2_152. This encoded image dataset is then randomly divided into training and test sets using an 80–20 strategy in a stratified manner to account for the imbalance that exists in the dataset, as there are more images of dirty solar panels than the cleaned ones. With the training dataset, a linear SVM is trained, which is then tested with the test set. The images of the dirty solar panels that are wrongly classified as the clean panels are shown in Figure 11. We are particularly interested in such examples to identify the conditions that lead to such wrong classifications.

The first misclassified example depicted in Figure 11a clearly shows that a larger area of the solar panel is clean than dirty, and hence it is classified as clean rather than dirty. Figure 11b contains many dirty solar panels. However, it may have been misclassified due to the bright and blue sky context, as the clean solar panels are often imaged with such a background, particularly in the images of advertisements. Figure 11c is most likely misclassified due to the image being taken in the evening and under dark conditions. Figure 11d,e are imaged with severe background clutter, where the solar panels cover far fewer image pixels than the background. Finally, Figure 11f,g contain clean solar panels that have been wrongly labeled in the image dataset as dirty solar panels. Hence, it is concluded from the analysis of the wrongly classified images of dirty solar panels that the misclassifications are mostly caused by the imaging conditions rather than the type of dirt present on the solar panels.

5.7. Testing on Unseen Real-World Dirty Solar Panels Image Dataset

In order to evaluate on a real-world unseen test dataset, we collect 100 images of solar panels installed on rooftops at various places in Pakistan using a drone camera. These solar panels are mostly covered with dust and in some places with bird droppings. We encoded our main dataset, i.e., the one discussed in Section 3 with resnetv2_152 and then trained a linear SVM on these encodings. All 100 images of the unseen test dataset were then encoded with resnetv2_152 and the trained SVM model was then used to predict their labels. On this dataset, we achieve a classification accuracy of 96%, with just four images are wrongly predicted as clean solar panels. All the wrong predictions and some of the correct predictions are shown in Figure 12.

6. Conclusions and Future Work

In the face of backbone architecture selection problem for deep learning-powered image-based dust detection on solar panels, we perform a comprehensive and systematic evaluation of 100 pre-trained CNN models, drawn from 16 diverse architecture families. Using a standardized pipeline comprising feature extraction via pre-trained models, vectorizing the output features map via global average pooling, and classification using a linear SVM, we assessed the models in terms of both classification accuracy and training time-based efficiency. The evaluation is conducted under three stratified data splits, 70–30, 80–20, and 90–10, to ensure robustness and generalizability across varying amounts of training data.

Our findings reveal key insights that are essential for selecting the best backbone model for deep learning-based dust detection systems for solar panels. Certain model families, such as ResNetV2, ConvNext, and RegNet stood out in terms of achieving high classification accuracy, making them highly suitable for applications where detection precision is crucial. Conversely, families like Inception_Next, CS3DarkNet, and MobileNetV3 exhibited superior training efficiency, requiring significantly less time to train the linear SVM on the encoded features. These results are particularly valuable for edge-computing scenarios or real-time applications where speed and resource constraints are critical.

Overall, our aim is to bridge a crucial gap in the literature by performing a large-scale, comparative benchmarking of CNN models to be used as backbones/feature extractors for image-based solar panel dust detection. By highlighting both the strengths and trade-offs of various architectures, we provide practical guidance to researchers and practitioners in selecting the most appropriate model based on the operational needs—whether that be maximizing accuracy, minimizing computation, or attaining a balance between the two.

In future work, we plan to extend both the datasets and the current pipeline for a multi-class classification problem to identify each of the dirt types using a novel CNN architecture based on resnetv2_152 as it achieved the best results both with respect to classification accuracy and efficiency. The future work may also explore the integration of lightweight transformers, ensemble strategies, or model pruning techniques to further improve real-time applicability while maintaining high accuracy.

Beyond the specific case of solar panels, the methodology presented here offers a versatile framework for monitoring various types of surface conditions. For instance, the proposed methodology could be adapted to target the cleaning of building facades, where identifying specific areas of pollution or staining can optimize maintenance resources. By providing a transparent image-based classification alongside qualitative analysis, our proposed pipeline ensures that such automated inspection systems remain reliable in diverse urban environments.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The image dataset used in this paper is publicly available and is mentioned in a dedicated section along with reference.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Data-Split 70–30 Results.

Model	Accuracy	Precision	Recall	F1 Score	Time (s)
convnext_xxlarge	91.43	91.51	91.43	91.43	1.38
resnetv2_152	91.37	91.44	91.37	91.38	2.16
regnet_y_32gf	89.84	89.95	89.84	89.86	2.30
convnext_xlarge	89.75	89.86	89.75	89.77	0.81
densenet161	89.53	89.57	89.53	89.52	2.32
densenet201	87.98	87.99	87.98	87.95	1.93
densenet169	87.88	87.93	87.88	87.88	1.64
dpn131	87.13	87.21	87.13	87.12	1.42
mnasnet1_0	87.10	87.12	87.10	87.06	1.26
convnextv2_tiny	86.70	86.84	86.70	86.72	0.31
convnext_base	86.60	86.65	86.60	86.59	0.80
inception_next_base	86.39	86.52	86.39	86.41	0.45
dpn98	85.89	85.92	85.89	85.88	1.79
convnext_small	85.83	85.92	85.83	85.84	0.60
inception_next_small	85.83	85.92	85.83	85.82	0.32
dpn107	85.55	85.65	85.55	85.55	1.29
convnextv2_nano	85.52	85.66	85.52	85.52	0.24
inception_next_tiny	85.51	85.66	85.51	85.50	0.35
densenet121	85.42	85.52	85.42	85.43	0.93
mobilenet_v3_large	85.39	85.48	85.39	85.38	0.88
resnet152	85.27	85.48	85.27	85.26	1.20
efficientnet_b0	85.26	85.41	85.26	85.27	1.27
convnext_tiny	84.98	85.07	84.98	85.01	0.66
convnext_large	84.92	84.97	84.92	84.91	1.33
efficientnet_b2	84.92	85.00	84.92	84.94	1.48
shufflenet_v2_x2_0	84.77	84.81	84.77	84.75	1.69
regnet_x_1_6gf	84.42	84.52	84.42	84.39	0.90
regnet_x_800mf	84.33	84.38	84.33	84.32	0.64
cs3darknet_x	84.24	84.43	84.24	84.25	0.50
efficientnet_b1	84.18	84.28	84.18	84.20	1.38
mnasnet0_5	84.05	84.04	84.05	83.98	1.69
mnasnet0_75	84.05	84.18	84.05	84.06	1.58
mobilenet_v3_small	84.05	84.24	84.05	84.09	0.58
regnet_x_3_2gf	84.02	84.03	84.02	84.00	0.92
shufflenet_v2_x0_5	83.99	84.07	83.99	83.95	0.47
shufflenet_v2_x1_0	83.99	84.05	83.99	83.97	0.44
efficientnet_b3	83.93	84.16	83.93	83.93	1.55
efficientnet_v2_s	83.93	83.99	83.93	83.91	1.27
dpn68	83.68	83.69	83.68	83.65	0.33
dpn92	83.68	83.92	83.68	83.68	2.20
mnasnet1_3	83.61	83.65	83.61	83.58	1.63
mobilenetv4_conv_small	83.52	83.56	83.52	83.51	0.52
vgg16_bn	83.52	83.58	83.52	83.47	3.51
vgg13	83.49	83.47	83.49	83.42	5.41
efficientnet_b4	83.18	83.34	83.18	83.21	1.88
alexnet	83.12	83.13	83.12	83.02	8.66
vgg11_bn	82.99	82.97	82.99	82.92	3.77
dpn68b	82.8	82.96	82.8	82.82	0.29
regnet_x_400mf	82.49	82.52	82.49	82.44	0.53
shufflenet_v2_x1_5	82.46	82.5	82.46	82.41	0.58
dpn68b	82.24	82.39	82.24	82.28	0.3
convnext_nano	82.21	82.39	82.21	82.26	0.35
vgg11	82.15	82.19	82.15	82.09	5.42
mobilenet_v2	81.87	81.87	81.87	81.85	1.28
vgg13_bn	81.84	81.86	81.84	81.72	4.08
cs3darknet_focus_m	81.71	81.66	81.71	81.61	0.26
resnet18	81.65	81.68	81.65	81.59	0.23
resnet34	81.56	81.81	81.56	81.57	0.25
efficientnet_b5	81.5	81.65	81.5	81.52	2.73
vgg16	81.43	81.35	81.43	81.26	5.21
efficientnet_b7	81.34	81.37	81.34	81.33	3
cs3darknet_focus_l	81.28	81.37	81.28	81.28	0.42
densenetblur121d	81.09	81.26	81.09	81.12	0.41
convnext_pico	80.87	81.08	80.87	80.93	0.21
regnet_y_16gf	80.56	80.64	80.56	80.49	5.9
regnet_y_400mf	80.06	80.14	80.06	80.07	0.27
regnet_y_8gf	80.06	80.15	80.06	80.01	1.62
resnetv2_50	80.06	80.22	80.06	80.07	1.33
convnextv2_pico	79.41	79.5	79.41	79.36	0.31
vgg19_bn	79.38	79.47	79.38	79.38	4.03
efficientnet_b6	79.06	79.1	79.06	79.06	2.82
regnet_y_800mf	78.75	78.82	78.75	78.74	0.39
resnet101	78.72	78.92	78.72	78.75	1.52
inception_v3	78.6	78.86	78.6	78.66	3.2
convnextv2_huge	78.47	78.53	78.47	78.4	2.82
regnet_y_1_6gf	78.38	78.37	78.38	78.3	1.07
resnetv2_101	78.32	78.5	78.32	78.35	1.19
vgg19	78.26	78.11	78.26	78.11	5.42
efficientnet_v2_m	77.97	78.14	77.97	78	1.76
googlenet	77.85	77.93	77.85	77.84	1.24
resnet50	77.6	77.66	77.6	77.6	1.65
regnet_y_3_2gf	77.01	77.02	77.01	76.97	1.03
wide_resnet50_2	77.01	77.16	77.01	77.04	1.68
wide_resnet101_2	76.67	76.76	76.67	76.63	1.77
convnextv2_base	76.6	76.77	76.6	76.6	0.41
xception65	76.54	76.55	76.54	76.5	1.19
resnext101_32x8d	76.14	76.13	76.14	76.08	1.79
regnet_x_32gf	75.98	76	75.98	75.91	4.55
convnextv2_large	75.95	75.81	75.95	75.84	0.88
regnet_x_16gf	75.48	75.46	75.48	75.45	3.18
resnext50_32x4d	75.33	75.45	75.33	75.33	1.6
convnextv2_femto	74.15	74.38	74.15	74.21	0.43
convnextv2_atto	74.05	74.11	74.05	73.97	0.75
inception_v4	73.83	74.12	73.83	73.88	1.12
regnet_x_8gf	73.58	73.55	73.58	73.53	2.75
mobilenetv4_conv_medium	73.18	73.92	73.18	73.35	0.75
efficientnet_v2_l	69.75	69.84	69.75	69.66	1.99
xception71	69.59	69.5	69.59	69.39	1.56
xception41	67.48	67.48	67.48	67.36	1.62
mobilenetv4_conv_large	66.14	66.06	66.14	66.04	1.2

Table A2. Data-Split 80–20 Results.

Model	Accuracy	Precision	Recall	F1 Score	Time (s)
convnext_xxlarge	91.96	92.04	91.96	91.98	1.84
resnetv2_152	91.96	92.08	91.96	91.98	2.68
regnet_y_32gf	89.86	89.92	89.86	89.86	3.85
convnext_xlarge	89.67	89.77	89.67	89.69	1.08
densenet161	89.2	89.37	89.2	89.23	2.94
densenet169	87.94	87.95	87.94	87.92	2.06
densenet201	87.94	88.01	87.94	87.96	2.42
dpn131	87.62	87.87	87.62	87.66	1.97
convnext_base	86.96	87.04	86.96	86.98	1.03
convnextv2_tiny	86.96	87.38	86.96	87.04	0.39
convnextv2_nano	86.92	87.1	86.92	86.96	0.32
mnasnet1_0	86.87	86.92	86.87	86.86	1.59
convnext_small	86.82	86.9	86.82	86.83	0.73
inception_next_base	86.64	86.83	86.64	86.68	0.54
mobilenet_v3_large	86.26	86.35	86.26	86.28	1.09
dpn107	86.12	86.25	86.12	86.14	1.75
resnet152	86.12	86.28	86.12	86.16	1.59
dpn98	86.07	86.23	86.07	86.1	2.37
inception_next_small	86.07	86.25	86.07	86.09	0.42
regnet_x_3_2gf	85.93	86.02	85.93	85.92	1.13
densenet121	85.56	85.72	85.56	85.59	1.11
efficientnet_b3	85.56	85.72	85.56	85.53	1.91
efficientnet_b2	85.47	85.64	85.47	85.51	1.8
convnext_tiny	85.23	85.48	85.23	85.31	0.83
mnasnet0_5	85.19	85.25	85.19	85.18	1.96
mnasnet0_75	85	85.15	85	85.02	1.71
efficientnet_b0	84.91	85.04	84.91	84.88	1.56
inception_next_tiny	84.91	85.01	84.91	84.94	0.46
convnext_large	84.81	85.12	84.81	84.89	1.68
dpn92	84.63	84.68	84.63	84.61	2.81
cs3darknet_x	84.49	84.77	84.49	84.56	0.71
efficientnet_b4	84.39	84.59	84.39	84.43	2.41
regnet_x_800mf	84.25	84.35	84.25	84.24	0.8
mnasnet1_3	84.11	84.16	84.11	84.08	1.91
mobilenetv4_conv_small	84.02	84.04	84.02	84	0.76
regnet_x_1_6gf	83.92	84.15	83.92	83.97	1.05
shufflenet_v2_x2_0	83.92	84.02	83.92	83.94	2.24
efficientnet_b1	83.83	83.91	83.83	83.85	1.61
mobilenet_v3_small	83.83	84.09	83.83	83.89	0.76
dpn68	83.78	84	83.78	83.84	0.39
efficientnet_v2_s	83.78	83.99	83.78	83.85	1.57
vgg16_bn	83.64	83.66	83.64	83.61	4.07
vgg11_bn	83.6	83.69	83.6	83.59	4.58
convnext_nano	83.55	84.22	83.55	83.68	0.41
shufflenet_v2_x1_0	83.55	83.68	83.55	83.57	0.56
alexnet	83.36	83.32	83.36	83.28	10.84
vgg13_bn	83.36	83.44	83.36	83.32	5.28
dpn68b	83.22	83.51	83.22	83.29	0.37
mobilenet_v2	83.08	83.13	83.08	83.04	1.6
regnet_x_400mf	82.9	82.93	82.9	82.88	0.67
shufflenet_v2_x0_5	82.85	82.89	82.85	82.83	0.58
dpn68b	82.8	82.99	82.8	82.85	0.38
vgg13	82.8	82.92	82.8	82.82	6.61
shufflenet_v2_x1_5	82.66	82.88	82.66	82.69	0.61
vgg11	82.48	82.57	82.48	82.44	6.6
cs3darknet_focus_l	82.24	82.5	82.24	82.28	0.56
efficientnet_b5	82.24	82.32	82.24	82.25	3.29
densenetblur121d	82.2	82.41	82.2	82.26	0.45
cs3darknet_focus_m	82.15	82.26	82.15	82.16	0.44
efficientnet_b7	81.96	82.26	81.96	82.02	3.8
regnet_y_16gf	81.87	81.95	81.87	81.84	7.3
resnet18	81.73	81.94	81.73	81.75	0.31
regnet_y_8gf	81.45	81.57	81.45	81.47	2.45
convnext_pico	81.35	81.49	81.35	81.37	0.27
resnet34	81.17	81.52	81.17	81.27	0.3
resnetv2_50	81.12	81.41	81.12	81.2	1.77
vgg16	81.07	81.12	81.07	81.04	6.37
efficientnet_b6	80.84	80.91	80.84	80.82	3.53
convnextv2_pico	80.75	80.91	80.75	80.79	0.4
regnet_y_800mf	80.75	80.92	80.75	80.72	0.45
inception_v3	80.14	80.3	80.14	80.15	4.03
resnetv2_101	79.67	79.88	79.67	79.72	1.66
vgg19_bn	79.44	79.66	79.44	79.48	5.09
regnet_y_400mf	78.97	79.2	78.97	79.04	0.31
convnextv2_huge	78.93	78.83	78.93	78.81	3.38
convnextv2_base	78.65	78.73	78.65	78.64	0.58
resnet101	78.6	78.79	78.6	78.65	1.98
resnet50	78.55	78.77	78.55	78.61	2.29
regnet_y_3_2gf	78.27	78.63	78.27	78.36	1.36
googlenet	77.99	78.13	77.99	77.99	1.46
vgg19	77.99	77.89	77.99	77.88	6.53
convnextv2_large	77.94	78.06	77.94	77.9	1.13
efficientnet_v2_m	77.9	78.18	77.9	77.91	2.1
regnet_y_1_6gf	77.38	77.5	77.38	77.37	1.22
wide_resnet50_2	76.87	77.13	76.87	76.92	2.18
resnext101_32x8d	76.54	76.81	76.54	76.59	2.39
regnet_x_16gf	76.12	76.21	76.12	76.1	4.08
xception65	75.84	76.13	75.84	75.89	1.46
resnext50_32x4d	75.7	76.05	75.7	75.82	2.07
convnextv2_femto	75.65	75.98	75.65	75.73	0.7
convnextv2_atto	75.56	75.53	75.56	75.5	1.13
wide_resnet101_2	75.33	75.52	75.33	75.32	2.39
mobilenetv4_conv_medium	74.86	75.25	74.86	74.99	1.02
regnet_x_32gf	74.63	74.47	74.63	74.51	5.8
inception_v4	74.25	74.75	74.25	74.4	1.43
regnet_x_8gf	73.36	73.66	73.36	73.46	3.2
efficientnet_v2_l	69.58	69.43	69.58	69.45	2.53
xception71	69.16	69.18	69.16	69.13	2.13
mobilenetv4_conv_large	68.55	68.44	68.55	68.46	1.64
xception41	68.13	68.11	68.13	68.01	2.21

Table A3. Data-Split 90–10 Results.

Model	Accuracy	Precision	Recall	F1 Score	Time (s)
convnext_xxlarge	92.43	92.49	92.43	92.43	2.11
resnetv2_152	91.78	91.92	91.78	91.8	3.21
regnet_y_32gf	90.56	90.64	90.56	90.57	5.79
convnext_xlarge	90.28	90.5	90.28	90.3	1.3
densenet161	89.81	90.02	89.81	89.82	3.51
densenet169	88.6	88.73	88.6	88.59	2.54
densenet201	88.32	88.57	88.32	88.34	2.93
convnextv2_tiny	87.66	88.04	87.66	87.67	0.47
dpn131	87.66	87.78	87.66	87.65	2.19
resnet152	87.48	87.65	87.48	87.5	2.01
inception_next_tiny	87.29	87.54	87.29	87.31	0.58
convnextv2_nano	87.2	87.38	87.2	87.18	0.39
inception_next_small	86.64	86.65	86.64	86.59	0.54
convnext_base	86.54	86.6	86.54	86.5	1.21
dpn98	86.54	86.76	86.54	86.54	3.11
inception_next_base	86.54	86.79	86.54	86.51	0.69
dpn107	86.26	86.36	86.26	86.22	2.35
mnasnet0_75	86.26	86.4	86.26	86.17	1.97
convnext_large	86.17	86.3	86.17	86.16	2.08
mnasnet1_0	86.17	86.32	86.17	86.12	2.02
regnet_x_3_2gf	86.17	86.41	86.17	86.15	1.36
efficientnet_b2	86.07	86.18	86.07	86.08	2.26
convnext_small	85.89	86.02	85.89	85.88	0.88
efficientnet_b0	85.89	86.1	85.89	85.89	1.91
efficientnet_b1	85.89	86	85.89	85.84	2.02
dpn68	85.7	86.06	85.7	85.74	0.61
alexnet	85.61	85.73	85.61	85.55	12.25
regnet_x_800mf	85.42	85.6	85.42	85.38	0.96
efficientnet_b3	85.33	85.52	85.33	85.31	2.36
efficientnet_b4	85.24	85.42	85.24	85.23	2.96
convnext_tiny	85.23	85.47	85.23	85.17	0.97
cs3darknet_x	85.05	85.16	85.05	85.06	0.87
efficientnet_v2_s	85.05	85.34	85.05	85.08	1.96
regnet_x_1_6gf	84.95	85.2	84.95	84.96	1.29
mobilenet_v3_large	84.77	84.9	84.77	84.7	1.32
vgg16_bn	84.67	84.72	84.67	84.62	4.59
shufflenet_v2_x1_0	84.58	84.66	84.58	84.55	0.67
shufflenet_v2_x2_0	84.58	84.75	84.58	84.51	2.66
mnasnet1_3	84.39	84.37	84.39	84.35	2.11
dpn92	84.3	84.42	84.3	84.24	3.06
mobilenetv4_conv_small	84.3	84.48	84.3	84.3	1.01
densenet121	84.11	84.12	84.11	84.03	1.36
mobilenet_v3_small	84.02	84.27	84.02	84.04	0.92
mnasnet0_5	83.83	83.82	83.83	83.73	2.06
efficientnet_b5	83.74	83.95	83.74	83.75	3.83
efficientnet_b7	83.74	83.96	83.74	83.73	4.61
shufflenet_v2_x0_5	83.55	83.94	83.55	83.61	0.75
regnet_x_400mf	83.18	83.37	83.18	83.16	0.82
dpn68b	82.99	83.26	82.99	82.96	0.45
vgg11_bn	82.99	83.06	82.99	82.83	5.41
vgg13_bn	82.99	83.04	82.99	82.83	6.23
mobilenet_v2	82.9	82.95	82.9	82.78	1.97
vgg11	82.9	82.87	82.9	82.77	7.64
regnet_y_16gf	82.43	82.59	82.43	82.35	8.9
shufflenet_v2_x1_5	82.34	82.46	82.34	82.29	0.75
convnextv2_pico	82.24	82.49	82.24	82.26	0.53
densenetblur121d	82.24	82.58	82.24	82.21	0.56
resnet34	82.24	82.7	82.24	82.25	0.37
resnetv2_50	82.24	82.82	82.24	82.25	2.17
regnet_y_8gf	82.15	82.21	82.15	82.08	3.34
resnet18	82.06	82.29	82.06	82.07	0.37
cs3darknet_focus_l	81.78	82	81.78	81.72	0.56
convnext_pico	81.68	81.76	81.68	81.52	0.37
cs3darknet_focus_m	81.59	81.87	81.59	81.43	0.41
inception_v3	81.5	81.6	81.5	81.44	4.64
regnet_y_800mf	81.4	81.95	81.4	81.36	0.52
vgg13	81.21	81.3	81.21	81.18	7.81
convnext_nano	80.93	81.39	80.93	81.04	0.49
dpn68b	80.93	81.14	80.93	80.96	0.44
vgg16	80.75	80.98	80.75	80.56	7.65
resnet50	80.28	80.54	80.28	80.28	2.92
regnet_y_400mf	80.09	80.39	80.09	80.13	0.42
convnextv2_huge	79.63	79.69	79.63	79.58	4.08
efficientnet_b6	79.53	79.69	79.53	79.55	4.26
vgg19_bn	79.35	79.46	79.35	79.16	6.19
googlenet	79.25	79.44	79.25	79.24	1.8
resnetv2_101	79.25	79.67	79.25	79.22	2.03
efficientnet_v2_m	78.6	78.6	78.6	78.49	2.36
convnextv2_base	78.5	78.88	78.5	78.47	0.73
convnextv2_large	77.85	77.95	77.85	77.65	1.53
regnet_y_3_2gf	77.29	77.32	77.29	77.25	1.61
vgg19	77.2	77.19	77.2	76.91	7.83
resnet101	77.1	77.47	77.1	77.11	2.38
wide_resnet101_2	77.01	76.93	77.01	76.75	3.06
regnet_y_1_6gf	76.73	77.06	76.73	76.7	1.44
xception65	76.63	76.62	76.63	76.45	1.83
resnext101_32x8d	76.45	76.58	76.45	76.36	3.02
resnext50_32x4d	76.45	76.59	76.45	76.42	2.43
regnet_x_32gf	76.36	76.18	76.36	76.07	7.22
convnextv2_atto	75.89	75.84	75.89	75.77	1.67
wide_resnet50_2	75.8	75.9	75.8	75.75	2.68
regnet_x_16gf	75.42	75.34	75.42	75.34	5.01
convnextv2_femto	75.14	75.35	75.14	75.16	1.07
regnet_x_8gf	74.95	75.07	74.95	74.84	3.71
inception_v4	74.39	74.44	74.39	74.37	1.87
mobilenetv4_conv_medium	74.39	74.57	74.39	74.39	1.38
mobilenetv4_conv_large	69.06	68.65	69.06	68.68	2.26
efficientnet_v2_l	68.88	69.07	68.88	68.62	3.18
xception71	67.85	67.79	67.85	67.7	2.86
xception41	65.98	65.54	65.98	65.55	2.89

References

Toennies, K.D. An Introduction to Image Classification: From Designed Models to End-to-End Learning; Springer: Berlin/Heidelberg, Germany, 2024. [Google Scholar]
Liu, G.H.; Yang, J.Y. Content-based image retrieval using color difference histogram. Pattern Recognit. 2013, 46, 188–198. [Google Scholar] [CrossRef]
Castleman, K.R. Digital image processing; Prentice Hall Press: Upper Saddle River, NJ, USA, 1996. [Google Scholar]
Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–25 September 1999; Volume 2, pp. 1150–1157. [Google Scholar]
Guo, Z.; Zhang, L.; Zhang, D. A completed modeling of local binary pattern operator for texture classification. IEEE Trans. Image Process. 2010, 19, 1657–1663. [Google Scholar] [CrossRef]
Anwar, A.; Anwar, H.; Anwar, S. Towards Low-Cost Classification for Novel Fine-Grained Datasets. Electronics 2022, 11, 2701. [Google Scholar] [CrossRef]
Torralba, A.; Isola, P.; Freeman, W.T. Foundations of Computer Vision; MIT Press: Cambridge, MA, USA, 2024. [Google Scholar]
Anwar, H.; Anwar, S.; Zambanini, S.; Porikli, F. Deep ancient Roman Republican coin classification via feature fusion and attention. Pattern Recognit. 2021, 114, 107871. [Google Scholar] [CrossRef]
Imran, M.; Anwar, H.; Tufail, M.; Khan, A.; Khan, M.; Ramli, D.A. Image-Based Automatic Energy Meter Reading Using Deep Learning. Comput. Mater. Contin. 2023, 75, 203–216. [Google Scholar] [CrossRef]
Yi, J.; Zhang, G.; Yu, H.; Yan, H. Advantages, challenges and molecular design of different material types used in organic solar cells. Nat. Rev. Mater. 2024, 9, 46–62. [Google Scholar] [CrossRef]
Pourasl, H.H.; Barenji, R.V.; Khojastehnezhad, V.M. Solar energy status in the world: A comprehensive review. Energy Rep. 2023, 10, 3474–3493. [Google Scholar] [CrossRef]
Ding, R.; Cao, Z.; Teng, J.; Cao, Y.; Qian, X.; Yue, W.; Yuan, X.; Deng, K.; Wu, Z.; Li, S.; et al. Self-Powered Autonomous Electrostatic Dust Removal for Solar Panels by an Electret Generator. Adv. Sci. 2024, 11, 2401689. [Google Scholar] [CrossRef] [PubMed]
Mondal, A.K.; Bansal, K. A brief history and future aspects in automatic cleaning systems for solar photovoltaic panels. Adv. Robot. 2015, 29, 515–524. [Google Scholar] [CrossRef]
Alatwi, A.M.; Albalawi, H.; Wadood, A.; Anwar, H.; El-Hageen, H.M. Deep Learning-Based Dust Detection on Solar Panels: A Low-Cost Sustainable Solution for Increased Solar Power Generation. Sustainability 2024, 16, 8664. [Google Scholar] [CrossRef]
Rathod, A.P.S.; Singh, R.K.; Kumar, S. Effect of Dust Accumulation on Efficiency of Solar Panels in Clement Town Region (Dehradun) India: An Empirical Study. NanoWorld J. 2023, 9, S1–S6. [Google Scholar] [CrossRef]
Shariah, A.; Al-Ibrahim, E. Impact of Dust and Shade on Solar Panel Efficiency and Development of a Simple Method for Measuring the Impact of Dust in any Location. J. Sustain. Dev. Energy Water Environ. Syst. 2023, 11, 1110448. [Google Scholar]
Rashid, M.; Yousif, M.; Rashid, Z.; Muhammad, A.; Altaf, M.; Mustafa, A. Effect of dust accumulation on the performance of photovoltaic modules for different climate regions. Heliyon 2023, 9, e23069. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Hu, W.; Wen, Y.; Zhang, F.; Li, X. Dust deposition characteristics on photovoltaic arrays investigated through wind tunnel experiments. Sci. Rep. 2025, 15, 1582. [Google Scholar] [CrossRef]
Yakubu, S.; Samikannu, R.; Gawusu, S.; Wetajega, S.D.; Okai, V.; Shaibu, A.K.S.; Workneh, G.A. A holistic review of the effects of dust buildup on solar photovoltaic panel efficiency. Sol. Compass 2025, 13, 100101. [Google Scholar] [CrossRef]
Kawamoto, H. Electrodynamic dust removal technologies for solar panels: A comprehensive review. J. Electrost. 2025, 134, 104045. [Google Scholar] [CrossRef]
Dantas, G.M.; Mendes, O.L.C.; Maia, S.M.; de Alexandria, A.R. Dust detection in solar panel using image processing techniques: A review. Res. Soc. Dev. 2020, 9, e321985107. [Google Scholar] [CrossRef]
Abdelsattar, M.; Rasslan, A.A.A.; Emad-Eldeen, A. Detecting dusty and clean photovoltaic surfaces using MobileNet variants for image classification. SVU-Int. J. Eng. Sci. Appl. 2025, 6, 9–18. [Google Scholar]
Onim, M.S.H.; Sakif, Z.M.M.; Ahnaf, A.; Kabir, A.; Azad, A.K.; Oo, A.M.T.; Afreen, R.; Hridy, S.T.; Hossain, M.; Jabid, T.; et al. SolNet: A convolutional neural network for detecting dust on solar panels. Energies 2022, 16, 155. [Google Scholar] [CrossRef]
Naeem, U.; Chadda, K.; Vahaji, S.; Ahmad, J.; Li, X.; Asadi, E. Aerial Imaging-Based Soiling Detection System for Solar Photovoltaic Panel Cleanliness Inspection. Sensors 2025, 25, 738. [Google Scholar] [CrossRef]
Cruz-Rojas, T.; Franco, J.A.; Hernandez-Escobedo, Q.; Ruiz-Robles, D.; Juarez-Lopez, J.M. A novel comparison of image semantic segmentation techniques for detecting dust in photovoltaic panels using machine learning and deep learning. Renew. Energy 2023, 217, 119126. [Google Scholar] [CrossRef]
Gao, X.; Wang, T.; Liu, M.; Lian, J.; Yao, Y.; Yu, L.; Li, Y.; Cui, Y.; Xue, R. A framework to identify guano on photovoltaic modules in offshore floating photovoltaic power plants. Sol. Energy 2024, 274, 112598. [Google Scholar] [CrossRef]
Oulefki, A.; Trongtirakul, T.; Agaian, S.; Benbelkacem, S.; Zenati, N. Multi-view VR imaging for enhanced analysis of dust accumulation on solar panels. Sol. Energy 2024, 279, 112708. [Google Scholar] [CrossRef]
Bassil, J.; Noura, H.; Salman, O.; Chahine, K.; Guizani, M. Deep learning image classification models for solar panels dust detection. In Proceedings of the 2024 International Wireless Communications and Mobile Computing (IWCMC), Ayia Napa, Cyprus, 3–7 June 2024; pp. 1516–1521. [Google Scholar]
Mohammed, H.M.; Alawi, A.E.B. CASolarNet: Channel Attention EfficientNet-based Model for Solar Panel Dust Detection. In Proceedings of the 2024 4th International Conference on Emerging Smart Technologies and Applications (eSmarTA), Sana’a, Yemen, 28–30 October 2024; pp. 1–4. [Google Scholar]
Sefer, T.; Kaya, M. Detection of Dust on Solar Panels with Deep Learning. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilim. Derg. 2024, 27, 1451–1464. [Google Scholar] [CrossRef]
Cipriani, G.; D’Amico, A.; Guarino, S.; Manno, D.; Traverso, M.; Di Dio, V. Convolutional neural network for dust and hotspot classification in PV modules. Energies 2020, 13, 6357. [Google Scholar] [CrossRef]
Chen, L.; Fan, S.; Sun, S.; Cao, S.; Sun, T.; Liu, P.; Gao, H.; Zhang, Y.; Ding, W. A detection model for dust deposition on photovoltaic (PV) panels based on light transmittance estimation. Energy 2025, 322, 135284. [Google Scholar] [CrossRef]
Shao, Y.; Zhang, C.; Xing, L.; Sun, H.; Zhao, Q.; Zhang, L. A new dust detection method for photovoltaic panel surface based on Pytorch and its economic benefit analysis. Energy AI 2024, 16, 100349. [Google Scholar] [CrossRef]
Noura, H.N.; Chahine, K.; Bassil, J.; Abou Chaaya, J.; Salman, O. Efficient combination of deep learning models for solar panel damage and soiling detection. Measurement 2025, 251, 117185. [Google Scholar] [CrossRef]
Ahmed, S.; Rashid, H.; Qadir, Z.; Tayyab, Q.; Senjyu, T.; Elkholy, M. Deep Learning-Based Recognition and Classification of Soiled Photovoltaic Modules Using HALCON Software for Solar Cleaning Robots. Sensors 2025, 25, 1295. [Google Scholar] [CrossRef] [PubMed]
Prova, N.N.I. Improved Solar Panel Efficiency through Dust Detection Using the InceptionV3 Transfer Learning Model. In Proceedings of the 2024 8th International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC), Dharan, Nepal, 5–7 September 2024; pp. 260–268. [Google Scholar]
Tan, Y.; Liao, K.; Bai, X.; Deng, C.; Zhao, Z.; Zhao, B. Denoising Convolutional Neural Networks Based Dust Accumulation Status Evaluation of Photovoltaic Panel. In Proceedings of the 2019 IEEE International Conference on Energy Internet (ICEI), Nanjing, China, 27–31 May 2019; pp. 560–566. [Google Scholar]
Zhang, W.; Archana, V.; Gandhi, O.; Rodríguez-Gallegos, C.D.; Quan, H.; Yang, D.; Tan, C.W.; Chung, C.; Srinivasan, D. SoilingEdge: PV soiling power loss estimation at the edge using surveillance cameras. IEEE Trans. Sustain. Energy 2023, 15, 556–566. [Google Scholar] [CrossRef]
Sai, H. Solar Panel Dust Detection. 2023. Available online: https://www.kaggle.com/datasets/hemanthsai7/solar-panel-dust-detection (accessed on 13 June 2024).
Zambanini, S.; Kampel, M. Coarse-to-fine correspondence search for classifying ancient coins. In Proceedings of the ACCV, Daejeon, Republic of Korea, 5–9 November 2012; pp. 25–36. [Google Scholar]
Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Healy, J.; McInnes, L. Uniform manifold approximation and projection. Nat. Rev. Methods Prim. 2024, 4, 82. [Google Scholar] [CrossRef]
Buckley, S.J.; Ringdal, K.; Naumann, N.; Dolva, B.; Kurz, T.H.; Howell, J.A.; Dewez, T.J. LIME: Software for 3-D visualization, interpretation, and communication of virtual geoscience models. Geosphere 2019, 15, 222–235. [Google Scholar] [CrossRef]
Adadi, A.; Berrada, M. Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access 2018, 6, 52138–52160. [Google Scholar] [CrossRef]
Petsiuk, V.; Das, A.; Saenko, K. RISE: Randomized Input Sampling for Explanation of Black-box Models. In Proceedings of the British Machine Vision Conference (BMVC), Newcastle, UK, 3–6 September 2018. [Google Scholar]

Figure 1. Global renewable-based power generation (a) The share of renewable energy capacity with respect to various technologies region-wise (b) The annual addition of new renewable energy-based power generation installations from 2010 to 2023 where a sharp increase can be noticed for solar PV from 2019 until 2023 (c) The yearly global percentage price drop of solar panels belonging to various types in the last five years https://www.irena.org (accessed on 29 November 2025).

Figure 2. The challenges in the image dataset. The first row shows an interesting challenge where the cleaning process is depicted such that some of the solar panels are cleaned while others are dirty. These images are placed into the dirty class. The second row shows the variations found in the dirt itself. For instance, dust, bird droppings and fungus, etc., are the commonly found dirt on solar panels. The third row yet again shows an interesting aspect of the clean solar panels as they are imaged with clear sky and a green environment that induce a strong bias in the images. If such an environment is also found in dirty solar panels, this may lead to misclassifications. The final row shows the variations in panels orientations. Overall, despite being fewer in number, the images in this dataset are very challenging due to these factors. Both the dataset and code can be accessed on https://github.com/hafeez-anwar/SolClean-Code (accessed on 2 January 2026).

Figure 3. The complete methodology. The solar panel images belong to two classes namely clean and dirty. These images are then encoded using a pre-trained CNN model that generates a dense multi-dimensional feature map. These feature maps are then converted to 1D vectors by applying GlobalAveragePooling. Once encoded, the images are then used to train and test a linear SVM using a StratifiedShuffleSplit strategy that takes the class distribution into account thus accounting for class imbalance. The evaluate splitting percentages are 70–30, 80–20, and 90–10, where for each of the split percentage, the experiments are run 10 times due to the random split, and the average results are reported.

Figure 4. Specifications of all the evaluated CNN models such as the encoded vector size, floating point operations (FLOPs), number of parameters, and the time in seconds they take to process a single image also called the inference time.

Figure 5. Classification accuracies achieved across all the three splitting strategies by the best variant of the family.

Figure 6. Training Time taken across all the three splitting strategies by the best variant of the family.

Figure 7. Detailed comparison of the bests.

Figure 8. Comparative performance analysis of the top two pre-trained models—resnetv2_152 and convnext_xxlarge—on image-based dust detection in solar panels using linear SVM classification in terms of results stability, PR, and ROC curves.

Figure 9. (a–d) LIME visualization on images that are encoded with convnext_xxlarge.

Figure 10. (a–d) RISE visualization on images that are encoded with resnetv2_152l.

Figure 11. (a–g) Images of dirty solar panels wrongly classified as clean.

Figure 12. Correctly and wrongly classified dirty solar panels in the unseen test dataset.

Table 1. Summary of Research Papers on Dust Detection on Solar Panels Using Deep Learning.

S.No.	Year	Ref.	CNN Models Evaluated
1	2025	[34]	ViTs and EfficientNet
2	2025	[35]	ANN
3	2025	[22]	MobileNetV1, V2, V3
4	2024	[14]	DenseNet169, VGG16, ResNet50, and 17 others
5	2024	[36]	InceptionV3
6	2024	[28]	ResNet50, VGG16, InceptionV3
7	2024	[29]	EfficientNet (with Channel Attention)
8	2024	[33]	Custom CNN Model
9	2023	[23]	SolNet, VGG16, ResNet50, InceptionV3, MobileNetV2
10	2023	[37]	DnCNN, VGG16, AlexNet, ResNet
11	2023	[38]	ResNet50, MobileNet
12	2023	[25]	UNet

Table 2. Number of dirty solar panel images categorized by the type of contaminant.

Type of Dirt	Number of Images
Bird droppings	69
Debris	7
Defect	9
Dust	524
Leaves	4
Occlusions (human or cleaning tools)	45

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Anwar, H. TransferLearning-Driven Large-Scale CNN Benchmarking with Explainable AI for Image-Based Dust Detection on Solar Panels. Information 2026, 17, 52. https://doi.org/10.3390/info17010052

AMA Style

Anwar H. TransferLearning-Driven Large-Scale CNN Benchmarking with Explainable AI for Image-Based Dust Detection on Solar Panels. Information. 2026; 17(1):52. https://doi.org/10.3390/info17010052

Chicago/Turabian Style

Anwar, Hafeez. 2026. "TransferLearning-Driven Large-Scale CNN Benchmarking with Explainable AI for Image-Based Dust Detection on Solar Panels" Information 17, no. 1: 52. https://doi.org/10.3390/info17010052

APA Style

Anwar, H. (2026). TransferLearning-Driven Large-Scale CNN Benchmarking with Explainable AI for Image-Based Dust Detection on Solar Panels. Information, 17(1), 52. https://doi.org/10.3390/info17010052

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

TransferLearning-Driven Large-Scale CNN Benchmarking with Explainable AI for Image-Based Dust Detection on Solar Panels

Abstract

1. Introduction

Research Gap, Questions, Contributions, and Scope

2. Related Work

2.1. Impact of Dust Accumulation on Solar Panel Efficiency: A Global Perspective

2.2. Image-Based Dust Detection on Solar Panels

3. Dataset

4. Methodology

4.1. Image Encoding Using Pre-Trained CNN Model

4.2. Classification Using a Linear SVM via Stratified Shuffle Split

5. Results and Discussion

5.1. Classification Accuracy

5.2. Training Time

5.3. Overall Verdict

5.4. Comparison of the Best-Performing Variants

5.5. Qualitative Analysis Using Explainable-AI

5.6. Analyzing Misclassified Images of Dirty Solar Panels

5.7. Testing on Unseen Real-World Dirty Solar Panels Image Dataset

6. Conclusions and Future Work

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI