Radiometric Infrared Thermography of Solar Photovoltaic Systems: An Explainable Predictive Maintenance Approach for Remote Aerial Diagnostic Monitoring

: Solar photovoltaic (SPV) arrays are crucial components of clean and sustainable energy infrastructure. However, SPV panels are susceptible to thermal degradation defects that can impact their performance, thereby necessitating timely and accurate fault detection to maintain optimal energy generation. The considered case study focuses on an intelligent fault detection and diagnosis (IFDD) system for the analysis of radiometric infrared thermography (IRT) of SPV arrays in a predictive maintenance setting, enabling remote inspection and diagnostic monitoring of the SPV power plant sites. The proposed IFDD system employs a custom-developed deep learning approach which relies on convolutional neural networks for effective multiclass classification of defect types. The diagnosis of SPV panels is a challenging task for issues such as IRT data scarcity, defect-patterns’ complexity, and low thermal image acquisition quality due to noise and calibration issues. Hence, this research carefully prepares a customized high-quality but severely imbalanced six-class thermographic radio-metric dataset of SPV panels. With respect to previous approaches, numerical temperature values in floating-point are used to train and validate the predictive models. The trained models display high accuracy for efficient thermal anomaly diagnosis. Finally, to create a trust in the IFDD system, the process underlying the classification model is investigated with perceptive explainability, for portraying the most discriminant image features, and mathematical-structure-based interpretability, to achieve multiclass feature clustering.


Introduction
The worldwide renewables' share accounted for 28.2% of the total final energy consumption in 2020, which highlights the continuous growth and dominance of solar photovoltaics (SPV) in the global energy landscape.The cumulative installation capacity of SPV arrays reached 1185 gigawatts globally in 2022, topping the 1-terawatt mark and contributing approximately 6.2% to total electricity generation [1,2].Moreover, the European Union (EU 27) achieved around 198.3 gigawatts of SPV capacity in 2022 [3][4][5].The global solar energy transition in 2022 saw a dominant role, with a net additional generation of 362 gigawatts, SPV technology had a substantial grid-connecting share, contributing 66% (239 gigawatts) of the total newly added capacity, surpassing other combined renewable energy technologies.Considering the global solar tenders and records of 2021, the notable mention was the Saudi Arabian solar auction (600 megawatts) setting a world record for the lowest bid of 0.0104 USD/kWh.Later in Europe, Germany awarded a ground-mounted Smart Cities 2024, 7 1262 utility-scale SPV project of 1.952 gigawatts in March 2023 at an average bidding price of 0.0710 EUR/kWh [6,7].
Hence, ensuring the prolonged lifespan of an SPV energy plant, by monitoring its desired generation output, has an important techno-economic impact.Periodical inspections and regular operational maintenance play a pivotal role in timely defect diagnosis, by detecting thermal abnormalities and stresses appearing in SPV modules [8,9].The degradation state (BS EN 13306:2017) resulting from thermal stresses in SPV panels hampers the plant's ability to meet the targeted energy output [10].This necessitates predictive maintenance (PdM) for informed, strategic, and financial decision-making [11,12].For nondestructive testing, evaluation, supervision, monitoring, diagnostics, inspection, and maintenance, the International Electrotechnical Commission (IEC TS 62446-3:2017) furnishes technical specifications for outdoor infrared thermography (IRT) [13].Moreover, the training and certifications of personnel are imperative for conducting an IRT survey, ensuring accurate data collection and precise information extraction [14][15][16].

Thermal Degradation and Aerial Thermographic Inspection
Given the fragile nature of SPV cells [17], temperature is a critical environmental factor that significantly accelerates mechanisms associated with SPV module degradation, particularly those linked to the rates of permeation, such as chemical reactions and diffusion.Generally, the SPV module depends upon its cell material and components' chemistry and has a nominal operating cell temperature of 42-43 ± 2 • C [18][19][20][21].Notably, the temperature within the SPV module or cell may deviate from ambient temperature, primarily influenced by incident solar irradiance and windy weather.Broadly speaking, there are three main causes of thermal degradation.Firstly, there are deterioration processes, which include thermo-chemical reactions [22,23] and thermo-mechanical stresses [24,25].Secondly, there are cyclic aging conditions [26,27], such as diurnal and seasonal temperature variations, as well as general aging effects.Finally, there is the issue of semi-blockage or non-uniform passage of sunlight through SPV glass, which can occur due to environmental factors [28,29] like partial shadowing or soiling, glass damage like partial cracks or breakage, as well as biological factors like bird droppings and vegetation debris.Heat dissipation from the cells is influenced by factors such as the thermal conductivity and geometry of surrounding materials, wind speed, and the installation configuration of the SPV module.
Additionally, discrepancies in thermal expansion coefficients among module materials can lead to differential expansion and contraction, generating thermo-mechanical stresses within the module structure.These stresses adversely impact the mechanical stability of crucial electrical components, including cells, solder joints, and interconnect ribbons, potentially causing issues such as deformation, delamination, and cell cracking.Further, cyclic thermo-mechanical stresses from diurnal and seasonal temperature fluctuations may lead to fatigue-induced failures among various module components [30][31][32][33].Moreover, thermal-caused failures are usually cascaded and irreversible, acting as a domino effect resulting in a fait accompli.Hence, thermographic inspection plays a vital role in assessing the heat signature and temperature profile of the SPV panel's glass surface, aiding in the classification, prediction, and mitigation of potential damage and defects.
The quantity of raw field data grows gradually as the surface area of the SPV arrays enlarges, which makes periodical manual diagnostic monitoring relatively challenging, laborious, and time-consuming.Here, fast remote supervision and aerial thermographic inspections are performed periodically by deploying a drone-mounted (or unmanned aerial vehicle-UAV) IRT camera to precisely locate and capture SPV panels' thermal degradation pattern for operational decision-making [34,35].

Predictive Maintenance and Fault Diagnosis
PdM involves leveraging specialized skills in machine learning (ML) and artificial intelligence (AI) tools to pinpoint and eliminate failure symptoms and anomalies.These tools excel at identifying subtle patterns and deviations in data, providing insights into anomalies and underperformance [11].The effective maintenance process is initiated by an accurate operational fault detection and diagnosis (FDD), which plays a critical role in ensuring the efficiency, quality, and reliability of both modern industrial systems and renewable energy systems.Traditional FDD methods, relying on human expertise and diagnostic experience, face challenges as the factory and electric power plant scale expands and the number of process variables increases [36].In SPV systems, conventional FDD methods, including overcurrent-protection devices and ground-fault detection interrupters, encounter limitations in detecting specific faults due to factors such as low solar irradiance conditions, nonlinear output characteristics, the presence of maximum power point trackers (MPPT) in SPV inverters, and high fault impedances.
Consequently, there is a growing need to explore more intelligent fault detection and diagnosis (IFDD) techniques, leveraging AI-based methods as promising alternatives to conventional approaches [37].Indeed, the adoption of ML and deep learning (DL) algorithms has the potential to overcome the limitations of traditional FDD systems.Particularly, AI-based systems tailored for the diagnosis of SPV-IRT images can analyze complex patterns, adapt to varying conditions, and accurately detect faults even in challenging scenarios.The shift towards IFDD ensures improved efficiency in anomaly classification tasks and opens the door to innovative advancements in PdM for renewable energy technology.In the broader context, traditional electricity-generating machines and renewable energy equipment are expected to operate continuously under specific conditions and environments to maintain the desired output.Functional safety and safety integrity levels are specified from the IEC 61508 series, which provides consistent guidelines for risk-free and fault-free smooth operations [38].
Hence, there is a low availability of faulty data instances for AI model training purposes.Addressing imbalanced data in IFDD, especially within complex and uncertain industrial environments, poses a significant challenge [39].Current solutions often struggle to effectively tackle these issues, somehow resulting in biased models and suboptimal performance in diagnosing fault instances and abnormal behaviors.Proactive inspection and maintenance approaches, supported by automatic predictive diagnostic monitoring tools, contribute to enhancing operational efficiency, prolonging equipment lifespan, and preventing costly disruptions [40,41].
Particularly, DL finds extensive applications in the localization, segmentation, detection, and diagnosis of SPV module images.However, it is accompanied by challenges such as high memory requirements, computational expenses, and the need for substantial quantities of image data for training [42].
The SPV-IRT image datasets generally consist of selected pseudo-color/false-color, visible/RGB/converted true color (three-channel), and grayscale (one-channel) palettes which are photographed by a visible-color and/or IRT camera mostly mounted on a drone having various image sizes/dimensions (pixel resolution and quality), labeled classes, and quantities.
By analyzing the existing studies reported in Table 1, it is revealed that various representations of SPV infrared thermal images can be used for IFDD, including pseudocolor/false color palette, visual/RGB/true color, masking/black and white, and monochrome/grayscale images (pixel value: 0-255).However, these representative thermal images mostly lack radiometric values.

Motivation and Contribution
The main contribution of this case study is twofold: (i) to further explore the preparation of a dataset that truly indicates an accurate temperature radiation representation of an SPV panel, allowing the recognition of relevant details for IFDD; (ii) to train, validate, and explain a reliable deep learning model for IFDD.
Indeed, the considered dataset comprises high-quality two-dimensional radiometric data (having floating-point temperature numerical values in degrees Celsius) obtained from publicly available raw aerial SPV array grayscale thermographic images.Then, a customized deep learning algorithm with explainability mappings, based on a CNN, is proposed to diagnose radiometric fault samples with high accuracy and efficiency.
The remainder of the paper is organized as follows: Section 2 describes the materials and methods employed for this research, starting from the data gathering and preparation to the training, validation, and explanation of the deep learning model.Then, the results are presented in Section 3 and discussed in Section 4, where also the limitations of the proposed methodology are considered.Finally, Section 5 draws the final remarks and portrays directions for future research.

SPV Data Characteristics
The availability and accessibility of a large, varied, and quality dataset plays a pivotal role in ANN-based diagnostic monitoring.Indeed, conducting effective thermographic inspections of SPV arrays in the field requires compliance with technical guidelines, specific conditions, and equipment calibration which includes a solar irradiance of 500 W/m 2 or higher for adequate thermal contrast, uncooled microbolometer detectors having a high sensitivity in the 8-14 µm waveband (long wavelength infrared-LWIR), the camera's thermal sensitivity must be ≤0.08K to distinguish small temperature differences on the SPV panel's glass surface, and adjusting the camera viewing angle to within 5-60 • (where 0 • is perpendicular) is an appropriate compromise to reduce glass reflections and relative emissivity high [60].
Furthermore, some unique peculiarities and problems need to be dealt with in energy systems IFDD: (a) the detection of anomalies also in the presence of noise [61]; (b) the data distribution is highly skewed, as industrial equipment and energy-producing assets primarily operate under healthy conditions, resulting in fewer instances of faults [39]; (c) specific defects, such as defective bypass diodes and PID effect by shunted cells (PIDsc), which can lead to both open and short circuits that exhibit unique visual patchwork patterns [62,63].

Dataset Preparation and Visualization
A customized highly imbalanced 6-class radiometric 2-dimensional IRT dataset of SPV panels was prepared.The publicly available raw aerial IRT-grayscale image dataset of an SPV array lacks the necessary sorting and labeling of the defective SPV panels [54,64].The raw SPV array dataset was visually assessed to identify the defect pattern of each SPV panel.Defective SPV panels were categorized and labeled according to IEC standards [13].Sample defects are shown in Figure 1.
systems IFDD: (a) the detection of anomalies also in the presence of noise [61]; (b) the data distribution is highly skewed, as industrial equipment and energy-producing assets primarily operate under healthy conditions, resulting in fewer instances of faults [39]; (c) specific defects, such as defective bypass diodes and PID effect by shunted cells (PID-sc), which can lead to both open and short circuits that exhibit unique visual patchwork patterns [62,63].

Dataset Preparation and Visualization
A customized highly imbalanced 6-class radiometric 2-dimensional IRT dataset of SPV panels was prepared.The publicly available raw aerial IRT-grayscale image dataset of an SPV array lacks the necessary sorting and labeling of the defective SPV panels [54,64].The raw SPV array dataset was visually assessed to identify the defect pattern of each SPV panel.Defective SPV panels were categorized and labeled according to IEC standards [13].Sample defects are shown in Figure 1.The SPV panels were extracted and resized to dimensions of 60 × 100 pixels (radiometric data or thermal intensity points) from raw aerial grayscale IRT images of the SPV array, and each SPV panel was represented by floating-point temperature numerical values in degrees Celsius (temperature matrix) as shown in Figure 2.Then, the images were zero-padded to 66 × 106 pixels, as portrayed in Figure 3.
The dataset consisted of 2672 instances, with a significant class imbalance.The majority of instances (2647) belonged to the "Good" class (or normal/no anomaly), while the remaining 25 instances were equally distributed among five "Faulty" classes (or abnormal/anomaly data samples having 5 instances of each minority class) as shown in Figure 4 with the pseudo-color visual depiction of the thermal values.The five faulty classes included a faulty multistring, heated junction box, faulty substring, patchwork pattern, and hotspot effect.Offline data augmentation, using horizontal and vertical flipping, was used to enhance the visual diversity of the samples belonging to the minority class.Finally, a min-max normalization was used to prepare the data for further processing stages by rescaling them into the [0, 1] range.
Raincloud plots were used to represent the distribution of temperature intensity data points across different classes as shown in Figures 5 and 6.The SPV panels were extracted and resized to dimensions of 60 × 100 pixels (radiometric data or thermal intensity points) from raw aerial grayscale IRT images of the SPV array, and each SPV panel was represented by floating-point temperature numerical values in degrees Celsius (temperature matrix) as shown in Figure 2.Then, the images were zero-padded to 66 × 106 pixels, as portrayed in Figure 3.The dataset consisted of 2672 instances, with a significant class imbalance.The majority of instances (2647) belonged to the "Good" class (or normal/no anomaly), while the remaining 25 instances were equally distributed among five "Faulty" classes (or abnormal/anomaly data samples having 5 instances of each minority class) as shown in Figure 4 with the pseudo-color visual depiction of the thermal values.The five faulty classes included a faulty multistring, heated junction box, faulty substring, patchwork pattern, and hotspot effect.Offline data augmentation, using horizontal and vertical flipping, was used to enhance the visual diversity of the samples belonging to the minority class.Finally, a min-max normalization was used to prepare the data for further processing stages by rescaling them into the [0, 1] range.
Raincloud plots were used to represent the distribution of temperature intensity data points across different classes as shown in Figures 5 and 6.

Experimental Workflow
The experimental workflow employed in this study is shown in Figure 7. First, a drone-assisted IRT camera was deployed to remotely capture raw grayscale thermographic images of SPV arrays to monitor their operational health status.Then, the 2-dimensional radiometric dataset was prepared as described in Section 2.2.
Subsequently, an ensemble of 4 CNN models was obtained via a 4-fold cross-validation, where each iteration was employed to train and validate a model of the ensemble.From the original dataset, 20% of the data (534 samples, of which 5 with anomalies) were kept as a final hold-out test set.Hence, during each cross-validation iteration, 60% of the data (1604 samples, of which 15 with anomalies) were used for training and 20% (534 samples, of which 5 with anomalies) for validation.An ensemble was created by averaging the output of the four models and then tested on the hold-out set.
We also considered another experiment, in which we resampled the minority samples with the Synthetic Minority Over-sampling Technique (SMOTE) to obtain a balanced dataset.From the resampled dataset, 20% of the data (3182 samples) were kept as a final hold-out test set.Hence, during each cross-validation iteration, 60% of the data (9544 samples) were used for training and 20% (3181 samples) for validation.The classes had an even distribution in the resampled dataset.
Finally, XAI methods are applied with two principal aims: (i) to provide mathematical interpretability of the learned feature structures among the different classes; and (ii) to highlight the most relevant regions considered for the classification via the adoption of perceptive explanation mappings.

Experimental Workflow
The experimental workflow employed in this study is shown in Figure 7. First, a drone-assisted IRT camera was deployed to remotely capture raw grayscale

Experimental Workflow
The experimental workflow employed in this study is shown in Figure 7. First, a drone-assisted IRT camera was deployed to remotely capture raw grayscale cal interpretability of the learned feature structures among the different classes; and (ii) to highlight the most relevant regions considered for the classification via the adoption of perceptive explanation mappings.
Overall, our approach aims not only to realize an end-to-end detection and diagnostic pipeline for SPV panel defects but also provides a rationale for the decisions made by the system, thanks to the aid of the employed XAI techniques.Overall, our approach aims not only to realize an end-to-end detection and diagnostic pipeline for SPV panel defects but also provides a rationale for the decisions made by the system, thanks to the aid of the employed XAI techniques.

Convolutional Neural Network Model
The architecture of the proposed CNN model, specifically designed for the multiclass classification (diagnosis) of the radiometric 2-dimensional IRT dataset of SPV panels, is portrayed in Figure 8.The number of trainable parameters is reported in Table 2.The visual depiction of the row-wise pseudo-color feature map of the heated junction box of each CNN layer (Conv2d, Batch Normalization, and MaxPooling2d) is shown in Figure 9.
The network was trained for a maximum of 1000 epochs, with an early stopping criterion corresponding to a patience of 200, with the following hyperparameters: a leaky ReLU activation function with an alpha of 0.6, a dropout at the rate of 0.25, a batch size set to 64, the Lion optimizer with a learning rate of 0.0001, and categorical cross entropy as the loss function.Class weights were employed to mitigate class imbalance for the experiment without SMOTE.

Explainable Artificial Intelligence Methods
The pursuit of interpretability and explainability, via the adoption of XAI techniques [65], particularly in DL models, is crucial for understanding how these models arrive at their predictions [66][67][68][69][70][71].Two broad categories of methods are commonly employed to achieve explainability and interpretability [66,67]: perceptive explainability and mathematical interpretability.These two families of XAI methods complement each other in providing a comprehensive understanding of a DL model.The perceptive methods offer visual cues that are more intuitive for human interpretation, while the mathematical methods attempt to reveal the logic and mathematical structures underlying the model's representation of the original data points.

Perceptive Explainability
The objective of the XAI methods belonging to the realm of perceptive explainability is to provide a straightforward visual representation of the top contributing features that influence the final predictions.The approach is to visualize and highlight regions or The network was trained for a maximum of 1000 epochs, with an early stopping criterion corresponding to a patience of 200, with the following hyperparameters: a leaky ReLU activation function with an alpha of 0.6, a dropout at the rate of 0.25, a batch size set to 64, the Lion optimizer with a learning rate of 0.0001, and categorical cross entropy as the loss function.Class weights were employed to mitigate class imbalance for the experiment without SMOTE.

Explainable Artificial Intelligence Methods
The pursuit of interpretability and explainability, via the adoption of XAI techniques [65], particularly in DL models, is crucial for understanding how these models arrive at their predictions [66][67][68][69][70][71].Two broad categories of methods are commonly employed to achieve explainability and interpretability [66,67]: perceptive explainability and mathematical interpretability.These two families of XAI methods complement each other in providing a comprehensive understanding of a DL model.The perceptive methods offer visual cues that are more intuitive for human interpretation, while the mathematical methods attempt to reveal the logic and mathematical structures underlying the model's representation of the original data points.

Perceptive Explainability
The objective of the XAI methods belonging to the realm of perceptive explainability is to provide a straightforward visual representation of the top contributing features that influence the final predictions.The approach is to visualize and highlight regions or features in the input data that have the most significant impact on the model's decision.The methods are used to study feature-level classification behavior, helping to understand the importance of specific regions or features in the input data for classification.The perceptive explainability methods applied in this study are activation maximization [72,73], SmoothGrad [74,75], and Grad-CAM (Gradient-weighted Class Activation Mapping) [69].
The activation maximization [72,73] method, introduced by Erhan et al. in 2009, is targeted at maximizing the activation of a given neuron as an optimization problem.Considering an input data point x (a radiometric image in our context), a neural network with parameters θ (referring to both weights and biases), and h ij (θ, x) the activation of a given neuron i from a given layer j in the network, activation maximization can be defined the following optimization problem: When the specific i, j indexes can be removed for generality, the activation can be denoted as h(•) instead of h ij (•).A simple and straightforward way to find x * is to perform gradient ascent using an update rule such as [72,73]: where ϵ 1 is the step size.It is important to define the initialization data point x 0 , which can be either random noise or a real data point.In this work, we used radiometric images as initialization to ease the gradient ascent optimization process, avoiding finishing in local optima which do not offer particularly meaningful representations.Simonyan et al. [75] exploited the concept of activation maximization for CNNs and introduced the image-specific class saliency visualization.The basic idea is that in a linear model, higher values of the weights are associated with more important parameters.In a highly nonlinear model, such as a deep CNN, the first-order Taylor expansion can be used in place of the original model.While such techniques allowed them to portray saliency maps on the original images for a given CNN, they tended to be fairly noisy.Hence, Smilkov et al. [74] improved over the saliency method by considering the average of several images perturbed with noise.The resulting saliency maps looked much smoother, and hence they called their method SmoothGrad.
Grad-CAM [69] exploits the gradient information going through a specific convolutional layer (in the original work, the authors focused on the last one, but it is not a mandatory choice) of the CNN to retrieve importance scores for each neuron involved in a particular classification.Hence, Grad-CAM allows one to generate a visual representation of the class activation of a model for a given input image and class, i.e., it creates easy-to-understand visualizations showing how the network decides on a specific class by highlighting important features in the form of a heatmap.
In our study, we exploited the activation maximization, the SmoothGrad, and the Grad-CAM methods to realize perceptive explanation mappings highlighting the salient regions to detect and diagnose the defective SPV panels.The conv2d_2 layer (Table 2) was used to extract all the perceptive explainability plots.We used 25 smooth samples for computing SmoothGrad.

Mathematical Interpretability
The methods belonging to the domain of mathematical interpretability aim to provide insights into the internal workings of the models and reveal the features used in making final predictions.In this case, the approach exploits mathematical structures to reveal the underlying mechanisms of AI models and understand how the information is processed.These methods are used to study the clustering capabilities of the networks, offering insights into how the model groups or categorizes different types of data.Hence, they can be used to assess if the learned features are meaningful to the considered task.The mathematical interpretability methods applied in this research were t-distributed Stochastic Neighbor Embedding (t-SNE) [76] and Uniform Manifold Approximation and Projection (UMAP) [77].
The t-SNE [76] methodology is a variant of SNE that enables the visualization of data with high-dimensionality by transforming each original data point in a low-dimensionality space of two or three dimensions.With respect to the traditional SNE, it offers two main advantages: (i) the optimization of the cost function and adoption of Student's t distribution for determining similarity between pairs of data points in the low-dimensionality space; (ii) the incorporation of a heavy-tailored distribution to face the "crowding in the low-dimensionality space.
The UMAP [77] methodology is a nonlinear approach for reducing dimensionality that operates under the following three key assumptions: (i) data are uniformly distributed across an existing manifold; (ii) the topological structure of the manifold is maintained; (iii) the manifold is locally connected.UMAP consists of two principal stages: learning the structure of the manifold in a high-dimensionality space and determining the corresponding representation in a low-dimensionality space.
In our study, t-SNE and UMAP were used to unveil the mathematical structure of the features, displaying clusters among the healthy panels and the different defective ones.Particularly, t-SNE was exploited to perform dataset feature clustering starting from radiometric input test data, whereas UMAP was applied for achieving dataset feature clustering from output feature data elaborated by the dense_1 layer (Table 2) of the CNN model.

Performance Evaluation Metrics
Confusion matrices were used for quantitative evaluation of the results of the proposed workflow since they offer a clear representation of the model's performance among individual classes, so that errors in specific areas can be easily noted.
Learning curves were exploited to illustrate the model's training progress over epochs for the training and validation datasets.Examining these curves can help identify the changes in the experience of learning performance over time to handle issues like overfitting or underfitting and guide decisions on model training duration by making it a wellfit model.

Experimental Resources
The experiments were performed on a computer server with 50 CPUs of Intel ® (Santa Clara, California, United States of America) Xeon ® Gold 6130 Processors with 22 M Cache, 2.10 GHz.The server was equipped with 100 GB of memory (RAM).Further, the paid version resources of Google Colab (available online at https://colab.research.google.com/,last accessed 24 January 2024) were also utilized.

Results
This section presents the experimental findings of this research, encompassing the classification accuracy of the proposed model, the insights coming from the application of XAI techniques, and the trends observed during the training and validation of the considered model.To consent to the comparison of the different perceptive explainability techniques among the iterations of the cross-validation, the results concerning activation maximization, SmoothGrad, and Grad-CAM are portrayed for each cross-validation model.
The results of applying activation maximization, considering as the starting point the original images of our dataset, for the conv2d_2 layer, are portrayed in Figure 10.It is possible to observe that the maximum saliency region is highlighted in different portions of the image among various iterations of cross-validation models.This is not particularly problematic since the CNNs are invariant to translation.
The outcomes from using SmoothGrad on the original radiometric images within our dataset, specifically focusing on the conv2d_2 layer, are illustrated in Figure 11.Various similarities can be discerned among the models trained throughout the cross-validation process.For instance, all the models correctly identified high saliency regions near the areas with the highest values in the junction box and hotspot cases.The depiction of applying Grad-CAM to the original radiometric images within our dataset, extracted from the conv2d_2 layer, can be observed in Figure 12.It is possible to observe very activated regions of class activation maps nearby the hotspot points for all the models.The pattern of such saliency regions is linked to the final classification performed by the network, aiding in the understanding of how the deep model makes its decisions.The representation of the input features' embedding of good and faulty classes is visually depicted in Figure 13 by exploiting the t-SNE technique, while the CNN feature predictions are visually portrayed in Figure 14, with the UMAP (Hellinger metric) technique.It is possible to observe that in the original feature space, there is not a defined clustering between the good samples and the anomaly ones.On the other hand, after the training is performed, definite clusters appear for the different anomaly types and the good samples.The stratified 4-fold cross-validation confusion matrices are reported in Figure 15.The models achieved an impressive 99.81 ± 0.15% (mean ± std) accuracy.On the other hand, we should keep in mind that the current dataset had few anomaly samples, and hence the accuracy was skewed towards high values since our network almost perfectly recognized good samples.The ensemble from the four cross-validation models achieved 100% accuracy, as shown in Figure 16.The learning curves are portrayed in Figure 17.It can be seen that the training process was inherently noisy due to the severely imbalanced dataset, with skews appearing throughout the curves for the stratified 4-fold cross-validation models.

Outcome of Resampling SPV Radiometric Dataset
The challenge of working with DL models when feeding an imbalanced dataset concerns the risk of the model not properly taking into account the minority classes, which, in PdM scenarios, are those concerning faults or anomalies.In the first experiment, we successfully dealt with the imbalance by adopting the categorical cross-entropy with class weights.
Here, the SMOTE algorithm was applied to enhance and balance the anomaly classes.It is evident from Figures 18 and 19 that the confusion matrix of each model obtained from the stratified 4-fold cross-validation displays excellent capacity to diagnose the faulty samples.Indeed, the models from individual iterations of cross-validation achieved an impressive 99.98 ± 0.03% (mean ± std) accuracy, whereas the ensembled model achieved a 100% accuracy on the hold-out test data.However, model 2 in Figure 18 misclassified only two samples of the majority (good) class on the test dataset.It is worth noting that the models obtained from the cross-validation in Figure 15 (class-weighted) and Figure 18 (SMOTE) are different due to the adoption of oversampling in the second experiment.The learning curves for models trained on the SMOTE-augmented dataset are portrayed in Figure 20, where it is possible to observe that model 2, despite the two misclassified samples, exhibit stable learning curves.The outcome obtained in the experiment exploiting SMOTE for rebalancing the dataset is presented in the next subsection.

Outcome of Resampling SPV Radiometric Dataset
The challenge of working with DL models when feeding an imbalanced dataset concerns the risk of the model not properly taking into account the minority classes, which, in PdM scenarios, are those concerning faults or anomalies.In the first experiment, we successfully dealt with the imbalance by adopting the categorical cross-entropy with class weights.
Here, the SMOTE algorithm was applied to enhance and balance the anomaly classes.It is evident from Figures 18 and 19 that the confusion matrix of each model obtained from the stratified 4-fold cross-validation displays excellent capacity to diagnose the faulty samples.Indeed, the models from individual iterations of cross-validation achieved an impressive 99.98 ± 0.03% (mean ± std) accuracy, whereas the ensembled model achieved a 100% accuracy on the hold-out test data.However, model 2 in Figure 18

Discussion
World renewables, particularly SPV arrays, have an impactfully growing quota of today's clean and sustainable energy production.Thermal anomalies in SPV arrays can reflect various deterioration processes due to thermo-mechanical stresses, thermo-chemical reactions, aging and seasonal issues, and environmental and biological effects, resulting in the loss of desired energy production.Hence, remote aerial diagnostic monitoring is an integral part of the PdM process of large-scale SPV electric power plants.The correct and timely detection and localized diagnosis of defects from radiometric IRT imaging has played a pivotal role in its long-term techno-economic impact.
Our model is simple and adaptable, with low computational effort required for training and inference.Furthermore, it displays convincing performance on the considered dataset.Indeed, an accuracy of 99.81 ± 0.15% was reached during the stratified 4-fold crossvalidation procedure for the class weight experiment, an accuracy of 99.98 ± 0.03% was attained during the stratified 4-fold cross-validation procedure for the SMOTE experiment, and an accuracy of 100% was achieved by the ensembled model on the hold-out test set for both experiments.
Finally, the application of XAI techniques provided a rationale for the working of the CNN model.The perceptive explainability methodologies displayed that the model concentrated on the relevant regions of the input images, whereas the mathematical interpretability techniques showed that the learned features were a meaningful representation of the considered defective classes, allowing us to cluster the radiometric images in a lowdimensional feature space.

Discussion
World renewables, particularly SPV arrays, have an impactfully growing quota of today's clean and sustainable energy production.Thermal anomalies in SPV arrays can reflect various deterioration processes due to thermo-mechanical stresses, thermo-chemical reactions, aging and seasonal issues, and environmental and biological effects, resulting in the loss of desired energy production.Hence, remote aerial diagnostic monitoring is an integral part of the PdM process of large-scale SPV electric power plants.The correct and timely detection and localized diagnosis of defects from radiometric IRT imaging has played a pivotal role in its long-term techno-economic impact.
Our model is simple and adaptable, with low computational effort required for training and inference.Furthermore, it displays convincing performance on the considered dataset.Indeed, an accuracy of 99.81 ± 0.15% was reached during the stratified 4-fold crossvalidation procedure for the class weight experiment, an accuracy of 99.98 ± 0.03% was attained during the stratified 4-fold cross-validation procedure for the SMOTE experiment, and an accuracy of 100% was achieved by the ensembled model on the hold-out test set for both experiments.
Finally, the application of XAI techniques provided a rationale for the working of the CNN model.The perceptive explainability methodologies displayed that the model concentrated on the relevant regions of the input images, whereas the mathematical interpretability techniques showed that the learned features were a meaningful representation of the considered defective classes, allowing us to cluster the radiometric images in a low-dimensional feature space.

Limitations
SPV energy-producing assets are expected to operate under healthy conditions achieving targeted output continuously, resulting in fewer faulty instances in true thermal datasets or two-dimensional radiometric representation.Due to the scarcity of available datasets with labeled faulty instances, it is a challenging task to train DL models for the correct multiclass classification (or diagnosis) of defects, which posed a limitation in conducting this research.Hence, we focused on the objective of developing a customized lightweight CNN model for a single-channel highly imbalanced radiometric dataset.Further studies could be conducted to investigate the generalizability of such methods on larger and more diverse datasets.

Conclusions
The results obtained in this research encourage the adoption of deep learning techniques for PdM tasks.The broader implications on safety, energy production, and financial viability further highlight the significance of accurate radiometric fault diagnosis in the field of SPV energy systems.In the future, the collection of large, high-quality, and diverse datasets will enhance the generalizability and robustness of DL models for predictive maintenance.

Figure 1 .
Figure 1.Raw aerial IRT-grayscale images of the SPV array having a visually defective SPV panel labeled as (a) hotspot effect, (b) patchwork pattern, (c) faulty substring.

Figure 1 .
Figure 1.Raw aerial IRT-grayscale images of the SPV array having a visually defective SPV panel labeled as (a) hotspot effect, (b) patchwork pattern, (c) faulty substring.

Figure 2 .
Figure 2. Extracted solar PV panel labeled as heated junction box: original and magnified region of 2D radiometric data-floating temperature numerical values [°C] (pseudo-color applied for visual depiction).

Figure 2 .
Figure 2. Extracted solar PV panel labeled as heated junction box: original and magnified region of 2D radiometric data-floating temperature numerical values [ • C] (pseudo-color applied for visual depiction).

Figure 2 .
Figure 2. Extracted solar PV panel labeled as heated junction box: original and magnified region of 2D radiometric data-floating temperature numerical values [°C] (pseudo-color applied for visual depiction).

Figure 3 .
Figure 3. Original solar PV panel image of 100 × 60 pixels (left) and zero-padded 106 × 66 pixels (right) having maximum temperature [°C] with pseudo-color visual depiction (red corresponds to higher temperature values, whereas blue to lower temperature values).

Figure 3 .
Figure 3. Original solar PV panel image of 100 × 60 pixels (left) and zero-padded 106 × 66 pixels (right) having maximum temperature [ • C] with pseudo-color visual depiction (red corresponds to higher temperature values, whereas blue to lower temperature values).

Figure 4 .
Figure 4. Original dataset of 6-class solar PV panels with pseudo-color visual depiction (max temperature [°C]).Color bar represents temperature values corresponding to colors (higher temperature values in red shades, lower temperature values in blue shades).Data augmentation applied for visual diversity (multistring, substring, and hotspot).

Figure 4 .
Figure 4. Original dataset of 6-class solar PV panels with pseudo-color visual depiction (max temperature [ • C]).Color bar represents temperature values corresponding to colors (higher temperature values in red shades, lower temperature values in blue shades).Data augmentation applied for visual diversity (multistring, substring, and hotspot).

Figure 7 .Figure 7 .
Figure 7. Experimental workflow.First, the data are acquired with a drone-assisted IRT of SPV arrays.Then, an ensemble of CNN models is realized via a stratified 4-fold cross-validation.Finally, Figure 7. Experimental workflow.First, the data are acquired with a drone-assisted IRT of SPV arrays.Then, an ensemble of CNN models is realized via a stratified 4-fold cross-validation.Finally, quantitative results are determined, and explainability techniques are used to unveil the mechanisms underlying the diagnostic process.Heatmaps have a jet color map, with red representing higher temperature (for radiometric images) or activation values (for CNN explanations), and blue depicting lower temperature or activation values.

Figure 8 .
Figure 8.The architecture of the proposed CNN model for the 6-class classification.

Figure 8 .
Figure 8.The architecture of the proposed CNN model for the 6-class classification.

Figure 9 .
Figure 9. Visual depiction of row-wise pseudo-color (red corresponds to higher activation values, whereas blue to lower ones) feature map of heated junction box of each CNN layered-block (Conv2d, Batch Normalization, and MaxPooling2d).

Figure 9 .
Figure 9. Visual depiction of row-wise pseudo-color (red corresponds to higher activation values, whereas blue to lower ones) feature map of heated junction box of each CNN layered-block (Conv2d, Batch Normalization, and MaxPooling2d).

Smart Cities 2024, 7 , 17 Figure 10 .
Figure 10.Explainable deep learning-activation maximization from the last convolutional layer (conv2d_2) for the 6-class classification.Results for each iteration of the stratified 4-fold cross-validation are presented.Pseudo-color (jetred corresponds to higher activation values, whereas blue to lower ones) is used for visual depiction.

Figure 10 .
Figure 10.Explainable deep learning-activation maximization from the last convolutional layer (conv2d_2) for the 6-class classification.Results for each iteration of the stratified 4-fold crossvalidation are presented.Pseudo-color (jetred corresponds to higher activation values, whereas blue to lower ones) is used for visual depiction.

Figure 11 .
Figure 11.Explainable deep learning-SmoothGrad from the last convolutional layer (conv2d_2) for the 6-class classification.Results for each iteration of the stratified 4-fold cross-validation are presented.Pseudo-color (red corresponds to higher activation values, whereas blue to lower onesjet) is used for visual depiction.

Figure 11 .
Figure 11.Explainable deep learning-SmoothGrad from the last convolutional layer (conv2d_2) for the 6-class classification.Results for each iteration of the stratified 4-fold cross-validation are presented.Pseudo-color (red corresponds to higher activation values, whereas blue to lower onesjet) is used for visual depiction.

Smart Cities 2024, 7 , 19 Figure 12 .
Figure 12.Explainable deep learning-Grad-CAM from the last convolutional layer (conv2d_2) for the 6-class classification.Results for each iteration of the stratified 4-fold cross-validation are presented.Pseudo-color (red corresponds to higher activation values, whereas blue to lower onesjet) is used for visual depiction.

Figure 12 .
Figure 12.Explainable deep learning-Grad-CAM from the last convolutional layer (conv2d_2) for the 6-class classification.Results for each iteration of the stratified 4-fold cross-validation are presented.Pseudo-color (red corresponds to higher activation values, whereas blue to lower onesjet) is used for visual depiction.

Smart Cities 2024, 7 , 20 Figure 13 .
Figure 13.Embedding with t-SNE (cluster map) of complete input radiometric dataset images.Each data point is depicted as a star-shaped point.Kindly note that some points overlap.

Figure 14 .
Figure 14.Embedding with UMAP (feature clustering) of test dataset predictions (Hellinger metric)-ensembled.Each data point is depicted as a star-shaped point.Kindly note that some points overlap.

Figure 13 . 20 Figure 13 .
Figure 13.Embedding with t-SNE (cluster map) of complete input radiometric dataset images.Each data point is depicted as a star-shaped point.Kindly note that some points overlap.

Figure 14 .
Figure 14.Embedding with UMAP (feature clustering) of test dataset predictions (Hellinger metric)-ensembled.Each data point is depicted as a star-shaped point.Kindly note that some points overlap.

Figure 14 .
Figure 14.Embedding with UMAP (feature clustering) of test dataset predictions (Hellinger metric)-ensembled.Each data point is depicted as a star-shaped point.Kindly note that some points overlap.

Figure 17 .
Figure 17.Categorical loss and accuracy of the training and validation of class-weighted test dataset for a 6-class output, for each iteration of the stratified 4-fold cross-validation.

Figure 17 .
Figure 17.Categorical loss and accuracy of the training and validation of class-weighted test dataset for a 6-class output, for each iteration of the stratified 4-fold cross-validation.

Figure 20 .
Figure 20.Categorical loss and accuracy of the training and validation of the SMOTE test dataset for a 6-class output, for each iteration of the stratified 4-fold cross-validation.

Figure 20 .
Figure 20.Categorical loss and accuracy of the training and validation of the SMOTE test dataset for a 6-class output, for each iteration of the stratified 4-fold cross-validation.

Table 2 .
View of layers and trainable parameters of the proposed CNN "Sequential" model.

Table 2 .
View of layers and trainable parameters of the proposed CNN "Sequential" model.