Automatic Faults Detection of Photovoltaic Farms: solAIr, a Deep Learning-Based System for Thermal Images

: Renewable energy sources will represent the only alternative to limit fossil fuel usage and pollution. For this reason, photovoltaic (PV) power plants represent one of the main systems adopted to produce clean energy. Monitoring the state of health of a system is fundamental. However, these techniques are time demanding, cause stops to the energy generation, and often require laboratory instrumentation, thus being not cost-effective for frequent inspections. Moreover, PV plants are often located in inaccessible places, making any intervention dangerous. In this paper, we propose solAIr, an artiﬁcial intelligence system based on deep learning for anomaly cells detection in photovoltaic images obtained from unmanned aerial vehicles equipped with a thermal infrared sensor. The proposed anomaly cells detection system is based on the mask region-based convolutional neural network (Mask R-CNN) architecture, adopted because it simultaneously performs object detection and instance segmentation, making it useful for the automated inspection task. The proposed system is trained and evaluated on the photovoltaic thermal images dataset, a publicly available dataset collected for this work. Furthermore, the performances of three state-of-art deep neural networks, (DNNs) including UNet, FPNet and LinkNet, are compared and evaluated. Results show the effectiveness and the suitability of the proposed approach in terms of intersection over union (IoU) and the Dice coefﬁcient.


Introduction
With the growing demand for a low-consumption economy and thanks to technological advances, photovoltaic (PV) energy generation has become paramount in the production of renewable energy. Renewable energy sources will represent the only alternative to limit fossil fuel usage and pollution. For this reason, PV power plants are one of the main systems adopted to produce clean energy. Huge investments have been allocated by European countries to stimulate the use of so-called clean energy. Indeed, monitoring the state of health of a system is crucial; detecting the degradation of solar panels is the only way to ensure good performance over time. Besides avoiding a waste of energy, the reason for maintaining a correct functional status of a plant is also economic: the degradation of long-term performance and overall reliability of PV plants can drastically reduce expected revenues [1,2].
PV plants are more and more extensive, composed by thousands of modules, potentially affected by the following fault types: optical degradation or faults, electrical mismatches, and non-classified faults [3]. In the last decades, several methods have been developed, spanning electrical diagnostics, statistical inference from monitored control units, shading detection and so on. Commercial monitoring approaches ensure power loss detection in a portion of the PV field, while the accurate localization of faulty modules requires strings' disassembling, visual inspection, and/or electrical characterization. The long-term performance and the overall reliability of the PV modules strictly depend on faults arising during the operational conditions, or that occurred during the transportation and installation [4,5].
An accurate and prompt detection of defects in the PV modules has the task of guaranteeing an adequate duration time and an efficient power generation of the PV modules and therefore a reliable functioning of the PV plants [6]. Operation and Maintenance (O&M) actions are performed to detect faults. O&M techniques are time-demanding, cause stops to the energy generation, and often require laboratory instrumentation, thus being not cost-effective for frequent inspections [7]. Moreover, it should be noted that PV plants are often located in inaccessible places, making any intervention dangerous.
In this regard, a strong contribution was given by the recent diffusion of unmanned aerial vehicles (UAV), equipped with a thermal infrared sensor, making this technique widely accessible and a de-facto standard for PV fields' diagnosis [8]. The inspection of a PV system using a thermal imaging camera allows to identify any malfunctions of the modules as zones with different colors represent different operating temperatures. Infrared thermography (IRT) is very important for the analysis of PV plants since it allows the acquisition of the operating temperature of each module, an important parameter for the performance evaluations. In addition, even with powerful equipment, accelerating the process of detection of these anomalies it is still challenging; in fact, fault detection is actually very time-consuming and error-prone, since it is generally performed with a visual interpretation of the operator. Moreover, the current practice adopted by the majority of PV plant owners is to perform inspections sporadically, with random criteria and without controlling the overall health of the installation. These represent the main motivations behind the proposed approach [9].
Given the above reasons, in this work, solAIr, a fast and accurate anomaly cells detection system, is developed, leveraging recent advances in deep learning. When dealing with the analysis of large image collections, deep learning-based approaches have been demonstrated to be useful compared to the widely used machine learning approaches (e.g., support vector machines, k-nearest neighbor, decision tree, random forest and more) [10]. The use of a deep neural networks (DNNs) allows a complete understanding of the image, guaranteeing greater accuracy and efficiency and discovering multiple levels of data representation. DNNs can extract the characteristics of the image and automatically classify them from a large amount of image data [11,12]. Hence, the proposed anomaly cells detection system is based on the mask region-based CNN (Mask R-CNN) architecture [13]. This work extends a previous one proposed for the classification of anomaly PV images [14]. In the previous work, a classification task was addressed. For each image, the system deduces that at least one anomaly is present in that image. Instead, in this work the detection task is addressed. For each image, the system returns the exact location of the anomalies contained in the same image. The Mask R-CNN approach solves three tasks at the same time: location, segmentation and classification of objects in an image, generating a bounding box, segmentation mask and class. Additionally, the most important aspect is that the R-CNN Mask solves the segmentation task at the instance level, i.e., it generates a result for each object found. SolAIr was trained and evaluated on the photovoltaic thermal images dataset, a public dataset collected for this work. The dataset is an extension of the one published in the previous work [14]. Initially, the dataset included only images of a portion of Tombourke's system. Now, the dataset has been expanded with images of the entire system. In addition, for each image, a mask containing the segmentation of the faulty cells has been added. The thermal dataset is available (http://vrai.dii.univpm.it/content/photovoltaic-thermal-images-dataset) after compiling a request form in which the applicants specify their research purposes. Furthermore, the performances of three state-of-art DNNs, including UNet [15], FPNet [16] and LinkNet [17], are compared and evaluated in this paper.
The main contributions of this paper can be summarized as follows: (i) a system based on deep learning for the anomaly detection and localization of damaged cells in PV thermal images, (ii) a newly annotated dataset that is publicly available for further experiments by the research community and (iii) a comparison of different deep learning methods that can serve as a benchmark for future experiments in the field.
The paper is organized as follows: Section 2 presents an overview of the related works for PV image processing; Section 3 introduces our approach that consists of a UAV-based inspection system (Section 3.1), gives details on the photovoltaic thermal images dataset (Section 3.2) and introduces a DNN-based solution for anomaly cells detection of PV thermal Images (Section 3.3). Section 4 presents the results, and Section 5 discusses the conclusions and future works.

Related Works
The latest technological improvements of digital cameras, in combination with affordable costs, made the PV inspection based on optical methods more and more popular. Specifically, electroluminescence (EL) and IRT imaging represent reliable methods for the qualitative characterization of PV modules. In recent years, several companies have developed systems based on EL techniques. Mondragon Assembly developed an EL inspection system equipped with three high-definition cameras, enabling easy identification of different defects, such as micro cracks, dark areas, finger problems, and short-circuits (https://www.mondragon-assembly.com/solar-automationsolutions/solar-manufacturing-equipment/pv-module-testing/el-inspection/). MBJ implemented high-resolution fully automated electroluminescence test systems for integration into production lines of PV panels, cells, modules or strings. Their system uses deep learning methods to ensure reliable automatic error detection (https://www.mbj-solutions.com/en/products/el-inspection-systems). In addition, AEPVI (Aerial PV Inspection) performs PV power plant inspections by using aerial EL testing systems. The evaluation of the images is automated and uses machine learning techniques to categorize module faults (http://www.aepvi.com/). Quantified Energy Labs performs quantitative electroluminescence analysis (QELA) for enabling the use of EL for outdoor applications. On top of QELA algorithms, they develop machine learning and artificial intelligence models to detect and analyze every module in PV plants and identify potential defects that might reduce the performance of the asset (https://qe-labs.com/). However, as stated in [3], EL-based methods present limitations with respect to IRT imaging which, by contrast, appears likely to be more suitable to provide quantitative information. IRT imaging can provide information about the thermal signature and the exact physical location of an occurring fault, indicating the defective cell, group of cells or module (qualitative diagnosis). In turn, such a thermal signature can be used for quantitative diagnosis, by identifying the electrical power output losses of the impacted module, in the form of dissipated heat. Besides this, thermal images can be obtained in a faster way, with cost-effective tools and avoiding the interruption of energy [3].
For all these reasons, a fault detection method based solely on thermal data will be shown in this paper. However, for the sake of completeness, this section provides the readers with the latest achievements in this field.
Rogotis et al. [18] proposed an early defect diagnosis in PV modules exploiting spatio-temporal information derived form thermal images. The approach uses a global thermal image threshold determined combining two threshold techniques. Their approach is efficient and robust to noise and reflections due to the sun or clouds, but it is not able to detect junction boxes when another area of the panel is super-heated.
In [19], the authors propose the use of standard thermal imaging and the Canny edge detection operator to detect PV module failures that cause the hot spot problem. Several field IRT measurements of thermal images were used for the inspection of defective PV modules. Overall, the whole approach was efficient in detecting hot-spot formations diagnosed in particular defective cells in each module that was analyzed. For this method some limitation occurs as an undesirable sensitivity in case of meaningless background objects. Kim et al. [20] also adopted Canny edge operator and image segmentation techniques to process IRT images acquired with a UAV platform. They used an approach to compare the intensity characteristics of the individual polygons in the panel area.
Efficient and improved edge detection techniques are presented in the work of [21], where significant advancements are presented for automated localization of defects.
In [5] the output IRT images derived from an aerial inspection were processed by the method of aero-triangulation that uses photogrammetry and the global positioning system. Even if all occurring failures are correctly detected, this data treatment method is highly time-and resource-consuming. To solve this problem, some optimizations are currently investigated.
An innovative thermal sensor that experimentally localizes heat sources and estimates the peak temperature using machine learning algorithms (ThermoNet) has been introduced by [22]. The combination of the thermal sensor called ThermoMesh and ThermoNet allows the detection of a high-speed high-resolution heat source through the transfer of conductive heat.
In [23], the authors evaluated and implement an automated detection method to inspect a PV plant using a UAV equipped with IRT, whereas in [24] the effectiveness of PV plant detection based on the profiles of temperature was studied. They also used a UAV equipped with an infrared camera that inspects the quality of photovoltaic systems in real operating conditions. The temperature distribution of PV modules allows to detect the defective modules. A useful approach to identify the presence of hot spots in real time was presented in [23]. But this approach was efficient only for the identification of the aforementioned type of defect, and not for other forms of failure.
Algorithms based on artificial neural networks (ANN) have been proposed to detect anomalies in PV modules. In fact, some recent studies have demonstrated that the use of deep learning can improve the defect detection performance in the aerial images of PV modules, thanks to their ability of self-learning, fault tolerance and adaptability [1]. The work of [25] detects three typologies of PV faults (disconnected substrings, hot spots, and disconnected strings) on infrared images acquired by a thermographic camera mounted on a UAV. The images are processed with digital image-processing methods and then are used as samples for training a CNN. They demonstrated that the algorithm was able to detect faults that were not detectable with the image-processing techniques. Telemetry and IRT images were used to detect hot spots in the work of [26]. Their approach is based on a region-based recurrent convolutional neural network, that once trained, is used as a hot spot detector. The work of [27] compared the performance of hotspot detection in the IRT image of PV modules using two approaches. The first is based on the classical technology that uses Hough line transformation and the Canny operator to detect hotspots. The second uses the deep learning model based on Faster-RCNN and transfer learning. With the second they obtained the best results. Close to the approach proposed in this article is the work of Dunderdale et al. [28]. To identify faulty modules, they combined a scale invariant feature transform (SIFT) descriptor with a random forest classifier. Moreover, to evaluate the performance of deep learning models, VGG-16 and MobileNet were implemented. Conversely, our study advances the state of the art, as it performs a segmentation task, with the advantage of identifying the correct location of each fault. Moreover, our approach exploits the thermal raw data. Finally, the results of the tested methods are compared using state-of-the-art metrics. At a glance, the previous solution [14] for the classification of PV damaged images has been improved by applying recent object detection architectures to the casting anomaly cells detection task, namely: Mask R-CNN [13], UNet [15], FPNet [16] and LinkNet [17]. The details of the proposed methods are presented in the following sections.

Materials and Methods
The approach presented in [14], i.e., the classification of PV anomaly images, has been extended for the development of the proposed solAIr system. To the best of our knowledge, this is the first available dataset with thermal information specifically annotated for the management of PV plants. Indeed, the available SoA datasets include RBG [29] or electtroluminescence [30] images, but thermal information is neglected. The framework for the anomaly cells detection system, as well as the novel PV thermal image dataset used for evaluation, were comprised of three main components: the UAV-based inspection system, the mask region-based CNN (Mask R-CNN) architecture and the DNN-based solution (see Figure 1). The design of the defect detection system is based on the Mask R-CNN architecture, which was adopted to simultaneously perform object detection and instance segmentation, making it useful for the automated inspection task. Further details on the UAV-based inspection system and the DNN-based solution are given in the following sections with the evaluation metrics adopted for solving this task. Details on the data collection and ground truth labelling are discussed in Section 3.2.  In the first step, a UAV is used to scan the PV system. The acquired frames are annotated and stored in the photovoltaic thermal images dataset. In the next step, the selected neural network (Region Based Convolutional Neural Network -RCNN) is trained on a portion of the dataset. In the last step, the trained models are tested on the remaining portion of the dataset. For the final experimental evaluation, state-of-the-art metrics (like Dice and Intersection over Union (IoU)) are used for the comparison between the segmentation of the networks and the relative ground truth.

UAV-Based Inspection System
The UAV-based inspection system is based on a Skyrobotic SR-SF6 drone equipped with a radiometric Flir Tau 2 640, a thermal camera with a resolution of 640 × 512 pixels and a focal length of 13 mm. The detailed UAV specifications and parameters adopted in this work are presented in Table 1. The analysis was carried out with a constant flight altitude of 50 m with respect to the surface of the panels.
The thermalCapture (Tau core) hardware of the thermo-camera can work in different modes; in our case thermal detections are available in two different temperature ranges: "high gain" and "low gain". For "high gain" the range of temperature is between −25 and +135 • C. For "low gain" the range of temperature is between −40 • C and +550 • C, but a lower resolution than the first. All thermo-camera specifications can be found in [14]. Once the raw thermal data are acquired, they can be pre-processed by a thermografic software, in our case ThermoViewer version 3.0.7 (https: //thermalcapture.com/thermoviewer/). It is important that the settings in ThermoViewer match those of the Tau Core to provide a valid output of temperature.

Photovoltaic Thermal Images Dataset
In this work, we provide a novel PV thermal image dataset (http://vrai.dii.univpm.it/content/ photovoltaic-thermal-images-dataset). For its collection, a thermographic inspection of a ground-based PV system was carried out on a PV plant with a power of approximately 66 MW in Tombourke, South Africa. The thermographic acquisitions were made over seven working days, from 21 to 27 January 2019 with the sky predominantly clear and with maximum irradiation. This situation is optimal to enhance any abnormal behavior of the entire panels or portion thereof.

Dataset Annotation
The images were captured during the inspection of the PV plants. The operator has selected the images with the presence of one or more anomaly cells. Then, the associated binary mask was generated. This mask contains white pixels indicating the anomaly cell. The detection of the anomalous cell is made only through the use of thermal data: the operator immediately identifies where the anomaly is placed because the cell has a temperature value that is totally different from all the surrounding cells. This difference has been evaluated by a software called ThermoViewer (Figure 2). The thermal images, obtained with the raw radiometric data, associate a thermal value to each pixel, using the Celsius graduated scale. The images may present one or more anomalies, as depicted in Figure 3, and the operator creates a single mask that segments each anomalous cell. In case of a portion of contiguous anomalous cells, the operator segments the whole portion in a single block. The pre-processing and annotation phase produced a dataset of 1009 thermal images, including each respective mask. The thermal images and the binary masks have the same dimensions of 512 × 640 pixels. The input classes were chosen according the following three types of annotation: • Images with one anomalous cell (Figure 3a,b); • Images with more than one anomalous cell (Figure 3c,d); • Images with a contiguous series of anomalous cells (Figure 3e,f).
(e) (f) Figure 3. Examples of images from the dataset. Figures (a,c,e) are normalized thermal images. Figures (a,c,e) depict examples of masks, where the black color is the background that contains all the cells without anomalies and the white is the cells with anomalies. Figure  The number of images per class are reported in Table 2.

Data Normalization
As already stated, the thermal data have several advantages compared to RGB data. However, a normalization and a transformation into a black and white image are required, for obtaining a single information channel. Table 3 shows the great variability of values within the thermal dataset: temperature values range between a minimum of 2.249 • C and a maximum of 103.335 • C, with a median equal to 44.21 • C. Figure 4 represents the histogram of temperatures of the whole dataset. Therefore, due to the great variability of values in the dataset, the thermal dataset was normalized in a range between 0 and 1, then transformed into grayscale images, i.e., with pixels having a value between 0 and 255. Examples of normalized thermal images are shown in Figure 3a,c,e.

DNN-Based Solution for Anomaly Cells Detection
In this Subsection, we introduce the proposed deep learning-based solution for PV anomaly cells detection. In particular, the presence and the right position of an anomalous cell in a PV image is addressed as a segmentation task.
Image segmentation techniques take as input an image and output a mask with the predicted anomalous cells. Since it is a binary segmentation, the mask has pixels with values equal to 0 for the background and 1 for the anomalous cell. The DNNs specifically designed for image segmentation use convolutional neural networks for image classification as backbones for feature extraction, and on these backbones different kinds of feature combinations are constructed to achieve the segmentation result. CNNs are the most successful, well-known and widely used architectures in the deep learning domain, especially for computer vision tasks. They are a particular neural network that is able to extract discriminant features from data with convolution operations, so they can also be used as feature extraction networks. Usually a CNN is composed by three type of layers: convolutional layers, where a kernel of weights is convolved on inputs to extract discriminant features; non-linear layers, to learn the modeling of non-linear functions by the network; and finally, pooling layers, which reduce dimensions of a feature map by using statistical operations (mean, max). The units of every layer are locally connected, i.e., units receive weighted inputs from a small neighborhood (receptive field) of units of the previous layer. A CNN architecture is usually composed by stacking layers to form multi-resolution pyramids: the higher-level layers learn features from increasingly wider receptive fields. State-of-art CNN architectures are AlexNet, VGG , ResNet, MobileNet, and more recently, EfficientNet [31]. In this work, the backbone of the three segmentation networks is based on EfficientNet [31]. This network uses a mobile inverted bottleneck for the image classification task. Based on the backbone extracted features, the three segmentation methods that are compared for the development of our system are: UNet [15], LinkNet [17] and feature pyramid network (FPN) [32].
UNet is composed of a series of convolutional layers where the outputs of those layers are passed to a corresponding deconvolutional layer. In particular, a contracting path and expansive path are applied to generate a segmentation mask.
LinkNet was chosen because it is lightning-fast and is composed of a series of encoder and decoder blocks used to break down the image and build it back up before passing it through a few final convolutional layers. The structure of the network has been designed to minimize the number of parameters so that segmentation could be done in real time. Instead of a simple contracting path and expanding path, it is used a "link", which is inserted between the contracting paths and connects the result of the single step of contraction to the specular step of the expanding path.
Feature pyramid network (FPN) [32] is designed as creates a pyramid representation of the input image and on it apply the extraction network. It replaces the feature extractor of detectors like Faster R-CNN and generates multiple feature map layers (multi-scale feature maps) with better-quality information than the regular feature pyramid for object detection.
All these techniques give as output a single overall mask containing all the anomalous cells predicted in the same input image.

Mask Region-Based CNN for Anomaly Cells Detection
Instance segmentation has been chosen for this work. The main reason behind this solution is that the advantage for the operator is that it obtains the correct position of the anomalous cell. If compared to a common segmentation task, instance segmentation requires a mask to be created for each anomalous cell and within the same image. Conversely, image segmentation needs a further step to split all the defective cells calculated within the overall mask mentioned above. Following this assumption, Mask R-CNN was proven to be an effective and accurate network for solving these problems [13]. It is based on Faster R-CNN [33] and has an additional branch for predicting segmentation masks on each Region of Interest (RoI) in a pixel-to-pixel manner. This network generates three outputs: one for each candidate object, one for a class (considering both the label and a bounding-box offset) and one for the object mask. Additionally, it is comprises two parts: a region proposal network (RPN), which proposes a candidate object with a bounding box, and a binary mask classifier, which generates a mask for every class. Considering the specific case of anomaly cells detection, this network is not trained directly with image masks, but it needs the anomalous cell bounding boxes within the image: not a single mask, but a set of top-left and bottom-right coordinates of each bounding box. Furthermore, in order to be compared with image segmentation techniques, it also needs a post-processing step: all the anomalous predicted cells have to be merged into one overall mask.

Evaluation Metrics and Loss Function
The metrics taken into consideration vary according to the type of task to be solved and therefore the type of available output. For the image segmentation task the output is a total mask containing all the defective cells segmented for the same input image. In this case, pixel-based metrics used in state-of-the-art techniques are accuracy, precision, recall, and F1-score. However, these metrics can be misleading as the dataset is unbalanced, so the pixels belonging to damaged areas are far fewer than those concerning the non-damaged areas. To solve this problem we used other more suitable metrics: the Jaccard index (Equation (1)) and the Dice coefficient (Equation (2)).
The Jaccard index is a similarity measure on sets [34], and in the segmentation task the sets are the masks: the first one is that generated by the network and the second one is the ground truth mask.
In Equation (1), A is the generated mask and B is the ground truth mask. The Dice coefficient is a measure of the overlapping of two images, in this application the images are masks; the generated mask is A and the ground truth is B.
These metrics are useful when, given an input image, the output is a single mask. This is not true for the Mask-RCNN, where a mask for each damaged identified cell is obtained. In this case, a post-processing phase is used to combine the masks into an overall mask and then calculate the metrics. The use of the Jaccard index and Dice coefficient, together with the publication of the thermal dataset, allows the scientific community to compare their approaches with the results of this work. For the training of the networks, starting from the metrics used to evaluate the performance, it is possible to use two cumulative loss functions, that is, a combination of the basic loss functions. The basic loss functions for the training of a network for image segmentation are: the Jaccard loss function (Equation (3)) and the Dice loss function (Equation (4)), described as follows: In addition to these metrics, we used the Focal loss [35], suitable for segmentation tasks with unbalanced datasets where the background has a greater number of pixels than relative to the foreground. The Focal loss definition, in Equation (5), uses a posteriori probability p t , which is the estimated probability for the class y = 1, where y = ±1. Focal loss uses an hyperparameter γ to tune the weight of different samples; the optimum value of γ, from [35], is 2.
These basic losses are combined to obtain the two different loss functions used to train the networks. The first one is used to maximize the Dice and Jaccard coefficients and is detailed in Equation (6): The second one is used to maximize the Dice coefficient and the focal loss over the different classes, and it is defined in Equation (7):

Results and Discussion
In this Section, the results of the experiments conducted on the photovoltaic thermal images dataset are reported. In particular, two experiments were performed: the first one is based on the performance comparison of the three image segmentation networks (U-Net, LinkNet and FPN) and the second one involves the Mask R-CNN for the instance segmentation task. Finally, a comparative analysis of the networks is carried out. The photovoltaic thermal images dataset was split into three subset: 70% for training, 20% for validation and 10% for the final test. For both image and instance segmentation, the evaluation metrics used were the Dice and Jaccard indexes, as described in Section 3.5.
In the first experiment, the performances of three image segmentation networks were compared. These networks were implemented using TensorFlow and Keras and the training was carried out for 100 epochs, using Loss 1 (Equation (6)). The results achieved by these networks are summarized in Table 4 in terms of the Jaccard and Dice indexes. Results show that all networks showed good performance and are very similar: LinkNet slightly outperformed the others in terms of Jaccard index while U-Net was better than the others in terms of the Dice index. For the second experiment, we trained and tested the Mask R-CNN network. This network was also implemented in Keras and Tensorflow. In contrast to other DNNs, this is a network has been specifically developed for instance segmentation. For this reason, it is important to make a few remarks about these comparisons. First of all, the input of the network is the ground truth of the anomalous cells in form of bounding boxes, instead of the masks. Thus, starting from the masks it is necessary to have a preprocessing phase that allows to calculate the coordinates of these bounding boxes. The polygons obtained and their position in the reference image were finally saved in a json file. During the training, the batch size was fixed at 2 and the dataset was split as stated before. As described in Section 3.4, Mask R-CNN comprises several networks, and hence its loss function is defined as the sum of the losses of the different network components: Loss total = Loss cls + Loss box + Loss mask (8) where Loss cls represents the loss of the classifier, Loss box is the loss of the regressor, and Loss mask is the loss of the segmentation branch.
The training was performed in three steps: • Network trained from scratch; • Network pretrained on the Microsoft Common Objects in the Context dataset (MS-COCO) [36], then retraining all layers; • Network pretrained on the MS-COCO dataset, then retraining only the layers of the head section (the classifier section).
The technique of retraining a pre-trained network on another dataset is a transfer learning technique called fine tuning, and it is widely adopted in cases of small datasets. This technique generally allows to train a network faster than training from scratch. This approach proved to help in achieving excellent results in [37,38], using a Mask-RCNN pretrained on the MS-COCO dataset for their tasks. Another difference compared to other networks is that it obtaines as output a mask for each anomalous predicted cell. For obtaining Jaccard and Dice metrics, a post-processing phase is needed to combine all the masks of the instances into a single overall mask. Table 5 reports the Instance segmentation results on the photovoltaic thermal images test set by using a Mask-RCNN network. The results of the three training approaches are reported, in terms of Jaccard and Dice metrics. The results show that using a pre-trained network and re-training only the head part allows obtaining a good instance segmentation network: it achieved 0.499 on the Jaccard index and 0.605 on the Dice index. These performances are higher than the network trained from scratch. The tests also show that totally re-training a pre-trained network could lead to worse results than training it from scratch.
Finally, Table 6 presents a comparative analysis of the performance of the best networks for both segmentation approaches. The trainable parameters and the training time are also reported. For this comparison, the network chosen is UNet, for its Jaccard and Dice index metrics, and Mask RCNN pre-trained on the MS-COCO dataset and re-trained only on the the head part. The results reveal that the U-Net outperformed the other approaches. However, Mask-RCNN has the key advantage that it directly outputs the position of each single predicted cell. Conversely, UNet outputs a single overall mask, but through a post-processing step based on image processing techniques, it can be easily split into the individual predicted cells.  Figure 5 depicts the training trend of the UNet network, concerning the loss function, the Jaccard index and the Dice index. It can be noticed that after only 30 epochs the network tends to the convergence. Figure 6 allows a visual analysis of the results obtained by the UNet network on the test set. It represents some examples of test images, their ground truth and the relative predicted mask. It is possible to deduce that for a human operator it is easy to understand its exact position within the plan (Figure 6a,b). This leads to many advantages in terms of time and efforts. Figure 6c shows that the network may have false positives in the predicted mask, i.e., some areas are miss-classified as anomalous cells. These false positives usually have a very small area. (c) Figure 6. UNet performance on test set images, with ground truth and predicted mask. The masks of (a,b) have been correctly predicted. (c) depicts some missclassified areas.

Conclusions and Future Works
In this study, solAIr, an artificial intelligence UAV-based inspection system was presented, which is capable of detecting faults in large-scale PV plants. To achieve such results, a DNN deep-learning based module was developed, and was designed to exert instance segmentation. The proposed solution was properly evaluated against existing solutions through a comparative study. The experimental results confirm its effectiveness and suitability to the diagnosing of thermal images of PV plants. In particular, the networks chosen obtained high values on the Jaccard and Dice indices. The proposed approach for defect analysis can be an essential aid to assist operators for O&M operations, reducing cost and errors arising from manual operations. Considering that, nowadays, inspections are entrusted to visual inspections, our approach will both reduce the overall costs of PV module maintenance and increase the efficiency of PV plants. Considering that instance segmentation through deep learning has never been applied in this field before, this study advances the body of knowledge and opens up promising scenarios for the management of clear energies. The work also presents some drawbacks: first of all, we only deal with binary segmentation: a pixel can be classified either as a damaged cell or as a background. Notwithstanding, this issue can be easily overcome. In fact, this framework is already prepared for a future multi-class segmentation: for example, detecting different types of cell anomalies, as described in the Section 2. A further consideration that can be made concerning the used dataset: probably creating a mask that combines several defective cells (Figure 3f) could add an error in the training of the network, because the pixels of conjunction between the cells should not be part of this mask. Hence the performance of the network will improve as soon as the masks of this type of defect are improved. The output results of the proposed experiments can be easily integrated within a dedicated geographical information system (GIS) specifically designed for operation and maintenance (O&M) activities in PV Plants. Indeed, having the geolocation for each image, the management of the detected faulty cells can be facilitated. Moreover, thermal data, which are processed with their raw values, still require a processing phase in the office. Given the good computational performances described in Table 6, an on-board integration in the UAV platform can be foreseen for on-site inspection operation, with minimized implementation hurdles. Additionally, the robustness and the reliability of the proposed UAV-based inspection system, along with the deep-learning anomaly cells detection solution, needs to be further validated and improved through extensive field assessments. A further improvement can be made by exploiting the data analysis of the real-time electrical measurements of operating PV modules, obtained from the underlying system monitoring infrastructure. Such data can be used in conjunction with the proposed solution to improve the performances of the fault detection system. Author Contributions: Conceptualization, R.P. and M.P.; methodology, R.P. and M.P.; software, A.F.; validation, P.Z.; data curation, F.P.; writing-original draft preparation, R.P. and M.P.; visualization, A.F.; supervision, P.Z. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.