A Generative Adversarial Network-Based Fault Detection Approach for Photovoltaic Panel

: Photovoltaic (PV) panels are widely adopted and set up on residential rooftops and photovoltaic power plants. However, long-term exposure to ultraviolet rays, high temperature and humid environments accelerates the oxidation of PV panels, which ﬁnally results in functional failure. The traditional fault detection approach for photovoltaic panels mainly relies on manual inspection, which is inefﬁcient. Lately, machine vision-based approaches for fault detection have emerged, but lack of negative samples usually results in low accuracy and hinders the wide adoption of machine vision-based approaches. To address this issue, we proposed a semi-supervised anomaly detection model based on the generative adversarial network. The proposed model uses the generator network to learn the data distribution of the normal PV panel dataset during training. When abnormal PV panel data are put into the model in the test phase, the reconstructed image generated by the model does not equal the input image. Since the abnormal PV panel data do not obey the data distribution learned by the generator, the difference between the original image and its reconstructed image exceeds the given threshold. So, the model can ﬁlter out the fault PV panel by checking the error value between the original image and its reconstructed image. The model adopts Gradient Centralization and SmoothL1 loss function to improve its generalization performance. Meanwhile, we use the convolutional block attention module (CBAM) to make the model pay more attention to the defective area and greatly improve the performance of the model. In this paper, the photovoltaic panels dataset is collected from a PV power plant located in Zhejiang, China. We compare the proposed approach with state-of-the-art semi-supervised and unsupervised approaches (i.e., AnoGAN (Anomaly Detection with Generative Adversarial Networks), Zhao’s method, GANomaly, and f-AnoGAN), and the result indicates that the Area Under Curve (AUC) increases by 0.06, 0.052, 0.041 and 0.035, respectively, signiﬁcantly improving the accuracy of photovoltaic panel fault detection


Introduction
Traditional thermal power generation uses heat energy generated by combustible materials (i.e., coal) and converts it into electric energy through power generation plants. The combustion of coal produces carbon dioxide, which is the main source of carbon emissions. Due to the rapid increase in carbon emissions, the concentration of global CO 2 continues to rise, and the caused greenhouse effect results in global warming, which poses a great threat to the survival of human beings. Since 1992, the Intergovernmental Panel on Climate Change (IPCC) has signed a series of agreements, including the Paris Agreement. great threat to the survival of human beings. Since 1992, the Intergovernmental Panel on Climate Change (IPCC) has signed a series of agreements, including the Paris Agreement. The overall goal of the Paris Agreement was to achieve a peak in global greenhouse gas emissions around 2020, and net zero greenhouse gas emissions in the second half of this century. Photovoltaic power generation is a technology that directly converts light energy into electric energy by using the photovoltaic effect at the semiconductor interface. Abundant solar radiant energy is an important energy that is inexhaustible, pollution-free, cheap and free to be used by human beings. Therefore, solar energy has become the focus of attention because of its unique advantages [1,2]. In the global promotion of clean and low-carbon energy transition, the adoption of photovoltaic (PV) panels in clean energy is increasing year by year [3]. According to International Energy Agency (IEA) statistics, since 2018, the cumulative installed capacity of global PV power generation has reached 480GW and is expected to reach 1721GW in 2030 [4].
In order to increase the exposure time of PV panels, large-scale PV power plants are located in areas without apparent shade, such as the roof of buildings, plains, hills and over fish ponds, as shown in Figure 1. Long-term exposure to ultraviolet rays, high temperature and humid environments accelerate the oxidation of PV panels, which results in functional failure [5]. Severe sealant delamination increases reflection, decreases irradiance, sinks moisture in the module, and accelerates panel oxidation [6]. The accumulation of dust directly reduces light transmittance and affects the efficiency of the PV power generation system, sometimes up to about 50% or, even worse, 80% [7]. Although snail trials have no direct effect on power generation efficiency, invisible cell cracks usually reduce their power output [8]. Therefore, it is necessary to inspect the status of photovoltaic panels regularly to find and replace defective photovoltaic panels in time. However, most of the photovoltaic panels are installed on roofs, fish ponds and hillsides with a wide area, so manual inspection cannot be carried out. Recently, several artificial intelligence algorithms (e.g., deep convolutional neural networks) were introduced to facilitate the process of PV panel fault detection [9][10][11][12][13]. Unfortunately, such fully supervised learning-based methods have been proven unsuitable for large-scale PV panel fault detection due to the lack of defective samples [14][15][16]. At present, semi-supervised learning-based approaches [17][18][19][20] are being explored by many researchers. To solve these issues, by leveraging adversarial auto-encoders and conditional Generative Adversarial Networks (GAN), we propose a semi-supervised anomaly detection approach (named ppFDetector) to facilitate the process of fault detection in PV panels. The main contributions in this paper can be summarized thus: (1) A novel model-based semi-supervised generative adversarial network is proposed.
Compared with the fully supervised learning model, the semi-supervised anomaly detection model does not require a large number of negative samples, which solves To solve these issues, by leveraging adversarial auto-encoders and conditional Generative Adversarial Networks (GAN), we propose a semi-supervised anomaly detection approach (named ppFDetector) to facilitate the process of fault detection in PV panels. The main contributions in this paper can be summarized thus: (1) A novel model-based semi-supervised generative adversarial network is proposed.
Compared with the fully supervised learning model, the semi-supervised anomaly detection model does not require a large number of negative samples, which solves the problem of the anomaly detection model's inability to be trained without negative samples.  (2) The original optimization function is implemented with gradient centralization (GC) to regularize weight and output space to prevent the model from overfitting. (3) Convolutional block attention module (CBAM) is used to make the model pay more attention to the defective area, and improve the performance of the model. (4) SmoothL1loss is used to define the loss, which can combine the advantages of L1Loss and L2Loss to speed up the training of the model.
The remainder of this paper is organized as follows: Section 2 describes related literatures. Section 3 details the technical implementation of the ppFDetector. Experimental results using the ppFDetector on the PV panels dataset compared with state-of-the-art anomaly detection models are presented in Section 4. We finally conclude the paper in Section 5.

Generative Adversarial Networks
The network structure of GAN is composed of a generation network and a discriminant network. The generator receives the random variable and generates fake sample data. The purpose of the generator is to try to make the generated samples the same as the real samples. The input of the discriminator consists of two parts, real data and data generated by the generator. The output is usually a probability value, which represents the probability that the input is a true distribution. If the input comes from real data, then output 1; otherwise, output 0. At the same time, the output of the discriminator will be fed back to guide's training. Ideally, it cannot distinguish whether the input data are real or generated, i.e., each time the output probability is 1/2 (equivalent to random guessing), the data generated by the generator have reached real. At this point, the model is optimal. In practical applications, the generation network and discriminant network are usually implemented by deep neural networks. As the number of network layers increases, it is difficult for GAN networks to converge. In order to solve this problem, a wide variety of GAN variant models have emerged. Deep Convolutional Generative Adversarial Networks (DCGAN) [21] is the first major improvement on GAN architecture. Its generator is an upsampling process, i.e., deconvolution. The discriminator is a common convolutional network. It is more stable in training and produces higher quality samples.
With the development of deep learning, generative adversarial networks can also perform image anonymization to protect training data. Ashish Undirwade and Sujit Das [22] proposed applying optimized noise to the latent space representation (LSR) of the original image, and then converting the input original image into a new synthetic image through generative adversarial networks.
At present, GAN is a promising network model that can be applied in many fields, such as image super-resolution, image translation and style conversion. Karnewar [23] proposed a Multi-Scale Gradient Generative Adversarial Network (MSG-GAN), which is a simple but effective technique for addressing this by allowing the flow of gradients from the discriminator to the generator at multiple scales. This technology provides a stable method for high-resolution image synthesis, and can replace the commonly used progressive growth technology.

PV Panel Fault Detection
PV panel fault detection roughly goes through the following phases: an initial phase, the phase of physical approaches and the phase of intelligent detection.
In the initial phase, manual inspection is the most primitive method for detecting PV panel faults. The worker uses an image of the PV panel taken by a camera to check whether there is a defect. Traditional manual inspection evaluates a single PV panel just by sight, which is usually high cost, high error rate and low efficiency [24], and has severe subjective inclines for different defects.
In the phase of physical approaches, Sawyer DE et al. [25] used laser scanning technology to aid the fault-detection process. It displays the continuity of resistance in the crystalline silicon of the forward conversion of laser scanning. If there exists a crack, there will be a discontinuity of resistance. Tsuzuki K et al. [26] proposed the use of sound waves for fault detection, which makes PV panels vibrate to generate sound waves. Based on a comparative analysis with sound waves generated by non-defective solar cells, the fault detection can be realized. Physical detection [25][26][27] has high detection accuracy for specific types of PV panel faults, but it cannot effectively detect other faults.
In the intelligent phase of fault detection, the intelligent PV panel fault detection method based on machine vision is now being gradually adopted by users. By machine vision detection, Bastari et al. [28] proposed a method for texture analysis of dark defect areas in electroluminescence images to classify defective and non-defective cells. Quarter et al. [29] first reported the UAV inspection system to perform non-destructive inspection of large-scale PV power plants. However, the effect of fault diagnosis cannot meet the requirements of practical use, and many researchers are further studying fault diagnosis methods. Li et al. [30] first extracted features by calculating the first derivative of the Gaussian function on the collected images, and then performed defect detection through feature matching. This method verifies the effectiveness of the fault analysis algorithm based on image processing to a certain extent, but strictly requires good quality of the collected images. However, due to various reasons such as the wind effect, the obtained image resolution is low, and the performance of this method is significantly reduced. In addition, traditional pattern recognition algorithms often fail to extract fault features with acceptable complexity from the acquired aerial images.
With the evolution of deep convolutional neural networks, the introduction of artificial intelligence algorithms into the intelligent patrol system [9] can improve its robustness and reliability [10]. Li et al. [11] proposed an intelligent diagnosis method of aerial PV module image defects based on the deep convolutional neural network (CNN). This method uses CNN to classify multiple deep features and diverse states. Compared with traditional methods, it can flexibly and reliably solve the problem of PV module images with low quality and distortions. Tang [12] proposed using a generative adversarial network combined with data augmentation to increase training samples; this achieves high classification accuracy, but cannot precisely locate defects. Zhao [13] presented a deep learning-based automatic detection of multitype defects to fulfill inspection requirements of the production line. These studies used CNN to solve PV panel fault detection, and the models were based on supervised learning. However, due to the lack of abnormal panel samples, supervised learning is not suitable for this kind of problem. Schlegl et al. [15] proposed an unsupervised anomaly detection method based on the generative adversarial network (AnoGAN) only using standard panel samples; it uses continuously iterative optimization to find a certain point in latent space. However, due to the need for iterative optimization, the AnoGAN algorithm is bound to consume a lot of time. Zhao et al. [16] proposed using GAN and auto-encoder to reconstruct the defect image, and then use Local Binary Pattern (LBP) to extract the local contrast of the image to detect the defect. However, for images with complex backgrounds, Zhao's method cannot reconstruct and repair defective images well for further fault detection. Also proposed was the f-AnoGAN [17], which can quickly map a picture to a certain point in hidden space and then use wasserstein GAN (WGAN) for anomaly detection. However, its quantitative evaluation of segmentation accuracy of anomaly detection only serves as a coarse indication.

ppFDetector Solution
In this paper, positive samples represent normal PV panel images without defect, and negative samples represent abnormal PV panel images with defect. As the diagram in Figure 2 shows, the workflow of ppFDetector is composed of GAN model training and anomaly detection strategy. During GAN model training, the generator network is designed to learn the data distribution of positive samples in the dataset of PV panel, and the discriminator network is designed for adversarial training, so as to reduce the error between the reconstructed image and its input original image. Then the generator can capture the training data distribution within both the normal PV panel image and its latent space vector. In the anomaly detection strategy, the test PV panel image (including positive samples and negative samples) enters the trained encoder-decoder to generate the latent space vector z and its reconstructed image. Next, the reconstructed image enters the encoder to generate its latent space vector z, and the error M(z, z) between latent space vectors z and z is computed. If M(z, z) > θ, the proposed model judges the test image as an abnormal PV panel, otherwise it is normal. The details of the model are described as follows.
Appl. Sci. 2022, 11, x FOR PEER REVIEW 5 of 18 anomaly detection strategy. During GAN model training, the generator network is designed to learn the data distribution of positive samples in the dataset of PV panel, and the discriminator network is designed for adversarial training, so as to reduce the error between the reconstructed image and its input original image. Then the generator can capture the training data distribution within both the normal PV panel image and its latent space vector. In the anomaly detection strategy, the test PV panel image (including positive samples and negative samples) enters the trained encoder-decoder to generate the latent space vector and its reconstructed image. Next, the reconstructed image enters the encoder to generate its latent space vector , and the error ( ,) between latent space vectors and ̃ is computed. If ( ,) > , the proposed model judges the test image as an abnormal PV panel, otherwise it is normal. The details of the model are described as follows.

GAN Model
The network structure of GAN model is composed of a generator network and a discriminator network . The generator and the discriminator are constructed by iterative confrontation training. Considering that the color and shape patterns of the PV panel image are usually single and regular, the generator network in GAN can better learn the data distribution of the PV panel image set and generate a more ideal reconstructed image.

GAN Model
The network structure of GAN model is composed of a generator network G and a discriminator network D. The generator and the discriminator are constructed by iterative confrontation training. Considering that the color and shape patterns of the PV panel image are usually single and regular, the generator network in GAN can better learn the data distribution of the PV panel image set and generate a more ideal reconstructed image.

Generator Network
As shown in Figure 3, the generator network is mainly composed of encoder network G E and decoder network G D . The details of the encoder parameters and decoder parameters are shown in Tables 1 and 2, respectively. Firstly, the generator network G reads the input PV panel image x and passes it into the encoder network G E . After x is passed through the convolutional layer, batch normalization layer, the leaky linear rectification activation function layer and CBAM layer, and scaled down several times, x is compressed into a latent space vector z = G E (x). The latent space vector z can theoretically be assumed as the best representation of x with the smallest dimension. Next, the decoder network G D enlarges z through the transposed convolution layer, batch normalization layer and linear rectification activation function layer. After upscaling, the latent space vector z is reconstructed into the reconstructed image x corresponding to the original PV panel image x, i.e., x = G(x). Figure 3, the generator network is mainly composed of encoder network and decoder network . The details of the encoder parameters and decoder parameters are shown in Tables 1 and 2, respectively. Firstly, the generator network reads the input PV panel image and passes it into the encoder network . After is passed through the convolutional layer, batch normalization layer, the leaky linear rectification activation function layer and CBAM layer, and scaled down several times, is compressed into a latent space vector = ( ). The latent space vector can theoretically be assumed as the best representation of with the smallest dimension. Next, the decoder network enlarges through the transposed convolution layer, batch normalization layer and linear rectification activation function layer. After upscaling, the latent space vector is reconstructed into the reconstructed image ̃ corresponding to the original PV panel image , i.e., ̃= ( ).

Encoder Subnet
The structure of the encoder subnet consists of the convolutional layer, batch normalization layer, leaky linear rectification activation function layer and CBAM layer. Through the − , the reconstructed image data ̃ is output. Put ̃ into to obtain the encoded latent space vector ̃= (̃) of the reconstructed image, and the fault detection of PV panels can be realized by capturing the error between and .

Discriminator Network
In ppFDetector, the original PV panel image and its reconstructed image ̃ are put into the discriminator network . After the transposed convolution layer, the batch

Encoder Subnet
The structure of the encoder subnet E consists of the convolutional layer, batch normalization layer, leaky linear rectification activation function layer and CBAM layer. Through the G E−D , the reconstructed image data x is output. Put x into E to obtain the encoded latent space vector z = E( x) of the reconstructed image, and the fault detection of PV panels can be realized by capturing the error between z and z.

Discriminator Network
In ppFDetector, the original PV panel image x and its reconstructed image x are put into the discriminator network D. After the transposed convolution layer, the batch normalization layer and the leaky linear rectification activation function layer, the input image is downscaled by several times. After flattening, the compressed vector enters the fully connected layer to achieve the purpose of binary classification (by checking whether the input is the original PV panel image or the reconstructed one).

Loss Functions in GAN Model Training
In order to mathematicalize the above assumptions, three loss functions are defined and combined to establish the objective function of the model. Each loss function optimizes a single subnet. At the same time, this paper uses SmoothL1Loss instead of the combination of L1Loss and L2Loss. The formula of SmoothL1Loss is as follows: From the above formula, when the absolute value of t is less than 1, SmoothL1Loss is L2Loss. Because L2Loss will square the error, a smaller loss will be obtained, which is conducive to model convergence. Otherwise, SmoothL1Loss is the translation of L1Loss. L1Loss is not sensitive to outliers, and the magnitude of the gradient can be controlled so that it is not easy to explode during training. So SmoothL1Loss has the advantages of both L2Loss and L1Loss.

Adversarial Loss
The generator network G needs to update the training parameters according to the classification accuracy of discriminator network D (to determine whether the input is the original PV panel image or the reconstructed image). This section only refers to the generator part, but meanwhile, the discriminator is also trained and its weights are updated. In this paper, we update the generator network G based on the features of the intermediate layer of the discriminator denoted as ϕ(x), as shown in Figure 3. Assuming that the independent variables x and x = G(x) both obey the positive sample data distribution P x , the adversarial loss function L adv is the distance between the original PV panel image x and its reconstructed image x using smooth L1 distance, it is defined as follows: where P x is the data distribution of positive normal PV panel images; ϕ(x) is the output feature of the input PV panel image x in the intermediate layer of the discriminator; and G(x) is the reconstructed image of x in the generator. This loss is only activated with negative samples in the discriminator.

Contextual Loss
By optimizing the adversarial loss function L adv in Equation (2), the generator G is able to produce images that are realistic enough to prevent discriminator D from discriminating between real and fake. However, for the reconstruction task, adversarial loss only allows the generator G to produce a reconstruction image x that obeys positive sample data distribution P x . But the reconstruction image x may not correspond to its original PV panel image x, and thus cannot achieve our purpose of reconstruction of the PV panel.
In this paper, we define the context loss L con by calculating the smooth L1 distance between the reconstructed image x and its original PV panel image x, and the equation is as follows: where G(x) is the reconstructed image in generator when input PV panel image x.

Encoder Loss
The two losses defined in Equations (2) and (3) can help the generator produce reconstruction images that are both realistic and relevant to the original PV panel image. We also use the latent space vector z to replace the original PV panel image x as the data for Appl. Sci. 2022, 12, 1789 8 of 18 detecting anomalies. In order to reduce the difference between latent space vectors z and z, this paper defines the encoder loss L enc , and its function equation is as follows: where G E (x) is the latent space vector z output from encoder subnet of generator G E when the input is PV panel image x. E(G(x)) is the reconstructed latent space vector z when input PV panel image x firstly passes through the generator G and then passes through encoder subnet E.
Up to now, the objective loss function L of the GAN model can be defined as follows: where λ is a parameter for adjusting the sharpness of the reconstructed image.

Gradient Centralizaton
Since the gradient of neural network becomes more and more difficult to descend as the number of network layers increases, problems such as gradient disappearance, gradient explosion, and inability to converge will occur. So, optimization techniques are crucial for efficient training of neural networks.
In this paper, we merge the idea of gradient centralization (GC) [31] to optimize the gradient directly. Using weight space and output feature space regularization to avoid overfitting, this will also improve the generalization performance of neural networks. As shown in Figure 4a, W denotes the weight, L denotes the loss function, ∇ W L denotes the weight gradient, and Φ GC (∇ W L) denotes the central gradient. As shown in Figure 4a, replace ∇ W L by Φ GC (∇ W L) to implement embedding GC to the existing network optimizer. The GC is calculated as the mean of each column of the gradient matrix/tensor, and transfers the center of each column to zero mean as shown in Figure 4b.

Encoder Loss
The two losses defined in Equations (2) and (3) can help the generator produce reconstruction images that are both realistic and relevant to the original PV panel image. We also use the latent space vector to replace the original PV panel image as the data for detecting anomalies. In order to reduce the difference between latent space vectors and , this paper defines the encoder loss , and its function equation is as follows: where ( ) is the latent space vector output from encoder subnet of generator when the input is PV panel image . ( ( )) is the reconstructed latent space vector ̃ when input PV panel image firstly passes through the generator and then passes through encoder subnet .
Up to now, the objective loss function of the GAN model can be defined as follows: where is a parameter for adjusting the sharpness of the reconstructed image.

Gradient Centralizaton
Since the gradient of neural network becomes more and more difficult to descend as the number of network layers increases, problems such as gradient disappearance, gradient explosion, and inability to converge will occur. So, optimization techniques are crucial for efficient training of neural networks.
In this paper, we merge the idea of gradient centralization (GC) [31] to optimize the gradient directly. Using weight space and output feature space regularization to avoid overfitting, this will also improve the generalization performance of neural networks. As shown in Figure 4a, denotes the weight, denotes the loss function, denotes the weight gradient, and ( ) denotes the central gradient. As shown in Figure 4a, replace by ( ) to implement embedding GC to the existing network optimizer. The GC is calculated as the mean of each column of the gradient matrix/tensor, and transfers the center of each column to zero mean as shown in Figure 4b.

GC Formula
As for the fully connected layer or convolutional layer, we assume that the gradients have been obtained by back propagation. As for the weight vector , its gradient is ( = 1,2, . . . , ), the subscript represents the column vector in the gradient matrix. The operator of GC, which is denoted by , can be calculated as follows:

GC Formula
As for the fully connected layer or convolutional layer, we assume that the gradients have been obtained by back propagation. As for the weight vector W i , its gradient is ∇ W i L(i = 1, 2, . . . , N), the subscript i represents the column i vector in the gradient matrix. The operator of GC, which is denoted by Φ GC , can be calculated as follows: where µ ∇ W i L = 1 M ∑ M j=1 ∇ W i,j L. As shown in Equation (6), we only need to calculate the mean of the column vector of weight matrix, then remove the mean from each column vector.

Properties of GC
The following is a theoretical analysis of how GC can improve the generalization ability of the model.
(1) Weight space regularization: The projection of the weight gradient is able to constrain the weight space in a hyperplane, as shown in Figure 5, in which P is a projection matrix of hyperplane with normal vector e, and P∇ W L is projection gradient. Firstly, projecting gradient ∇ W L to the hyperplane is determined by e T W − W t = 0, in which W t is the weight vector of the t times iteration, then weight W is updated along −P∇ W t L direction. It can be concluded that e T W t+1 = e T W t = · · · = e T W 0 , i.e., e T W is a constant during the training, so GC regularizes the solution space of W, thus decreasing the possibility of overfitting. (2) Output space regularization: After the introduction of GC, a constant intensity change in an input feature causes a change in the output activation, which is unrelated to the current weight vector. If the mean value of the initial weight vector converges to 0, then the output activation is insensitive to changes in the interference intensity of the input features, so the output feature space is more stable to the changes in the training samples.
. As shown in Equation (6), we only need to calculate the mean of the column vector of weight matrix, then remove the mean from each column vector.

Properties of GC
The following is a theoretical analysis of how GC can improve the generalization ability of the model.
(1) Weight space regularization: The projection of the weight gradient is able to constrain the weight space in a hyperplane, as shown in Figure 5, in which is a projection matrix of hyperplane with normal vector , and ∇ is projection gradient. Firstly, projecting gradient ∇ to the hyperplane is determined by (W − ) = 0, in which is the weight vector of the times iteration, then weight is updated along −P∇ direction. It can be concluded that +1 = = ⋯ = 0 , i.e., is a constant during the training, so GC regularizes the solution space of , thus decreasing the possibility of overfitting.
(2) Output space regularization: After the introduction of GC, a constant intensity change in an input feature causes a change in the output activation, which is unrelated to the current weight vector. If the mean value of the initial weight vector converges to 0, then the output activation is insensitive to changes in the interference intensity of the input features, so the output feature space is more stable to the changes in the training samples. In this paper, the ppFDetector model uses Gradient Centralization to regularize the weight space and output space in order to avoid overfitting.

Convolutional Block Attention Module
Introducing an attention mechanism into a convolutional neural network can improve the performance of the model in classification tasks. Woo [32] proposed convolutional block attention module (CBAM). Considering that the traditional CNN network only uses convolution operation to extract different features in the images, we use the CBAM module to emphasize the meaningful features of channel and spatial axis. As shown in Figure 6, we use the channel attention module and the spatial attention module in turn, whose functions are to learn 'what' in the channel axis and 'where' in the spatial axis respectively. In this paper, the ppFDetector model uses Gradient Centralization to regularize the weight space and output space in order to avoid overfitting.

Convolutional Block Attention Module
Introducing an attention mechanism into a convolutional neural network can improve the performance of the model in classification tasks. Woo [32] proposed convolutional block attention module (CBAM). Considering that the traditional CNN network only uses convolution operation to extract different features in the images, we use the CBAM module to emphasize the meaningful features of channel and spatial axis. As shown in Figure 6, we use the channel attention module and the spatial attention module in turn, whose functions are to learn 'what' in the channel axis and 'where' in the spatial axis respectively.

Spatial Attention
Adding the spatial attention module makes up for the lack of channel attention to a certain extent because spatial attention mainly focuses on which part of the input image has more effective information. In order to calculate spatial attention, a feature map is obtained through max pooling and average pooling, spliced into a 2D feature map, sent to the standard 7 × 7 convolution for parameter learning, and finally a 1D weight feature map is obtained. The spatial attention module of CBAM is shown in Figure 8. In this paper, CBAM module is added to make the ppFDetector model pay more attention to the defect area, which can improve the performance of the proposed model.

Anomaly Detection Strategy
In the anomaly detection strategy, when the normal PV panel image is put into the trained model, the generator first encodes it into latent space vector and then decodes it into the reconstructed image. Because it obeys data distribution learned by the generator, the error between the input image and its reconstructed image is smaller than the threshold defined by the anomaly model. However, when the abnormal PV panel image is put into the model, the reconstructed image generated by the model is not equal to the input image. Since it doesn't obey the data distribution learned by the generator, the error between the input image and its reconstructed image is bigger than the threshold. Therefore,

Spatial Attention
Adding the spatial attention module makes up for the lack of channel attention to a certain extent because spatial attention mainly focuses on which part of the input image has more effective information. In order to calculate spatial attention, a feature map is obtained through max pooling and average pooling, spliced into a 2D feature map, sent to the standard 7 × 7 convolution for parameter learning, and finally a 1D weight feature map is obtained. The spatial attention module of CBAM is shown in Figure 8.

Spatial Attention
Adding the spatial attention module makes up for the lack of channel attention to a certain extent because spatial attention mainly focuses on which part of the input image has more effective information. In order to calculate spatial attention, a feature map is obtained through max pooling and average pooling, spliced into a 2D feature map, sent to the standard 7 × 7 convolution for parameter learning, and finally a 1D weight feature map is obtained. The spatial attention module of CBAM is shown in Figure 8. In this paper, CBAM module is added to make the ppFDetector model pay more attention to the defect area, which can improve the performance of the proposed model.

Anomaly Detection Strategy
In the anomaly detection strategy, when the normal PV panel image is put into the trained model, the generator first encodes it into latent space vector and then decodes it into the reconstructed image. Because it obeys data distribution learned by the generator, the error between the input image and its reconstructed image is smaller than the threshold defined by the anomaly model. However, when the abnormal PV panel image is put into the model, the reconstructed image generated by the model is not equal to the input image. Since it doesn't obey the data distribution learned by the generator, the error between the input image and its reconstructed image is bigger than the threshold. Therefore, In this paper, CBAM module is added to make the ppFDetector model pay more attention to the defect area, which can improve the performance of the proposed model.

Anomaly Detection Strategy
In the anomaly detection strategy, when the normal PV panel image is put into the trained model, the generator first encodes it into latent space vector and then decodes it into the reconstructed image. Because it obeys data distribution learned by the generator, the error between the input image and its reconstructed image is smaller than the threshold defined by the anomaly model. However, when the abnormal PV panel image is put into the model, the reconstructed image generated by the model is not equal to the input image. Since it doesn't obey the data distribution learned by the generator, the error between the input image and its reconstructed image is bigger than the threshold. Therefore, the model detects abnormal PV panel by the error between the input image and its reconstructed image. However, the input image x is a three-channel color image that is stored in a matrix. If the error between the input image and its reconstructed image is directly calculated by the image matrix, the complexity of space and time is unacceptable. The latent space vector z can theoretically be assumed as the best representation of x with the smallest dimension. In order to reduce the time and space complexity of the proposed method, we used the error between latent space vectors z and z instead of the error between the input image and its reconstructed image.
In this paper, the error of the latent space vector denoted as M(z, z) can be calculated by the mean of the differences between the latent space vectors z and z. The relevant expressions are as follows: where N is the length of vector z. Finally, if M(z, z) > θ, the proposed model judges the test image as an abnormal PV panel, otherwise, it is normal.

Dataset
The dataset is collected from a PV power plant located in Zhejiang province. The original image resolution is 3840 × 2048. In order to speed up the training process, we split the original images into a set of small images (i.e., 32 × 32 images as shown in Figure 9a). By segmenting the original image, 32,000 small images are obtained, of which 25,600 small images are included under the training dataset and 6400 under test set. An additional 3200 abnormal images are obtained from the negative samples by splitting defective images, as shown in Figure 9b.
the model detects abnormal PV panel by the error between the input image and its reconstructed image. However, the input image is a three-channel color image that is stored in a matrix. If the error between the input image and its reconstructed image is directly calculated by the image matrix, the complexity of space and time is unacceptable. The latent space vector can theoretically be assumed as the best representation of with the smallest dimension. In order to reduce the time and space complexity of the proposed method, we used the error between latent space vectors and ̃ instead of the error between the input image and its reconstructed image.
In this paper, the error of the latent space vector denoted as ( ,) can be calculated by the mean of the differences between the latent space vectors and . The relevant expressions are as follows: where is the length of vector . Finally, if ( ,) > , the proposed model judges the test image as an abnormal PV panel, otherwise, it is normal.

Dataset
The dataset is collected from a PV power plant located in Zhejiang province. The original image resolution is 3840 × 2048. In order to speed up the training process, we split the original images into a set of small images (i.e., 32 × 32 images as shown in Figure 9a

Model Construction
The construction of the generator and the discriminator in the generative confrontation network is a key task. The generator includes an encoder and a decoder, and the discriminator is an encoder-like structure. The details of the encoder parameters and decoder parameters are shown in Tables 1 and 2, respectively. The input image size of the encoder is 32 × 32 × 3, and a 4 × 4 convolution kernel is used. The first three layers of convolution use edge padding. The core step size is 2, and the batch normalization layer [33], the LeakyReLU activation function layer [34] and CBAM layer [32] are added after the convolution layer. The last layer of convolution directly outputs a hidden space vector with a size of 1 × 1 × 100. The decoder is symmetrical to the encoder structure, the first layer of transposed convolution has no padding and step size, and the last three layers of transposed convolution padding is 1, and the step size is 2. After the first three transposed convolutional layers, the batch normalization layer and the ReLU activation function layer are added, and the output size is 32 × 32 × 3 of the reconstruction image.

Model Construction
The construction of the generator and the discriminator in the generative confrontation network is a key task. The generator includes an encoder and a decoder, and the discriminator is an encoder-like structure. The details of the encoder parameters and decoder parameters are shown in Tables 1 and 2, respectively. The input image size of the encoder is 32 × 32 × 3, and a 4 × 4 convolution kernel is used. The first three layers of convolution use edge padding. The core step size is 2, and the batch normalization layer [33], the LeakyReLU activation function layer [34] and CBAM layer [32] are added after the convolution layer. The last layer of convolution directly outputs a hidden space vector with a size of 1 × 1 × 100. The decoder is symmetrical to the encoder structure, the first layer of transposed convolution has no padding and step size, and the last three layers of transposed convolution padding is 1, and the step size is 2. After the first three transposed convolutional layers, the batch normalization layer and the ReLU activation function layer are added, and the output size is 32 × 32 × 3 of the reconstruction image.
In this paper, three experiments are performed on where the CBAM layer is embedded. Firstly, the CBAM layer is embedded after the first convolution block of the encoder network G E . Secondly, the CBAM layer is embedded after both the first and second convolution blocks of G E . Thirdly, the CBAM layer is embedded after the first, second and third convolution blocks, as shown in Figure 3. The final results of the three different CBAM embedding methods are shown in Figure 10. It can be seen that the third embedding effect is the best, so the proposed ppFDetector model chooses the third CBAM embedding method, and the AUC reaches the maximum value.
ded. Firstly, the CBAM layer is embedded after the first convolution block of the encoder network . Secondly, the CBAM layer is embedded after both the first and second convolution blocks of . Thirdly, the CBAM layer is embedded after the first, second and third convolution blocks, as shown in Figure 3. The final results of the three different CBAM embedding methods are shown in Figure 10. It can be seen that the third embedding effect is the best, so the proposed ppFDetector model chooses the third CBAM embedding method, and the AUC reaches the maximum value.

Model Training
As shown in Figure 2, in the model training, only positive PV panel image enters the encoder-decoder (S.2) to generate the reconstructed image ̃. Then it is divided into two branches. In the first branch, the reconstructed image ̃ enters the discriminator through S.4, and then goes through S.5 to determine whether the result of the discriminator is correct or not. In the second branch, the reconstructed image ̃ enters the encoder (S.3) to generate the latent space vector ̃ of the reconstructed image. Then, the adversarial loss, encoder loss and contextual loss (S.7) are calculated and sent back to the generator. The generator and the discriminator are constructed by iterative confrontation training.

Model Validation
After the model training is completed, the model needs to be evaluated to ensure that efficient and accurate anomaly detection tasks are completed at minimum cost. Therefore, the model is evaluated from the following perspectives: (1) We evaluate the likelihood of the original image and its reconstructed image of the generator. As shown in Figure 11, in the training process, as the number of iterations increases, the difference between the original image and the reconstructed image gradually shrinks.

Model Training
As shown in Figure 2, in the model training, only positive PV panel image x enters the encoder-decoder (S.2) to generate the reconstructed image x. Then it is divided into two branches. In the first branch, the reconstructed image x enters the discriminator through S.4, and then goes through S.5 to determine whether the result of the discriminator is correct or not. In the second branch, the reconstructed image x enters the encoder (S.3) to generate the latent space vector z of the reconstructed image. Then, the adversarial loss, encoder loss and contextual loss (S.7) are calculated and sent back to the generator. The generator and the discriminator are constructed by iterative confrontation training.

Model Validation
After the model training is completed, the model needs to be evaluated to ensure that efficient and accurate anomaly detection tasks are completed at minimum cost. Therefore, the model is evaluated from the following perspectives: (1) We evaluate the likelihood of the original image and its reconstructed image of the generator. As shown in Figure 11, in the training process, as the number of iterations increases, the difference between the original image and the reconstructed image gradually shrinks. (2) We evaluate the data distribution comparison between the original image and its reconstructed image for the positive sample and negative sample, respectively. Since only positive sample dataset is fed into the generator − during the training process, − only learns by the data distribution of the normal PV panel images while the data distribution of the abnormal PV panel images is unknown. When a normal PV panel image is put into the generator, the reconstructed image ̃ generated by the generator is equal to the input image because it obeys the data distri- (2) We evaluate the data distribution comparison between the original image and its reconstructed image for the positive sample and negative sample, respectively. Since only positive sample dataset X is fed into the generator G E−D during the training process, G E−D only learns by the data distribution P x of the normal PV panel images while the data distribution P y of the abnormal PV panel images is unknown. When a normal PV panel image x is put into the generator, the reconstructed image x generated by the generator is equal to the input image x because it obeys the data distribution learned by the generator, then its data distribution P x is infinitely close to P x as shown in Figure 12a. When an abnormal PV panel image y is put into the generator, G E−D still encodes it into z y and then decodes it into y in a manner that obeys the data distribution P x . Due to the difference between P x and P y , y and y are different, and its data distribution P y is different from P y as shown in Figure 12b. (2) We evaluate the data distribution comparison between the original image and its reconstructed image for the positive sample and negative sample, respectively. Since only positive sample dataset is fed into the generator − during the training process, − only learns by the data distribution of the normal PV panel images while the data distribution of the abnormal PV panel images is unknown. When a normal PV panel image is put into the generator, the reconstructed image ̃ generated by the generator is equal to the input image because it obeys the data distribution learned by the generator, then its data distribution ̃ is infinitely close to as shown in Figure 12a. When an abnormal PV panel image is put into the generator, − still encodes it into and then decodes it into ̃ in a manner that obeys the data distribution . Due to the difference between and , and ̃ are different, and its data distribution ̃ is different from as shown in Figure 12b.

Model Checking
As shown in Figure 2, in the anomaly detection process, the test PV panel image ∈ 256×256 is first split into a set of small images (i.e., 32 × 32 images as shown in Figure 9a). The small image ∈ 32×32 enters the encoder-decoder (T.2), to generate the latent space vector and its reconstructed image. Next, the reconstructed image enters the encoder (T.3) to generate the latent space vector . Then the error ( ,) between latent space vectors and ̃ (T.4) is computed. If ( ,) > , the test small image is judged as an abnormal PV panel, Otherwise, it is normal. According to the above method, each small image is judged whether it is abnormal or not in turn. Finally, we merge the 64 small image ∈ 32×32 back into the large original image ∈ 256×256 . As shown in Figure 14, a is the test image, and c is a line chart of the latent vector of the test image and its reconstructed image. The horizontal axis represents the sequence number of the image block, and the total is 64. The vertical axis represents the value of latent vector . Figure  14d is a scatter plot of the error ( ,) of the test image, the green points are normal, but the red points are abnormal. It can be clearly seen that there is an obvious decision bound-

Model Checking
As shown in Figure 2, in the anomaly detection process, the test PV panel image x ∈ R 256×256 is first split into a set of small images (i.e., 32 × 32 images as shown in Figure 9a). The small image x ∈ R 32×32 enters the encoder-decoder (T.2), to generate the latent space vector z and its reconstructed image. Next, the reconstructed image enters the encoder (T.3) to generate the latent space vector z. Then the error M(z, z) between latent space vectors z and z (T.4) is computed. If M(z, z) > θ, the test small image is judged as an abnormal PV panel, Otherwise, it is normal. According to the above method, each small image is judged whether it is abnormal or not in turn. Finally, we merge the 64 small image x ∈ R 32×32 back into the large original image x ∈ R 256×256 . As shown in Figure 14, a is the test image, and c is a line chart of the latent vector z of the test image and its reconstructed image. The horizontal axis represents the sequence number of the image block, and the total is 64. The vertical axis represents the value of latent vector z. Figure 14d is a scatter plot of the error M(z, z) of the test image, the green points are normal, but the red points are abnormal. It can be clearly seen that there is an obvious decision boundary between the normal point and the abnormal point. When we get the outliers in M(z, z), and select the corresponding feature map as abnormal, the output result is shown in Figure 14b. vector and its reconstructed image. Next, the reconstructed image enters the encoder (T.3) to generate the latent space vector . Then the error ( ,) between latent space vectors and ̃ (T.4) is computed. If ( ,) > , the test small image is judged as an abnormal PV panel, Otherwise, it is normal. According to the above method, each small image is judged whether it is abnormal or not in turn. Finally, we merge the 64 small image ∈ 32×32 back into the large original image ∈ 256×256 . As shown in Figure 14, a is the test image, and c is a line chart of the latent vector of the test image and its reconstructed image. The horizontal axis represents the sequence number of the image block, and the total is 64. The vertical axis represents the value of latent vector . Figure  14d is a scatter plot of the error ( ,) of the test image, the green points are normal, but the red points are abnormal. It can be clearly seen that there is an obvious decision boundary between the normal point and the abnormal point. When we get the outliers in ( ,), and select the corresponding feature map as abnormal, the output result is shown in Figure 14b.

Model Evaluation
In order to verify the effectiveness of the proposed method, this paper compares the ppFDetector with AnoGAN [17], Zhao's method [18], f-AnoGAN [19], GANomaly [20], and Pre-trained Vgg16. Considering that the minimum input size of Vgg16 is 48 × 48, we need to enlarge the image from 32 × 32 to 48 × 48. For a fully supervised model, pre-trained Vgg16, 10, 100 and 200 negative samples are respectively appended to the training set to explore the influence of different numbers of negative samples on the metric of accuracy.
We take ROC (Receiver Operating Characteristic curve) curve, precision, accuracy, F1 and sensitivity scores as the performance metrics. Precision represents the proportion of examples classified as positive examples that are actually so. The higher the index, the better the performance. Accuracy is the ratio of the number of samples correctly classified by the classifier to the total number of samples for a given test dataset. And the higher the score, the better the model. F1 score can be regarded as a weighted average of the model precision and recall. Its maximum value is 1, and its minimum value is 0. The larger the value, the better the model. Sensitivity refers to the proportion of all positive examples identified to all positive examples. The higher the sensitivity rate, the higher the probability that the actual abnormal sample is predicted. As shown in Table 3, the performance of a fully supervised learning method (i.e., Pre-trained Vgg16) improves by increasing the proportion of negative samples in the training process. Although the precision and accuracy performance metrics of vgg16(200) are higher than ppFDetector, ppFDetector does not require a negative sample during training, and so ppFDetector outperforms the Pre-trained Vgg16 method with limited negative samples. Table 4 shows the performance comparison between the ppFDetector, semi-supervised learning methods (i.e., Zhao's method and GANomaly), and unsupervised learning methods (i.e., AnoGAN and f-AnoGAN). It shows that the ppFDetector only training with positive samples is generally competitive with state-of-the-art semi-supervised and unsupervised methods, and achieves higher accuracy, precision, F1 score and sensitivity.  Bold shows the best performance between the four models. Figure 15a illustrates the comparison of ROC curves of semi-supervised and unsupervised methods. The ppFDetector outperforms in PV panel fault detection. The AUC value of the ppFDetector is 0.943, which is 0.041 higher than GANomaly, and 0.052, 0.06 and 0.035 higher than Zhao's method, AnoGAN and f-AnoGAN, respectively. Figure 15b illustrates the ROC curves of the ppFDetector and fully supervised method (i.e., Pre-trained Vgg16) with a different number of negative samples in the training process. The ppFDetector outperforms Pre-trained Vgg16 with limited negative samples, i.e., 10 and 100. But when the Pre-trained Vgg16 trained with more than 200 negative samples, its performance is close to ppFDetector.  Figure 16a shows the comparison of the loss with GC and without GC in 50 rounds of training. The original network only takes Batch Normalization (BN) as optimization. As shown in Figure 16a, the training loss of BN+GC is reduced faster than BN, so GC can further speed up convergence of the model during training. Figure 16b compares SmoothL1Loss with the manual combination of L2loss and L1Loss. Using SmoothL1Loss is significantly lower than the one using the manual combination of L2loss and L1Loss. This proves that SmoothL1Loss can make the model more accurate. The GC is applied on conditional GAN of ppFDetector to regularize weight space and output feature space to prevent overfitting, so as to improve the generalization performance in the deep convolution network. Figure 16c is a comparison of the loss with and without the CBAM module. CBAM first extracts features based on the attention of the spatial domain, then extracts  Figure 16a shows the comparison of the loss with GC and without GC in 50 rounds of training. The original network only takes Batch Normalization (BN) as optimization. As shown in Figure 16a, the training loss of BN+GC is reduced faster than BN, so GC can further speed up convergence of the model during training. Figure 16b compares SmoothL1Loss with the manual combination of L2loss and L1Loss. Using SmoothL1Loss is significantly lower than the one using the manual combination of L2loss and L1Loss. This proves that SmoothL1Loss can make the model more accurate. The GC is applied on conditional GAN of ppFDetector to regularize weight space and output feature space to prevent overfitting, so as to improve the generalization performance in the deep convolution network. Figure 16c is a comparison of the loss with and without the CBAM module. CBAM first extracts features based on the attention of the spatial domain, then extracts features based on the attention of the channel domain, and finally achieves the optimal attention extraction effect, thereby improving the performance of the model. Figure 16a shows the comparison of the loss with GC and without GC in 50 rounds of training. The original network only takes Batch Normalization (BN) as optimization. As shown in Figure 16a, the training loss of BN+GC is reduced faster than BN, so GC can further speed up convergence of the model during training. Figure 16b compares SmoothL1Loss with the manual combination of L2loss and L1Loss. Using SmoothL1Loss is significantly lower than the one using the manual combination of L2loss and L1Loss. This proves that SmoothL1Loss can make the model more accurate. The GC is applied on conditional GAN of ppFDetector to regularize weight space and output feature space to prevent overfitting, so as to improve the generalization performance in the deep convolution network. Figure 16c is a comparison of the loss with and without the CBAM module. CBAM first extracts features based on the attention of the spatial domain, then extracts features based on the attention of the channel domain, and finally achieves the optimal attention extraction effect, thereby improving the performance of the model.

Conclusions
In this paper, with an aim to detect the fault of PV panels in a fast and accurate manner, a novel PV panel fault detection approach-ppFDetector is proposed. The ppFDetector does not require a large number of negative samples. It employs the generative adversarial network to extract image features to approximate the image data distribution of the positive samples. Thus, the model can generate a reconstructed image extremely similar to its positive samples. By calculating the differences between the original image and its

Conclusions
In this paper, with an aim to detect the fault of PV panels in a fast and accurate manner, a novel PV panel fault detection approach-ppFDetector is proposed. The ppFDetector does not require a large number of negative samples. It employs the generative adversarial network to extract image features to approximate the image data distribution of the positive samples. Thus, the model can generate a reconstructed image extremely similar to its positive samples. By calculating the differences between the original image and its reconstructed image, ppFDetector can accurately determine whether the PV panel is abnormal. This anomaly detection model is with strong transplantation. As there are a large number of positive samples, binary classifications can be performed after the model is trained; i.e., ppFDetector can solve the problem that models cannot train without negative samples. Moreover, ppFDetector accelerates the model training speed while increasing the generalization performance of the model through the application of gradient centralized. The ppFDetector also uses the CBAM and SmoothL1 loss function to further improve the accuracy and robustness of PV panel fault detection, which can replace manual inspection. In the future, we will work towards expanding the abnormal types that ppFDetector can detect by collecting abnormal datasets with various types of defects (such as dust cover, snail pattern and rust), and employ the CNN to classify the types of the defect after detecting abnormal areas.