Photovoltaic Array Fault Diagnosis Based on Gaussian Kernel Fuzzy C-Means Clustering Algorithm

In the fault diagnosis process of a photovoltaic (PV) array, it is difficult to discriminate single faults and compound faults with similar signatures. Furthermore, the data collected in the actual field experiment also contains strong noise, which leads to the decline of diagnostic accuracy. In order to solve these problems, a new eigenvector composed of the normalized PV voltage, the normalized PV current and the fill factor is constructed and proposed to characterize the common faults, such as open circuit, short circuit and compound faults in the PV array. The combination of these three feature characteristics can reduce the interference of external meteorological conditions in the fault identification. In order to obtain the new eigenvectors, a multi-sensory system for fault diagnosis in a PV array, combined with a data-mining solution for the classification of the operational state of the PV array, is needed. The selected sensors are temperature sensors, irradiance sensors, voltage sensors and current sensors. Taking account of the complexity of the fault data in the PV array, the Kernel Fuzzy C-means clustering method is adopted to identify these fault types. Gaussian Kernel Fuzzy C-means clustering method (GKFCM) shows good clustering performance for classifying the complex datasets, thus the classification accuracy can be effectively improved in the recognition process. This algorithm is divided into the training and testing phases. In the training phase, the feature vectors of 8 different fault types are clustered to obtain the training core points. According to the minimum Euclidean Distances between the training core points and new fault data, the new fault datasets can be identified into the corresponding classes in the fault classification stage. This strategy can not only diagnose single faults, but also identify compound fault conditions. Finally, the simulation and field experiment demonstrated that the algorithm can effectively diagnose the 8 common faults in photovoltaic arrays.


Introduction
With the intensification of the energy and environment crisis, clean energy plays an integral role in restraining global warming issues and has received more and more attention in industrial circles. As a major clean energy technology, photovoltaic power generation has received worldwide attention in recent years, especially in developing countries. Online intelligent multisensor monitoring systems are the guarantee of stable operation for the PV system. This direction is increasingly becoming a research hotspot in academic circles. Once the signal from the sensors have been acquired, various diagnostic techniques of the PV array can be adopted to extract as much information as possible from these signals. Furthermore, suitable decision-making strategies can be built up for failure detection of a PV array. A variety of studies have applied different diagnostic approaches to conduct this industrial task. In paper [1,2], Takashima T et al. used the acquired voltage and current signals to propose quantities and the robustness of the GKFCM, the generalization ability of the fault diagnostic algorithm can be further guaranteed.

Gaussian Kernel Fuzzy C-Means Clustering Method
The gaussian kernel fuzzy C-means clustering method contains two steps: firstly, the original space X is mapped to high-dimension space F by the Gaussian Kernel Function and then using the clustering algorithm to classify different datasets. The GKFCM can highlight the difference of the sample features by the non-linear mapping operation in the kernel space. This method is suitable for the classification problems with similar sample characteristics [20][21][22][23][24][25][26]. The nonlinear mapping function is defined as follow: Φ : where x k is the sample of the original space X. The objective function of the clustering algorithm is given by where ν i is the core point of the original sample space; c is the clustering number; n is the sample number in the original sample space; µ ik is the membership between the kth sample and the ith fault class. µ ik satisfies three conditions including µ ik ∈ [0, 1], 0 < n ∑ k=1 µ ik < n and c ∑ i=1 µ ik = 1, k = 1, 2, · · · , n; m is a weighted parameter. The kernel function is defined as follows: Therefore, the Euclidean distance in Equation (2) is given by The common kernel functions generally contain the Gaussian kernel function, polynomial kernel function and Sigmoid kernel function. The Gaussian kernel function is adopted in this paper and shown as the Equation (5).
K(x k , ν i ) = exp − x k − ν i 2 / 2σ 2 (5) where σ represents the parameter of Gaussian kernel function. Altogether, the Equation (2) uses the constraints and the Lagrange multiplier approach to find µ ik and ν i . The parameters µ ik and ν i are shown as the Equations (6) and (7).

The Fault Diagnosis Algorithm Based on GKFCM
The fault diagnosis method based on the GKFCM is divided into two stages: the training and testing phases. In the training stage, using the training fault datasets to train the GKFCM and taking the fault classification error as the principle of the training performance. When the classification Sensors 2019, 19, 1520 4 of 15 error reaches the minimum level, the core points are calculated and obtained. The classification error equation is shown by where C L i and C L i are the ith class fault dataset and its corresponding sample number, respectively. c is the sample number of the fault dataset. C i belongs to the ith fault class after the GKFCM clustering.
In the testing phase, the similarity between the new fault data and the center points of the reference fault datasets is used to judge the fault category of the new fault data. The steps of the fault diagnosis are listed as follows:

•
Using the fault datasets including c classes as the training datasets • According to the Equation (7), the core points of the reference datasets are calculated and obtained when the classification error rate reaches the minimum level. • Using the Equations (9) and (10) to determine the fault type of the new sample x new in the testing datasets.
where ρ i is the similarity between x new and the dataset C L i . The larger the value of ρ i , the higher the possibility of x new belonging to the corresponding fault type [15]. The Equation (8) is the judging criteria for the classification of the datasets in Support Vector Machine (SVM) and we use this principle to discriminate different fault datasets in this paper. λ is the category attribution threshold and the value is generally distributed between 0 and 0.5 [15,16].

•
The previous step can not only determine the known faults in the training datasets but also judge the unknown fault types. If the x new belongs to an unknown fault, it will be classified as the c + 1th class.
The flow chart of the proposed fault diagnostic method are described as Figure 1.

Selection of the Fault Feature Quantities
In order to obtain a good diagnostic accuracy, three characteristics are constructed and combined

Selection of the Fault Feature Quantities
In order to obtain a good diagnostic accuracy, three characteristics are constructed and combined as the feature vector for characterizing different fault types of PV array [16][17][18]27].

•
The normalized PV voltage V norm (11) where V oc−ref is the open circuit voltage of the reference PV module. m is the module number in each branch.

•
The normalized PV current I norm I norm = I mpp n × I sc−ref (12) where I sc−ref is the short circuit current of the reference PV module. n is the branch number of the PV array.

•
The Fill Factor (FF) (13) where V oc−array is the open circuit voltage of the PV array and I sc−array is the short circuit current of the PV array.
The 4 × 3 PV array is designed as the research object in this paper and the detailed operating conditions contains four categories: normal, open circuit, short circuit and compound fault conditions. There are 8 fault types in total. c is the number of the fault types and equals to 8. The 8 fault types of the photovoltaic array are simulated in the simulation and the fault feature vectors X that characterizing the 8 faults are acquired and shown as the Equation (14).
where X is the datasets of 8 fault types. n is the sample number of each fault.

Model of the Photovoltaic Array
The single diode model (SDM) shown as Equation (15) is the most commonly utilized model [28][29][30]. It is used to construct the 4 × 3 PV array to mimic accurately solar cells and PV modules behaviors in this paper.
The parameters of the PV module in the simulation are listed in Table 1. The simulation model of the 4 × 3 PV array is shown in Figure 2.   Table 2. one module in short-circuit condition short1 Case 5 two modules distributed in one string in short-circuit condition short2 The wide-weather environment conditions in simulation experiment are set as the ranges of solar irradiance varying from 100 to 1000 W/m 2 with step of 50 W/m 2 and the backplane temperature changing from 0 • C to 40 • C with step of 1 • C. The 8 fault types can be simulated and carried out by the different combinations of Air Switches (AS) shown in Figure 2. The detailed descriptions of 8 fault types are listed in Table 2.
In the simulation experiment, 779 samples were acquired in each fault type of the 8 faults and 6332 fault samples were obtained in total. one module in short-circuit condition short1 Case 5 two modules distributed in one string in short-circuit condition short2 Case 6 two modules distributed in two different strings respectively in short-circuit condition s1s1 Case 7 one module in short-circuit condition and another string in open-circuit condition s1o1 Case 8 two modules distributed in two different strings respectively in short-circuit condition and the other string is in open-circuit condition s1s1o1

The Feature Characteristic Analysis of 8 Fault Types
The conventional feature vectors V mpp , I mpp were used to characterize the 8 fault types in the paper [6] and the distribution of the eigenvectors V mpp , I mpp are shown in Figure 3. It is obvious that the 8 fault types overlap seriously and they are difficult to differentiate.   In the paper [6,7], the eigenvectors of 8 fault types are tranformed and improved as (V norm , I norm ) for further discriminating the overlapping faults. The distribution of the feature vectors (V norm , I norm ) are described in Figure 4 and the faults are mostly separated. However, short1 and s1s1, s1o1 and s1s1o1 still overlap with each other. The data distribution in Figure 3; Figure 4 indicates that the eigenvectors V mpp , I mpp and (V norm , I norm ) have limitations in differentiating the single faults and compound faults with similar characteristics. It was found that the Filling Factor (FF) of each fault case was different through the in-depth research. So, the normalized PV voltage V norm , the normalized PV current I norm and the Fill Factor (FF) were combined together as a new eigenvector to further characterize 8 different fault types. The results, showing use of the feature vectors (V norm , I norm , FF) for fault characterization of the 8 faults, are shown in Figure 5.
The distribution of the eigenvectors (V norm , I norm , FF) in Figure 5 demonstrates that the newly proposed feature vector can discriminate the 8 fault types obviously. This new eigenvector is appropriate for describing and characterizing some complex operating conditions, including single and compound faults with similar characteristics.  The distribution of the eigenvectors ( ) norm norm , , V I FF in Figure 5 demonstrates that the newly proposed feature vector can discriminate the 8 fault types obviously. This new eigenvector is appropriate for describing and characterizing some complex operating conditions, including single and compound faults with similar characteristics. However, the actual data acquisition process is usually accompanied by the noise of the environment and acquisition devices. Therefore, the fault datasets acquired are complex and the differentiation degree of each fault is not obvious. Since GKFCM can effectively improve the clustering performance of the complex datasets, this clustering algorithm was adopted to cluster and identify fault types in this article. The classification and identification of the new fault datasets is based on the maximum similarity between the new fault datasets and the core points of the reference datasets. As the distribution of the 8 fault types is relatively concentrated in Figure 5, a small part of the fault datasets in wide-weather environmental conditions can be used as the reference datasets to train the GKFCM and then identify the new fault datasets. The environmental parameters in the 8 fault datasets collection process are listed in Table 3. In the training phase of the simulation, the detailed parameters of GKFCM were set as follows: the clustering number was 8, weighted index m was 2, the maximum number of the iterations was 1000 and the limitation of the iteration was the minimum similarity with the value 10 −5 .  Figure 5 gives an intuitive display that 8 faults can be separated by the eigenvectors (V norm , I norm , FF). However, the actual data acquisition process is usually accompanied by the noise of the environment and acquisition devices. Therefore, the fault datasets acquired are complex and the differentiation degree of each fault is not obvious. Since GKFCM can effectively improve the clustering performance of the complex datasets, this clustering algorithm was adopted to cluster and identify fault types in this article. The classification and identification of the new fault datasets is based on the maximum similarity between the new fault datasets and the core points of the reference datasets. As the distribution of the 8 fault types is relatively concentrated in Figure 5, a small part of the fault datasets in wide-weather environmental conditions can be used as the reference datasets to train the GKFCM and then identify the new fault datasets. The environmental parameters in the 8 fault datasets collection process are listed in Table 3. In the training phase of the simulation, the detailed parameters of GKFCM were set as follows: the clustering number was 8, weighted index m was 2, the maximum number of the iterations was 1000 and the limitation of the iteration was the minimum similarity with the value 10 −5 . The Figure 6 shows the distribution of the 8 faults in the simulated training datasets and the red points are the cores of each clustering. The center-point coordinates of the 8 clusterings in the reference fault datasets are listed in Table 4.  in the reference fault datasets. λ is set as 0.5 and the results of the classification are listed in Table 5.  Number  normal  110  110  open1  110  110  open2  110  110  short1  110  110  short2  110  110  s1s1  110  110  s1o1  110  110  s1s1o1 110 110

Identified Sample
In summary, the overall diagnostic accuracy was 100%. The simulation experiment shows that the proposed method has a good fault diagnostic performance.

The Field Experiment
In the field experiment, the GSP-240 PV module was used to build the 4 3 × PV array for validating the proposed fault diagnostic method. The key specifications of GSP-240 module are listed in Table 6 and the experiment platform is shown in Figure 7. The field experiment platform contained 4 3 × PV array, 5 KW inverter, a series fuse, a combiner box and the data collecting and recording system. The data acquisition system consists of a temperature sensor, a solar irradiance sensor and two pairs of current and voltage sensors. The first two sensors and one pair of current and voltage sensors were located on the reference module. They were used to collect meteorological information, the short circuit current I and open circuit voltage V of the reference module, respectively.  In the testing stage, each dataset of 8 fault types contained 110 samples. The Equations (9) and (10) were used to calculate the similarity of the new fault datasets and the core points o i , i = 1, 2, . . . , 8 in the reference fault datasets. λ is set as 0.5 and the results of the classification are listed in Table 5. Table 5. Eight fault identification accuracy in testing phase of simulation experiment. Identified Sample Number   normal  110  110  open1  110  110  open2  110  110  short1  110  110  short2  110  110  s1s1  110  110  s1o1  110  110  s1s1o1 110 110

Fault Types Sample Number for Identification
In summary, the overall diagnostic accuracy was 100%. The simulation experiment shows that the proposed method has a good fault diagnostic performance.

The Field Experiment
In the field experiment, the GSP-240 PV module was used to build the 4 × 3 PV array for validating the proposed fault diagnostic method. The key specifications of GSP-240 module are listed in Table 6 and the experiment platform is shown in Figure 7. The field experiment platform contained 4 × 3 PV array, 5 KW inverter, a series fuse, a combiner box and the data collecting and recording system. The data acquisition system consists of a temperature sensor, a solar irradiance sensor and two pairs of current and voltage sensors. The first two sensors and one pair of current and voltage sensors were located on the reference module. They were used to collect meteorological information, the short circuit current I sc−re f and open circuit voltage V oc−re f of the reference module, respectively. The operating voltage V mpp and current signals I mpp of PV array were measured by means of the other pair of the current and voltage sensors. This set of sensors shown in the data acquisition board in Figure 7 were mounted at the input of the converter. In total, a 6-sensor system was developed for online monitoring of the PV array. In this experiment, eight fault datasets were simulated by different combination of the air breakers and the different fault datasets were collected from 09:30 to 10:30 on 7 September 2018. In the fault datasets, each fault dataset owns 180 samples. Therein, 60 fault samples of each class were defined as the training datasets and the remaining 120 samples in each fault type were divided as the testing datasets.  Figure 7 were mounted at the input of the converter. In total, a 6-sensor system was developed for online monitoring of the PV array. In this experiment, eight fault datasets were simulated by different combination of the air breakers and the different fault datasets were collected from 09:30 to 10:30 on 7 September 2018. In the fault datasets, each fault dataset owns 180 samples. Therein, 60 fault samples of each class were defined as the training datasets and the remaining 120 samples in each fault type were divided as the testing datasets.   The range of the irradiance and backplane temperature in the daytime on 7 September 2018 are shown in Figure 8.
The eigenvector (V norm , I norm ) distribution of 8 fault types shown in Figure 9 indicates that some single faults and compound faults with similar characteristics are difficult to be discriminated in two-dimension space (V norm , I norm ). In contrast, the feature vectors (V norm , I norm , FF) that are shown distributed in Figure 10 can yield a more obvious discrimination of 8 fault types than the eigenvectors (V norm , I norm ). After the training process of GKFCM, the kernels of the 8 fault training datasets with two-dimension and three-dimension feature characteristics were calculated and distributed as the red points in Figures 9 and 10, respectively.
In the fault identification phase of GKFCM, 120 samples of each fault are used to validate the effectiveness of the algorithm. The Equation (8) Table 7; Table 8, respectively. The Table 7 is the results acquired by processing two-dimension datasets and the Table 8 are the diagnostic performance that representing the processing results of the three-dimension fault datasets. In two tables, the first 8 faults in x new belong to the corresponding categories. In addition, line to line fault was defined as an unknown fault for validating the ability of the algorithm to identify an unknown faults. The diagnostic results in the last row of Tables 7 and 8 illustrate that the proposed method based on GKFCM can also identify the unknown fault accurately. The range of the irradiance and backplane temperature in the daytime on 7 September 2018 are shown in Figure 8.       Table 7; Table 8, respectively. The Table 7 is the  Table 7. Eight fault and unknown fault identification accuracy in testing phase of the field experiment. Sample Number for Identification  Identified Sample Number   normal  120  118  open1  120  119  open2  120  119  short1  120  72  short2  120  116  s1s1  120  67  s1o1  120  79  s1s1o1 120 66 unknown fault 120 0 It can be seen in Tables 7 and 8 that all 8 fault types within the training datasets and the unknown faults realize fine discrimination. The results of this experiment demonstrate that the proposed algorithm has a good diagnostic performance for 8 common faults, including 5 single faults and 3 compound faults in PV array.

Fault Types
In order to further validate the persuasive and reliability of the proposed clustering method, a three-layer BP neural network was built to detect the 8 faults and an unknown fault. The structure of the BP neural network is shown in Figure 11. The BP neural network was trained by using the training datasets including 8 fault types, then the remaining fault samples of 8 fault types and the unknown fault samples were diagnosed by the trained BP neural network in the testing phase. The specific diagnostic results respectively characterized by two-dimension and three-dimension eigenvectors are shown in Tables 9 and 10. The parameters in the BP neural network were set as follows: the iteration number was 1000, the learning rate was set as 0.01 and the minimum error was 0.01.  Table 9. Table 9. Eight faults and unknown fault identification accuracy in testing phase of the field experiment. Figure 11. Basic structure of the BP neural network. Table 9. Eight faults and unknown fault identification accuracy in testing phase of the field experiment. Sample Number for Identification  Identified Sample Number   normal  120  117  open1  120  117  open2  120  119  short1  120  73  short2  120  112  s1s1  120  55  s1o1  120  58  s1s1o1 120 96 unknown fault 120 all identified as normal In the testing stage, two-dimension eigenvectors (V norm , I norm ) were used to characterize the 8 fault types and the unknown faults. The corresponding results are shown in Table 9.

Fault Types
Meanwhile, three-dimension eigenvectors (V norm , I norm , FF) were used to characterize the 8 fault types and the unknown faults. The final results are shown in Table 10.
In comparation with the results shown in Tables 8 and 10, the processing algorithm with the two-dimension eigenvector has a lower diagnostic accuracy, and the details are shown in Tables 7  and 9. These results demonstrate that the three-dimensional feature vectors, including the extended 3rd-dimension feature quantity, do improve the diagnostic accuracy of the faults.
In addition, compared to the results shown in Tables 7 and 8, the performance of the BP neural network shown in Tables 9 and 10 for the 8 known faults was similar to the results acquired by the proposed algorithm. However, the unknown faults are all identified as normal when the unknown faults occur. In respect to the detection of the unknown faults, the BP neural network has poor generalization ability. In contrast, the detection method based on GKFCM still has a relatively better diagnostic result when the unknown faults occur.
Overall, the results of the simulation and field experiments demonstrate that the strategy combining the new three-dimension eigenvectors and GKFCM proposed in this paper does not only exhibit a good diagnostic accuracy on the existing 8 fault types in the training datasets, but also can identify the unknown faults well. The algorithm presented has a good diagnostic accuracy and generalization capability.

Conclusions
A promising architecture to detect 8 faults of a PV array has been presented. This diagnostic strategy comprises the new feature eigenvector, including three new feature quantities and the GKFCM. The new feature vectors, including the normalized PV voltage, the normalized PV current and the fill factor, are proposed in the paper to discriminate 5 single faults and 3 compound faults with similar characteristics. The simulation and field experiments demonstrate that the proposed new eigenvectors have a good clustering ability and can differentiate 8 fault types in wide-weather conditions. Since the acquired fault datasets are accompanied by external and internal noise, the GKFCM was adopted to