A New PV Array Fault Diagnosis Method Using Fuzzy C-Mean Clustering and Fuzzy Membership Algorithm

Photovoltaic (PV) power station faults in the natural environment mainly occur in the PV array, and the accurate fault diagnosis is of particular significance for the safe and efficient PV power plant operation. The PV array’s electrical behavior characteristics under fault conditions is analyzed in this paper, and a novel PV array fault diagnosis method is proposed based on fuzzy C-mean (FCM) and fuzzy membership algorithms. Firstly, clustering analysis of PV array fault samples is conducted using the FCM algorithm, indicating that there is a fixed relationship between the distribution characteristics of cluster centers and the different fault, then the fault samples are classified effectively. The membership degrees of all fault data and cluster centers are then determined by the fuzzy membership algorithm for the final fault diagnosis. Simulation analysis indicated that the diagnostic accuracy of the proposed method was 96%. Field experiments further verified the correctness and effectiveness of the proposed method. In this paper, various types of fault distribution features are effectively identified by the FCM algorithm, whether the PV array operation parameters belong to the fault category is determined by fuzzy membership algorithm, and the advantage of the proposed method is it can classify the fault data from normal operating data without foreknowledge.


Introduction
The photovoltaic (PV) power plant works under a tough natural environment, and PV array faults are complicated and various, seriously affecting safe-stable operation and economic benefits of the power station in a very complex and dynamic manner.The DC monitor resolution available to PV power plants has reached the PV array level.The resolution of certain smart PV power stations has even reached the module level.There is critical significance in identifying and early warning for DC faults using the PV module/output data array of the PV power station in regards to intelligent predictive maintenance of the PV power plant and improving the overall operation level of the station [1].
The classification and diagnosis of PV array faults has become a popular research subject in recent years.Model-based algorithms and intelligence-based algorithms have drawn increasing attentions recently.Model-based and multivariable statistical monitoring methods are the common methods for fault identification, for example, Stellbogen [2] compared actual and expected values through detection equipment for fault analysis; however, they did not establish any method for setting thresholds between them.The model using PCA and other multivariate statistical monitoring methods for fault diagnosis is it classifies individual sample points to realize fault diagnosis.This paper presents a novel approach to PV array fault diagnosis based on FCM and fuzzy membership algorithms.The randomness and uncertainty of PV array fault characteristics are solved by the introduction of fuzzy theory, simulation and experimental analyses demonstrate that the proposed method scientifically classifies fault sample data for efficient, accurate PV array fault diagnosis.
This paper is organized as follows: Section 2 discusses the fault characteristics of PV arrays.Section 3 provides an introduction to basic theories, principles, and application methods relevant to the FCM algorithm and fuzzy membership algorithm for fault diagnosis.Section 4 reports simulation tests and Section 5 reports experimental tests conducted to validate the proposed diagnosis method.Section 6 provides concluding remarks.

Generation Mechanism of Typical Faults in PV Arrays
The actual operation of the PV power station is affected by multiple external factors such as solar radiation intensity, temperature, humidity, dust, hail, and snow constituting a harsh (and highly fault-prone) environment [23].The PV array is an integral part of the PV system; its cost can account for about 40% of the power system as a whole.In this paper, four common faults or abnormal condition of the laboratory PV plant is studied respectively, which is configured as three parallel PV strings of 13 PV modules in series (regard this as the research object), as shown in Figure 1.Three common faults of the PV array include the hot spot phenomenon (partial blockage), open circuits, or short circuiting of the PV module caused by junction box error.Long-term shadow shadings and module mismatch also accelerate the rate of degradation and introduce corresponding aging faults.
distinguish the influence degree of fault classifications based on a wide variety of data; it classifies individual sample points to realize fault diagnosis.This paper presents a novel approach to PV array fault diagnosis based on FCM and fuzzy membership algorithms.The randomness and uncertainty of PV array fault characteristics are solved by the introduction of fuzzy theory, simulation and experimental analyses demonstrate that the proposed method scientifically classifies fault sample data for efficient, accurate PV array fault diagnosis.
This paper is organized as follows: Section 2 discusses the fault characteristics of PV arrays.Section 3 provides an introduction to basic theories, principles, and application methods relevant to the FCM algorithm and fuzzy membership algorithm for fault diagnosis.Section 4 reports simulation tests and Section 5 reports experimental tests conducted to validate the proposed diagnosis method.Section 6 provides concluding remarks.

Generation Mechanism of Typical Faults in PV Arrays
The actual operation of the PV power station is affected by multiple external factors such as solar radiation intensity, temperature, humidity, dust, hail, and snow constituting a harsh (and highly fault-prone) environment [23].The PV array is an integral part of the PV system; its cost can account for about 40% of the power system as a whole.In this paper, four common faults or abnormal condition of the laboratory PV plant is studied respectively, which is configured as three parallel PV strings of 13 PV modules in series (regard this as the research object), as shown in Figure 1.Three common faults of the PV array include the hot spot phenomenon (partial blockage), open circuits, or short circuiting of the PV module caused by junction box error.Long-term shadow shadings and module mismatch also accelerate the rate of degradation and introduce corresponding aging faults.For simplicity, six failure modes of the PV array are referred to in this paper: a one module short circuit, two module short circuit, local shadow shading in one string group, local shadow shading in two string groups, and one module open circuit.These modes are marked F1, F2, F3, F4, F5, and F6, respectively.As shown in Table 1, based on typical fault mode of PV array set, the physics-based mathematical model of the PV cell is established according to "Accurate model simulation research on PV cells, modules and arrays" for different fault type [24].The modeling results are shown in Figure 2, where I-V and P-V curves describe the distribution characteristics of the PV array's electrical parameters under different fault conditions.The fault features are summarized in Table 1.For simplicity, six failure modes of the PV array are referred to in this paper: a one module short circuit, two module short circuit, local shadow shading in one string group, local shadow shading in two string groups, and one module open circuit.These modes are marked F1, F2, F3, F4, F5, and F6, respectively.As shown in Table 1, based on typical fault mode of PV array set, the physics-based mathematical model of the PV cell is established according to "Accurate model simulation research on PV cells, modules and arrays" for different fault type [24].The modeling results are shown in Figure 2, where I-V and P-V curves describe the distribution characteristics of the PV array's electrical parameters under different fault conditions.The fault features are summarized in

Fault Characteristic Parameter Selection
Changes in the PV array are similar under different fault modes and the same test conditions (light intensity and temperature), as shown in the I-V and P-V curves in Figure 2.This suggests that it is not feasible to diagnose fault in the PV array only by analyzing the I-V and P-V curves.Additional fault parameters must be selected to describe the working conditions of the PV power system.
Changes in the electrical parameters under different fault conditions were determined as shown in Figure 3 using the actual external environment input excitation simulation model.
As shown in Figure 3, under different fault conditions, the output characteristics of one or more PV arrays change dramatically.To this effect, the output characteristics of the PV array may serve as fault characteristic parameters under different fault states and environments: Uoc, Isc, Um, Im and Pm, expressed as the form of the fault eigenvectors (Uoc, Isc, Um, Im, Pm) (the description of the parameters is given in Table 2).

Fault Characteristic Parameter Selection
Changes in the PV array are similar under different fault modes and the same test conditions (light intensity and temperature), as shown in the I-V and P-V curves in Figure 2.This suggests that it is not feasible to diagnose fault in the PV array only by analyzing the I-V and P-V curves.Additional fault parameters must be selected to describe the working conditions of the PV power system.
Changes in the electrical parameters under different fault conditions were determined as shown in Figure 3 using the actual external environment input excitation simulation model.
As shown in Figure 3, under different fault conditions, the output characteristics of one or more PV arrays change dramatically.To this effect, the output characteristics of the PV array may serve as fault characteristic parameters under different fault states and environments: U oc , I sc , U m , I m and P m , expressed as the form of the fault eigenvectors (U oc , I sc , U m , I m , P m ) (the description of the parameters is given in Table 2).

Fuzzy C-Mean Clustering Algorithm
Fuzzy clustering is commonly applied within knowledge discovery, pattern recognition, and many other research fields.The FCM algorithm is one of the most widely used and successful algorithms for fuzzy clustering, which improves Hard C-mean clustering (HCM) algorithm, and represents the foundation upon which other fuzzy clustering analysis methods have been developed in theory and application.
FCM is a classification method as well as a clustering algorithm.The membership degree of individual sample points is obtained iteratively by optimizing the objective function.The class of sample points is determined to achieve the automatic classification of sample data.As discussed above, this method is commonly used in the fault diagnosis field [20].
Set up n data sample as = | , = 1,2, … , | = , , … , , divide n data vectors Xi into a c fuzzy group, then calculate the c cluster center v = {v1, v2, …, vn}.This produces the minimum value of objective functions.Next, determine the level of each data point belonging to each group according to the membership degree, which is any value in the [0, 1] interval.The sum of membership values of each sample point to each cluster center is 1.The following two principles must be satisfied: The general form of FCM algorithm's objective function Jb can be expressed as follows:

Fuzzy C-Mean Clustering Algorithm
Fuzzy clustering is commonly applied within knowledge discovery, pattern recognition, and many other research fields.The FCM algorithm is one of the most widely used and successful algorithms for fuzzy clustering, which improves Hard C-mean clustering (HCM) algorithm, and represents the foundation upon which other fuzzy clustering analysis methods have been developed in theory and application.
FCM is a classification method as well as a clustering algorithm.The membership degree of individual sample points is obtained iteratively by optimizing the objective function.The class of sample points is determined to achieve the automatic classification of sample data.As discussed above, this method is commonly used in the fault diagnosis field [20].
Set up n data sample as X = |x i , i = 1, 2, . . ., n| = {x 1 , x 2 , . . . ,x n }, divide n data vectors X i into a c fuzzy group, then calculate the c cluster center v = {v 1 , v 2 , . . ., v n }.This produces the minimum value of objective functions.Next, determine the level of each data point belonging to each group according to the membership degree, which is any value in the [0, 1] interval.The sum of membership values of each sample point to each cluster center is 1.The following two principles must be satisfied: The general form of FCM algorithm's objective function J b can be expressed as follows: where n is the number of samples, c (2 ≤ c ≤ n) is the number of cluster centers; µ ik is the membership degree between sample x i and class A k ; d ik is a Euclidean measurement distance between the i sample The membership degree µ ik between the sample x i and class A k is calculated as follows: The cluster centers are calculated as follows: We modify the cluster centers and membership repeatedly according to Equations ( 3) and ( 4).When the algorithm converges, the cluster center and membership degree of each sample to each class can be obtained successfully and the fuzzy clustering division is complete.The analysis shows that FCM algorithm is a simple iteration process, the general steps of determining cluster center and membership matrix based on the FCM algorithm [20,25] are as shown in Figure 4.
Energies 2018, 11, 238 6 of 21 where n is the number of samples, c (2 ≤ c ≤ n) is the number of cluster centers; μik is the membership degree between sample xi and class Ak; dik is a Euclidean measurement distance between the i sample The membership degree μik between the sample xi and class Ak is calculated as follows: Set up = 2 ≤ c < ; , = 0 , for all i classes, i  Ik, μik = 0.The cluster centers are calculated as follows: We modify the cluster centers and membership repeatedly according to Equations ( 3) and ( 4).When the algorithm converges, the cluster center and membership degree of each sample to each class can be obtained successfully and the fuzzy clustering division is complete.The analysis shows that FCM algorithm is a simple iteration process, the general steps of determining cluster center and membership matrix based on the FCM algorithm [20,25] are as shown in Figure 4.The FCM algorithm converges rapidly with relatively few training samples; it can thus facilitate fault diagnosis very efficiently.MATLAB software (R2015b) also provides a rich functions for the FCM algorithm and is easily operable for fault diagnosis personnel [20].

Membership Function Algorithm Based on Fuzzy Normal Distribution
Fuzzy sets are completely described by their corresponding membership functions.In classical sets, the membership degree between sets and elements can only be 0; in fuzzy sets, the membership degree between sets and elements can be any value in the [0, 1] interval.It can thus be used to describe the extent to which an element belongs to the concept in the domain U.The membership function is the most fundamental concept of fuzzy mathematics as it quantizes the necessary fuzzy sets [26][27][28].
To define the fuzzy set, make sure that fuzzy subset A in the domain U encompasses the characteristics of membership function µ A and construct the following map: where µ A is the membership function of the fuzzy subset; µ A (x) is the membership degree of U to A, which represents the degree of the element u belonging to its fuzzy subset A in the domain U with continuous variables on a closed interval [0, 1].The closer µ A (x) is to 1, the greater the extent to which u belongs to A. The closer µ A (x) is to 0, the lesser the degree of u belonging to A.
For the fault diagnosis of PV array, the characteristic parameters change in a certain range, PV array is under healthy conditions with these parameters in a certain scope, and PV array is under faulty conditions while these parameters out the scope, so the typical normal distribution function is selected to calculate membership degree between diagnosis samples and the cluster center in the PV system to diagnose the PV array directly and clearly according to the membership degree.Figure 5 shows the curve of Normal Distribution Membership Function.The membership function of normal distribution-Gaussian function is used to calculate the membership degree of each parameter: where µ(x) is the membership degree of the parameter x; µ is expected value of the distribution; σ is the width of the Gaussian function.The FCM algorithm converges rapidly with relatively few training samples; it can thus facilitate fault diagnosis very efficiently.MATLAB software (R2015b) also provides a rich functions for the FCM algorithm and is easily operable for fault diagnosis personnel [20].

Membership Function Algorithm Based on Fuzzy Normal Distribution
Fuzzy sets are completely described by their corresponding membership functions.In classical sets, the membership degree between sets and elements can only be 0; in fuzzy sets, the membership degree between sets and elements can be any value in the [0, 1] interval.It can thus be used to describe the extent to which an element belongs to the concept in the domain U.The membership function is the most fundamental concept of fuzzy mathematics as it quantizes the necessary fuzzy sets [26][27][28].
To define the fuzzy set, make sure that fuzzy subset A in the domain U encompasses the characteristics of membership function μA and construct the following map: where μA is the membership function of the fuzzy subset; μA(x) is the membership degree of U to A, which represents the degree of the element u belonging to its fuzzy subset A in the domain U with continuous variables on a closed interval [0, 1].The closer μA(x) is to 1, the greater the extent to which u belongs to A. The closer μA(x) is to 0, the lesser the degree of u belonging to A.
For the fault diagnosis of PV array, the characteristic parameters change in a certain range, PV array is under healthy conditions with these parameters in a certain scope, and PV array is under faulty conditions while these parameters out the scope, so the typical normal distribution function is selected to calculate membership degree between diagnosis samples and the cluster center in the PV system to diagnose the PV array directly and clearly according to the membership degree.Figure 5 shows the curve of Normal Distribution Membership Function.The membership function of normal distribution-Gaussian function is used to calculate the membership degree of each parameter: where μ(x) is the membership degree of the parameter x; μ is expected value of the distribution; σ is the width of the Gaussian function.According to the Gaussian function characteristics, 99.73% of the area under the function curve is within three standard deviations (3σ) of the expected value μ.In this paper, 6σ is used as the function domain.The value of σ was obtained as follows: where μmax, μmin are the maximum and minimum values of the parameters.According to the Gaussian function characteristics, 99.73% of the area under the function curve is within three standard deviations (3σ) of the expected value µ.In this paper, 6σ is used as the function domain.The value of σ was obtained as follows: where µ max, µ min are the maximum and minimum values of the parameters.

Fault Diagnosis Based on FCM Algorithm and Fuzzy Membership Algorithm
The relationship between the fault category and the fault eigenvectors is established to improve robustness of the fault diagnosis methods based on the FCM algorithm, considering the randomness and uncertainty of the fault eigenvectors.The fuzzy membership algorithm (membership function based on fuzzy normal distribution) is a distance algorithm.It is used to quantize fuzzy sets to diagnose fault samples.By measuring the membership degree between fault samples and each fault mode, the fault diagnosis is finished by fuzzy membership algorithms.
The purpose of this study, as stated above, was to establish a novel PV array fault diagnosis technique based on the FCM clustering and fuzzy membership algorithms.The proposed method effectively exploits the advantages of FCM (excellent classification ability) as well as the membership function algorithm (excellent distance computing ability), improving the proposed method's accuracy.Firstly, the FCM clustering algorithm is used to conduct clustering analysis of PV array fault samples and give the cluster centers of various fault states.Then, the fuzzy membership function is designed and used to carry out the fault diagnosis of PV array.Figure 6 shows the fault diagnosis framework of a PV array.

Fault Diagnosis Based on FCM Algorithm and Fuzzy Membership Algorithm
The relationship between the fault category and the fault eigenvectors is established to improve robustness of the fault diagnosis methods based on the FCM algorithm, considering the randomness and uncertainty of the fault eigenvectors.The fuzzy membership algorithm (membership function based on fuzzy normal distribution) is a distance algorithm.It is used to quantize fuzzy sets to diagnose fault samples.By measuring the membership degree between fault samples and each fault mode, the fault diagnosis is finished by fuzzy membership algorithms.
The purpose of this study, as stated above, was to establish a novel PV array fault diagnosis technique based on the FCM clustering and fuzzy membership algorithms.The proposed method effectively exploits the advantages of FCM (excellent classification ability) as well as the membership function algorithm (excellent distance computing ability), improving the proposed method's accuracy.Firstly, the FCM clustering algorithm is used to conduct clustering analysis of PV array fault samples and give the cluster centers of various fault states.Then, the fuzzy membership function is designed and used to carry out the fault diagnosis of PV array.Figure 6 shows the fault diagnosis framework of a PV array.Step 1: Several fault feature parameters are selected through fault analysis, and fault samples are collected under various fault modes based on simulated (or measured) data, structure fault sample matrix.The fault sample sets are obtained.Meanwhile, the clustering number is also obtained by the types of the faults through fault analysis, which is set as the input parameter of the FCM algorithm.
Step 2: The FCM clustering algorithm is used to classify the selected fault samples.The optimal cluster centers of various fault states are obtained by adjusting clustering number C and the fault data sets are clustered under different fault types based on the FCM algorithm, which means that the fault classification is complete.The data classification process based on FCM algorithm is that the FCM classifies the existing fault data into several classes, comparable with established the number of fault types, known as cluster centers.In the process, the changes of the fault characteristics caused the changes of clustering center.When the new data was input, we can adjust some parameters of the FCM algorithm to obtain the new clustering center based on the process shown in Figure 6.
Step 3: The membership function algorithm based on fuzzy normal distribution is used to diagnose the faults using operation data, then calculate the degree of membership according to the cluster centers obtained by step 2. By transforming the fuzzy membership function into the distance function, it quantize the faults, calculate the membership degree of fault samples between each fault mode and each cluster center to complete the diagnosis, then the total membership degree of each failure mode is calculated via weighted mean method.
Step 4: The total membership degrees under various fault modes are sorted, then the largest membership degree is selected as the fault state of the diagnosed sample.The fault diagnosis is complete.

Formation of Fault Sample Data Sets
To simulate fault characteristics in different light intensities and temperatures in a typical PV system, an 3 × 13 PV array simulation model was built in MATLAB/Simulink (R2015b) as shown in Figure 7.
Energies 2018, 11, 238 9 of 21 Step 1: Several fault feature parameters are selected through fault analysis, and fault samples are collected under various fault modes based on simulated (or measured) data, structure fault sample matrix.The fault sample sets are obtained.Meanwhile, the clustering number is also obtained by the types of the faults through fault analysis, which is set as the input parameter of the FCM algorithm.
Step 2: The FCM clustering algorithm is used to classify the selected fault samples.The optimal cluster centers of various fault states are obtained by adjusting clustering number C and the fault data sets are clustered under different fault types based on the FCM algorithm, which means that the fault classification is complete.The data classification process based on FCM algorithm is that the FCM classifies the existing fault data into several classes, comparable with established the number of fault types, known as cluster centers.In the process, the changes of the fault characteristics caused the changes of clustering center.When the new data was input, we can adjust some parameters of the FCM algorithm to obtain the new clustering center based on the process shown in Figure 6.
Step 3: The membership function algorithm based on fuzzy normal distribution is used to diagnose the faults using operation data, then calculate the degree of membership according to the cluster centers obtained by step 2. By transforming the fuzzy membership function into the distance function, it quantize the faults, calculate the membership degree of fault samples between each fault mode and each cluster center to complete the diagnosis, then the total membership degree of each failure mode is calculated via weighted mean method.
Step 4: The total membership degrees under various fault modes are sorted, then the largest membership degree is selected as the fault state of the diagnosed sample.The fault diagnosis is complete.

Formation of Fault Sample Data Sets
To simulate fault characteristics in different light intensities and temperatures in a typical PV system, an 3 × 13 PV array simulation model was built in MATLAB/Simulink (R2015b) as shown in Figure 7.The irradiance of the model was set to range from 900 W/m 2 and 1000 W/m 2 and the temperature from 25 • C and 45 • C to simulate six different fault modes of the PV module; each fault mode's value of U oc , I sc , U m , I m , P m was obtained accordingly.Fault data samples under various fault modes were collected through multiple cycles of simulation.In each fault mode, 15 sample data points were randomly collected under different irradiation intensities to make a total of 90 data samples across six fault modes constituting fault sample matrix X:

FCM Algorithm Cluster Analysis
The selection of clustering number C for the FCM algorithm is very important.Generally, C is significantly smaller than the total number of cluster samples, and clustering number C > 1.Through the analysis of Section 2.2, PV array is taken as basic fault diagnosis units, and fault is classified into six classes: a one module short circuit, two module short circuit, local shadow shading in one string group, local shadow shading in two string groups, and one module open circuit.The number of fault types equals to clustering number C of FCM algorithm.So the parameters of the FCM algorithm (Equations ( 3) and ( 4)) include clustering number C = 6, weighted exponent m = 2, maximum iteration number L = 1000, and stopping iteration threshold ε = 10 −5 .The cluster of six fault modes were obtained using the FCM function as shown in Table 3.Each cluster center is the typical value of each fault mode and can be plugged into the fault dictionary of the PV array diagnostic system.The cluster center rules under different fault modes were quantified as shown in Table 4 according to the initial diagnosis and cluster centers of six fault modes in the PV array system (Table 3).R1

R5
All the characteristic parameters are zero 0 0 0 0 0 1 Five rules correspond to the results in Table 4 in comparison against the normal operation state.
Energies 2018, 11, 238 11 of 21 (1) When the open-circuit voltage drops about 32 V, the maximum-power-point voltage drops about 28 V and the maximum power drops 185 W. This is diagnosed as F2, i.e., one module short-circuit fault.( 2) When the open-circuit voltage drops about 65 V, the maximum-power-point voltage drops about 52 V and the maximum power drops 380 W. This is diagnosed as F3, i.e., two modules short-circuit fault.( 3) When the open-circuit voltage drops about 4 V, the maximum-power-point voltage drops about 30 V and the maximum power drops 200 W; this is an F4, or one module shaded fault.( 4) When the open-circuit voltage drops about 9 V, the maximum-power-point voltage drops about 58 V and the maximum power drops 420 W; this is an F5, or two modules shaded fault.( 5) When all the characteristic parameters are zero the fault is an F6, or one module opened fault.

Fault Diagnosis Using Fuzzy Membership Algorithm
The cluster centers of the PV array diagnosis system (Table 3) can be combined with the fuzzy membership function algorithm to calculate the membership degree between fault diagnosed samples and their cluster centers for complete fault diagnosis of the PV array.As mentioned above, the larger the membership degree, the more likely the diagnosis sample is to belong to the given fault state.
To apply the membership function algorithm based on fuzzy normal distribution to the PV array fault diagnosis, select a sample randomly and obtain the diagnosed parameters: U oc = 361.0858V, I sc = 7.7779 A, U m = 281.8601V, I m = 7.0669 A, P m = 1987.7488W. According to the deviation theory σ 2 introduced in Section 2.2, calculate the standard deviation of five fault characteristic parameters, then set up the membership function of the open-circuit voltage (U oc ), the short-circuit current (I sc ), the maximum-power-point voltage (U m ), the maximum-power-point current(I m ), and maximum power (P m ): Plug the parameters of the measured sample into Formula (10) to obtain the membership degree of fault samples between each fault mode and each cluster center as shown in Table 5.The extent to which PV power impacts each characteristic parameter is the same; the weighted total membership degree in the last column of Table 5 is the average value of U oc , I sc , U m , I m , P m .Once the total membership degrees are sorted (Table 5), select the largest membership degree as the fault state of the diagnosed sample.As shown in the last column of Table 5, µ F3 > µ F5 > µ F4 > µ F2 > µ F1 > µ F6 .Within the total membership degree sorting results, F4 two modules shorted fault comprises the largest proportion-that is, the diagnosed samples in the F4 fault state are consistent with the preset fault type, indicating that the proposed fault diagnosis method is effective and accurate.
Next, the selected range of the sample irradiance was expanded to 700 W/m 2 and 1000 W/m 2 and 150 fault samples of the PV array were selected based on the proposed method.Only six fault samples showed diagnostic errors out of the 150 sample.During actual diagnosis, some fault types are easily misjudged which may cause some errors, due to the similarity of faults types.The error analysis of fault samples is shown in Table 6.The diagnostic accuracy was 96%, effective, indicating that the proposed method is also highly feasible.

Actual Class
Table 6 gives the misdiagnosis rate of the fault samples.Among them, the correct number of diagnosed samples is marked with green, the wrong number of diagnosed samples is marked with orange, the misdiagnosis rate of each sample is marked with gray, and the misdiagnosis rate of all samples is marked with blue.
To verify the adaptability of the proposed method when a new fault was coming, a new fault named F7 which six PV modules are shaded in one string is addressed.With a perfect scalability of FCM algorithm, and the clustering number C is changed to 7, new cluster center is obtained, and the faults types are identified by comparing the membership degree based on the fuzzy membership function algorithm.Seven faults are re-simulated based on simulated PV array model established in this paper, 175 fault samples of the PV array were selected based on the proposed method.And the diagnostic accuracy was 96.6%.The result shows that the proposed method has a good scalability and adaptability.

Comparison of Classification Algorithms
K-Means algorithm is popular as one of hard C-means (HCM) clustering algorithms.When the data set and clustering number are given, K-Mean classifies the data into different clustering domain iteratively according to specific distance function, and its membership degree can only be 0 or 1. FCM algorithm is the improvement of HCM algorithm and extends HCM algorithm to a fuzzy case, its membership degree can be any value in the [0, 1] interval.FCM algorithm is more suitable for extraction of the fault feature and classification of the fault data in the course of PV array fault diagnosis.
In order to verify the performance of FCM algorithm, K-Means algorithm and FCM algorithm are used for the classification of 6 types fault described in Table 1.Fifteen samples are given for each fault, and the total number of fault samples is 90.
The clustering result of different algorithm is shown in Figure 8, which describes six fault states of PV array fault modes.Figure 8a shows that there are mixings between different clustering result, and the clustering result of K-Means is not ideal.While Figure 8a shows the FCM clustering algorithm can divide the data into six groups, and six types of fault data are aggregated in cluster center.For PV array's fault data classification, compared with the K-Means algorithm, the FCM algorithm can cluster and classify the fault data accurately and effectively.
The clustering result of different algorithm is shown in Figure 8, which describes six fault states of PV array fault modes.Figure 8a shows that there are mixings between different clustering result, and the clustering result of K-Means is not ideal.While Figure 8a shows the FCM clustering algorithm can divide the data into six groups, and six types of fault data are aggregated in cluster center.For PV array's fault data classification, compared with the K-Means algorithm, the FCM algorithm can cluster and classify the fault data accurately and effectively.Table 7 shows the comparison of different algorithms by wrong classified number, running time and accuracy.The running time of FCM algorithm is larger than the K-Means algorithm in, but the clustering accuracy of FCM algorithm is much higher than the K-Means algorithm, which indicates that the FCM algorithm has a better clustering performance.Table 7 shows the comparison of different algorithms by wrong classified number, running time and accuracy.The running time of FCM algorithm is larger than the K-Means algorithm in, but the clustering accuracy of FCM algorithm is much higher than the K-Means algorithm, which indicates that the FCM algorithm has a better clustering performance.As a typical intelligent-based fault diagnosis method, the BP neural network is widely applied in the field of fault diagnosis, but its fault detection is mainly determined by its parameter setting and the training data.In order to verify the performance of the proposed diagnostic method, the diagnostic performance is compared between the BP neural network methods.The 90 fault samples describing six typical fault types are selected for the training of BP neural network and the proposed method.24 typical fault data samples are selected for the testing of different method.The results are shown in Table 8.According to Table 8, the proposed method in this paper has one diagnosis error, and the BP neural network diagnosis method has three diagnostic errors.The contrast analysis shows that the proposed algorithm is more accurate the BP-based method.The reason for the low accuracy of BP neural network diagnosis method is that the general neural network needs a large amount of fault samples for the training process, but the sparsity of the fault samples in actual operation data leads to the limitation of the neural network diagnosis method.

Dynamic Attribute of the Algorithm
To illustrate the adaptability of the proposed method when transient faults come, a transient fault is set up in PV array 1 based on the simulation model showed in Figure 6.The introduced transient fault is a shadow fault occurs in PV array 1 within a period of time and other time is normal in a day.The simulation conditions are described as Figure 9.

Dynamic Attribute of the Algorithm
To illustrate the adaptability of the proposed method when transient faults come, a transient fault is set up in PV array 1 based on the simulation model showed in Figure 6.The introduced transient fault is a shadow fault occurs in PV array 1 within a period of time and other time is normal in a day.The simulation conditions are described as Figure 9. Table 9 shows the dynamic adaptability and diagnosis results of the proposed method for transient fault.According to Figure 9, there are five status modes for different PV arrays: PV array 1 before the failure, PV array 1 in the failure, PV array 1 after the failure, PV array 2 and PV array 3.These modes are marked M1, M2, M3, M4 and M5, and the diagnosis results are shown in Table 9.Table 9 shows that the proposed method can identify the unknown transient faults effectively, and the faults can be classified according to the feature distribution of faults.

Experiment Analysis
To verify the correctness and effectiveness of the proposed method, experiments were carried out under short circuit, open circuit, and partial occlusion conditions.The fault conditions of each data sample cover a wide range of work irradiances and temperatures.First, some labeled data samples under different fault conditions were collected on an experimental platform.Then, tests and analyses were carried out based on the proposed diagnosis method.Table 9 shows the dynamic adaptability and diagnosis results of the proposed method for transient fault.According to Figure 9, there are five status modes for different PV arrays: PV array 1 before the failure, PV array 1 in the failure, PV array 1 after the failure, PV array 2 and PV array 3.These modes are marked M1, M2, M3, M4 and M5, and the diagnosis results are shown in Table 9.Table 9 shows that the proposed method can identify the unknown transient faults effectively, and the faults can be classified according to the feature distribution of faults.

Experiment Analysis
To verify the correctness and effectiveness of the proposed method, experiments were carried out under short circuit, open circuit, and partial occlusion conditions.The fault conditions of each data sample cover a wide range of work irradiances and temperatures.First, some labeled data samples under different fault conditions were collected on an experimental platform.Then, tests and analyses were carried out based on the proposed diagnosis method.

Experimental Description
In order to verify the correctness and effectiveness of the proposed diagnosis method in this paper under different environmental conditions, an empirical test platform for PV power generation is constructed.Figure 10 illustrates the system structure of the empirical test platform.The platform installed capacity is 9.555 kWp.Thirty nine PV modules are used and the electrical parameters are shown in Table 10.In order to better analyze the influence of external environment on the PV power generation and performance, the experimental platform includes a high-precision irradiator for measuring solar irradiance, a small weather station for measuring the external environmental parameters such as global solar irradiance, temperature, wind speed, a temperature sensor for measuring the operating temperature, a data collector for measuring the current and voltage, an I-V scanner, etc., then those data are stored in the computer through a Supervisory Control And Data Acquisition (SCADA) system, which can collect multiple operating parameters of PV plant such as the PV power generation, the current of AC and DC sides and the voltage of AC and DC sides.

Experimental Description
In order to verify the correctness and effectiveness of the proposed diagnosis method in this paper under different environmental conditions, an empirical test platform for PV power generation is constructed.Figure 10 illustrates the system structure of the empirical test platform.The platform installed capacity is 9.555 kWp.Thirty nine PV modules are used and the electrical parameters are shown in Table 10.In order to better analyze the influence of external environment on the PV power generation and performance, the experimental platform includes a high-precision irradiator for measuring solar irradiance, a small weather station for measuring the external environmental parameters such as global solar irradiance, temperature, wind speed, a temperature sensor for measuring the operating temperature, a data collector for measuring the current and voltage, an I-V scanner, etc., then those data are stored in the computer through a Supervisory Control And Data Acquisition (SCADA) system, which can collect multiple operating parameters of PV plant such as the PV power generation, the current of AC and DC sides and the voltage of AC and DC sides.Table 10 illustrates the specific parameters of the experimental platform.Different tests under short circuit, open circuit, and partial shading conditions were run on an empirical platform.The partial shading condition is tested by covering some PV modules with shield panels.
F2 One module shorted F3 Two modules shorted The short-circuit fault is tested by short-circuiting some PV modules.
F4 One module shaded F5 Two modules shaded The partial shading condition is tested by covering some PV modules with shield panels.

F6 One module opened
The open-circuit fault is tested by open-circuiting some PV modules.

F6 One module opened
The open-circuit fault is tested by open-circuiting some PV modules.
F2 One module shorted F3 Two modules shorted The short-circuit fault is tested by short-circuiting some PV modules.
F4 One module shaded F5 Two modules shaded The partial shading condition is tested by covering some PV modules with shield panels.

F6 One module opened
The open-circuit fault is tested by open-circuiting some PV modules.

Experimental Result Analysis
Five typical fault data samples were selected from the five fault modes mentioned above as shown in Table 13.The diagnosis results obtained via the proposed method are shown in Table 14.Again, a larger membership degree indicates a greater likelihood that the diagnosis sample is in the given fault state.As shown in Table 14, the five diagnosed samples fell into F1, F2, F3, F4, and F5 states, once more indicating that the proposed method yields correct results.

Conclusions
This paper proposed a novel PV array fault diagnosis method based on the FCM and fuzzy membership algorithms.The proposed method effectively detects short circuit, open circuit, partial occlusion, and other faults.Simulation analysis indicated that the diagnostic accuracy of the proposed method was 96%.Field experiments further verified the correctness and effectiveness of the proposed method, and the method can complete the PV array diagnosis.The innovations of this paper can be summarized as follows: (1) The FCM algorithm described the distribution characteristics of fault data effectively based on small amounts of fault samples data, and avoided the difficult for obtaining the fault samples.(2) By using the membership function of vague math as the fault diagnosis function, it quantized the membership degree between fault samples and each fault mode, and described the degree of similarity between fault samples and each fault mode clearly and objectively.(3) The proposed method effectively exploits the advantages of FCM (excellent classification ability) as well as the membership function algorithm (excellent distance computing ability).And the proposed method didn't need additional equipment support, concerned people can detect the fault module quickly by measuring voltage, current, power and other parameters.(4) The distribution characteristics of the FCM cluster centers reflected the fault characteristics, and the distribution characteristics can be used for updating membership function.(5) The clustering centers obtained by the FCM algorithm can be used as the typical value of each fault state, and then fault characteristic database can be established.Based on the fault characteristic database, combined with other intelligent methods, it will be much easier to develop new ideas for the PV array fault diagnosis.

Figure 2 .
Figure 2. Output characteristic curves of PV array under different fault conditions, (a) I-V curve under different fault conditions; (b) P-V curve under different fault conditions.

Figure 2 .
Figure 2. Output characteristic curves of PV array under different fault conditions, (a) I-V curve under different fault conditions; (b) P-V curve under different fault conditions.

Figure 3 .
Figure 3. PV array electrical parameters change under different fault conditions, (a) Change in the electrical parameter Uoc under different fault conditions; (b) Change in the electrical parameter Isc under different fault conditions; (c) Change in the electrical parameter Um under different fault conditions; (d) Change in the electrical parameter Im under different fault conditions; (e) Change in the electrical parameter Pm under different fault conditions.

Figure 3 .
Figure 3. PV array electrical parameters change under different fault conditions, (a) Change in the electrical parameter U oc under different fault conditions; (b) Change in the electrical parameter I sc under different fault conditions; (c) Change in the electrical parameter U m under different fault conditions; (d) Change in the electrical parameter I m under different fault conditions; (e) Change in the electrical parameter P m under different fault conditions.

Figure 5 .
Figure 5.The curve of Normal Distribution Membership Function.

Figure 5 .
Figure 5.The curve of Normal Distribution Membership Function.

Figure 6 .
Figure 6.Framework of proposed fault diagnosis technique.

Figure 6 .
Figure 6.Framework of proposed fault diagnosis technique.

Figure 8 .
Figure 8.Comparison of clustering results, (a) Clustering analysis results of six fault states based on K-Means algorithm; (b) Clustering analysis results of six fault states based on FCM algorithm.

Figure 8 .
Figure 8.Comparison of clustering results, (a) Clustering analysis results of six fault states based on K-Means algorithm; (b) Clustering analysis results of six fault states based on FCM algorithm.

Figure 9 .
Figure 9.The simulation of transient fault.

Figure 9 .
Figure 9.The simulation of transient fault.
Im Maximum power-point current of the array PmMaximum power of the array Shadow fault caused the decline of Um, Im

Table 3 .
Clustering results of typical faults.

Table 4 .
Cluster center rules under different fault modes.

Table 5 .
Membership degree between fault sample and each cluster center.

Table 6 .
Error analysis of fault samples.

Table 7 .
The experimental comparison results of K-Means algorithm and FCM algorithm.

Table 8 .
The comparison results of different fault diagnosis methods for testing samples.

Table 9 .
The dynamic adaptability analysis of the algorithm.

Table 9 .
The dynamic adaptability analysis of the algorithm.

Table 10 illustrates
the specific parameters of the experimental platform.Different tests under short circuit, open circuit, and partial shading conditions were run on an empirical platform.

Table 13 .
PV array fault diagnosis samples.

Table 14 .
Proposed method diagnosis results.