Fault Prediction and Early-Detection in PV Power Plants based on Self-Organizing Maps

In this paper a novel and flexible solution for fault prediction based on data col1 lected from Supervisory Control and Data Acquisition (SCADA) system is presented. Generic 2 fault/status prediction is offered by means of a data driven approach based on a self-organizing 3 map (SOM)and the definition of an original Key Performance Indicator (KPI). The model has 4 been assessed on a park of three photovoltaic (PV) plants with installed capacity up to 10 MW, 5 and on more than sixty inverter modules of three different technology brands. The results indi6 cate that the proposed method is effective in predicting incipient generic faults in average up to 7 7 days in advance with true positives rate up to 95%. The model is easily deployable for on-line 8 monitoring of anomalies on new PV plants and technologies, requiring only the availability of 9 historical SCADA data, fault taxonomy and inverter electrical datasheet. 10


Introduction
1.1. Motivation 14 The implementation of accurate and systematic preventive maintenance strategies 15 is emerging nowadays as an essential tool to maintain high technical and economic 16 performance of solar PV plants over time [? ]. Analytical monitoring systems have 17 been installed worldwide to timely detect possible malfunctions through the assess-18 ment of PV system performance [2][3][4][5]. However, in addition to high customization 19 costs and the need of collecting and transmitting a large number of physical variables, 20 there appears to be a lack of automatic, non-supervised and accurate methodologies 21 to perform such maintenance strategies. Due to the abundance of relevant data, and 22 the difficulty in modeling many complex aspects of PV plants, statistical methods 23 based on data mining and machine learning algorithms are recently emerging as a 24 very promising approach both for fault prediction and early detection. However, few 25 works can be found for this topic, and especially in the field of power generation from 26 renewable sources most papers focus on equipment level failures in wind farms [6,7], 27 while the counterpart for PV plants is not as developed [8]. 28

Paper contribution 29
The present paper describes a novel and flexible solution for inverter level fault pre-30 diction based on a data-driven approach. In particular, its ability to predict or to rec-

38
In the paper we shall consider three PV plants, called in the following as plants A, 39 B, and C, respectively, with an installed capacity ranging between 3 and 10 MW,   The operating facility is able to produce around 15 million kWh per year, correspond-49 ing to the annual energy needs of more than 7,500 households, thereby avoiding the 50 emission of over 6,800 tonnes of CO 2 into the atmosphere per year.

51
Plants B and C are located in Greece. Plant B is in the Xanthi region and is composed 52 by strings of thin film solar panels connected to seven inverter modules with a rated 53 ouput power of 385 kW AC, which globally corresponds to an installed capacity of     of SCADA data. More specifically, a fault of the k-th type is assigned to timestamp t n if 89 the following condition occurs: where t start,k (t end,k ) are the initial (final) instant of the fault event. Once the O&M logs where m and b are the slope and the intercept, respectively, of the linear approxima- As different electrical (e.g., P AC ) and environmental (e.g., GTI) signals exhibit seasonal 131 trends, it is convenient to remove such seasonality trends to prevent biased predic-132 tions from occurring. In order to remove the season-dependent variability from input 133 data, a detrending procedure has been applied by following tailored approaches for 134 each variable. In particular, the training data of T mod have been deseasonalized by 135 means of the least-squares fitting method to infer the best line T f it against T amb and 136 selecting only low samples with low GTI to remove the effect of the panel heating due 137 to sunlight: where is the fitting temperature, m T is the regression slope, b T is the intercept and GTI thr = 140 100 W/m 2 is a heuristically dermined threshold for the solar irradiance to identify 141 "low values of the GTIs" that do not give rise to relevant panel heatings effects.

142
All the remaining input variables, apart from DC and AC voltages, have been de-  [11,12].

146
Finally, input data normalization is performed to avoid unbalance between heteroge-147 neous quantities.

149
The proposed approach consists in training a self-organizing map (SOM) [13,14]  to the SOM and are classified as "in control" or "out-of-control". For this purpose, we 177 calculate the probability of cell occupancy for all the instances measured during the 178 last 24 hours, and we compare it against the previously computed probability of cell 179 occupancy. The procedure is now illustrated in more detail.
In this case we say that the input pattern r is mapped to the cell c. In order to assess 195 the condition of newly observed state patterns to be monitored, we introduce the 196 following KPI: where d denotes a test day index, and the probability of cell occupancy during day d is where N d = 24 · 60/ν is the total number of samples in a day, and N i,d is the number of As a result, the KPI(d) value defined in equation (5)

230
The proposed model has been trained on the training set as specified in Table 3, and in 231 this section we discuss the outcome of the testing stage. In particular, our system has 232 been able to identify a significant amount of failure events, which we could validate 233 using the available data, and a selection of the most interesting ones is discussed in 234 more detail in this section.   as: Roughly speaking, Eq.  when the model predicts an anomaly on August 23, which is followed by an actual 306 registered fault that occurs the following day.

307
The performance over the whole test set are remarkable, with a TPR exceeding 93% 308 (FNR < 7%) and a FPR of almost 13%.     Table 7 lists the most severe failures registered for inverter 3.5 of plant C in the testing 339 period, from February 1 to July 27, 2016. As in the previous cases, Figure 6 shows 340 the proposed KPI, the warning levels and the daily number of faults as a function of 341 time for the same module. As can be seen in Table 5

354
The KPI works in an accurate way also for plant C, as can be seen in the bottom plot of 355 Figure 6. In fact, the TPR is almost 92% (FNR = 8%) and FPR is just roughly 1%.  The predictive capacity of the proposed method is summarized in table 9 reporting 361 the dates of the occurrence of the faults, and the dates when such faults had been pre-362 dicted by the proposed KPI. On average, the KPI predicts incipient faults between 6 363 and 7 days before they are observed in practice. Also, in addition to being able to pre-364 dict the faults, the KPI also exhibits excellent early detection capabilities, by signaling 365 with increasing warning levels as the faults evolve and reach more severe conditions.

366
The proposed SOM-based monitoring system is now being installed in PV plants for 367 online condition monitoring and the preliminary feedback from plant operators is 368 very positive. A full evaluation of the online system will be the subject of our future 369 work. Also we are currently developing a supervised fault-classification tool that we 370 plan to integrate in the system in order to predict the specific class of fault, in addition 371 to recognizing a generic faulty condition, as in our presented work.