A Monitoring System for Online Fault Detection and Classification in Photovoltaic Plants

Photovoltaic (PV) energy use has been increasing recently, mainly due to new policies all over the world to reduce the application of fossil fuels. PV system efficiency is highly dependent on environmental variables, besides being affected by several kinds of faults, which can lead to a severe energy loss throughout the operation of the system. In this sense, we present a Monitoring System (MS) to measure the electrical and environmental variables to produce instantaneous and historical data, allowing to estimate parameters that ar related to the plant efficiency. Additionally, using the same MS, we propose a recursive linear model to detect faults in the system, while using irradiance and temperature on the PV panel as input signals and power as output. The accuracy of the fault detection for a 5 kW power plant used in the test is 93.09%, considering 16 days and around 143 hours of faults in different conditions. Once a fault is detected by this model, a machine-learning-based method classifies each fault in the following cases: short-circuit, open-circuit, partial shadowing, and degradation. Using the same days and faults applied in the detection module, the accuracy of the classification stage is 95.44% for an Artificial Neural Network (ANN) model. By combining detection and classification, the overall accuracy is 92.64%. Such a result represents an original contribution of this work, since other related works do not present the integration of a fault detection and classification approach with an embedded PV plant monitoring system, allowing for the online identification and classification of different PV faults, besides real-time and historical monitoring of electrical and environmental parameters of the plant.


Introduction
The growth of renewable energy (RE) sources has been increasingly significant in the last decade. For instance, in 2018, without considering large hydro-power plants, an outstanding amount of 190 GW total new capacity installed was reached worldwide, being 55% of the total power capacity that was installed during that year [1]. Among most frequent renewable sources, solar photovoltaic (PV) has shown the largest growth and it is responsible for around 39% of installed capacity over the course

Fault Detection
In general, fault detection for PV systems is based on the modeling of the system in order to compare the results from modeling with real-acquired data, indicating a fault event every time the difference between modeling and acquired data is above some predefined threshold [16]. The modeling step is normally divided into dynamic or static. Static models do not consider time as an independent variable and, due to that, they are normally referred as non-memory models. Dynamic models, on the other hand, take time variations in the model into account. In PV modeling, static models are the most recurrent, in which PV cells are represented by the Single Diode Model [33].
In [5,34,35], a static model that is based on a single diode model is considered in the modeling process to detect faults and predict energy production. However, the main limitation of this group of models is the representation of a generic and static PV cell. By simplifying the PV cell to a generic and static one, individual characteristics and the dynamics of different PV systems may be disregarded, compromising the modeling of certain phenomena and, consequently, the identification of faults that occur in short intervals of time. Nevertheless, from the diagnosis point of view, the static model is appropriate for detecting aging and degradation issues, due to the long-term characteristics of such faults [35].
In [14,15], a statistical approach based on the multivariate exponentially weighted moving average charts is proposed for fault detection in order to improve those limitations of single-diode models. The authors generated array's residuals of DC current, voltage and power, considering temperature and irradiance as inputs. With the residuals, it is possible to calculate the difference between the Sensors 2020, 20, 4688 5 of 30 measurements and the predictions for the electrical variables from the single-diode model, and use them as fault detectors. Real-acquired data show the ability of the proposed method to detect partial shadowing, open-circuit, short-circuit, and degradation, but the authors only present seven case-studies, which does not demonstrate the generalization of the model for other fault scenarios. Additionally, because the model is based on a decision-tree classifier, it is restricted to the presented 9.6 kWp PV plant.
In terms of dynamic modeling, most of the models are dedicated to energy forecasting and do not present fault detection results. In [16], for instance, a black-box modeling is used to obtain an empirical model for the system, using temperature and irradiance as input signals. However, that paper simplifies the model by excluding possible system nonlinearities, which makes it difficult to use in the context of fault detection. In [36], on the other hand, the Hammerstein-Wiener model is used to emulate the system including nonlinearities. The irradiance and DC power were used as input and output signals, respectively. The chosen sampling time was 15 min., compromising the detection of short-term events, such as partial shadowing. In [37], an ARMAX model is proposed to predict the generated power, one-day ahead, for a PV system. The input signals of the ARMAX model are the daily average temperature, the precipitation, the insulation duration, and the humidity. However, the authors did not discuss fault detection with the proposed model.
It is also noteworthy that none of the discussed detection methods presented the results of the solution in a dedicated hardware or system, integrated to a monitoring system.

Fault Classification
One way to perform fault classification, which has been receiving increasing attention and popularity in recent literature [9], is the use of artificial intelligence models, especially machine learning classifiers, which is also the main approach that is proposed in this work. In [10], for instance, the use of artificial neural networks to classify the operation of a photovoltaic system in four possible states (normal, degradation, short-circuit, and shadowing) is presented. This method was trained and tested in a simulated environment and obtained an accuracy of approximately 88.89% when considering the nine evaluated test samples. Real fault cases were not reported in that work.
In [38], a two-stage system is discussed, being the first for fault detection and the second for classification. The authors consider the following cases: open-circuit, degradation, short-circuit, and shadowing, including or not the bypass diode. For fault detection, the proposed method is based on the comparison of the power of the PV plant with its correspondent mathematical model. When a difference above a given threshold is verified, the system reports a fault detection. For fault classification, a multilayer perceptron artificial neural network is used, reaching an overall accuracy of 90.3% (detection and classification). Furthermore, this system uses only simulated data for training the network and it is tested with a real plant based on the system's VxI curve. With that, the real-time classification of faults is unfeasible, since the generation of the VxI curve requires the disconnection of the plant to connect the proposed equipment and perform the detection and classification. Following a similar idea, a detection and classification system is presented in [20], obtaining an accuracy of 94.1%. However, the authors also presented tests only in a simulation environment.
In a more recent approach from [39,40], another two stage architectures were used, but this time non-linear auto regressive models (NARX) were developed to estimate the generated power under different environmental conditions. Next, fuzzy inference models compared the estimated value to the sensed power in order to classify the system into one of a given set of fault configurations, which includes combinations of shadowing, short circuit and open circuit, yielding 98.2% accuracy, using 16 use cases.
Still, in the context of intelligent methods in two stages, in [41,42], systems for detecting normal, open-circuit, and different types of short-circuits were proposed. The two-stage approach of the aforementioned works takes place with the use of probabilistic neural networks, for detection and classification of the referred faults. In [41], two simulated tests were carried out to validate the proposed Sensors 2020, 20, 4688 6 of 30 system, achieving a detection and classification accuracy of 82.34% and 98.19%, respectively while [42] achieves 100% accuracy while using real data for training and validation.
More specialized approaches use methods to detect line-to-line faults in several situations. For example, [43] uses a support vector machine trained with simulated data and tested in a real PV plant which achieves up to 94.74% accuracy to detect short-circuit conditions, while [44] uses a Radial Basis Function Neural Network using irradiance and power as its inputs to detect one or modules disconnections from the photovoltaic system. The system attained 98.1%, 97.9%, and 96.5% accuracy when tested in two plants, one with 2.2 kW and other with 4.16 kW when subject to normal operation, shadowing, and overcast conditions. Another Radial Bases Function Network was used in [45] to classify a 1 kW photovoltaic plant into one of 14 cases, including: Normal, short circuit, cell bypass, shading, ground fault, and nine converter/inverter's component faults. This system was only tested in simulations and achieved 97% test accuracy.
In [11], on the other hand, a method using a Kernel Extreme Learning Machine and data from VxI curves is presented in order to classify PV faults in the following cases: open-circuit, shadowing, short-circuit, and degradation. To evaluate the performance of the system, three case studies were carried out. The first uses only simulation data for training and testing, whilst the second employs only real data acquired in a 1.8 kW peak PV plant. The third approach uses simulated data for training and real data for testing, due to the limited amount of data collected in the real plant. In the case that uses only simulated data, the proposed system reached an accuracy of 100.0%. In the case with real data, the accuracy varied between 97.9% and 99.0% and, in the third case, with mixed data, the final accuracy was 98.9%. Despite the relatively high performance (>95.0%), because the method is based on VxI curves, the PV system must be disconnected to perform the proposed fault classification procedure-it uses an external device that must be connected to the plant to obtain VxI curves. Additionally, the authors did not present the results of the solution in a dedicated hardware or system.

Contributions of This Work
Based on the exposed so far, it can observed that fault detection and classification is a hot topic with very interesting contributions so far. This work complements the current literature presenting different contributions with respect to monitoring of PV systems, detection and classification of faults. In terms of monitoring systems, the related works show that, when the monitoring of plant variables is more comprehensive, there is usually no fault analysis. On the other hand, when fault detection is included (embedded) in the monitoring system, the detected faults are limited to certain types of faults. When both are present, the work is only evaluated using simulations or with lower powered PV deployments. With that, the first contribution of this work is the integration of a fault detection and classification approach with an industrial grade embedded photovoltaic plant monitoring system, allowing for the online identification and classification of different PV faults, besides providing a MS integrated to the plant.
Regarding the fault detection, the use and a very detailed comparison of dynamic models for several fault scenarios is still limited in the literature and it can be highlighted as an additional contribution of this work, particularly for real-acquired data. Finally, from the classification point of view, the use of simulated (and validated) data to train machine learning models with different fault conditions, besides testing with several real-acquired data in real fault scenarios, is also briefly evaluated in the literature and can be highlighted as a relevant contribution. Besides, we also present a detailed comparison among some of the most common machine learning methods, pointing to the most suitable model for online fault classification in PV systems.
Another relevant contribution of our work is the fact that the simulator and the training dataset is publicly available to enable straightforward comparison with newly proposed techniques.
Finally, it is worth mentioning that this work is an extension and combination of previous works [8,17,21] of the authors of this paper, which presented initial and individual results of monitoring system, fault detection, and classification, respectively.

Proposed Monitoring System
Suppose a photovoltaic system composed of a group S of strings each composed of N serially connected photovoltaic modules PV {1, ... ,N} . When an arbitrary string s, (s ∈ S) is subject to an Irradiance (G), it generates a voltage V dc , s and a current I dc , s (and, consequently, and output power P dc , s = V dc,s I dc,s ), which are dependent on the ambient temperature T, considered constant for PV j , ∀ j . An inverter is deployed to convert the energy output of the mentioned S string group into a two-phase ac waveform whose voltage is V ac and current is I ac . The converted power is injected to the utility grid. A data acquisition system is able to collect all of the aforementioned variables with a minimum sample frequency of 1 Hz. The power output may be influenced by the following system faults: (i) short circuit between an arbitrary number of adjacent PV j, ... ,k modules, (ii) open circuit of any module string ∈ S, (iii) high resistance connection between any adjacent PV j,k module pair, and (iv) module output mismatch due to partial shadowing. We propose a Fault Detection system which uses the acquired data to detect whether the considered system is operating under one of the proposed faults. When a fault is detected, the fault classification is performed by the appropriated algorithm and the result is informed to the user. Figure 1 depicts the proposed system in which the acquisition system is implemented by a National Instruments CompactRIO (cRIO) controller equipped with signal acquisition modules, as presented in Figure 2. The cRIO controller runs a Linux Real-Time Operating System and it features a FPGA, and modular I/Os, programmed in the LabVIEW environment, for industrial-grade embedded high-speed control and signal processing systems.  Figure 1: The proposed method consists in transmitting a frame using ⌈α n cha ⌉ channel uses for the first transmission attempt, with 0 < α ≤ 1. In the event of a decoding error, incremental redundancy is transmitted making use of the rest of the available ⌊(1 − α) n cha ⌋ channel uses.  Two String PV System (5kW) Power Inverter Utility Grid Figure 1: The proposed method consists in transmitting a frame using ⌈α ncha⌉ channel uses for the first transmission attempt, with 0 < α ≤ 1. In the event of a decoding error, incremental redundancy is transmitted making use of the rest of the available ⌊(1 − α) ncha⌋ channel uses. Figure 2: The proposed method consists in transmitting a frame using ⌈α ncha⌉ channel uses for the first transmission attempt, with 0 < α ≤ 1. In the event of a decoding error, incremental redundancy is transmitted making use of the rest of the available ⌊(1 − α) ncha⌋ channel uses.

Subsection Heading Here
Write your subsection text here.

Conclusion
Write your conclusion here. Between the acquired signals and the analysis, there is a software stack that comprises the execution environment where the developed software coordinates the sampling at 25 kHz of V dc,s , I dc,s , V ac , I ac , with s ∈ S. Next, the root mean square (RMS) values over the monitored variables are calculated at every second, generating the signals v dc,s (k), i dc,s (k),v ac (k), i ac (k) (with s ∈ S) which are stored on the local database. Additionally, RMS signals are calculated from the monitored signals: String power output p dc,s (k) = v dc,s (k) i dc,s (k) and inverter power output p ac (k) = v ac (k) i ac (k). At the top of the stack, the fault detection and classification algorithms are implemented. In the following subsections, the blocks from Figure 1 are presented.

PV Power Plant
In this work, PV {1, ... ,N} are implemented using N = 16, Canadian 330W Poly-crystalline Modules (CS6U-330P), forming a group S, with |S| = 2 strings of 8 modules each (where | · | represents group cardinality). Table 1 presents the main electrical data of the module. In Figure 3, the deployment site of the proposed system (modules and sensors) is presented. In total, the system may yield 5 kW peak installed capacity.
Connected to the grid, the power inverter produced by NHS (depicted in Figure 2) is responsible for converting the DC input energy coming from strings 1 and 2 to the single phase AC output. As well as running the Maximum Power Point Tracking (MPPT) algorithm for maximum energy conversion, this inverter can also measure V dc,s , I dc,s , V ac , and I ac , with s ∈ S and report its RMS values via a RS-485 interface to the acquisition system. This report may be used to detect external sensor faults.    Figure 4: The proposed method consists in transmitting a frame using ⌈α n cha ⌉ channel uses for the first transmission attempt, with 0 < α ≤ 1. In the event of a decoding error, incremental redundancy is transmitted making use of the rest of the available ⌊(1 − α) n cha ⌋ channel uses.

Weather Station
Because photovoltaic energy production is strongly dependent on the instantaneous environmental conditions where the solar panels operate, it is important for the acquisition system to correlate the instantaneous power output with meteorological variables. For this reason, a weather station was connected to cRIO to monitoring the variables that have the major influence in photovoltaic production: (i) irradiance G [W/m 2 ]; and, (ii) module temperature T [ • C]. Additionally, our system measures secondary variables that may influence the main variables: (iii) ambient temperature: Ta  Figure 3, the weather station may be observed adjacent to PV modules and sensors.
For the irradiance (G) measurement, a EKO MS-40 class B pyranometer was installed adjacent to PV 1 following the same inclination of the panel installation [19]. It is capable of measuring global irradiance (285 to 3000 nm spectral sensitivity) with 180 • angle. Panel temperature from PV {1, ... ,N} is assessed by four PT100 contact sensors, Kimo Instruments SFCSD-51-A-3-PVC-25, installed on the back side of PV 1,6,11,16 : T is considered to be the arithmetic average of the obtained measurements. The output accuracy of these sensors for temperatures between 0 and 100 • C is ±0.15 • C. The environmental signals are obtained from a Novus Fieldlogger datalogger (eight analog input channels, with 24 bit A/D resolution and 1 kHz maximum sampling rate). The connection to the cRIO is implemented through Modbus TCP/IP bus over ethernet. The datalogger is factory calibrated, so no further processing is necessary on the main monitoring system, besides synchronization and logging. The datalogger reports the 1 s average of G and T to the acquisition system, which are respectively named g(k) and t(k) in the following sections. Figure 4 shows a detailed view of the electrical variables acquisition scheme. There, the exact points of acquisition of monitored signals are indicated for DC voltage and current output for both PV strings (V dc,s , I dc,s with s ∈ S), as well as the AC voltage at the output of the power inverter (V ac , I ac ), which are obtained by measuring both wires of the inverter with respect to the Utility Grid neutral wire (not shown for simplicity reasons) (V ac,1 , I ac,1 , V ac,2 , and I ac,2 ).

Electrical Variables
One NI 9215 module is used to acquire DC voltage signals of both strings. It reads two simultaneously sampled analog input channels. Besides surge protection already installed in the Figure 3: The proposed method consists in transmitting a frame using ⌈α n cha ⌉ channel uses for the first transmission attempt, with 0 < α ≤ 1. In the event of a decoding error, incremental redundancy is transmitted making use of the rest of the available ⌊(1 − α) n cha ⌋ channel uses.  Figure 4: The proposed method consists in transmitting a frame using ⌈α n cha ⌉ channel uses for the first transmission attempt, with 0 < α ≤ 1. In the event of a decoding error, incremental redundancy is transmitted making use of the rest of the available ⌊(1 − α) n cha ⌋ channel uses.

Subsection Heading Here
Write your subsection text here.

Conclusion
Write your conclusion here.
3 Figure 4. Diagram showing the electrical variables monitoring system using the cRIO RS485 interface to the power inverter, two NI 9215 and one NI9492 acquisition cards.
power plant, a signal conditioning and protection circuit was installed in order to adapt the range and protect the acquisition cards for DC voltage of the PV strings measurements. The current transducer used is the LEM model HASS-50S. It is based on the Hall effect measuring principle with frequency bandwidth up to 50 kHz, and galvanic isolation between primary and secondary circuit. All of the current signals, DC and AC, are acquired by a second NI9125 card.
For AC voltage signals, an NI 9242 module is used. It offers three channels for measurements between the signal and the neutral channel, and the neutral channel provides measurements between its terminal and the chassis ground.
Regarding the measurement of electrical power, both in DC and AC domains, they are calculated in real-time using the sampled voltage and current, as advised by EN 61724:1998 standard [25]. Additionally, phase dependent quantities in a two-phase unbalanced power system (which is our scenario), such as power factor (PF), total and apparent power flow, and total harmonic distortion (THD), were implemented in the frequency domain and validated in accordance with IEEE1459 [46].
The measurements of voltage and current DC and AC, as well as aggregated values, such as active and reactive power, including the signal conditioning, were calibrated against local instruments, namely a Tektronix model MDO3014 oscilloscope and a RMS model MarH21 Power Quality Analyser (presented in Figure 2). The resulting accuracy was inferior than 1% for all signals, as recommended in EN 61724:1998 standard [25].

Execution Environment and Developed Software
In general, the MS reads environmental variables, acquires, calibrates, and aggregates electrical signals, time stamps the data, and provides a database driven execution environment where Fault Detection and Fault Classification can be implemented. Figure 5 shows the software stack, including the bottom level hardware modules, which implement the interfaces to the real-world, already described in the previous section. The Acquisition Control module coordinates the cyclical process from reading to storing. The LabVIEW synthesized programmable logic running in FPGA guarantees the real-time execution and synchronization of the sampled signals at 25 kHz. The RS485 Modbus communication with the power inverter, and the Ethernet connection to the weather station, are implemented as sub-Virtual Instruments (function) in LabVIEW environment and run at 1s period.
The Data Analysis module is responsible for calibration and aggregated values calculations, prior to the storage in the database, with timestamps via the Ethernet connection as well. Another functionality implemented in the Data Analysis module is the interpolation, decimation of timestamped signals and aggregated values, and its comparison in order to update the Human Machine Interface (HMI), generating events for user configurable alarms. It runs as a software in Python over a TCP-IP remote procedure call. This scheme allows doe the higher level applications to run locally in the cRIO, as well as in another network node of a distributed system.
The logical connection to the database server running Ubuntu is done through a Maria DB client installed directly in the Linux environment in the cRIO, and activated through shell scripts from LabVIEW and Python. The database server in the local network can be expanded to the cloud as a scalable solution for multiple power plants management.
At the top of the software stack, the higher level applications are the Fault Detection module, running over LabVIEW, and the Fault Classification module, running in Python. The classification module uses the SciKit Learn machine learning libraries as middleware.
Finally, the HMI Handling module, implemented in LabVIEW, coordinates the system graphical user interface, presented in a touch screen display directly connected to the cRIO USB and Display Port connections. Using this HMI, it is possible to visualize real time data and generate a graphical analysis for configurable time windows, as well as setup alarm conditions to be indicated by the system.

Datasets
The proposed fault detection and classification systems are based on system identification and machine learning techniques, which may require a large dataset of past operational data for training purposes, particularly for machine learning models. The accuracy of these algorithms in the different situations relies on the diversity of this training dataset, which must contain operational data under all of the considered faults for the whole range of environmental conditions. We created a PV plant simulator that can generate the required dataset in a short period of time since waiting for the natural occurrence of all these environmental combinations to generate the required faults is impractical for most PV installations. On the other hand, the generated model must accurately describe a real system behavior. This way, we use the real photovoltaic installation (Section 3.1) to validate our simulator setup. This hybrid approach used to generate our training and validation dataset will be described in the following sections. First, the methodology to artificially introduce operational faults in the real installation is described. In the sequence, the proposed electrical simulator that matches the real installation behavior is presented. Finally, shadowing may be generated by physically blocking solar radiation using diverse opaque objects.

Real System Installation
Utility Grid Figure 6: The proposed method consists in transmitting a frame using ⌈α n cha ⌉ channel uses for the first transmission attempt, with 0 < α ≤ 1. In the event of a decoding error, incremental redundancy is transmitted making use of the rest of the available ⌊(1 − α) n cha ⌋ channel uses. Figure 7: The proposed method consists in transmitting a frame using ⌈α n cha ⌉ channel uses for the first transmission attempt, with 0 < α ≤ 1. In the event of a decoding error, incremental redundancy is transmitted making use of the rest of the available ⌊(1 − α) n cha ⌋ channel uses. Figure 6. Installed Photovoltaic Plant Electrical Diagram.
We start from completely clear days to generate the real dataset: cloudy and rainy days are not considered, since there is no way to guarantee that natural shadowing is not influencing the collected parameters. Next, the fault schedule that is presented in Table 2 is generated in the system. Partial shadowing occurs naturally due to the characteristics of the deployment site: sunlight is blocked by a nearby buildings in different moments of the morning and afternoon (around two hours in each period). The process was repeated for 16 days, when data is collected and properly labeled from around 07:30 to 17:00, including faults and normal conditions, with a sampling ratio of 1 Hz, generating 10,371 points with degradation, 5999 points with short-circuit, 6024 points with open-circuit, 184,311 points with partial shadowing, and 309,253 points in which no faults were introduced. This real dataset is also publicly available (https://github.com/clayton-h-costa/pv_fault_dataset) in order to facilitate other experiments regarding fault detection and classification methods.

System Simulation
One major concern taken into consideration, when our PV simulator was developed, is the ability to represent different commercially available components used in PV plants. For validation purposes, the parameters that are chosen for Dataset generation match the ones from the available system described in Section 3.
The electrical circuits of simulator runs on PSIM (Power System Simulation software), while the environmental variables are controlled by a Simulink/Matlab script. The implemented simulation blocks can be observed in Figure 7, where S {1,2} models PV strings configured with eight PV modules with 330 W each, which are individually simulated using the single-diode model [33], which takes the cell's irradiance and temperature as inputs. The parameters of the model were chosen to match the PV panels specification from Table 1, as detailed in [21]. Next, G represents the simulated system irradiance, while T represents the simulated module temperature.
Each S s module outputs a simulated voltage V dc,s and a simulated current I dc,s , which are inputs for a fixed voltage output boost converter (B s ), which implements a perturb and observe MPPT algorithm [47] (with s ∈ {1, 2}). Finally, the regulated output is fed into a full bridge inverter (J 1 ) that converts the DC bus to a 127 V, 60 Hz, single phase output connected to a simulated utility grid.
Additionally, some auxiliary elements were introduced to simulate the considered system faults: switches that model open circuit faults, resistors that model string degradation, a variable that models partial shadowing, and switches that model short-circuit faults. Details of these elements can also be observed in [21].

Protection Circuit
Protection Circuit SI 1 DM 1 Utility Grid Figure 6: The proposed method consists in transmitting a frame using ⌈α n cha ⌉ channel uses for the first transmission attempt, with 0 < α ≤ 1. In the event of a decoding error, incremental redundancy is transmitted making use of the rest of the available ⌊(1 − α) n cha ⌋ channel uses. Figure 7: The proposed method consists in transmitting a frame using ⌈α n cha ⌉ channel uses for the first transmission attempt, with 0 < α ≤ 1. In the event of a decoding error, incremental redundancy is transmitted making use of the rest of the available ⌊(1 − α) n cha ⌋ channel uses. Figure 8: The proposed method consists in transmitting a frame using ⌈α n cha ⌉ channel uses for the first transmission attempt, with 0 < α ≤ 1. In the event of a decoding error, incremental redundancy is transmitted making use of the rest of the available ⌊(1 − α) n cha ⌋ channel uses. Training dataset was generated using the simulation by covering each of the five operational conditions for the whole range of temperature (T ) and irradiance (G). The lower temperature bound was set to −5 • C, which is compatible to historical minimum for the installation site (Curitiba-Brazil), while the upper bound was set to +85 • C which is the maximum operational temperature of the PV module. The irradiance range was set from 100 W/m 2 to 1000 W/m 2 which represents a range between the point at which the real inverter starts to operate and the peak generated power. The temperature range was simulated in 19 steps with 5 • C each while the irradiance was simulated in 19 steps of 50 W/m 2 for each of the four considered faults. For the shadowing fault, four different cases were simulated, each with a set amount of shade varying from 5% to 15% of a string, which is the typical shadowing observed in the PV plant. This setup resulted in 361 sample cases per string, with 5054 samples in total.

Fault Detection
Fault detection technique, as expressed in this section, is based on modelling the photovoltaic system dynamics, and uses the matching between the real system and the model as a metric of properly operation of the system. In this context, it is important to clarify two concepts as: (i) system: defined as confined arrangement of mutually affected entities [48]; and, (ii) model: defined as mathematical representation of these systems [49]. The next section will bring more details about the PV modeling process.

System Identification Model
In this work, the methodology that was used to achieve proper models for real systems was based on system identification. In this context, one can define system identification as a method of measuring the mathematical description of a system by processing the observed inputs and outputs of the system [50]. Generally, a model achieved by this technique is more accurate to describe a system than models based only on physical laws [48]. In the scenario of this work, it is possible to apply models in order to detect miss-functions of an specific system [51].
It is important to mention that only the real dataset (detailed in Section 4.1) was employed in order to estimate the detection model parameters.

Proposed Fault Detection Method
The idea behind using dynamic models to detect faults is based on the mismatching between the output of the real system, instant power p dc,s (k), s ∈ S, and the model's output,p dc,s (k), its estimation. More specifically, in a photovoltaic array, the model only represents the system's output, if this system is operating without miss-functions. If there is some fault, then the model and PV plant will operate with different dynamics. Consequently, the residue e s (k), given by e s (k) = p dc,s (k) −p dc,s (k), will increase as time goes by. If the signal e s (k) reaches an amplitude greater than a limit established by the Adaptive Threshold block, so one can conclude that there is a fault, f (k), in the power plant. Figure 8 shows the general idea of this process.

g(k)
p dc,s (k)

Recursive Estimation
Adaptive Threshold Figure 9: The proposed method consists in transmitting a frame using ⌈α n cha ⌉ channel uses for the first transmission attempt, with 0 < α ≤ 1. In the event of a decoding error, incremental redundancy is transmitted making use of the rest of the available ⌊(1 − α) n cha ⌋ channel uses. In this work, the dynamics of the underlying system was approximated by a linear Auto-Regressive with eXogenous input (ARX) model, expressed bŷ p dc,s (k) = a 1pdc,s (k − 1) + a 2pdc,s (k − 2) + b 0 g(k) + b 1 g(k − 1), (1) or one can obtain the system's transfer function, using z transform in which G(z) ∈ C is the Z Transform of the irradiance, g(k), and P(Z) ∈ C is the Z transform of the output power,p dc,s (k), both in frequency domain. Parameters' vector θ = [a 1 (k) a 2 (k) b 0 (k) b 1 (k)] T ∈ R 4 were estimated using a Recursive Least Squares (RLS) algorithm [52]. As the system can have its dynamic varing over time (after a fault or degradation process), the evolution of these parameters can be seen in Figure 9.  It is worth mentioning that there are two strings in the underlying photovoltaic system. Thus, in order to detect faults in each string, there are two models as shown in Figure 8 (one for each string). Additionally, in each model, the parameters θ = [a 1 (k) a 2 (k) b 0 (k) b 1 (k)] T are approximately constant in normal system's operation. The discontinuities presented in the Figure 9 (dashed red lines) are representing possible faults in the photovoltaic system. Table 3 shows the correlations that were calculated between many possible input variables and the power output, in each string, of the underlying system, using real data, collected from the PV system and the meteorological station. Taking that into account, one can see for both input variables, irradiance (G) and PV temperature (T), a high correlation with output power in String 1 (P dc,1 ), and the same for output power in String 2 (P dc,2 ). Therefore, the natural choice for input variables would be: (i) irradiance (T); and, (ii) PV temperature (T). However, for simplicity reasons, only the irradiance G was chosen as input variable, because of its higher correlation: 0.96. Furthermore, the linear ARX model was chosen because of the following reasons: • ARX linear models are one of the most simple dynamic systems. Thus, its computational implementation is less time consuming, which is ideal for the implementation of monitoring systems in photovoltaic area; • The ARX model represents a close approximation to the real system, as detailed in Section 7.
It is important to mention that the proposed model is not the only one. In a previous work, the authors have already investigated Diode Models and Hammerstein-Wiener models, applied to the same problem [17]. The comparisons will be presented in Section 7.
In order to detect faults using the mismatch between the system and model, an adaptive threshold was employed, as can be seen in Figure 8. This block was mathematically composed by a recursive mean, expressed in (3), and recursive variance, expressed in (4), both being modulated by a forgetting factor λ [51]:ē in whichē(k) ∈ R is the mean of residues, in instant k ∈ Z, λ ∈ [0, 1], and σ 2 (k) ∈ R is the variance of the residues. One event is considered as a fault if the absolute value of the residue, |e(k)|, overflows the thresholdē(k) ± σ 2 (k), i.e.: The selection of the value was empirically done, aiming at minimizing Mean Square Error (MSE) between f (k) and a benchmark signal, b f (k) ∈ B = {0, 1}, expressing the evolution of manipulated faults over time. In other words, b f (k) = 0 defines that there is no fault in the system, while b f (k) = 1 represents a manipulated fail occurring.

Detection Metrics
In order to quantify the results that were obtained by the fault detection process, some metrics were employed in this work. However, before defining performance metrics for this process, it is important to define the following concepts: • TP (True Positive): it happens when the detection process points out a real fault, in the photovoltaic system; • TN (True Negative): it occurs when there is no fault in the photovoltaic system, and the fault detection system confirms that; • FP (False Positive): it happens when the photovoltaic system presents no fault, and the fault detection system points out a fault; and, • FN (False Negative): it occurs when the photovoltaic system presents a fault and the detection system does not signalize it.
Based on this, one can define the following performance metrics for the proposed fault detection system: • Accuracy (A): corresponds to the overall detection efficiency: • Precision (P): stands for the rate between positives indicators: • Sensitivity (S): evaluates the efficiency of classifying correct detection: • Specificity (E ): evaluates how efficiently the classifier identifies incorrect detections:

Fault Classification
Whenever a fault is detected at a given time ( f (k) = 1), the fault classification block is responsible for indicating to the user the most probable cause of the abnormal operation. For this task, we evaluated the accuracy of the four most common supervised machine learning methods. The variables that were chosen as the input for these algorithms are the ones that describe the behavior of the DC side of the PV plant, where the faults occur, forming a feature vector The classification system may be represented as a mapping function h : FV(k) → Ψ, where Ψ = L short , L open , L degradation , L shadowing represents a group of the four considered faults. Every considered method uses a training procedure to construct h. A brief description of each method and the training procedure will be detailed in the next paragraphs.
The first method tested was k-Nearest Neighbors [53], in which the current feature vector of the PV plant is associated to the one in the training set which presents the most similar characteristics (closest in terms of Euclidean distance). This method is very simple to implement, but has an disadvantage of requiring the complete training set to perform the classification.
For this reason, a still simple but less memory intensive method was considered: Decision Trees [53]. In this method, a tree structure is formed in which each class is contained in a leaf node. For classification, this tree is traversed from node to leaves, with each step being guided by binary decisions based on different input features. As an advantage, the classification procedure is very simple, but the process of building an efficient unbiased decision tree is not always available. Furthermore, a slight change on classes may require a complete tree rebuild, which motivates the search for more efficient classification algorithms.
Another widely used classification method is Support Vector Machines [54], which operates by creating a multidimensional vector space in which each feature vector is represented as a point.
The training procedure consists in determining an hyperplane mapped so that the examples of the separate classes are divided by a margin that must be made as wide as possible. The solution for the problem can be found with convex optimization techniques.
The last compared method was artificial neural networks, which consists of a parallel distributed signal processor that have the capacity to store knowledge using a learning algorithm. In this work, we used the Multi-Layer Perceptron (MLP) network [53], which is a feedforward architecture, with just a single hidden layer, an input layer, and the output layer, which corresponds to the assigned class (label). The learning process is based on the backpropagation algorithm [53], which basically consists in a method to estimate the gradient of the training error cost function, along the layers of the network, allowing the use of a gradient descent-based method to optimize and estimate the parameters. All of the mentioned algorithms were trained using the procedure depicted in the next section.

Training Procedure
The training and test procedures for fault classification are depicted in Figure 10. First, the training data, composed of only simulated data generated with the parameters presented in Section 4.2, are used to train the supervised machine learning methods, generating the mapping function h. Subsequently, h is applied to labeled points of the real dataset, as presented in Section 4.1. The partitioning generated by the mapping function is then compared to the actual label partitioning, this way, the system accuracy can be determined.
Partial shadowing Degradation Short-Circuit Open-Circuit Figure 14: The proposed method consists in transmitting a frame using ⌈α n cha ⌉ channel uses for the first transmission attempt, with 0 < α ≤ 1. In the event of a decoding error, incremental redundancy is transmitted making use of the rest of the available ⌊(1 − α) n cha ⌋ channel uses. In the training procedure, a technique, called cross-validation [53], was used with the train data to set the parameters for the classification methods. In that technique, the training dataset is partitioned into two groups, the first one is used for training while the second is used for classification accuracy evaluation (Section 6.2. The process can be repeated several times, by using different data to compose each partition, attention must be paid to include a fair proportion of each classification labels on the training/validation groups. The final result is the average of all repetitions.
In our case, we partitioned the simulated dataset using 80% of data for training, and repeating the process 10 times by using a different data to compose the partitions. This cross validation process was run with varying values for the method's parameters, in which we chose the ones that yield the best accuracy rate average among the tests.
For k-Nearest Neighbors, the parameter k, which is the amount of nearest neighbors considered, was varied in the set {2, 3, . . . , 99, 100} for the cross-validation application. Regarding Decision Tree, two parameters were evaluated: Maximum Leaf Nodes and Maximum Depth. The boundaries of the variation were {2, 3, . . . , 99, 100} for both parameters. For Support Vector Machine, the best results were achieved using the Gaussian kernel [54], which contains two parameters to be varied in the cross-validation process: the regularization parameter C (searched in {0.5, 1, 2, 5, 10, 15, 20, 25}), which is responsible for the balance between model complexity and goodness of fit to the training data and the kernel parameter γ (searched in {0.1, 0.5, 1, 2, 5, 10, 15, 20, 25}). Finally, the ANN was built with six input units, four output units and a single hidden layer with its size as the sole hyperparameter used. The hidden layer uses the Rectified Linear Unit as activation function, while the output layer uses the softmax (normalized exponential function). Additionally, the optimization solver used was the Adaptive Moment Estimation, or ADAM, as presented in [55]. The search for the hidden layer size, via cross-validation, was made in the interval {5, 6, . . . , 29, 30} neurons.

Classification Metric
For overall performance assessment of the classification stage, one can construct a confusion matrix with J classes, being J equals to the number of faults. Using this confusion matrix, the performance for all classes in terms of classification accuracy can be calculated, while using individual performances per class. The individual performance is computed by the number of corrected classified examples divided by the total number of examples per class. Subsequently, the average of individual performances is used to reduce the impact of class imbalance on the final result, allowing for a more adequate comparison among all methods.

Results and Discussion
In this section, all of the results are presented and discussed. First, the results of individual fault detection are presented, comparing recursive approaches and indicating the most appropriate model proposed in this work. In the sequence, the validation of simulated data for classification is presented, in order to highlight the use of simulated data in the context of classification. Posteriorly, the results of individual fault classification are presented, for different machine learning models. Subsequently, a combination of the best fault detection and classification models is presented, followed by a comparison with state-of-the-art models. Finally, the results of the MS with integrated fault detection and classification are depicted.

Model Results
This section presents the results of the fault detection process based on photovoltaic models and adaptive thresholds. Figure 11 shows, graphically, the performance of Single-Diode Model (SDM), Double-Diode Model (DDM), ARX model, and Hammerstein-Wiener Model (HWM) as compared to the real output system.  Figure 12: The proposed method consists in transmitting a frame using ⌈α n cha ⌉ channel uses for the first transmission attempt, with 0 < α ≤ 1. In the event of a decoding error, incremental redundancy is transmitted making use of the rest of the available ⌊(1 − α) n cha ⌋ channel uses.  Table 4 presents a quantitative comparison, between the different possible models. This analysis is based on Normalized Root Mean Square Error (NRMSE).  Table 4, and on Figure 12, it is possible to conclude that ARX model overcame the Diode models, reaching a NRMSE of 0.82, (for more details about the DDM, refer to [56]). However, the Hammerstein-Wiener (for more information, see [17]) has reached a NRMSE of 0.84. It is worth mentioning that, in this part of the analysis, only data representing the system without fault was employed on the parameter estimation of the models. It was expected that the DDM surpassed the SDM, in terms of NRMSE. However, in these experiments using real data, the SDM achieved a NRMSE bigger than DDM, 0.77 against 0.76. The authors consider that it happened because of the sub-optimal parameters estimation employed in DDM.
Analyzing the reasons for the ARX model performance, it may occur: (i) because the ARX model is dynamic, by definition, while the Diode models are static, and have difficulties for representing transient events in the underlying system; and, (ii) the employed ARX model parameters' estimation is based on RLS algorithm, which is better adapted to model a time varying system, or a system in fault operation.

Fault Detection Results
This section describe the results of applying the models, as mentioned in Section 7.1.1, in order to detect faults in photovoltaic systems. Figure 12 shows an overlay of the fault benchmark function b f (k) and fault, f (k), based on: (a) ARX model; (b) DDM; (c) SDM; and, (d) Hammerstein-Wiener model, all of them followed by an adaptive threshold.  Figure 13: The proposed method consists in transmitting a frame using ⌈α n cha ⌉ channel uses for the first transmission attempt, with 0 < α ≤ 1. In the event of a decoding error, incremental redundancy is transmitted making use of the rest of the available ⌊(1 − α) n cha ⌋ channel uses.  Figure 14: The proposed method consists in transmitting a frame using ⌈α n cha ⌉ channel uses for the first transmission attempt, with 0 < α ≤ 1. In the event of a decoding error, incremental redundancy is transmitted making use of the rest of the available ⌊(1 − α) n cha ⌋ channel uses.     Tables 5-8 show the raw data used to calculate the overall efficiency of the proposed fault detection process (Table 9). However, some inferences can be done even using the raw data. Among the tested cases, ARX models (Table 7) present: (i) the best ability to point out correctly when the system is operating normally (298,641 detections); (ii) the best capacity to discern when the system is operating in fault (181,658 detections); (iii) the lowest number of false positives (10,612 detections); and, (iv) the lowest number of false negatives (25,047 detections). These characteristics and the performance comparison among the models can be summarized in the Table 9. Based on Table 9, one can conclude that ARX model approach overcomes the three other models in all analyzed statistical properties. It is important to point out that the overall accuracy offered by the ARX model is 19.64% greater than the second best model (DDM). It is also worth to mention that the detection fault (based on ARX model) is robust in signalizing that the photovoltaic system is not in fault state, presenting a specificity of 92.26%. On the other hand, its precision is under 90%, representing the detection system can indicate faults erroneously in 12.22% of cases (considering a precision of 87.88%).
It is important to mention that the HWM was tested in fault detection, in spite of its time invariant nature, which is not advisable for detecting a fault, and its higher computational cost (if compared with the others). The SDM and DDM were tested, because they are benchmarks largely employed in the literature (as presented in Sections 2.3 and 4.2). Furthermore, all of the mentioned models were coupled with an adaptive threshold, in order to carry out the fault detection process.
The better performance achieved by the ARX model in fault detection may be due to: (i) it is a dynamical model (and have advantages if compared with static diode models); (ii) as ARX models are dynamic, they can describe the system behavior in both transient and steady state; and, (iii) ARX model parameters estimation is based on RLS algorithm, which is better adapted to model a time varying system (surpassing time invariant models, as Hammerstein-Wiener, in a fault detection).

Validation of Simulated Data
In Section 4, we proposed a hybrid approach for generating the training dataset. Some preliminary tests were made in order to guarantee that the simulated systems accurately emulates the real system behavior. For these tests, all of the collected irradiance (g(k)) and temperature (t(k)) pairs from the real dataset (Section 4.1) were used to generate simulated outputs. Table 10 shows the Mean Absolute Percentage Error (MAPE) between real and simulated data for normal operation and for each of proposed faults except from open-circuit, which outputs zero. It can be seen from the results that a maximum MAPE of 2.59% is observed for the normal operation, which is sufficiently small to not affect classification accuracy, when the system is trained using simulated data only. In addition, an excerpt of the generated data was plot in Figure 13, which shows a visual concordance between real and simulated data.  Figure 12: The proposed method consists in transmitting a frame using ⌈α n cha ⌉ channel uses for the first transmission attempt, with 0 < α ≤ 1. In the event of a decoding error, incremental redundancy is transmitted making use of the rest of the available ⌊(1 − α) n cha ⌋ channel uses. Because the validity of the proposed simulator is considered to be sufficient, we can proceed to Fault Classification tests, which will be described in the next section.

Fault Classification
We started the assessment of the classification methods by determining which algorithm is the best suited in this stage. Table 11 shows the obtained results, as well as the optimal parameters that are found with the cross-validation method, from Section 6.1. The Table also shows two accuracies: the Train Accuracy was calculated by the classification of the 20% data reserved for validation on the training dataset, whereas the Test Accuracy was calculated using the classification of the real dataset.
It can be observed that Decision Tree is the least accurate of the classifiers for our use case and the one that presents the most severe performance degradation when applied to real data. Surprisingly, SVM and k-NN classifiers performed better with real data than with the validation data. Such a result may be justified by the fact that, in the training set, there are extreme temperature values that are not observed in real data. Those conditions resulted in poor classification results in the training set, but are not included in the test set, since temperature is not less than five • C and not superior than 32 • C for real data. Additionally, SVM and k-NN classifiers cannot perform better than artificial neural networks, which present a very similar performance for Train and Test datasets.
For this reason, ANN was the selected method to be used in our system, and the confusion matrix for this method is presented in Table 12. The class accuracy is also shown in Table 12, indicating that the most challenging fault for classification purposes is shadowing. This behavior can be justified by the great variability in the shadowing effects, since each event may present a great variability in the effects that were observed in the monitored variables. The overall accuracy is 90.05%.

Combination of Fault Detection and Classification
By combining the proposed fault detection method with the best classifier (ANN), as presented in the previous Section, the final confusion matrix is presented in Table 13. In order to obtain such result, first, instantaneous data are applied to the fault detection model and, for each detected fault, the classifier is used to identify the corresponding fault. With that, the confusion matrix includes the normal class. The average class accuracy for Table 13 is 92.64%. When compared with the individual accuracy from detection (93.09%) and classification (95.44%) methods, one can observe that the result is inferior for the combined approach. However, it is similar to the result obtained for the detection stage, which demonstrates that most of the detected faults present high classification accuracy. The class with the most misclassifications is shadowing, i.e., 77% of individual accuracy, around 15% inferior when compared to the second class with more misclassifications (degradation). This is mainly due to the fact that it is the fault with more parameters that can vary, since it can occur in different ways, with different intensities, and in different amounts for each cell. Additionally, most of the observed errors in Table 13 are related to normal class, which indicates that this is the influence of the detection stage. The best class accuracy, on the other hand, is obtained for open-circuit, since it is the fault that causes the largest voltage and current drop and it cancels the generation contribution of an entire string.

Comparison with State-of-the-Art Methods
There are several related works that focus on detection and classification of faults in PV systems, as presented in Section 2. The lack of a standard protocol to generate and analyze faults, besides the absence of a public dataset, hampers a proper comparison between proposed method and related works.
However, by selecting works that detect and classify similar faults, we can present a comparison with state-of-the-art models.
The recent works from [39,40] present a similar two-stage architecture when compared with this work. Additionally, they use auto regressive models to estimate the expected power output as a function of current environmental conditions. The difference lies in the fault detection methods, which makes the works complimentary. While [39,40] uses fuzzy inference models yielding 98.2% accuracy with 16 combinations of shadowing, short circuit and open circuit, they cannot operate without disturbing the normal operation of the system, disconnecting the whole system to evaluate VxI curves or run the tree search algorithm.
The proposal from [42,45], on the other hand, works when the PV modules are normally operating. The first achieved 100% accuracy when detecting short circuit and open circuit faults while the second yielded overall accuracy of 97.52% for the same faults. This way, we observe a similar performance for the mentioned faults when compared to our proposed methods, which yields 97.22% accuracy for short-circuit and 98.78% for open circuit faults.
Moreover, we argue that our work is complementary to the mentioned since our proposed system is installed in a plant that generates more power (5 kW vs 1.8 kW), is installed in a region with more panel temperature variation (40 • C vs 7 • C) and can detect Shadowing and Degradation faults besides different short/open circuit conditions. In addition, we can achieve 92.64% of overall accuracy (detection and classification) while presenting some advantages that can be summarized, as follows: • it allows real-time fault detection and classification (detection and classification are performed every second), keeping the PV plant in operation (without disconnection); • shadowing events are caused by real shadowing, which makes it difficult to characterize, unlike the controlled shadowing presented in [11,38] that normally increases the performance for that class; and, • it presents a comparable performance to other works, notwithstanding those works use different databases and classification procedures.

Monitoring System
Finally, this section describes the implemented system that collects all of the data to make them available in an execution environment to run the fault detection and fault classification algorithms. The calculated results are indicated as the current status of the PV plant. Figure 14 shows one of the HMI screens, namely the real-time tab, in which all instantaneous and integrated monitored signals, both electrical and environmental, can be viewed together with the system inferred status. The data disposition follows the generation on the PV strings in the left, the inverter and its parameter in the center, and the output of two phases at the right. The upper and right hand strips show the environmental conditions and performance. In this tab, pre-built and user configurable alarms can also be set. The HMI also implements a performance evaluation tab and a history tab, in which a configurable graph of the chosen signals can be plotted in a chosen time window. Because the system is a test bench for monitoring strategies, further details are reported in other work [8]. Regarding time results of the detection and classification methods in the embedded system, the ARX model takes 0.25 ms to run and provide the detection result, whilst the ANN classifier takes 13.60 ms to inform the final classification, resulting in 13.85 ms for the whole process. Such a result is less than the 1s time resolution used in the detection and classification modules, allowing the proposed online identification. Additionally, since the detection stage is less time consuming compared to the classification, it can be executed inside the main loop (1 s), triggering the classification model only in the case of possible faults, releasing the embedded system to perform the other monitoring related tasks.

Conclusions
Maintaining continuous energy production in PV systems is a recurring subject in power utilities. It has attracted attention from the academic community, particularly in the context of proposing mitigation techniques and automatic analysis of possible production deviations in PV plants.
In this work, we presented a MS that provides electrical and environmental variables measurements, allowing to record instantaneous and historical data and estimate parameters that are related to the plant performance. Integrated to this MS, we proposed an online detection and classification of faults, such as: short-circuit, open-circuit, partial shadowing, and degradation. The complete system is installed in a 5 kW PV plant and it was validated when considering 16 days with faults in different conditions.
Regarding the fault detection, we proposed a recursive linear model to detect faults in the system, using irradiance on the PV panel as input signals and power as output. The accuracy of the fault detection was 93.09%, which is considerably higher than other models that are normally used in the literature, such as Single and Double-Diode. In terms of classification, different machine-learning-based methods were compared, and the best accuracy was observed for an ANN model, with 95.44%.
Additionally, simulated (and validated) data were used to train machine learning models, allowing to generate different fault conditions, increasing the generalization of the models.
By combining detection and classification, the overall accuracy was 92.64%. Such a result can be highlighted as relevant, since other state-of-the-art methods present comparable performance and do not present the integration of a fault detection and classification approach with an embedded PV plant monitoring system, allowing the online identification and classification of different PV faults, real-time and historical monitoring of electrical and environmental parameters of the plant. In addition, by making our real dataset publicly available, we intend to contribute to standardizing methods for comparing fault detection and classification methods.
In the sequence of this work, the authors intend to include the analysis of multiple simultaneous faults, besides adding more typical faults that can occur in the system, such as MPPT fault and other types of short-circuits. As a final validation of the proposed models, the system will also be integrated in different PV plants, some with higher rated power and some installed in different geographical regions that are subject to different environmental conditions. This way, the generalization capabilities of our system can be further reinforced.