Complex Network Analysis of Photovoltaic Plant Operations and Failure Modes

: This paper presents a novel data-driven approach, based on sensor network analysis in Photovoltaic (PV) power plants, to unveil hidden precursors in failure modes. The method is based on the analysis of signals from PV plant monitoring, and advocates the use of graph modeling techniques to reconstruct and investigate the connectivity among PV ﬁeld sensors, as is customary for Complex Network Analysis (CNA) approaches. Five month operation data are used in the present study. The results showed that the proposed methodology is able to discover speciﬁc hidden dynamics, also referred to as emerging properties in a Complexity Science perspective, which are not visible in the observation of individual sensor signal but are closely linked to the relationships occurring at the system level. The application of exploratory data analysis techniques on those properties demonstrated, for the speciﬁc plant under scrutiny, potential for early fault detection.


Introduction
The cumulative global photovoltaic (PV) capacity has been growing exponentially around the world over recent years. In the decade 2005-2015, the solar PV generation capacity in the EU has increased from 1.9 GW to 95.4 GW [1]. Notwithstanding this, the Europe PV market's conditions are still substantially dependent on regional energy policies and public subsidies for renewable energies. As per the Italian market, in June 2013 the Italian public company GSE (Gestore dei Servizi Energetici) officially announced the discontinuation of the last Feed-in-Tariff incentive after its cap of 6700 million euro was reached [2]. The end of such subsidies has led to new attention being focused on PV plant performance management, lifetime and availability, with a view to reduce operating and maintenance costs [3].
Faults in PV plant components (i.e., modules, converters, connection lines), in addition to downtime correlated penalties, could result in an acceleration of system aging and as a consequence in a reduction of power plant reliability [4]. Typically, faults severity depends on various factors. These factors include the time to detect, time to repair or substitute, COE, and occurrence over time, and all of these factors can have a significant impact on profitability [5]. For these reasons, over the last decade Fault Detection and Diagnosis (FDD) in PV systems has been established as a critical field of research [6]. To mention but a few areas, research has addressed key topics like real-time monitoring, partial shading effects analysis, estimation of natural degradation rate over time and residual life for solar panels [7]. In the FDD arena, a number of studies have proposed data-driven fault detection algorithms based on statistical processing of performance parameters (e.g., power loss factor analysis studies, I-V output characteristics [8][9][10][11]) or an exponentially weighted moving average control chart [12]. In addition, model-based techniques, implementing dynamic fault trees [13] or

Methodology
The field sensors are modeled using graph theory as a complex network, where the nodes are represented by the signals from sensors and the edge are evaluated with non-linear statistical correlation functions applied to the time series pairs. Specifically, the data flow is structured according to the following steps: a.
data collection and pre-processing, b.
connectivity analysis and graph modeling, c.
pattern recognition. At the initial stage (a), heterogeneous data acquired by the sensors are synchronized and cleaned by removing outliers. Then for stage (b), a fully-connected graph is created in which each monitored variable represents a node and all the nodes are connected to each other with directed edges. The output of this phase is a complete weighted graph model (also called functional graph) of the sensor network. After, some typical complex network measurements are applied to extrapolate synthetic properties from the functional graphs. Finally (c), these measurements were used as input for exploratory network analysis, with the aim of grouping the data as a function of multiple topological graph metrics.
The implementation of the proposed methodology has been done using Python version 3.5 [37]. In particular, a Python code has been created using Scikit-learn and NetworkX packages, respectively [38,39].

Complex Network Analysis
The CNA has its origins in graph theory and is used to describe the properties of complex systems through the mathematical study of networks. The key ingredient in the CNA is the study of the correlation among uni-variate signals recorded from different sources, as is customary for biological network analysis. For this purpose, the mutual information (MI) has been widely used [40][41][42].
In this paper we propose a novel approach based on the study of correlations between signals of m heterogeneous sensors within a fixed window of n samples. In particular, given two signals y n,i and y n,j , taken from the i-th and j-th component of the feature matrix Y n,m , where (n) is spanning over time steps, the mutual information MI quantifies the level of uncertainty in y n,i removed by the conditional knowledge of y n,j . This measure essentially tells us how much extra information one gets from one signal by knowing the outcomes of the other one [43][44][45]. Once for each sample, a correlation matrix is obtained computing the MI in the m-dimensional feature space, while sliding the window from the beginning of the data set up to the entire monitoring interval. As a consequence, the evolution of correlation matrix is represented in the form of a functional graph consisting of a set of m nodes associated by k weighted edges representing the connecting force between the pairs of nodes.
The result of this phase is therefore a dynamic graph that contains all the information about the evolution of spatial and temporal relationships between all the entities monitored and allows the definition of parameters quantifying the characteristics of the systems [46].
In the context of CNA, the network measurements represent the most used tools for the extrapolation of synthetic information through the analysis of network topology. These measurements can be evaluated over the entire network (e.g., Shortest Path length, Diameter, Global Efficiency, Modularity, &c) or they can locally refer to nodes (e.g., Centrality Measures). This type of metric has been applied in several social networking studies to address the problem of identifying and ranking those people who exert an unequal amount of influence on the decision-making of others (also called influencers or opinion leaders) and to study the diffusion information within the network [47][48][49].
In the present approach, both the network measurements relating to the whole network (global scale) and those specific to individual nodes (local scale) were considered. Specifically, the study uses Shortest Path Length [50], Diameter and Radius of the graph [51], among the different global-scale measurements, Eccentricity and Weighted Degree Centrality of the nodes [39,[52][53][54], among the local-scale measurements.

Exploratory Data Analysis
Promoted by Tukey [55], exploratory data analysis (EDA) represents a powerful tool to maximize insight into the underlying structure of a complex dataset. It facilitates the understanding of the distribution of samples and simplifies the data analysis, pointing out to special observations (outliers), clusters of similar observations, groups of related variables, and crossed relationships between specific observations and variables [56,57]. All this information in turn, can be very informative for further data modeling and it is of paramount importance to improve data knowledge. For these purposes, it encompasses a wide range of statistics and graphical tools (e.g., histogram, box plot, Pareto chart, principal component analysis, etc.) which have been commonly used for decades in various research fields such as archeology, biology, anthropology, medicine, chemometrics, &c [58][59][60].
In this paper, EDA is used to investigate the results of the complex network analysis and compare them with the sensor signals Y n,m . In particular, the 3D scatterplot has been used for visual pattern recognition. This tool is based on a features representation in a multivariable space, allowing us to identify the dependencies between different network measurements in order to discriminate specific operational PV plant clusters.

Case Study
The data were collected from a PV plant with a power of 1 MWp. The solar field is connected to two inverters, each with three conversion blocks. Both inverters are grid-tied, feeding a medium voltage power distribution network. They are equipped with fully independent monitoring systems and incorporate a solar power controller to regulate the Maximum Power Point Tracking (MPPT).
Measurements were recorded every 5 min and included AC and DC electrical parameters, AC power output, ambient temperature and solar irradiance. The monitoring system is shown in Figure 1. In particular, solar radiation on the plane of the modules and ambient temperature have been measured by a pyranometer and a PT-100 RTD sensor respectively, both installed on a sensor box near the panels. Both inverters are equipped with resistive potential divider voltage sensors for DC and AC parameters measurement for each conversion block. Shunt resistors have been used to measure 12 string currents as a representative sample of the plant.   The measurements were collected in the period from May 20th to September 15th 2017. The 47 monitored variables which define the m-dimensional feature matrix Y n,m are listed in Table 1.  Table 2 provides the specification of the sensors used in the monitoring system. As far as the dataset is concerned, the 47 monitored variables Y n,m are used for the connectivity analysis with a sliding window set to 216 data (about 18 h) per variable, along the available 5-months records.      The interest on inverter 1 is motivated by the observation of the signals revealing the occurrence of a fault in the period between 31st August and 15th September. In particular, a breakdown of switching devices occurred, which caused the failure of the block A of inverter 1 inductor. In this case the internal protection of the inverter automatically reacted by turning off the block involved in the fault. It is worth noting that the protection reacts within the sampling interval. Detailing on a three-day period at the end of August, the trends of the active power on the 3 blocks (from A to C) of the two inverters with the irradiation are shown in Figure 3 (inverter 2) and Figure 4 (inverter 1) respectively. The plots refer to 30th August (Figures 3a and 4a), 31st August (Figures 3b  and 4b) and September 1st (Figures 3c and 4c). The interest on inverter 1 is motivated by the observation of the signals revealing the occurrence of a fault in the period between 31st August and 15th September. In particular, a breakdown of switching devices occurred, which caused the failure of the block A of inverter 1 inductor. In this case the internal protection of the inverter automatically reacted by turning off the block involved in the fault. It is worth noting that the protection reacts within the sampling interval.

Results
Detailing on a three-day period at the end of August, the trends of the active power on the 3 blocks (from A to C) of the two inverters with the irradiation are shown in Figure 3 (inverter 2) and Figure 4 (inverter 1) respectively. The plots refer to 30th August (Figures 3a and 4a), 31st August (Figures 3b and 4b) and September 1st (Figures 3c and 4c). In Figures 3a and 4a, August 30th plots reveals the typical operation of PV plant in sunny condition, where the solar irradiance reaches the peak value of about 950 W/m 2 at mid-day and the active powers of all the inverters blocks follow its trend.
Looking at the monitored data on 31st August (Figures 3b and 4b), it is possible to infer that the PV plant operates under cloudy weather conditions with variations of solar irradiance. In particular, while the inverter 2 ( Figure 3b) appears to work in standard conditions, following the solar radiation  In Figures 3a and 4a, August 30th plots reveals the typical operation of PV plant in sunny condition, where the solar irradiance reaches the peak value of about 950 W/m 2 at mid-day and the active powers of all the inverters blocks follow its trend.
Looking at the monitored data on 31st August (Figures 3b and 4b), it is possible to infer that the PV plant operates under cloudy weather conditions with variations of solar irradiance. In particular, while the inverter 2 ( Figure 3b) appears to work in standard conditions, following the solar radiation trend, inverter 1 (Figure 4b) starting from 13:30 features the collapse of active power on block A, which then extends to 1st September (see Figure 4c) and gives evidence of fault occurrence.
To give more hints on this fault event, Figure 5 compares the signals Y n,7-8 and Y n,36 gathered from inverter 1 block A (Figure 5a) and the corresponding CNA metrics. Specifically, Figure 5b illustrates the behavior of degree centralities of solar irradiance, DC voltage, AC voltage and string currents.  To give more hints on this fault event, Figure 5 compares the signals Yn,7-8 and Yn,36 gathered from inverter 1 block A (Figure 5a) and the corresponding CNA metrics. Specifically, Figure 5b illustrates the behavior of degree centralities of solar irradiance, DC voltage, AC voltage and string currents.
As confirmed by sensor signals (Figure 5a), at 13:30 the fault has an immediate effect on the string currents, while the voltages zeroed around 16:30. When looking at complex network topology measurements (Figure 5b), the fault occurrence correlates with the departure of the degree centrality of solar irradiance from those of voltage-current signals which remain correlated.
In order to understand the dynamic of the PV field operation, EDA is applied to raw data from the plant monitoring system as well as to the modeled sensor network topology variables. Figure 6, first, shows the results of the three-dimensional scatterplot of sensor signals as a function of time and solar irradiance, focusing on: the active power (Figure 6a  As confirmed by sensor signals (Figure 5a), at 13:30 the fault has an immediate effect on the string currents, while the voltages zeroed around 16:30. When looking at complex network topology measurements (Figure 5b), the fault occurrence correlates with the departure of the degree centrality of solar irradiance from those of voltage-current signals which remain correlated.  In terms of raw data clustering, it is possible to resolve only the two behaviors which determine the operations before and after the fault event. Figure 7 illustrates the results of EDA applied to network topology measurements. The scatter plots are created to investigate the correlation in time between a global graph measure (e.g., the graph diameter) and a local node related variable (e.g., sensor signal degree of centrality). In particular, the sensor network graph diameter is plotted against active power degree centrality (Figure 7a), AC voltage degree centrality (Figure 7b), and DC voltage degree centrality (Figure 7c).
(a) (b) Figure 5. Analysis of (a) sensor signals from block A of the inverter 1 compared with (b) sensor network CNA metrics on 31st August.
In order to understand the dynamic of the PV field operation, EDA is applied to raw data from the plant monitoring system as well as to the modeled sensor network topology variables. Figure 6, first, shows the results of the three-dimensional scatterplot of sensor signals as a function of time and solar irradiance, focusing on: the active power (Figure 6a In terms of raw data clustering, it is possible to resolve only the two behaviors which determine the operations before and after the fault event. Figure 7 illustrates the results of EDA applied to network topology measurements. The scatter plots are created to investigate the correlation in time between a global graph measure (e.g., the graph diameter) and a local node related variable (e.g., sensor signal degree of centrality). In particular, the sensor network graph diameter is plotted against active power degree centrality (Figure 7a), AC voltage degree centrality (Figure 7b), and DC voltage degree centrality (Figure 7c).   As a first general comment, Figure 7 gives the evidence of a wealth of information emerging from the application of CNA methods. All the scatter plots demonstrate how the combination of global-local network topology measures determine the emergence of the dynamics in the PV plant operations, opening the possibility for a different performance indexing. In detail, all the plots distinguish three patterns possibly linked to the operations of the inverter block: • A to B (from 00:00 of 20 May to 06:30 of 31 August), typical of the standard operation, features a degree centrality in the range 3 to 2, which is nearly proportional to the diameter of the network which experience a variation in the range 3 to 1.5. In this case, the results of the agglomerative clustering coincide with those emerging from the scatterplots and confirm the existence of a pre-fault behavior following a nearly proportional decrease in global/local graph topology measures.

Conclusions
This paper proposes a sensor fusion approach based on the use of graph modeling techniques to investigate the connectivity among several key parameters of the PV plant.
The results show that the study of the properties of the graphs through the application of Complex Network Analysis techniques is able to reveal a wealth of hidden information.
As reported in Korn et al. [61] the global behavior of the system is more than the sum of its parts, so these properties can be thought of as behavior deriving from the interaction between the components that can't be identified through their simple functional decomposition.
In particular, the visual analysis of the single topological metrics of the functional graph focused on the inverter block where the fault occurs (see Figure 4), reveals an evident variation in terms of correlation between the monitored variables, mainly associated with an anomalous deviation of the degree centrality of the solar irradiance with respect to the centrality of the key block parameters in the fault conditions. However, by combining the multiple information deriving from the different network measurements (i.e., Degree Centrality and Network Diameter) with EDA techniques, it is possible to clearly distinguish not only the standard operation from the fault conditions, but also to isolate specific pre-fault conditions with an advance time with respect to the event of about 7 h. It is important to note that these conditions only emerge after applying CNA and are not observable with EDA techniques based on simple sensor signals. This latter type of analysis, in fact, is only able to discriminate standard operating conditions from fault conditions. Thus, the results of this study show interesting potential in the evaluation of useful Key Performance Indicators and Control Charts based on topographic metrics of graphs for early fault detection in the PV plant.