Towards a Holistic Microgrid Performance Framework and a Data-Driven Assessment Analysis

On becoming a commodity, Microgrids (MGs) have started gaining ground in various sizes (e.g., nanogrids, homegrids, etc.) and forms (e.g., local energy communities) leading an exponential growth in the respective sector. From demanding deployments such as military bases and hospitals, to tertiary and residential buildings and neighborhoods, MG systems exploit renewable and conventional generation assets, combined with various storage capabilities to deliver a completely new set of business opportunities and services in the context of the Smart Grid. As such systems involve economic, environmental and technical aspects, their performance is quite difficult to evaluate, since there are not any standards that cover all of these aspects, especially during operational stages. Towards allowing an holistic definition of a MG performance, for both design and operational stages, this paper first introduces a complete set of Key Performance Indicators to measure holistically the performance of a MG’s life cycle. Following, focusing on the MG’s day-to-day operation, a data-driven assessment is proposed, based on dynamic metrics, custom made reference models, and smart meter data, in order to be able to extract its operational performance. Two different algorithmic implementations (i.e., Dynamic Time Warping and t-distributed Stochastic Neighbor Embedding) are used to support the methodology proposed, while real-life data are used from a small scale MG to provide the desired proof-of-concept. Both algorithms seem to correctly identify days and periods of not optimal operation, hence presenting promising results for MG performance assessment, that could lead to a MG Performance Classification scheme.


Introduction
An increasing amount of industrial and commercial key stakeholders have shifted their attention to the Microgrid (MG) concept over the past few years [1]. Various MG-related viable solutions, both in terms of hardware and software products, have been recently introduced to the markets. Combined with the exponential escalation of renewable energy resources (RES) penetration, and the highly diverse technologies for energy storage that are promoted at the European Union (EU) level (e.g., Reference [2]), MGs have proven to be a flexible and active building block for the Smart Grid. The overall benefits from the incorporation of MGs into existing distribution grids range from improved efficiency to increased reliability and resiliency, even in extreme cases such as black start and grid restoration after natural disasters.
According to IEEE, a MG is "a group of interconnected loads and distributed energy resources (DERs) with clearly defined electrical boundaries that acts as a single controllable entity with respect to the grid and can connect and disconnect from the grid to enable it to operate in both grid-connected or island modes" [3,4]. In that context, the MG concept aims to maximize the balance between energy demand and supply for a group of consumers and producers, ensuring reliability in both grid-connected and islanded situations. In order to achieve this at sensible energy costs and in an effective way that optimally combines various RES, Energy Storage Systems (ESS), and conventional generators, there are certain financial and technical criteria that are required to be met.
Nevertheless, although MGs are gaining more and more attention, tools and procedures that define and assess their overall performance are still eluding of proper standardization, and in some cases profound understanding. As it will be presented in the following paragraphs (see Section 2), there have been several endeavours to properly draft a framework that addresses all aspects that define this rather complex and challenging problem, mainly due to the extremely vast product range (e.g., inverters, inverters/chargers, converters, etc.) but even more due to their highly vast configuration combinations (e.g., AC vs. DC, centralised vs. decentralised, etc.). Despite the invested effort, in most cases these frameworks are focusing on reliability and financial aspects, and even more specifically they correspond to the design phase, not the actual MG operation.
Furthermore, when discussing about performance of such complex and highly heterogeneous systems, it is quite often required to have detailed information on every asset deployed. This fact increases significantly the amount of information required and even more the effort that is needed to adequately understand all the necessary interrelations that can lead to the delivery of an accurate performance assessment. The endeavours should not only focus on providing evaluation insight for a MG operator, but also on offering comparative potential between different MGs. Through such assessment frameworks, sharing of best practices and solutions to common challenges can be facilitated, leading potentially to the increase of MG performance worldwide.
In an effort to holistically cover all aspects that define a MG's performance during real-life operation, this work initially presents a thorough framework that consists of numerous multi-dimensional Key Performance Indicators (KPIs) that successfully cover the most commonly found MG applications. Further on, and towards evaluating the operational performance, a data-driven performance assessment methodology is presented, following two different algorithmic implementations, aiming to understand and document the MG's performance from the data perspective, without requiring detailed information about each and every deployed asset. By employing known best practices in performance assessment from other applications, a reference model has been created for each KPI. By comparing any MG's data with this reference model, objective assessment of its overall MG performance is possible. This paper is structured in five sections-Section 2, presents a brief overview of related research results over the last decade. In Section 3 a detailed introduction is delivered on the holistic performance framework, followed by the data-driven performance assessment approaches and the MG testbed used for as experimental setup. Results over the introduced framework are presented in Section 4, and finally, Section 5 is devoted to conclusions and future work.

Related Work
Over the last decade there has been significant research on MGs in general, but also in particular towards understanding the aspects that define MG performance, evaluating a variety of KPIs on multiple levels and within the categories presented above. It needs to be highlighted that the term "performance" in MGs is still under discussion and in more than a few cases the interpretation varies, the majority of which researching just stability under various use cases (i.e., transition, high penetration of renewable energy, and more). In addition, there are a lot of instances were different metrics are presented, however the context remains the same, even though, as it will be highlighted below, with certain ambiguities involved.

Performance Indicators for MG Assessment
As the majority of research endeavours is performed at simulation level, their results can be applied mainly for (re-)design purposes. Bollen et al. [5] almost ten years ago presented very basic requirements in terms of MG operation such as voltage and frequency characteristics (magnitude, fluctuations, dips, etc.) as well other reliability metrics already defined for the distribution network such as the Customer Average Interruption Duration and Frequency Indeces (CAIDI and CAIFI, respectively), and the System Average Interruption Duration and Frequency Indeces (SAIDI and SAIFI, respectively). A few years after, Rahim and Hussain [6] evaluated the performance of their MG model by assessing the overall stability through detailed simulations, highlighting the fact that the energy mix from PV and wind generation should be carefully controlled so as to maintain the overall system stability. In terms of metrics from the MG perspective the focus is solely on voltage and power measurements, as well as the eigen values from the small signal analysis.
Taking it a step further, the same year, Wang et al. [7] introduced specific indexes for not only reliability, but also financial assessment. Using as a starting point the IEEE Standard 1366-2001 as well as previous work on the field [8], 19 metrics were introduced to assess both the reliability and economic MG performance. Results from a two step Monte Carlo simulation present interesting outcomes in terms of comparative foundation, however this work reflects only steady-state performance and neglects dynamic aspects of MG stability evaluation.
A more complex approach was followed by the Department of Energy of the Unites States of America, in an effort to meet their already defined national objectives for MG deployment [9]. Exploring revenue and environmental aspects during grid-connected operation, critical load reliability and longevity, priority load service for non-critical yet important loads, as well as other environmental and budgetary constraints, a design tool has been introduced towards planning MG deployment in real-life scenarios, while a performance/reliability model assess reliability and cost parameters on the simulated design results. Again, simulation results present very promising outcomes, nevertheless the application of such tool remains to be seen in real-life MG operation towards evaluating it's real-life performance. Although more complete as an evaluation toolset, the metrics list could not be recovered in total to replicate the indexes presented.
Moving from not only design but also operation evaluation, Gabbar et al. [10] introduced not only reliability, economic and environmental indicators, but also power quality related ones, presenting a more complete range of metrics for the MG lifecycle. In total 25 KPIs are demonstrated to fully cover the mentioned aspects and a multi-objective function has been defined ( Figure 1), to research their combined impact to the MG performance based on the interrelation between KPIs and MG assets/components (i.e., diesel generators, PV, wind turbines, battery, AC load, etc.). Solving an objective function that takes into account that interrelation, a method is first introduced for creating high level metrics for evaluating a MG's performance. Nevertheless, within the case studies examined in the presented work only 11 KPIs are actually used and assuming that their effect is uniform.
In a similar manner, and taking into account that MGs are considered active building blocks for Smart Grids (or that Smart Grid principles apply in MGs as well), the work of Personal et al. [11] is also evaluated, as it presents an extended list of indicators, grouped per objective and macro-objective for assessing the Smart Grid goals, covering 21 indicators (Figure 2), which at a high level cover besides the four main categories introduced above most innovative aspects, such as vehicle-to-grid (V2G), mini-generation, and more. The main difference, besides the higher level approach, is identified in the fact that each KPI has a different weight in the objective function defined to explore their combination. However, these weights are said to be found empirically and are not included in the work presented, hence once again the paradigm cannot be replicated. Even though the latter KPIs can be used for assessing Smart Grid goals, they are not necessary for evaluating its performance, and even more they cannot be considered actually essential for MG performance assessment. Although, it is important to consider the fact that KPIs do have different impact on the overall MG performance. Following, the work of Rangel et al. [12] is based upon less metrics and multi-objective functions per case for grid-connected MGs. Through hardware-in-the-loop (HIL) validation, the work focuses only on very specific metrics, such as the levelized cost of energy (LCoE); the total demand curtailed; the total purchased energy; the amount of fuel consumed; as well as power violations. Finally, in a recent work from Pinceti et al. [13] a series of technical indicators for MGs are introduced divided into power quality, energy efficiency, and conventional generators stress indicators. Once again the categories and the names of the KPIs are not consistent, even thought the technical aspects that are essential when assessing a MG are well known.

Data-Driven Approaches for Performance Assessment
There are quite a few examples in literature with new researching approaches for MG challenges were different metrics are used to highlight the results. Nevertheless, even in new standards there are not any procedures defined that can assess the MG's performance through these KPIs. Just like buildings and Energy Performance Certificates, a performance assessment framework for MGs will be most likely required in the years to come. However, scaling up the notion from buildings to MGs would not be a sustainable solution, as the amount of information required would be quite cumbersome to analyse, and quite the human and computational workload would be required. Hence, in an effort to use the context-rich data offered by the KPIs provided, a data-driven approach is required to identify how these can actually provide an overview of the entire MG's performance. Such an approach would comprise the foundation for assessing and comparing MGs in overall and regardless deployment-specific barriers. Although quite less in terms of detail, the KPIs presented still offer quite thhe multi-dimensional challenge.
Dimensionality reduction methods manage to extract low-dimensional representations of high-dimensional data, which is crucial for data-driven performance analysis, in absence of well-defined ground truth target labels for classification or regression. The low-dimensional representations allow the analyst to explore the available information and discover patterns that can then be linked to their expert knowledge. The literature of dimensionality reduction is rich with linear or non-linear methods, such as PCA [14], MDS [15], the general framework of graph embedding [16], and so forth, which try to unfold a low-dimensional manifold on which the data lie inside the high-dimensional space. Methods such as t-SNE [17] have shown high performance in distinguishing both the local and global formations of the high-dimensional input space. Combinations of Dimensionality reduction techniques also find use in Business Intelligence systems. Applying Dynamic Time Warping and Granger Causality iteratively to industrial time series, leading indexes have been extracted, whereas implementing adaptive agglomerative clustering has finally resulted in main indicator determination [18].
Applications of performance measurement and analysis have benefited from the use of dimensionality reduction methods, as a means to explore the measurement space and discover areas of high or low performance. In mobile network applications, graph-based dimensionality reduction techniques have been used to map users of cell phones on the two-dimensional plane, based on their behavioural similarities [19]. The resulting mapping allowed the visual distinction of separate groups of users with similar behaviour and their characterization with respect to normal or abnormal behaviour, based on the amount of call activity. Clustering methods have also been used in the cellular network literature to discover associations between performance indicators of the network [20]. Estimating user experience is also taking advantage of unsupervised learning, as in the case of video viewing assessment in Content Delivery Networks. K-means algorithm was fitted on a dataset of logs and requests forming meaningful clusters of user sessions and extracting clear KPI patterns [21].

Microgrid Key Performance Indicators
Following the state-of-the-art analysis in Section 2, a wide range of KPIs are introduced for holistically assessing MG performance. In total six categories have been identified, however with a certain degree of overlapping among them. In each of the categories, we define a set of KPIs with specific equations that allow precise quantification, provided that the required data can be gathered. Depending on their availability, further performance indicators can be defined to provide even more insight on the quality of technical and economic MG design decisions. Thus, measuring and assessing MG performance includes various metrics, which can be aggregates in the following main categories:

•
Economy means that certain levels of reliability and efficiency can only be met in an economic context where both use of resources during operation and expenditures for equipment like supply infrastructure and generation and storage facilities are taken into account. Important features that include the financial energy exchange with the grid as well as the overall cost minimization to supply the required demand are included.

•
Environmental criteria introduce the optimal use of renewable resources in the overall MG operation, thus reducing energy generated from fossil fuels from the utility grid or even from assets within the MG (e.g., diesel generators) and as such minimizing emissions from the MG infrastructure • Reliability of a system has to do with its ability to provide electricity to the MG consumers at the time and in the amount that it is requested • Resiliency has to do with the capability of the system to respond to various failures such as asset outages, control and/or communication equipment malfunctions • Power Quality describes parameters such as deviations in voltage magnitude and frequency from desired values, distortion of voltage and current waveforms, phase imbalance, and occurrence of various types of short-term voltage variations. These parameters must be within the suitable tolerance ranges for operation of given consumer equipment.

•
Efficiency refers to the use of resources that is needed to fulfil the consumer demands. Since MGs by definition include electricity generation resources, maximizing the use of renewable and emission-free resources is the key aspect here, while also average and peak demand reduction is explored along with accuracy, flexibility, and so forth. Within this category other metrics can also be included, which are not normally assessed in the literature. Such metrics are the accuracy of various components, the deviation from an operational schedule, and so forth.
Another classification of the aforementioned KPIs separates them into static and dynamic. The first category refers to KPIs that are defined and calculated during the design or redesign phase of a MG and cannot be considered as time-series, whereas the second one introduces KPIs that change over time and are highly dependent on the temporal domain. To avoid confusion, at this point the KPIs are presented regardless of whether they are static or dynamic. However, for the data-driven approach followed within the presented approach, only dynamic KPIs will be evaluated, as will be clarified in Section 3.2 This clarification is provided within the following sections where needed and it will also be evaluated during implementation and visualisation of the results during MG evaluation either at simulation or real-time application.
In each of these categories, a set of performance indicators has been identified or defined, allowing precise quantification of the above aspects, provided that the required data can be gathered. Depending on the later, further performance indicators can be defined to provide even more insight on the quality of technical and economic MG design decisions. All of the KPIs, their description and mathematical representation, along with key aspects that makes each one of them important are provided in detail in the appendixes of this manuscript. Nevertheless, an overview of the KPIs in each category is presented in the following paragraphs, highlighting some key points per KPI.

Economy
When considering economics we not only need to look at operation strategies and cost, but also at required equipment. This means that both operational expenditure (OPEX) and capital expenditure (CAPEX) should be taken into account. Therefore the question of economic viability is always part of the MG design phase already. In essence the idea behind the economic evaluation is to define a matrix of possible asset configurations and operation strategies, configure assumptions about demand, resource availability and asset prices, and then simulate operation across a certain time period (e.g., some years) and calculate the resulting total expenditure (TOTEX) [22].
Specifically, OPEX can be further divided into sub-metrics due to the different aspects that can be identified within a MG. Starting from the maintenance cost (MAINTEX), the fuel cost (mainly for diesel generators-FCEGEX), the OPEX metric is divided into more clear indicators for the MG needs. These are significant KPIs, especially when exploring the operational MG sustainability, as well as return of investment (ROI) parameters. Beyond commonly found metrics, this work also cover the energy transaction perspective that occurs with the grid. Either as a cost (AEPEX) or an income (AESINC), the microgrid is able to trade electricity as an active stakeholder to energy markets. From simple net-metering schemes, to complex dynamic pricing Demand Response programs, these KPIs elaborate on the added-value that the MG presents through ancillary services to the Smart Grid.
Another set of financial-related metrics that have been identified as necessary for the economy category are the levelized costs of energy (LCoE) for the three different generation (including energy provided from energy storage systems) technologies explored in the context of a MG infrastructure. These LCoE metrics provide an effective tool to compare different energy resource technologies with different lifetimes, cost structures, and capacity factors from an economical perspective [23]. Generally, the LCoE metric presents the cost to build and operate a power generation facility, divided by the total power output over the lifetime of the facility. In the present work, the most commonly existing deployment (i.e., PVs plus ESSs) is considered and explored.
All economic KPIs that have been identified are documented in detail in Appendix A.

Environmental
These performance indicators target to reveal how environmental friendly is the MG. Thus the optimal use of Renewable Energy Resources (RES) is explored in order to depict the energy savings in general for both grid-connected and islanded operation. These savings can be directly translated into Greenhouse Gas Emissions, or more easily in carbon dioxide (CO 2 ) emissions through the use of specific coefficients [24] based on the energy mix used to produce the energy consumed under no-RES operation. Moreover, since in EU there is already a framework for reducing greenhouse gas emissions, accompanied by a penalty system for CO 2 emissions excess under the EU Emissions Trading System Directive [25], a financial approach is also investigated towards weighting the significance of such penalties in the MG operation.
These metrics are important when assessing the environmental footprint of the MG. Although not directly critical for the actual operation, they provide meaningful indirect information for both efficiency and economic variables. The RES penetration (RESPEN) can provide a good insight of the actual coverage at any given moment, whereas the reduction of GHG emissions (GHGRED), accompanied by the Reduction of Carbon Penalties (PNLRED) allow an easy to understand way to quantify an environmental friendly (or not) performance. More information on these KPIs is provided in Appendix B.

Reliability
When designing MGs, reliability represents one of the most important (if not the most important) aspects. The reliability study of MGs, however, is a very complex subject because there are several issues to be taken into account, each with its own peculiarity. Under normal operation, utilities measure reliability by counting supply interruption duration, interruption frequencies, and calculating certain distinct metrics.
The most commonly used indices are those defined in IEEE Standard 1366-2003 [26]. To address MG performance in terms of reliability, the metrics that are taken into consideration cover both constant and dynamic indices. The latter are the ones that actually depict the operational condition of the designed MG, whereas the former are an estimation based on certain assumptions.
More well known and well established indices have been defined in the IEEE Standard 1366-2003 [26] towards measuring system and customer reliability. In particular, the most commonly used are SAIDI, SAIFI, CAIDI, and CAIFI.
Supply interruptions in this mode of operation can be caused by different events: Short circuits are the most common reason. Any electrical equipment where a short circuit occurs must be disconnected as soon as possible to prevent further damage. This is done automatically by protection relays and fuses and can result in consumer outages. Short circuits can be caused by deterioration of isolation material, dirt accumulation, human error (e.g., switching mistakes), lightning strokes, animals, earth works, and so forth. Malfunction of protection relays often also results in interruptions.
As it is really difficult to actually measure such metrics from the MG perspective, as well as due to the large amount of statistical information and further assumptions that are required for a meaningful analysis of such reliability indices, an assessment of the original metrics is quite cumbersome or even infeasible. Therefore, the authors have redefined the aforementioned KPIs in a manner that they can correspond better to a MG's operation (i.e., LAIDI, LAIFI, MGAIDI, MGAIFI).
Following the same principles as above, where the loads are of utmost importance for the MG in general, additional reliability KPIs need to be presented. As any system, to be able to assess something, it is essential to first being able to monitor it at any given moment. Towards that direction the Load Monitoring Coverage (LDCOV) has been introduced. But load coverage does not stop there. To support ancillary services and optimal operation, it is important to have knowledge over how much flexible loads are present, and thus how much demand can be curtailed or shifted, for servicing multiple business or technical requirements (i.e., Demand Response request or maintenance needs). Hence, a KPI for flexible load integration (LDFLEX) needs to be provided. From there on, to able to support a reliable MG, it is necessary to have knowledge of the balance between supply and demand. Different KPIs can quantify such knowledge. From RES (RESCOV), to Diesel (DGCOV), and ESS (ESSA, ESSANoRES, SCC) coverage metrics have been introduced.
These KPIs are quite important in the general performance of the MG, but are extremely essential for anticipating and planning potential islanded occurrences, as they will be defining the timeframe under which the MG will remain operational. For a clearer understanding of the new metrics as well as other reliability KPIs, the reader is redirected Appendix C.

Resiliency
As previously explained, system resiliency is the capability to be able to continue to operate within acceptable limits after certain failures occur upon the system for various reasons. In general, the resilience of a MG can be estimated during the design phase by calculating two metrics, the Systems Resilience Factor (SRF) and the Units Resilience Factor (URF). However, even though this can be useful for anticipating failures and estimating how resilient the system is, there is also a need to be able to provide quantified resiliency metrics while the system is operational. Thus, after the first two KPIs which are static, the authors present indicators that can evolve dynamically and provide a more accurate description of how resilient the MG is, in terms of general duration and reaction times (i.e., SPFSR and DUROSPF), as well as voltage (VLTDEVSPF), frequency (FQDEVSPF), active (PDEVSPF) and reactive (QDEVSPF) power deviations during a single-point failure occasion.
These KPIs are not usually considered beyond the design of the MG, and even more after such events have occurred, given the rarity of them. Most systems do not even provide reports of that kind of metrics. Nevertheless, even though rare, such metrics could enable early detection of critical issues as well as the necessary knowledge for preventing future ones. A detailed description of these KPIs can be found in Appendix D.

Power Quality
Power Quality can be analysed using standard EN 50160 [27]. There, limits for all quantities associated with voltage are defined (amplitude, frequency, Total Harmonic Distortion, flicker, unbalances). To fulfil the requirements of the standard, the examined quantities have to stay within the defined limits for a certain amount of time (e.g., 95%). Usually one week is analysed when checking if power quality requirements are fulfilled.
The drawback of doing a power quality analysis with standard EN 50160 is that there is no more information than requirements are fulfilled or are not fulfilled. Hence, if a deeper analysis of power quality parameters shall be conducted, new criteria have to be defined. In the this manuscript, detailed criteria to analyse power quality are described. The focus is put on voltage and frequency fluctuations (i.e., FQR, FQSTD, VLTR, and VLTSTD) because these two parameters are the most critical ones. Generator protection gets triggered depending on frequency and voltage deviations. Hence, it is essential to keep these parameters within a certain range to prevent blackouts. Detailed quantification of real-time development of these metrics, along with other KPIs, can reveal issues in terms of for example, RES penetration, that are not easy to catch in daily or monthly measures, and can be critical when occurring simultaneously with a transition to islanded operation.
In addition, the Total Harmonic Distribution is also included, for well known reasons, whereas an additional indicator in terms of Asymmetric load charge is presented, to assess the distribution of the MG demand, but also given the recent interest in phase balancing as an ancillary service to the grid. All Power Quality KPIs are available in detail in Appendix E.
3.1.6. Efficiency MG efficiency includes mostly metrics that provide some sort of feedback as to how well the MG performs under normal operation conditions. This definition of efficiency has added value mostly in deployments where there is RES-based electricity generation within the premises of the MG. If the electricity demand is always higher than the solar and/or wind-based generation and there is no relevant minimum operating level of other generation, then the VRE (variable renewable energy, meaning wind turbines and PVs) generation does not need to be curtailed at any time, and operational strategies of demand response do not have an impact on efficiency (in this case storage is never really needed and hence should not be used, since it always involves losses).
During island operation, if there is more VRE-based power injection available at one point in time than there is electricity demand, then flexibility is needed to establish the power balance. This can be done by any combination of:

•
Reducing feed-in from VRE sources (curtailment) • Demand response (increasing the load to match the high available generation) • Making use of storage facilities for electricity (batteries) Using demand response and storage is likely to increase efficiency in terms of maximising use of RES, but also considering curtailment is useful from an economic perspective.
In grid-connected operation there is typically no need for any demand response or storage application, unless there are incentives for minimizing power exchange (or exchange of energy altogether) with the external system. If such incentives are considered, then the share of renewable generation that is stored or consumed locally without ever being exchanged with the grid is also an indicator of efficiency.
To provide an overview in terms of MG efficiency the KPIs presented in Appendix F have been defined based on previous work on Smart Grids [11]. Special interest is denoted to the capability to reduce demand, both in general (ROED) and peak (RRD) values, but also to the algorithms deployed to optimize the MG management (DOTO). As most of them require forecasting algorithms to optimally schedule MG operation, these should also be evaluated (i.e., LDFA, RESPFA, UPFL).
Up to this point, a holistic performance assessment framework has been presented to cover all MG performance aspects. To the knowledge of the authors, all of the above KPIs are required to have an overall understanding of whether a MG will or is currently operating as good as it can. However, not all of the above KPIs are needed for the real-time assessment of a MG's performance. Therefore, only a subset of the presented KPIs has been selected for the second part of the presented methodology of this manuscript.

Dynamic KPIs and Reference Values
As this work introduces a data-driven approach for evaluating from a higher level perspective the MG performance, some KPIs were not included in the methodology presented below. The authors selected the KPIs ( Table 1) that their values change frequently and are more significant for the daily operation of the MG, in order to proceed with further analysis. More specifically, static KPIs that do not affect day to day activity and are mainly related to the MG establishment and maintenance like CAPEX, MAINTEX, REPLEX and so forth have been excluded. On the contrary some annual KPIs like AESINC and AEPEX were substituted from their equivalent daily KPIs, meaning DESINC and DEPEX. In addition, Diesel generator related KPIs like FCEGEX, LCoEG and so forth have been removed as the studied MG has not such assets installed. Finally, the KPIs that are referred to interruptions and failures were not included in the analysis as their values change only in transient phenomena with limited duration and this work studies the normal operation which constitute the wider period of the microgrid operation. Following other performance evaluation schemes (e.g., Building Performance Certificates), for each KPI a reference value has been created to explore the validity of the methodology adopted. Even though it is well understood that such reference models should be generic and not specific to a dedicated infrastructure, towards providing the required proof of concept, the reference models for each KPI was calculated based on the actual values of the MG examined. Nevertheless, some generic principles are also present for certain KPIs (i.e., the electrical characteristics of the MG like voltage and frequency should not refrain from electric grid nominal values, meaning 230 V and 50 Hz respectively for Greece.) For more dynamic measurements like generated and consumed energy, monthly reference values were calculated.

Data-Driven Performance Assessment Analysis
In this section, two data-driven approaches for the performance assessment methodology suggested are presented: the Dynamic Time Warping (DTW) and the t-distributed Stochastic Neighbor Embedding (t-SNE).

Dynamic Time Warping-(DTW)
The Dynamic Time Warping (DTW) distance algorithm is a comparison methodology and is used extensively in many areas of time series analysis and it is a well-known algorithm in scientific literature. DTW is a seminal time series comparison technique that is first introduced in the 1960s [28] and has been used extensively for speech and word recognition [29,30] since the 1970s with sound waves as the source.
The objective of this time series comparison method is to produce a distance metric between two input time series. The similarity or dissimilarity of two input time series is calculated by converting the data into vectors and calculating the Euclidean distance between those points in vector space. Generally, DTW distance metric is as a robust similarity score between two time series where the lower the number is the more similar the two time series are. Below, the DTW algorithm is presented as described in Reference [31].
Given two time series X = (x 1 , x 2 , ..., x N ), N ∈ N and Y = (y 1 , y 2 , ..., y M ), M ∈ N represented by the sequences of values (or curves represented by the sequences of vertices) DTW algorithm starts by building the distance matrix C ∈ R N×M representing all pairwise distances between X and Y time series. This distance matrix is called the local cost matrix for the alignment of two time series X and Y: Once the local cost matrix built, the algorithm finds the alignment path, called as the warping path or the warping function, which runs through the low-cost areas on the cost matrix, which defines the correspondence of an element x i ∈ X to y j ∈ Y following the boundary condition which assigns first and last elements of X and Y to each other. The cost function associated with a warping path computed with respect to the local cost matrix (which represents all pairwise distances) is given by: The warping path which has a minimal cost associated with alignment called the optimal warping path. The DTW distance function DTW(X, Y) = c p * (X, Y) = min{c p (X, Y), p ∈ P N×M } where P N×M is the set of all possible warping paths and builds the accumulated cost matrix D which is defined as follows: The only limitation of the algorithm is that time series should be sampled at equidistant points in time.
This technique can be used not only for speech and recognition pattern matching, but also anomaly detection (e.g., overlap time series between two disjoint time periods to understand if the shape has changed significantly, or to examine outliers). In our work, DTW will be used to calculate the similarity between the actual KPIs time series and their reference values.

Low-Dimensional Embedding Using t-SNE
In an attempt to characterize instances of MGs with respect to their performance, we have also examined embedding the MGs in the two-dimensional plane, which allows the creation of visual maps of the MGs and the inspection of their structure.
We have used the t-SNE [17] method for the two-dimensional embedding. The t-SNE method (t-Distributed Stochastic Neighbor Embedding) is a popular dimensionality reduction method that manages to map high-dimensional points onto a low-dimensional space, so that both local and global structures of the dataset are apparent. The t-SNE method is based on the earlier SNE (Stochastic Neighbor Embedding) method [32]. In SNE, the similarity between two data points x i and x j was taken to be the conditional probability p j|i that x i would pick x j as its neighbor, considering a Gaussian neighborhood distribution around each data point. The goal was to find low-dimensional points y i and y j , so that a corresponding lower-dimensional Gaussian conditional distribution q j|i would be as close to the original distribution as possible. The Kullback-Leibler divergence was used as a measure of closeness between the distributions [32], which was optimized using gradient descent: The t-SNE method improves the SNE method by providing an easier cost function to optimize, making use of symmetric conditional probabilities between points, thus achieving faster optimization. It also overcomes a crowding problem arising in SNE, where multiple points were crowded in the same space, by considering a Student's t distribution for the low-dimensional points, instead of a Gaussian one. Under these improvements, the t-SNE method can achieve revealing two-dimensional visualizations at a small computation time.
In our case, the high-dimensional MG instances are mapped to two-dimensional points, in order to examine the structure of the resulting mapping. The MG instances considered are the characteristics of the same MG at different days of the year. Specifically, let us consider a single day of the dataset. For this day, and for each KPI type c, we collect a time series of KPI measurements, at 15 min intervals: x c = (x c,1 , x c,2 , . . . , x c,96 ).
There is also a subset of KPIs for which we only have one aggregate daily values. For these KPIs, each x c is a scalar.
As a pre-processing step, we aggregate the time-series per day by computing their Root Mean Square (RMS) value: Considering that the RMS value of a scalar is the same value, we can extend this notation to all KPIs, including those with only a single value per day. This leads us to a vector x = (x 1 , x 2 , . . . , x C ), characterizing a single day, whose elements each correspond to one of the C types of KPIs considered.
Let us now consider that there are N days in the dataset. We can organize the above vectors in a matrix X ∈ R N×C , where each element X ic is the RMS value of the time series of KPI type c for day i. Since different KPI types have different scales, we normalize the columns of X by dividing them by their RMS value across all days. If we let X c denote the c-th column of X, the normalized column X c is computed as: We then supply this normalized feature matrix X as input to t-SNE, which results in a matrix Y ∈ R N×2 , where each row is the two-dimensional embedding of the corresponding day in the original data: We can then examine the structure of the resulting embedding, to find areas of interest, and characterize each area based on the deviations of its points from reference values provided by experts. Results of this procedure can be found in Section 4.3, below.

Microgrid Testbed
In order to be able to provide the necessary evaluation and validation of the proposed methodology, data from a real-life small scale MG have been employed over a time range of a few months. The MG discussed is a living lab at the Centre for Research and Technology Hellas infrastructure consisting of a two-floor Smart House building (see Figure 3), equipped with 9.57 kWp PV installation and a 5 kWh Li-on battery. CERTH Smart House is the first grid-connected MG in Greece and is presented in detail in Reference [33]. Going into more detail, the electrical design of the MG deployed is depicted in Figure 4. Starting from the point of common coupling with the Grid (left) and going to towards the loads, an interface protection relay has been installed to ensure compliance with the Greek regulations in terms of voltage, frequency, and time restrictions. Following, all loads have been connected in line with a master-slave configuration of three 3 kW single phase inverter through which it is possible to fully control the power flow from and to the grid. The PVs are connected on a common AC bus at the output of the ESS inverters, through a two MPP channel PV inverter of 10kW rated power. The loads follow, with a nominal capacity exceeding by far generation and storage, however due to asynchronous operation, during grid-connected mode around 12-14 kW can be self-consumed, whereas on islanded operation this is limited to a maximum of 9 kW and only during sufficient PV production. Finally, for the MG described, a day-ahead and real-time optimisation framework is deployed following state-of-the-art energy forecasting algorithms for automatically controlling all involved assets based on maximizing the MG financial and technical aspects. However, these are considered out of scope for the presented work and are not further analysed.

KPI Overview
Starting mid 2019 the framework described in Section 3.1 has been deployed to the MG testbed, starting the data collection, executing and collecting various scenarios (i.e., days without any PV generation). Either at the 15 min or daily interval, all KPIs are calculated dynamically and stored automatically to a local database. Due to various technical and non-technical reasons, not all days have valid data, hence a certain degree of prep-processing is required for the extracting the correct dataset to be used in the data-driven performance assessment.
The following figures ( Figures 5-12) provide a graphical representation of an indicative day of the MG (i.e., 23/08/2019), as an example. Moreover, Table 2 depicts the aggregated results over the studied period for certain financial, environmental and reliability KPIs.

DTW
The DTW distance is calculated for a 7 month period in 2019 (June-December) for those days where there were measurements every 15 min for the entire time of day, a total of 96 measurements. The whole dataset consists of a total of 144 days, divided into months as follows: June: 19, July: 22, August: 26, September: 27, October: 15, November: 15, and December: 20.
In Figure 13a one can see that reduction of greenhouse gas (GHG) emissions (GHGRED) does not show large deviations from the reference value throughout the control period (see red line) while, Energy Storage System Autonomy (ESSA), Energy Storage System Autonomy to Hours without RES production ratio (ESSANoRES) and SCC, show corresponding deviations from their reference value, with larger the one of August. In October, a similar phenomenon is observed while in December only SCC has the larger deviation compared to ESSA and ESSANoRES. On the other side, in Figure 13b one can see that for Frequency Range (FQR), Voltage Range/Fluctuations (VLTR) and Excessive Asymmetric Load Charge (EALC) the deviations are nearly zero, except from a small period at the end September (like ESSA, ESSANoRES and SCC), while for Upward Flexibility (UPFL) the deviations from references values are quite large with those in August to be the larger ones (like ESSA, ESSANoRES and SCC) and again in October a large deviation at the end of the month is observed. From Figure 13a,b we see that when there are large deviations from the reference values this phenomenon is observed in several KPIs.

t-SNE
Following the procedure described in Section 3.2.2, we can compute the two-dimensional embeddings of the days in the dataset, using the t-SNE method. The resulting embedding can be seen in Figure 14. Each point corresponds to a particular day in the dataset. Its position on the two-dimensional plane is determined by t-SNE so that nearby points correspond to days that are similar with respect to their KPI values. Some groups of days seem to emerge in this mapping, forming clusters of points separated by relatively empty space. Each of these groups contains days that follow a similar pattern with regard to their KPI values. However, we cannot so far tell which areas correspond to days with good performance and which to days with poor performance. Figure 14. Two-dimensional embedding of the days in the dataset, using t-distributed Stochastic Neighbor Embedding (t-SNE). Each point corresponds to a day. The xand ycoordinates are computed by t-SNE, so that nearby points correspond to days that are similar with respect to their KPIs. We can see that groups of similar days emerge.
In order to characterize the areas computed by t-SNE in terms of their performance, we define a crude measure of performance cost for each day, by comparing the KPI values of each day to reference values provided by experts. The available reference values are in the form of time series (of 15-min time intervals, similar to the KPI measurements), or in the form of scalar values for those KPIs for which only aggregated daily measurements are available. The reference values characterize a single reference day in each month of the year.
Prior to defining the cost functions, the reference values are aggregated using the RMS values of the time series, and normalized per KPI, by dividing them with the RMS values computed from the data, in order for both the data and the reference values to be at the same scale. Then, for each KPI, the cost of a measurement x ic is defined as follows: where i is the index of a particular day, c the index of a particular KPI and r ic the reference value for day i (i.e., for the month of day i) and for KPI c. In other words, the more the squared difference between the measurement and the reference value, the larger the cost. There are three exceptions in the above definition, for the RESPEN, LDFA and RESPFA KPIs. For these KPIs, the nominal values are not particular values, but rather thresholds above or below which the performance is optimal. For these KPIs, the above definition is slightly modified, so that when the x ic falls within the optimal range, the cost is 0. Otherwise, it is as above.
Using the above definitions for costs, a cost matrix B ∈ R N×C is constructed, where each element is B ic = cost c (x ic , r ic ). Summing the costs across all days, we construct an aggregated cost per day: This gives a crude measure of the deviation of each day from the nominal values. This aggregate measure looses some of the information contained in the individual KPI values available per day, which was preserved in the t-SNE embedding. However, the fact that it is a scalar value for each day allows us to approximately characterize each day, in terms of its performance, and thus use this measure to characterize the different areas appearing in the t-SNE embedding. Figure 15a shows the same embedding as Figure 14, but now the points are colored according to the total cost per day. The red color corresponds to high cost (large deviation from the reference values) while the white color corresponds to low cost (small deviation from the reference values). We can see that the different areas of the t-SNE embedding are more or less colored differently according to the cost function. The left and bottom parts contain small groups of days with consistently high costs, while larger areas mostly at the right part of the mapping contain days with low costs, that is, close to the reference values on average. We can use the costs per day as a guide to characterize the larger areas appearing in the t-SNE embedding.
(a) (b) Figure 15. Two-dimensional embedding of the days in the dataset, using t-SNE. (a) Each day is colored according to the total cost (deviation from reference values). We can see areas of high cost (red, left and bottom), as well as areas of low cost (white, right). (b) Each day is colored according to its assigned cluster, using k-means clustering. Each cluster can be characterized by the average cost of its days.
As an attempt to do such a characterization, we have proceeded to group the two-dimensional points using k-means clustering, in order to split the points into groups according to their positions on the map. The number of clusters has been set to 10, in order to be able to capture the small groups of points appearing in the embedding. The resulting clustering is shown in Figure 15b, where each color corresponds to a cluster. Computing the average total cost per cluster gives us an indication of the overall performance of the days in each group, as shown in Table 3. The clusters are ordered in decreasing average cost order. The clusters with the larger costs are 8 and 6, which correspond to the left and bottom parts of the mapping. Such a ranking can be used to grade each group of days, or each group of microgrids in general, according to its performance compared to the other days. Table 3. Average cost for each cluster of Figure 15b, sorted in decreasing cost order. The clusters with higher costs are 8 and 6, which correspond to the left and bottom areas, respectively. The cost function and cluster assignments can also be plotted against time, in order to examine which days are those with the higher and lower costs, or which days belong to the different clusters (see Figure 16). The larger costs appear in August, corresponding to the high-cost clusters. This is due to the large deviation from the reference values in August. A two-day spike is also visible on September 25 and 26, when there was significantly low production. The form of the curve in Figure 16 is similar to the form of the curves in Figure 13a, showing that the cost measure characterizes the different days in a similar manner as the DTW method. The characteristic form arises from the deviation of each day with respect to the corresponding reference values. This suggests that the two methods of quantifying the difference with respect to the reference values converge. Merging the two methods to construct a unified measure of performance that can be validated against expert ground truth is left for future work.

Conclusions & Future Work
This paper presented two main contributions: (i) a holistic performance assessment framework that consists of 46 KPIs covering the most commonly assessed aspects of a MG design and operational performance, and (ii) a data-driven approach for evaluating the operational MG phase through a high level analysis of the dynamic KPIs (i.e., 18) presented in the above mentioned holistic framework. For the latter, two different approaches have been employed for exploring the methodology proposed, namely the DTW and t-SNE implementations. Both algorithms were evaluated on specific reference models that were made having the MG's optimal performance in mind (e.g., no frequency and voltage deviations, high RES coverage, etc.). As tested with real-life data from a small scale MG in northern Greece, both approaches seem to correctly identify and classify daily MG performance, providing a promising tool for high level performance assessment.
Starting from the KPI framework, it is quite evident that there are a lot of aspects that require attention when evaluating the performance of a MG, a fact that has been highlighted by multiple different research results, and in a collective manner presented in this manuscript as well. This fact is also translated into quite a lot of data and information revolving around all seven metric categories, which further expands to a variety of metering points and equipment required for collecting all necessary input. Most current systems have not considered all of these KPIs whilst a MG design phase. Even more, in terms of real-time operation, there are quite a lot of challenges that need to be addressed in order to truly being able to evaluate the overall performance, such as interoperability issues. This work aimed to detail all this information, into a uniform framework, that can potentially aid towards better designing and operating future microgrids.
The algorithms used for characterizing days and grouping KPIs, that is, DTW and t-SNE, are effective in discovering groups of similar days and in assessing the difference of each day from the supplied reference values, as has been demonstrated in the results. So far, they mostly operate in an unsupervised manner, utilizing the supplied expert knowledge (the reference values) as a post-processing step, to construct crude measures of assessing the algorithm results. Based on the promising results of this work, future extensions could use the reference values explicitly to train dimensionality reduction and prediction models.
Furthermore, the data-driven methodology can lead to a major breakthrough in MG design, assessment and comparison. As already presented, both algorithms reach to similar conclusions in terms of daily performance, identifying correctly days where the overall MG performance had dropped. Such results, could introduce multiple merits to the MG research and industrial community. Beside the obvious benefit of being able to assess in real-time the MG performance by going through aggregated information (i.e., data originating not per inverter but from all inverters together regardless of their types, sizes, etc.), it is also easier to examine and reveal system problems and shortcomings by evaluating days and moments that introduce "poorer" performance, as shown in the presented results. By doing so, the MG operator can identify quite early performance issues and proceed towards resolving them, acting in a preventive manner.
Moreover, even through still in quite a neonatal form, through this approach, a new high level metric could be introduced, similar to the ones in buildings, that could present the MG Performance Class. Such a metric besides the quite practical result of easily understanding how "good" a MG is, it could facilitate significantly regulatory aspects that currently hinder MG integration to the grid in various countries. Just like Energy Performance Certificates, the MG Performance class could unlock how MGs integrate, operate and support the Smart Grid in its legislative perspective.
As an ongoing investigation, there are various ideas and concepts that have erupted during the evaluation of the presented methodologies. These are expected to be analysed as future work in the following months to come. The most interesting of these ideas is the following. A Long Short-Term Memory Auto-encoder will be used in a sense of trying to learn data representation, an efficient encoding that uses as few parameters and memory as possible, so as to detect hidden anomalies in the KPIs. Since the data are unlabeled, the whole network will be unsupervised.

Limitations
Although in this study an holistic performance framework and a promising methodology for the real-time performance assessment have been presented, there are certain limitations that require further investigation, one of which is the size of the MG testbed employed. Although sufficient for providing proof of concept, larger MG are needed to further validate the presented methodology and thoroughly examine the assessment results. Another limitation identified is the reference models. Even though some of them are generic, a few are specific to the MG examined. As a result, this approach cannot be easily be used to other MGs. To that direction, it is necessary to create a set of reference models (besides the generic ones) that will be able to describe a variety of MG sizes and configurations towards presenting the necessary ground truth for accurate and validated performance assessment results.
Another key aspect that requires future attention is an eighth category of KPIs that have to do with security, both physical and cyber. Although not directly related with the energy performance, it is expected to play a major role in future deployments affecting both hardware and software investments required but also issues emerging in terms of information exchange, hence making interoperability even more critical.
Finally, in the Smart Grid context, MGs are expected to deliver an active building block that can offer a magnitude of ancillary services. Even though certain aspects are already covered (directly or indirectly), additional KPIs may be required to fully cover these aspects.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

Capital Expenditure
• Initial cost of the investment needed in order to make the system functional.
• Calculated once the project is finished and the purchases of the new equipment can be added.
• Considered to be static information since it is not affected by the MG operation.
• In case of infrastructure changes, this metric should be also recalculated taking into account more complex calculations that include equipment age, return of investment, and so forth.
• N: total amount of assets purchased and installed for creating a functional MG, • C cap : initial investment for each asset k, • Y ( p, k): lifetime for each asset.

Maintenance Expenditure
• All costs that are needed to keep the system operational.
• Measured once the system is up and running.
• Annualized costs can be estimated as a constant value.

Reduction of GHG Emissions
• RES usage in MGs assists to the reduction on Greenhouse Gas Emissions that would be produced via burning fossil fuels in one year.
• The metric can be measured in shorter intervals (per month, day, hour) if needed.
∑ n t=1 E t,RES · CF CO 2 • E t,RES : t energy produced (kWh) from RES per year t, • CF CO 2 : coefficient that "translates" energy to kg (or tons) of CO 2 emissions. This coefficient depends on the energy mix (including efficiencies) of the electricity generation replaced by RES.

Reduction of Carbon Penalties
• Environmental taxation imposed upon organisations/ companies in order to raise awareness on their carbon footprints.
• Applicable in some countries only.
• EC directive: penalty for every ton of CO 2 emitted above a given threshold [34].
• The reduction in carbon penalties is calculated as the difference between penalties prior the deployment of any system and after is calculated.
• Em CO 2 ,excess,t : the kg or tonne excess of CO 2 above a given allowed threshold for a year t, • PNL c : cost in euros/CO 2 tonne, • PNLEX C : the current penalties that the operator of the MG has to pay for every ton of CO 2 , • PNLEX P : the previous penalties Operation RESPEN

Renewable Energy Penetration
• RES penetration is defined as the percentage of renewable energy produced compared to the total consumed energy over a given period (e.g., a year).
• It describes how "environmental friendly" a MG is.
• In grid connected operation it can exceed 100%.
• In islanded mode, it has a maximum of 100%.
• E t,RES,total : the total energy produced from RES, • E c,total : the total energy consumed from the MG for the same amount of time explored. • Applicable in some countries only.

Appendix C. Reliability KPIs
• EC directive: penalty for every ton of CO 2 emitted above a given threshold [34].
• The reduction in carbon penalties is calculated as the difference between penalties prior the deployment of any system and after is calculated.
SmM i n f lex Load t · 100% • SmM i n f lex: Number of integrated flexible load smart meters, • Load t : Number of total load assets. Design

Renewable Energy Sources Coverage
• the percentage of RES Peak Power to overall peak load. E t,RES,peak,i E c,peak,i · 100% • E t,RES,peak : the maximum generated power from renewable energy resources, • E c,peak : the maximum consumed power from the MG loads at given time i. • It gives the average outage duration that any given customer would experience.

Design
• It expresses also the average restoration time.
• Defined as the total duration of customer interruptions to total number of customer interruptions.
• For the presented scale-down to MG level, every MG load is considered as customer.

Frequency Range
• The range of frequency variation during daily operation gives a rough indication of the capacity of the MG generation pool needed to satisfy the load requirements and to cope with the random RES variations.
• This metric is most important during islanded operation (likewise to other frequency indicators), given the fact that in grid-connected mode the frequency is defined and must follow the grid.
f − f re f f re f · 100% • f : current frequency value measured, • f re f : nominal frequency value required. Operation

Frequency Standard Deviation
• Standard deviation quantifies the dispersion of frequency around the target value of 50/60 Hz.
• An important indication on the capacity of the system to balance the active power flow.
• A lower value of the standard deviation of the frequency indicates a stable and reliable MG. • It can also be used to assess voltage at the Point of Common Coupling (PCC) level.
• V avg : the current average voltage value measured, • E c,peak : V re f : the nominal voltage amplitude value required.

Voltage Standard Deviation
• The standard deviation quantifies the dispersion of voltage around the rated value.
• This value gives an important indication on the capacity of the system to balance the reactive power flow.
• A lower value of the standard deviation of voltage indicates a reliable MG. Operation Table A5. Cont.

Total Harmonic Distortion
• Generally speaking, the ideal situation for THD would be no THD at all. According to EN 50160 THD of the supply voltage (including all harmonics up to the order 40) shall be less than or equal to 8 % for 95 % of the time.
• For the purpose of assessing MGs, additionally, voltage and current THD during the considered period (e.g., one week) should be checked for evaluation. Operation EALC Excessive Asymmetric Load Charge • In any 3-phase system, it is important to quantify the effectiveness of balancing loads. In the majority of applications, the 3 phases are not charged in a balanced way, a phenomenon that causes displacement of the neutral conductor and this consequently poses a series of problems regarding protection and power quality.
• In order to assess the asymmetric load charge, a new KPI is defined based on the frequency that the displacement of the neutral exceeds the limits set by the international standards.
|Vavg−VphA|+|Vavg−VphB|+|Vavg−VphC| 2·Vavg · 100 • V phA , V phB , V phC : the voltages in each phase of the 3 phase system, • V avg : the average voltage value of the 3 phases. • E c,baseline : the consumed energy prior to the application of the new control scheme, • E c : the current MG energy demand,

Appendix F. Efficiency KPIs
• CF: a correction factor for different environmental and operation conditions.

Reduction in Peak Demand
• The peak demand can also refer to a comparison for the previous year but also can be compared to peak reductions for the same year.
• Reduction in peak demand can be demonstrated through simulation by enabling the regulated power exchange mode in order to minimize the peak demand.
• KPI firstly defined in Reference [35]. • D h,max : the maximum value of the hourly demand, • D h,avg : the average value of the hourly demand for a 24 h timeframe.

Divergence from Optimal Tertiary Operation
• How many times the operation of the MG was not the optimal one during a year, meaning that in order to satisfy the load demand, extra effort from the ESS and/ or diesel generators had to be invoked (or extra energy had to be imported from the Grid when prices are high or grid is unreliable).
• This KPI will underline the amount of times the MG operation was diverging from the day ahead schedule's suggestions.