Resilience Quantiﬁcation of Smart Distribution Networks—A Bird’s Eye View Perspective

: The introduction of pervasive telecommunication devices, in the scope of smart grids (SGs), has accentuated interest in the distribution network, which integrates a huge portion of new grid applications. High impact low probability (HILP) events, such as natural hazards, manmade errors, and cyber-attacks, as well as the inherent fragility of the distribution grid have propelled the development of effective resilience tools and methods for the power distribution network (PDN) to avoid catastrophic infrastructural and economical losses. Multiple resilience evaluation frameworks are proposed in the literature in order to assist distribution system operators (DSOs) in managing their networks when faced with exogenous threats. We conduct detailed analysis of existing quantitative resilience studies in both electric and telecommunication domains of a PDN, focusing on event type, metrics, temporal phases, uncertainty, and critical load. Our work adopts the standpoint of a DSO, whose target is to identify feasible resilience assessment frameworks, which apply to pre-deﬁned requirements in terms of resilience evaluation objectives (planning, reactive response, or simple assessment), time of evaluation, and available enhancement strategies. Finally, results and observations on selected works are presented, followed by discussion of identiﬁed challenges and opportunities.


Introduction
Current information and communication technologies (ICTs) have achieved a high degree of penetration in all critical infrastructure (CI) systems, owing to the ever-increasing capabilities of their services in terms of coverage, throughput capacity, latency, scalability, and privacy [1][2][3][4]. In power systems, the massive introduction of telecommunication devices accelerated the shift toward smart grids (SGs) [5] that come with a whole new package of functionalities such as automated control, smart sensing and metering, highpower converters, and modern energy management techniques based on the optimization of demand, energy, and network availability [6]. The high-performance smart grid allows thereby for the insertion of new applications in the network like distributed generation, Industrial Internet of Things (IIOT), and electrical vehicles [7]. This comes, however at the expense of increased complexity, which brings new vulnerabilities and broadens the attack surface [8]. Recent extreme events of natural disasters, cyber-attacks, and man-made errors which we refer to as HILP events, have shown that SGs are susceptible to strong disruptions given the large-scale networks they represent, and the attendant interdependencies [9]. Some recent examples are the power disruptions in the US in 2017, caused by hurricanes and wildfires [10], which caused a cumulative damage of $306.2 billion, affecting a total of discussed to show their high importance, and available tools to date for their involvement in the study.
We extend by this work the wide spectrum of subjects associated with resilience quantification in power networks (modeling and simulation, enhancement strategies, metrics, and extreme events), covered in recent reviews [28,36,44,45,[63][64][65]. The main contributions and novelty of this paper can be summarized as follows: (a) focus on resilience assessment of both electric and telecommunications domains of smart power distribution networks. (b) Detailed analysis and classification of performance calculation techniques. (c) Finegrained categorization of quantitative resilience works based on time of evaluation and target objective.
Finally, despite the considerable number of works analyzed and relatively deep examination of reviewed methods for resilience quantification in smart PDNs, this paper does not claim to be comprehensive in the issues addressed (and related references), but remains complete enough to give a good overall perspective of the research trends and understanding of challenges and opportunities.
This paper is organized into five sections. Section 1 is the introduction. Section 2 introduces the link between resilience and both reliability and Quality of Service (QoS). Section 3 expands on the taxonomy of resilience evaluation methods and proposes a classification of associated models. Section 4 treats the relationship between the objective of resilience study and time of evaluation. Section 5 presents reviewed papers with all pertaining characteristics, observations, and discussions. Concluding notes are given in Section 6.

Resilience in Smart Grids
Amid desired functionalities for smart grids lays the need for capabilities like: selfhealing, high reliability, power quality, and resistance against various disasters and attacks [50]. Resilience represents a promising approach to meet such requirements, by being able to address network circumstances not handled by widely adopted principles of reliability and quality of service.

From Reliability to Resilience
Reliability is the ability of an item (component or system) to operate under designated operating conditions for a designated period of time or number of cycles, where this ability can be formulated through a probability [66]. In electrical networks, this is equivalent to maintaining the delivery of electric services to customers in the face of routine uncertainty under operating conditions [67]. Metrics like Energy Not Supplied (ENS), Average Customer Curtailment Index (ACCI), System Average Interruption Duration Index (SAIDI), System Average Interruption Frequency Index (SAIFI), Customer Average Interruption Duration Index (CAIDI) are widely used to describe PDN reliability [68,69]. System operators use such indicators to track and enhance the performance of their networks. These indices are further used by system regulators and system operators in service level agreements (SLAs), in order to define penalty thresholds and ensure that the right compensation is paid based on the experienced outages. Reliability metrics are relevant to assess the impact of recurrent events with available historical records, over which maintenance actions are applicable; excluding major hazards such as severe weather events [70]. Some of these metrics were extended to capture more severe events, where metrics like STorm Average Interruption Frequency Index (STAIFI) and STorm Average Interruption Duration Index (STAIDI) are proposed [71]. However, a demonstration was made that these two metrics are not relevant for resilience evaluation because, when used during a storm, they show large deviation that can be even greater than the values of STAIDI and STAIFI [62].
Resilience is "the ability to prepare and plan for, absorb, recover from, or more successfully adapt to actual or potential adverse events" [72]. Unlike reliability, which focuses on the frequency and duration of failures "event-agnostically," resilience seeks to further track the dynamics and resources of response, adaptability, and ability to restore.
Energies 2021, 14, 2888 5 of 29 This is relevant to HILP hazards where consequences in the system need to be studied with respect to specific events, as each disruption has its distinguishing characteristics [73]. Thus, the fundamental difference resides in the scale, scope, and duration of events handled: resilience targets events with strong impact in a wide geographical area with long duration of outages, while reliability handles local impact in short duration outages [74].
Despite this difference between the two concepts, mainly due to events each of them tackles, they remain closely related because enhancing resilience or reliability may require the same strategies, with resilience being more general, confirming that being resilient typically encompasses being reliable, but not vice versa [75].

Resilience and QoS in ICT Networks
ICT networks traditionally rely on QoS metrics to define SLAs [76]. These metrics consisting of delay, jitter, bandwidth, packet loss, bit error rate, and traffic load are performance measures that do not give a comprehensive view of network state. Therefore, other complementary metrics are adopted in SLAs in order to better quantify the system state, namely availability metrics.
In the initial introduction of Quality of Resilience (QoR) in [77], QoS is divided into short-term quality parameters referred to by availability, and long-term quality parameters grouped under QoR. In other words, resilience is considered as an aspect of QoS, as latency or packet loss. However, QoR is presented in [78] as a concept-treating quality at different levels of the Open Systems Interconnection model (OSI-model), including the network level which corresponds to traditional QoS. Figure 1 shows how QoR extends QoS to include other types of quality: Quality of Experience (QoE), Quality of Delivery (QoD), and Quality of Protection (QoP). This is done by considering the additional metrics from each level. QoR is used as a transverse evaluation for all aforementioned qualities. This is done by considering the metrics that describe different resilience stages. From a high-level perspective, we can say that QoR is a shift from client-centric evaluation, conducted using QoS, toward a more general framework that includes the system potential in terms of resources, organizational processes, and humans. This is relevant to HILP hazards where consequences in the system need to be with respect to specific events, as each disruption has its distinguishing charac [73]. Thus, the fundamental difference resides in the scale, scope, and duration o handled: resilience targets events with strong impact in a wide geographical a long duration of outages, while reliability handles local impact in short duration [74].
Despite this difference between the two concepts, mainly due to events each tackles, they remain closely related because enhancing resilience or reliability quire the same strategies, with resilience being more general, confirming that b silient typically encompasses being reliable, but not vice versa [75].

Resilience and QoS in ICT Networks
ICT networks traditionally rely on QoS metrics to define SLAs [76]. These consisting of delay, jitter, bandwidth, packet loss, bit error rate, and traffic load formance measures that do not give a comprehensive view of network state. Th other complementary metrics are adopted in SLAs in order to better quantify the state, namely availability metrics.
In the initial introduction of Quality of Resilience (QoR) in [77], QoS is divi short-term quality parameters referred to by availability, and long-term quality ters grouped under QoR. In other words, resilience is considered as an aspect of latency or packet loss. However, QoR is presented in [78] as a concept-treating q different levels of the Open Systems Interconnection model (OSI-model), inclu network level which corresponds to traditional QoS. Figure 1 shows how QoR QoS to include other types of quality: Quality of Experience (QoE), Quality of D (QoD), and Quality of Protection (QoP). This is done by considering the additio rics from each level. QoR is used as a transverse evaluation for all aforementione ties. This is done by considering the metrics that describe different resilience stag a high-level perspective, we can say that QoR is a shift from client-centric eva conducted using QoS, toward a more general framework that includes the sys tential in terms of resources, organizational processes, and humans. Once again, in both cases above, the need for resilience stems from harsh lar events imposing consideration of stress in the system and recovery strategies. T spite the slight divergence in terminology that is still the case today, the two con beyond the traditional QoS evaluation, to capture both requirements of custom enhancement strategies of operators. Nevertheless, the idea that resilience takes i gaining more attention [75], suggesting that a system cannot be resilient if it doe Once again, in both cases above, the need for resilience stems from harsh large-scale events imposing consideration of stress in the system and recovery strategies. Then, despite the slight divergence in terminology that is still the case today, the two concepts go beyond the traditional QoS evaluation, to capture both requirements of customers and enhancement strategies of operators. Nevertheless, the idea that resilience takes in QoS is gaining more attention [75], suggesting that a system cannot be resilient if it does not offer acceptable QoS, but providing acceptable QoS is not the only requirement for a network to be resilient.

Taxonomy of Resilience Evaluation Methods
The panoply of methods proposed for qualitative evaluation of resilience in electric power networks [20][21][22][23][24][25] is not enough to convince critical infrastructure operators in general, and utilities in particular, to adopt the resilience-based design. They are unable to systematically discover hidden vulnerabilities and critical elements [79]. To overcome this, stakeholders need to have a closer, more tangible grasp of resilience, using quantitative analyses which gained huge momentum in recent years. Most of these analyses are performance-based, where performance is defined in various ways in order to fit different participants and study objectives [80]. The fact that almost all works selected in this paper happen to belong to this high-level method of quantification comes to stress the consensus in progress toward the adaptation of this method as a tool for resilience quantification.
In Figure 2, we propose four aspects based on which the state-of-the-art papers on resilience metrics for smart power distribution network could be classified, evaluated, and compared. Some of these aspects will be further elaborated in the later sections. For instance, in Section V we classify the papers based on extreme event handled, performance calculation method, and both type and computational method of resilience metrics. Each of these four aspects is explained in detail below.
Energies 2021, 14, x FOR PEER REVIEW 6 of 30 fer acceptable QoS, but providing acceptable QoS is not the only requirement for a network to be resilient.

Taxonomy of Resilience Evaluation Methods
The panoply of methods proposed for qualitative evaluation of resilience in electric power networks [20][21][22][23][24][25] is not enough to convince critical infrastructure operators in general, and utilities in particular, to adopt the resilience-based design. They are unable to systematically discover hidden vulnerabilities and critical elements [79]. To overcome this, stakeholders need to have a closer, more tangible grasp of resilience, using quantitative analyses which gained huge momentum in recent years. Most of these analyses are performance-based, where performance is defined in various ways in order to fit different participants and study objectives [80]. The fact that almost all works selected in this paper happen to belong to this high-level method of quantification comes to stress the consensus in progress toward the adaptation of this method as a tool for resilience quantification.
In Figure 2, we propose four aspects based on which the state-of-the-art papers on resilience metrics for smart power distribution network could be classified, evaluated, and compared. Some of these aspects will be further elaborated in the later sections. For instance, in Section V we classify the papers based on extreme event handled, performance calculation method, and both type and computational method of resilience metrics. Each of these four aspects is explained in detail below.

Extreme Event
Given that resilience takes all its meaning when a high-impact hazard occurs [72], it is paramount to classify the works on resilience based on the extreme event(s) they target.

Single Event
Generally, resilience evaluation frameworks are by definition designed to cope with a single (type of) event (like a natural hazard, a cyber-attack, or a physical manmade attack) [67]. Disruptions studied are strong, have large geographic extents, and cause high impacts on the network, that no sequence of events is considered. However, a single event is considered capable to strike at different points in the network simultaneously.

Wide-Range of Events
There are attempts to address multiple events, in order to make developed methods more attractive to use by network operators as they sweep a wide spectrum of failure scenarios [81][82][83]. However, addressing multiple types of hazards is challenging, partially due to the various nature and properties of the hazards. It is often very hard to use a single modeling framework for different hazards (e.g., natural hazards vs. cyber-attacks).

Extreme Event
Given that resilience takes all its meaning when a high-impact hazard occurs [72], it is paramount to classify the works on resilience based on the extreme event(s) they target.

Single Event
Generally, resilience evaluation frameworks are by definition designed to cope with a single (type of) event (like a natural hazard, a cyber-attack, or a physical manmade attack) [67]. Disruptions studied are strong, have large geographic extents, and cause high impacts on the network, that no sequence of events is considered. However, a single event is considered capable to strike at different points in the network simultaneously.

Wide-Range of Events
There are attempts to address multiple events, in order to make developed methods more attractive to use by network operators as they sweep a wide spectrum of failure scenarios [81][82][83]. However, addressing multiple types of hazards is challenging, partially due to the various nature and properties of the hazards. It is often very hard to use a single modeling framework for different hazards (e.g., natural hazards vs. cyber-attacks). Also, the inherent trade-offs between resilience strategies make multi-event studies more challenging, as some enhancement operations can be profitable for a set of events but not for others [84]. Therefore, choosing the set of contingencies to be handled jointly turns out to be challenging and careful attention needs to be allotted.

Generic Event
The focus of some studies is limited to metric design, then authors prefer to render generic the choice of failure that hit the network by directly observing the impact [85,86]. In that case, the system model when considered, no longer needs to cover contingency and component fragility. Indeed, this is a straightforward way to skip the difficulties inherent to disaster impact modeling, but it does leave the designer with a large set of possible scenarios from which a selection of the most relevant ones is not easily made. A well-defined event helps to narrow down the number of possible system failure modes.

Performance Calculation
Performance, or Figure of Merit (FoM) [87], is a quantity that describes how good the system is at providing services, system operation cost-effectiveness, and the behavior of the system when confronted with internal or external stress. These issues are addressed with different indicators, each of which is relevant to system operator objectives, and can be adopted as a performance measure [35,41,78,88].
Evaluating performance is a key element toward the end goal of resilience quantification. This performance information necessary for resilience metrics computation is not readily available, and designers resort to modeling in order to calculate performance measures. We classify works based on the modeling method that permits obtaining performance indicators. Mention was given earlier to the dominance of performance-based studies in the field of resilience quantification. Even rare works, which consider other aspects of quantification as main enablers [89], resort to the use of performance within their frameworks.
Modeling methods adopted by the scientific community to evaluate performance are described below.

System Model Method
The study of power distribution or telecommunication networks requires, as with other critical infrastructure, modeling the system with all its internal and external characteristics [79,90]. Two broad families of modeling are usually embraced for performance evaluation: analytical models, and simulation-based models [61]. Analytical models rely on mathematical concepts like graph theory, percolation theory, worst-case analysis, Markov chains (or processes), and statistics [91][92][93][94] to represent the structure and behavior in any network and interactions therein. Theoretical analyzes can also be used for the threat, fragility, and recovery characterization process. Then, rigorous formulation is conducted using multiple mathematical tools.
Simulation-based models basically have the same objective of system representation analytical models, but with the intent to have less abstraction and more fidelity to real networks. To do so, simulators are developed [95][96][97], based on the analytical approaches, however with many practical considerations which are usually too complex and not tractable by mathematical formulation. Thus, it is quite common to use simulation-based models as a validation method for the solution obtained by analytical analysis [79].
A deeper look at the modeling techniques explained above shows that both are comprised of four distinguishable sub-models [35]. Note that this further granularity allows, in some cases, hybrid analytical-simulation models, as each sub-model is constructed analytically or by simulation, independently from the others. These sub-models are:

•
Contingency model: describes hazard profile, which is expressed in terms of characterizing parameters. An example would be to have a statistical profile that gives the probability distribution of wind intensities [41] or meteorological data to calculate the amount of ice accreted on conductors and overhead lines during an ice disaster [91]. Another widely considered example is cyber (or cyber/physical) attack scenarios [98,99]. In some cases, there is deep uncertainty about the threat, then worstcase analysis [100,101] and less conservative approaches like robust optimization [43] are the most suitable to model such events. • Component fragility model: represents the sensitivity of system components to a threat. This goes hand in hand with the contingency models, as fragility curves or other ways of representation are developed with respect to event profiles [41,91]. • Restoration model: complements previous contingency and fragility models in order to yield threat impact quantification [102]. Focus is in recovery times which can be estimated using mathematical programming, fuzzy logic, statistical methods, specialist expertise, random distributions, or even heuristic approaches in some cases [28,103]. • Network functional model: functional models in use range in complexity from pure topological approaches to physics-based models of AC power flows [104]. They describe system infrastructure, topology, services, and all related dynamic interactions. This is present in all system models and constitutes their core element, because it replicates the structure and all functions found in real networks as much as possible.

Empirical Model Method
Post-recovery surveys are conducted by network operators, government agencies, and market regulators to assess the impact of extreme events in the system and efficiency of implemented enhancement strategies, saving results as historical records [14]. Collected field data are so informative that it can be used to construct models by which performance is calculated [31,107,108]. Note that other sources of information for such models are network management systems, like outage management system (OMS), distribution management system (DMS) in electric network, core network in telecommunications, as well as expert judgements [94]. This kind of models serve as baseline for previous analytical and simulation-based representations [61].

Surrogate Model Method
A relatively new approach to performance evaluation in smart grids is the introduction of surrogate models, borrowed from the evolutionary computation community [109]. Surrogate models aim to reduce runtime and complexity of analytical and simulation-based models while maintaining a high degree of fidelity. The idea is to bypass conventional system modeling (where the name "surrogate" or "meta-model") using techniques such as neural networks [110,111], and kriging methods [112]. A simple example is a machine learning (ML) agent taking as input system topology parameters, hazard characteristics, area climate, and topography; and outputs performance measures. The system model is replaced by an implicit non-linear multi-variate function implemented by the ML agent. The biggest challenge is to choose the right inputs (predictors). A Polynomial Chaos Expansion-based method is proposed in [113] to conduct risk analysis for rare events, which is projected by the authors to have an extension to resilience assessment.

Resilience Metric Computational Method
Once the performance is calculated using one of the methods described before, it will be used to compute resilience metrics. The goal is to provide the decision-maker with resilience information in the most instructive way.

Service and Assets Performance Only
Resilience computation is solely based on performance measures obtained from the operational services and infrastructural assets of the network. Metrics can be calculated from a curve describing the evolution of performance with time [41], using a justified empirical formula [114], following an analytical derivation, or taken directly as the consequences observed from the event [80].

Multi-Criteria
This method combines various parameters (such as service performance, topology, topography, and event characteristics) to output resilience metrics. Different analytical tools are used to aggregate all these parameters into final metrics [67,92,115].

Graph Theory Algorithms
Resilience computation is uniquely based on performance measures obtained from network topology and calculated using graph theory algorithms [116].

Resilience Metric Type
There are many possible angles to categorize and classify metrics based on their types [45,63]. The choice is made in our classification to select simple categories, which link intuitively to metric computational methods presented above.

Operational Metrics
Metrics that use performance as described in terms of functional service (electric, telecoms) and associated monetary costs. Expected lost load [24], supplied energy [117], and recovery duration [14] are examples of performance measures used by this type of resilience metrics.

Infrastructural Metrics
Metrics that use performance as described in terms of network infrastructure (electric, telecoms) and associated monetary costs. The number of affected components [41,101] (and associated costs) is an example of a performance measure used by this type of resilience metrics.

Topological Metrics
Metrics that use performance as described in terms of network topology and static connections between different elements such as measures of connectivity, betweenness, and redundancy [116].

Resilience Quantification Objectives
Four broad classes of resilience metrics are generally adopted: (i) average performance metrics, (ii) integrated multi-phase metrics, (iii) time-dependent metrics, and (iv) probability-based metrics [118]. In the case of a HILP event, probability distributions are often not available, whereas the other three classes depend on the measure of performance in the network. Thus, a reasonable statement is that an ideal evaluation of resilience may consist of a complete tracking of the time-dependent performance function P(t). This way, network operators can have the value of performance at any instant for the complete event duration. However, despite the apparent dependence of P(t) in time, performance function does not necessarily change with time if it is not for the extreme event which hits the system. In other words, performance function depends on many parameters including hazard intensity, system preparedness, resilience strategies in hand, and priority decisions made, all of which cause network state to change. This sends back the problem of resilience multi-dimensionality, which makes developing closed form derivation for resilience function challenging and hitherto out of reach. Performance-based methods try to include all previously mentioned parameters and additional ones into a temporal curve describing the performance evolution of the network. It can be said that many resilience features are embedded in a performance curve as shown in Figure 3, because the construction of such a graph takes into consideration all factors intervening during a catastrophic contingency. back the problem of resilience multi-dimensionality, which makes developing closed form derivation for resilience function challenging and hitherto out of reach. Performance-based methods try to include all previously mentioned parameters and additional ones into a temporal curve describing the performance evolution of the network. It can be said that many resilience features are embedded in a performance curve as shown in Figure 3, because the construction of such a graph takes into consideration all factors intervening during a catastrophic contingency. A salient advantage of such an approach is to have the temporal follow-up of network state which allows decision-makers to be in a best-informed posture. Four main phases can be distinguished, among which some can be further detailed into sub-phases: • Anticipation phase (phase I): Represents the time period before the event occurrence, when performance is at its nominal level. Monitoring information, impact projections, and historical data when available are used for prediction studies, and all possible defensive measures are implemented. This serves particularly in the case of multi-hazard management where risks and vulnerabilities to each event are investigated. For single hazard resilience analysis which is the most relevant in the case of HILP event, this phase is not considered and a post-event resilience study is adopted. However, this also refers to the period of normal operation where reliability and risk management for recurrent failures can be conducted, which participates in system resilience, because a resilient system needs to be first as reliable and low-risk as possible. In addition, security measures for protecting the system and preparing it to withstand malicious behaviors are implemented at this stage [96].

•
Mitigation phase (phase II): Once an extreme event hits the network, reliance is on system robustness, reactivity, and absorption to minimize the effect on services and infrastructure. Adding to some preparation policies that could be anticipated, many dynamic actions can be implemented to reduce the aftermath, like distribution automation actions, load shedding, and monitoring actions in power distribution networks or customer prioritization in telecom networks. These actions can withstand performance degradation that is in place, or serve to coordinate between entities in order to achieve an accurate assessment of consequences and prepare next crisis management steps.

•
Recovery phase (phase III): Unlike short-timed low impact incidents where maintenance actions are achieved relatively fast, in major events, recovery actions can require anywhere between several weeks to months [119]. The main reason is that, given the safety of emergency crews and logistic constraints, restoration is conducted carefully and waits for the reduction in hazard intensity, or more gener- A salient advantage of such an approach is to have the temporal follow-up of network state which allows decision-makers to be in a best-informed posture. Four main phases can be distinguished, among which some can be further detailed into sub-phases:

•
Anticipation phase (phase I): Represents the time period before the event occurrence, when performance is at its nominal level. Monitoring information, impact projections, and historical data when available are used for prediction studies, and all possible defensive measures are implemented. This serves particularly in the case of multihazard management where risks and vulnerabilities to each event are investigated. For single hazard resilience analysis which is the most relevant in the case of HILP event, this phase is not considered and a post-event resilience study is adopted. However, this also refers to the period of normal operation where reliability and risk management for recurrent failures can be conducted, which participates in system resilience, because a resilient system needs to be first as reliable and low-risk as possible. In addition, security measures for protecting the system and preparing it to withstand malicious behaviors are implemented at this stage [96]. • Mitigation phase (phase II): Once an extreme event hits the network, reliance is on system robustness, reactivity, and absorption to minimize the effect on services and infrastructure. Adding to some preparation policies that could be anticipated, many dynamic actions can be implemented to reduce the aftermath, like distribution automation actions, load shedding, and monitoring actions in power distribution networks or customer prioritization in telecom networks. These actions can withstand performance degradation that is in place, or serve to coordinate between entities in order to achieve an accurate assessment of consequences and prepare next crisis management steps. • Recovery phase (phase III): Unlike short-timed low impact incidents where maintenance actions are achieved relatively fast, in major events, recovery actions can require anywhere between several weeks to months [119]. The main reason is that, given the safety of emergency crews and logistic constraints, restoration is conducted carefully and waits for the reduction in hazard intensity, or more generally identification of restoration windows. Priority is first given to service restoration where all alternative (even temporary) ways to provide services are explored and deployed allowing to regain an intermediate level of performance. Complete recovery will take more time and effort as it involves mostly infrastructure catering which turns out to be very challenging.

•
Learning phase (phase IV): This phase is less considered than the two previous phases in quantitative resilience frameworks, generally with the argument that resilience is best examined in face of exogenous threats [120]. The post-recovery phase should still be looked at closely in order to draw conclusions about damages experienced by the network and how various implemented policies helped to alleviate consequences. Data collection through field surveys and supervisory management tools enable improvement in system performance and enhancement in preparation for upcoming extreme events backing the vision for a sustainable network.
Many works [13,27,45,63] explore each of these phases with slightly different denominations. Here a generalizing description is adopted where the four above-mentioned phases are considered, with mitigation and recovery divided each into two sub-phases in order to better explain all involved mechanisms. Resilience quantitative frameworks can be assessed based on phases they handle [64]. The more phases taken into consideration, the better the insight into system operation during extreme events. Furthermore, the layout can be used to seek answers for the following questions: When is resilience evaluation conducted and for which reason? Figure 4 distinguishes time instants at which resilience quantification can be conducted, and objectives of this evaluation. The former here orients/guides/steers the latter, because for example, an operator who aims to plan investments for his network will most likely opt for pre-event evaluation, while another who only wants to see the impact induced by a contingency in his network may adopt post-recovery damage evaluation. Knowing "why" resilience is to be evaluated serves as a guideline to choose "when" it should be done. Without loss of generality, resilience evaluation can be induced from the performance curve in Figure 3; so it is important to know when system operators can get such a representation. Three options are available:

•
Proactive evaluation: The procedure in this case is to drive pre-event studies with the goal of obtaining resilience indicators before contingency happens. The outbuilding is in prediction data, recommendations of experts, supervision alerts, and historical records. However, for HILP anomalies, little information is available, then designing preventive measures appeals for simulation tools, emulation, and analytical models which help to make projections for the impact that will be borne by the network in face of uncertain events.
Once metrics are computed, they can be used to make informed decisions about resilience strategies to implement in order to minimize the impact and speed up recovery. In other words, the output of this phase is planning schemes which enhance robustness, survivability, restoration, and recovery of the system that can be summarized in the concept of resilience. The prominent advantage of a proactive evaluation is the ability to lookforward that allows foreseeing what is coming. On the other hand, the large number of possible contingency scenarios and little relevant data cause low-confidence results.

•
Reactive evaluation: Quantification is carried out as the event happens, meaning that resilience metrics are computed on-the-fly, and policies adopted to cope with severe hazards are taken from the inherent reaction capacity of the system without support from pre-event recommendations. Metrics are calculated as the event goes for the two broad phases of robustness and recovery. In such real-time setup, information that can be gathered is realistic and narrows down failure modes space. However, the flexibility margin can be very tight because the HILP event hits the network by surprise while no anticipative actions are in place. There are no good or bad choices between proactive and reactive evaluation, they are both suitable for resilience analysis and can be complementary. The goal is to find a balanced fit for a given use case [121]. • Deductive evaluation: When resilience metrics are computed at the end of a HILP disturbance, they mainly serve to draw conclusions about how the system handled an external event [81,107,108]. Results of this are intended to point out axes of improvement for future reference in similar extreme situations, and can also be considered as performance evolvement baseline. Further, the output of such post-recovery evaluation can be fed to the pre-event phase for hazards in the future, closing a kind of a cycle with the evaluations presented above.
Proactive approaches are dominant in resilience engineering, especially when considering the fact that in some cases the reactive approach is subsumed therein. The combination of the two is simply referred to as proactive approaches. provement for future reference in similar extreme situations, and can also be considered as performance evolvement baseline. Further, the output of such post-recovery evaluation can be fed to the pre-event phase for hazards in the future, closing a kind of a cycle with the evaluations presented above.
Proactive approaches are dominant in resilience engineering, especially when considering the fact that in some cases the reactive approach is subsumed therein. The combination of the two is simply referred to as proactive approaches.

Literature Review
The present work, on state-of-the-art resilience quantification of smart grids at the distribution level, is conducted with three main objectives: • Understanding architectures and models involved in resilience quantification methodologies; • Identifying all considered objectives behind resilience quantification; • Explaining implementation specifics that directly relate to the practical application of the proposed methods.
The selection process of reviewed papers is briefly introduced in the following section, then a detailed discussion and results are presented.

Paper Selection Process
With the aim of being as comprehensive as possible, a wide swipe of various digital libraries was carried out: IEEE Xplore, Science Direct, Scopus, Elsevier, Google Scholar. The review is limited to the last six years (2015 to 2020 included), and search expressions comprised various combinations of specific words: resilience, quantification, evaluation, assessment, metrics, indicators, measures, smart grid, distribution network, ICT network, (tele)communication network.
A first selection step consisted of reviewing abstracts of all found papers (in the order of several hundreds), and shortlisting works which: • Analyze the power network at the distribution level, or the ICT network of power network, and; • Present quantitative analysis of resilience, with the proposed metrics.
This resulted in a total of 34 pre-selected papers, 10 of which were excluded from this survey as they were recognized after deep analysis to not entirely satisfy the two

Literature Review
The present work, on state-of-the-art resilience quantification of smart grids at the distribution level, is conducted with three main objectives:

•
Understanding architectures and models involved in resilience quantification methodologies; • Identifying all considered objectives behind resilience quantification; • Explaining implementation specifics that directly relate to the practical application of the proposed methods.
The selection process of reviewed papers is briefly introduced in the following section, then a detailed discussion and results are presented.

Paper Selection Process
With the aim of being as comprehensive as possible, a wide swipe of various digital libraries was carried out: IEEE Xplore, Science Direct, Scopus, Elsevier, Google Scholar. The review is limited to the last six years (2015 to 2020 included), and search expressions comprised various combinations of specific words: resilience, quantification, evaluation, assessment, metrics, indicators, measures, smart grid, distribution network, ICT network, (tele)communication network.
A first selection step consisted of reviewing abstracts of all found papers (in the order of several hundreds), and shortlisting works which: • Analyze the power network at the distribution level, or the ICT network of power network, and; • Present quantitative analysis of resilience, with the proposed metrics.

Power Distribution Network
The set of 18 papers that analyze resilience quantification from the perspective of PDN electrical service is summarized in Table 1. In addition to the provided implementation details, the references are assessed based on extreme events and methods adopted for performance calculation. Metrics type and computational method are not shown in Table 1 for convenience considerations.

Performance Calculation
Performance evaluation under disruptions is the milestone of resilience assessment, where system modeling-based approaches prevail. Still, in [130] and [134], field data are used to calculate the resilience of recent natural disasters like: 2010 earthquake and tsunami in Chile, 2011 earthquake and tsunami in Japan, 2011 earthquake in New Zealand, and hurricanes: Isaac (2012), Sandy (2012), and Ike (2008). This fits post-recovery evaluation given the availability of the information a posteriori [130]. This is also a useful experience for upcoming events when included in a proactive analysis for response and restoration [134]. An alternative to system physical and operational modeling is exposed in [129], where a machine-learning-based agent is leveraged to compute the number of outages, the outage duration, and the number of unserved customers; from clusters of focal variables used to estimate a multivariate resilience manifold.
Other than these options, the reviewed literature stipulates using system modeling due to a lack of data in the case of HILP extreme events. One can recall all aspects of the model: contingency, fragility, restoration, and functionality; which are achieved in different ways. Works in [127,128,136,137,139] suggest using simulation-based frameworks to implement the quantification procedure, while [123,124,131,135,138] opt for complete analytical formulation. A good compromise is found in [122,125,126,132,133] with a hybrid analytical-simulation modeling, for example [133] where the functional model is experimental, and remaining contingency, fragility, and restoration models are posed as optimization problems.
In the case of generic events, the model omits handling contingency and fragility, because direct impact scenarios are applied in the study; except for [137], which needs Matlab graph analysis libraries to compute quantities that contribute to the failure scenarios selection.

Extreme Event and Time of Evaluation
A closer inspection of Table 1 shows that in some cases the restoration model is not specified, and the explanation is given in the electric service portion in Table 2 (orange background). These works do not target recovery and restoration capabilities of the distribution network, as they proactively plan for survivability [123,132], react to an event uniquely by resilience assessment [139] and a damage minimization response [138], or even drive a post-recovery study like in [127]. This illustrates, as discussed in Section 4, how the objective of resilience quantification instructs the choice of system model. It goes without saying that planning and response cells in Table 2 include resilience assessment as a first step and enrich it by further use of the obtained metrics. By steering interest toward when resilience metrics are obtained, the concentration of resilience quantification in the pre-event phase can be pointed out, corroborating the preventive nature of such studies and their contribution to planning for unseen events. However, real-time evaluation gained some interest [133,[136][137][138][139] and offers valuable information used on-the-fly to monitor and enhance the distribution network resilience. Next, after recovery from a HILP hazard, works in [127] and [130] survey the network for lessons, with [127] offering more learning opportunities as empirical advanced experiments are done for moderate and heavy damage scenarios.
Natural hazards catch most of the attention in present PDN resilience research due to various recent catastrophic events which raised awareness among the government agencies, regulators, and network operators about the damage that a distribution system may incur. Generally, a resilience study handles a single event, which makes the setting dependent on considered specific characteristics. Table 1 shows that some resilience frameworks are designed for a wider scope so as to tackle a set of these natural events [130][131][132]136]. This renders anomaly modeling challenging, albeit feasible through a knapsack problem [131] or extended N-k network interdiction model [132]. Even so, the model should be readjusted whenever applied to a specific contingency. To handle multiple events simultaneously, [136] derives a code-based metric by computing network resilience several times for all possible natural hazards. Even though the approach is based on an empirical formula and more work should be done to justify the choice, it is an easy-to-understand measure and introduces an interesting concept of the "service potential" of the network.
With the exception of [133], cyber or cyber-physical (CP) attacks are put aside in this portion of the literature despite increasing damage induced even in physical electrical infrastructure, but this apparent neglect remains understandable due to the focus of this section on electric service.

Uncertainty
Uncertainties in HILP events, intermittent power generation (with DER), load, and energy markets are a major concern for resilience assessment [52][53][54]. In [123], the spatiotemporal uncertainty of a harsh weather event and wind turbine generation is managed through a probabilistic approach. Authors in [134] assume a probability distribution for uncertain parameters in their resource allocation optimization problem (event parameters and resource allocation effectiveness parameters), by modifying the objective to the expected value of resilience. Likewise, in [135] a stochastic scenario-based optimization is adopted to cope with event uncertainties. However, for deep uncertain events, little to no data are available, turning interest toward robust optimization in both [132] for multi-stage and multi-zone natural hazard, and [138] for load and renewable generation. Also, simulation tools in [128,139] take into consideration the uncertainties in HILP events and intermittent power sources, respectively. Uncertainty is sometimes handled implicitly as it is inherent to HILP events without clear and well-defined formulation, like in [125].

Critical Load
An essential distinguishing feature of resilience is the ability to establish a differentiation between loads. For instance, in electrical networks, groups of customers are prioritized during emergencies, and will be spared from load shedding strategies due to their relative importance compared to other loads. Analyses in [128,135] assign weights to loads based on the priority they have during the load-shedding procedure or the restoration phase in case of a strong event which affects even critical nodes. Resilience evaluation is however done on impact over the entire network. Works such as [124,127,131] take it a step further by evaluating the resilience metrics for the whole system on the one hand, then on the other hand only for critical loads, giving a deeper insight into the network dynamics during the event. Finally, frameworks in [126,[136][137][138] focus mainly on the critical load, as priority rankings are considered during curtailment and recovery stages, and resilience metrics quantify the impact in critical units.

Metrics Computation
As said before, performance assessment is an enabler for resilience quantification. Performance can include network topological characteristics and human factors, but it is mostly associated with service operational aspects defined in various ways: number of disconnected users [122,127,129,134], probability of lines failure [123], power from the main grid [123], power from distribution generation [123], supplied/connected load [124,126,128,130,131,135,136] (or equivalently load shedding [122,123,126,127,132,133]), critical supplied load [124,131], total customer-hours of outages [127], total customer energy not served [122,125,127], outage duration [129,130,134], number of outages [129], loss of voltage and frequency regulation [133], load control and islanding [133], probability of source availability and penalty [137], total forecasted load [138], and current flow [139].
A straightforward approach suggests considering displayed performance indicators as resilience metrics [122,123,[127][128][129]132] or proposes a justified empirical formula [136] that concocts performance into resilience. The dominant technique is to build a representation of performance (e.g., time curve) and use it to extract indicators, as in [122] where an index of resilience is proposed by tracking the number of LV customers not served. This results in a time-dependent index which can be used in different phases illustrated in Figure 3. With the same dynamic, [138] introduces an index calculated periodically as the ratio between the level of priority (or critical) load and total load. Moreover, authors in [124] propose to compute multiple phase-specific indices for vulnerability, degradation, and restoration efficiency, all from a timely curve of supplied load. This is then supplemented with a resilience index, which covers the whole event horizon. The same tendency is observed in [128] where the load expected maximum loss, interruption rate, restoration rate, and the recovery rate are evaluated. In relation to this, works in [131,139] present fewer details on phase, but still offer the possibility to distinguish, in a broad sense, between survivability and restoration. A novel approach is highlighted in [135], where the percentage of loss load is proposed as a resilience metric, explicitly distinguishing in its terms loss of load in each single resilience phase.
However, unlike the above phases fine-grained analyses of resilience, studies in [126,130,133,134] opt for embedding the entire resilience information in a single metric, based on the inverse of power loss during an extreme typhoon event in [126], the ratio between up-time and event time in [130], loss percentage in [133], and combination of average loss and recovery time in [134]. This offers the advantage to be more attractive for DSOs as the framework is simple and less cumbersome, but it should be handled carefully to not miss tradeoffs that exist in resilience assessment. A good example is illustrated in [130], where resilience is calculated as the ratio between up-time and total event time. Attention was given to emphasize that this measure is defined for a single node, embodying another kind of granularity different from the one offered by multiple metrics for different phases.
Poudel et al. [125], extend a risk-based metric, value-at-risk (VaR) which calculates the maximum loss expected over a given time period and give a specified degree of confidence. The proposal is conditional VaR (CVaR), defined to calculate the expected resilience loss due to probabilistic threat events, conditioned on the events being HILP. This bridges traditional risk management and all-phases resilience study.
Topological characteristics are considered in [131] in the form of node degree. Bajpai et al. [137] make advanced use of the modeling graph, by proposing a multi-criteria decision-making (MCDA) approach which takes a set of inputs, among which performance and topology parameters, and aggregates them into a single resilience metric using Choquet Integral. Table 3 summarizes the different implemented measures to enhance PDN resilience. Infrastructure hardening, energy storage, and distributed generation resources are intensively explored owing to their wide deployment and availability. In addition, both distribution automation and network reconfiguration (which can be manual or automatic) contribute to enhancing the robustness and adaptability of the network, and enable very efficient recovery. It can be seen thereby that all works from Table 2 that handle recovery either in pre-event, or event real-time, implement one or both of these two strategies. Contribution in [128] develops a set of probabilistic metrics that capture features and a detailed process of automatically locating, isolating faults, and restoring the service to customers in distribution systems. More precisely, the proposed algorithm devises a switching sequence and calculates load interruption when dealing with a large number of switches in large-scale distribution networks. Despite promising results to boost resilience, attention should be paid to the level of automation to be introduced in the network, because it can produce the inverse effect in rare events [146]. Hardening Various smart grid functions of improved safety, self-healing, high DER penetration, and active load control can be enhanced using microgrids [147]. Microgrids (MGs) are in some cases operated in parallel with the main distribution grid, where the possibility to have their separate resilience analysis [30,148,149], meaning that MGs can be taken as a testbed to illustrate the applicability of the proposed resilience quantification [124,136,137]. In another approach, MGs are adopted as a resilience strategy that can be enabled in case of a disaster through islanding technique [150,151], thus the need to schedule the formation of MGs and associated DER dispatch and remote switches operation [133,138]. Further resilience benefit is achieved when multiple MGs are interconnected, given a better situational awareness conveyed between networked grids and eventually sharing of distributed resources [30,127]. Contributions in [124,127,133,[136][137][138] are only a small part of the increased interest in MGs for distribution grid resilience enhancement [65]. In a general sense, resilience strategies are in some cases adapted only to certain disruption, and can be even a shortcoming during different circumstances [30]; thus, network planner needs to conduct a general study which includes all possible anomalies and try to manage all the tradeoffs therein when it comes to implementing resilience enhancement strategies.

Grid ICT Network
The resilience of PDN communication service is analyzed in [140][141][142][143][144][145] and a summary is given in Table 4. Again, classifications are used as in Sections 3 and 4 to review these works.  [143] Hurricane Sandy Spatio-temporal non-Stationary random process Real data from 4 DSOs [144] Generic failure DayLight SDN controller interfaced with Mininet-based testing framework integrated with ns-3 network simulator [145] Natural disasters Real data from various scenarios

Performance Calculation, Resilience Metrics, and Extreme Event
Figures of performance (FoP) defined for ICT system in distribution grid are different from the ones presented before for electric service. Both [142,144] adopt simulation-based modeling to set the ground for resilience quantification. The former builds upon the ad-hoc nature of wireless sensor networks (WSNs) technology that can be used to support metering infrastructure for redundancy and replication, therefore the use of a WSN simulator to evaluate various routing protocols (assumed 300 nodes) based on five performance measures: average delivery ratio, energy efficiency, delivery fairness, average throughput, and delay efficiency. Then, all these are normalized and provided as an equiangular polygon where each performance metric is presented by an axis. Resilience metric is taken as the area of that polygon, so the wider it is, the more resilient is the routing protocol against selective forwarding attacks. Authors in [144] consider a simpler configuration with one software defined networking (SDN) controller, and three substations each having a connected field device; with the goal to show that SDN is a viable technology with negligible switching delay to backup wireless communication and a minimum number of packet loss, which are taken as resilience metrics.
A graph-based analytical model is adopted in [140] to determine the needed transmission power and required number of gateways for wireless-enabled mesh architectures in the context of smart metering. A proposed methodology involves clustering to assign each smart meter to a gateway, then the average number of hops and the number of independent paths to reach the gateway are calculated as intra-cluster resilience metrics, while node capability to connect to other gateways in case of a primary gateway failure is addressed by inter-cluster resilience. A different graph approach is used by [141] to consider dependencies between ICT and measurement layers which, seen from a higher perspective, are no more than the entire communication infrastructure used in a smart grid. The degree of centrality is used to find the importance of each communication link and measurement unit, then resilience metric is defined as the deviation from ideal importance values, knowing that the main goal is to reduce the importance of critical nodes that increases the robustness of the network.
At this point, one can notice the absence of resilience phases notion from the presented works so far, which is a major drawback. This can be seen also from the relatively low importance given to disruption modeling and characterization, considered very important in resilience studies. On the contrary, [143,145] introduce temporal phases; though with fewer details than electric service cases, but sufficiently to convey all relevant information about resilience. Both works rely on empirical data from post-recovery assessments by DSOs. In the case of [143], the proposed resilience metric is calculated for the infrastructure and the service using expected cost from customer and system sides (4 considered DSOs) during hurricane sandy (2012). Obtained curves show the effectiveness of coupled nonstationary random processes modeling for failures, recoveries, and costs to customers.
As suggested in [130], the same author defines power supply resilience of an ICT site in [145] as the ratio between up-time and event duration, and uses real field data from different natural disasters to calculate this quantity. This illustrates how the same metric can be applied to quantify the resilience of electric and telecommunication services in a smart grid.

Time of Evaluation
Attention was drawn above to the absence of temporal analysis in most ICT network reviewed works, and when present, empirical models are used for resilience frameworks. This renders knowing when performance measures should be calculated and for which objective without detailed exploration. In other words, analysis is still at an initial level of uniquely obtaining the metrics and, except for [141], no planning or response is based on these metrics. For instance, [141] proposes to optimize the wide area monitoring system (WAMS) design through the optimal resilient deployment of phasor measurement units (PMUs) and new optical ground wires, formulated as an optimization problem based on performance measures used to calculate the resilience metric. Thus, almost all evaluations are conducted after the event as illustrated in the second part of Table 2 (green background) which entails no further use in planning or response.

Resilience Strategies
Proposing resilience enhancement is tightly connected to the type of conducted evaluation. So, due to the limitation here to post-recovery metric calculation, improvement strategies are shown to have a positive impact on the network but only one optimized implementation [141] is achieved to exploit the whole potential of these measures (Table 3).
Data-related strategies of replication and redundancy are completely adapted to multi-hop routing mechanisms in WSN networks, and need to be explored considering the associated cost for either the initial investment or subsequent maintainability [142]. SDN and virtualization technologies represent an attractive option for SG resilience under different architectures (e.g., substation automation, utility Machine-to-Machine (M2M) applications, cloud and IoT applications . . . ) which can address SG-related issues of security, privacy, granularity, vendor-specific components, and network management [152]. This wide penetration of SDN opens the opportunity to leverage it also to improve the resilience of the network.
Furthermore, measures seen for electric service [130] are suggested for ICT case [145] highlighting interdependence between the two networks, and the possibility to develop promising joint evaluation frameworks treated in the recent literature [61], out of the scope of present work.

Results and Insights
This section builds on presented observations and analysis of the reviewed literature to explain challenges and priority perspectives for resilience quantification in modern distribution grids.

Moving from Qualitative to Quantitative Resilience Assessments of the ICT Domain
From a qualitative perspective, resilience studies are very well established and succeed to demonstrate the shift of paradigm they incarnate in terms of preventing a given infrastructure from catastrophic failures and orchestrating restoration of nominal services. However, when it comes to quantitative assessments, general tendency heads toward restraining resilience capacity to one of its components such us robustness, survivability, adaptability, restoration, and recovery [153].
Power network resilience analyses in general, and PDN in particular, are managed in recent years to develop quantitative frameworks that describe and harness all capabilities of resilience. This is not limited to proposing metrics for all temporal phases, but includes also using developed indicators to optimize enhancement strategies like done in [125,131,[133][134][135]137]. Certainly, more works should be carried out in this sense and even more to mutualize visions through standardization to yield consensus in evaluation methodologies and metrics; but the right research direction is indeed being explored in electric distribution networks. Parallel to this, the same dynamic should be adopted also for telecommunication services involved in smart grids which so far, as shown through this review, stick to partial definitions of resilience adopted even in studies targeting communication networks outside the scope of smart grids. Differently said, ICT resilience studies are a step behind compared to what is done in power networks in terms of adopted definitions and proposed frameworks. Awareness then increases that smart grid comprehensive resilience analysis goes hand in hand with both electric and telecommunication services evaluation at comparable levels of advancement, meaning that ICT layer in distribution grid has a considerable margin for improvement that can mimic electric service analyzes and be guided by recent works in general purpose resilient communication networks [75,78].
One can argue that tracking electric service performance subsumes the telecommunication aspect, because the latter contributes to the degradation of power supply to customers which is after all the main concern. This is a client-centric approach that resilience contains, but also complements with operator (or network) centric view, where a fine-grained analysis of all system mechanisms is needed, involving among others a separate and deep look at ICT functions.

Need to Specify Time of Evaluation
Emphasis is put throughout previous sections on the importance of "when" resilience evaluation is conducted (Table 2), which is not to be confused with the time of the event occurrence [36]. The difference is easily seen in an example of proactive approaches, where the entire event time horizon is studied in the pre-event phase of real-time scale. This means that event time is taken as the virtual quantity, which in case of data availability or use of modeling can be observed before it happens, while the real-time scale describes the moment of resilience quantification. Therefore, the concern of event time is to know if the resilience framework treats all phases (the more phases the better), but time of evaluation wants to know when resilience assessment metrics will be available, probably for use in optimization by enhancement strategies.
Obviously, DSOs are more interested in the look-forward method, which allows them to anticipate major disruptions and prepare the network. However, HILP events are so unpredictable that fidelity of assumed models and projections is reduced, supporting the need for real-time resilience analyzes that will have more knowledge into the impact of an event, and could complement initial proactive measures. Thereby, effort should be put to explore the possibility of a framework with both proactive and reactive resilience quantification in order to seize the advantages of the two approaches. At last, postrecovery evaluation can back both previous alternatives by collecting valuable field data after hazards.

Topology and Service Performance Metrics
Only a few reviewed papers consider topological parameters in metrics computation [127,131,140,141] due to the high level of abstraction in graph-based methods and static features therein. Still, it is important to include them in resilience studies because they capture network architecture and internal dependencies between different elements that complement service performance measures. As discussed in Section 5, a noteworthy multi-criteria approach was suggested in [137] to combine topological and operational characteristics in the same metric. Although more inclined toward topological features, this proposal illustrates how multiple weighted parameters can be aggregated into a single representative indicator. In addition, interdependence modeling widely adopts topological approach [154,155], so it is unavoidable to embrace it in power systems, because in the long run, smart grids resilience must be analyzed taking into account the interactions between electric and ICT layers; and with other infrastructure networks (gas, heat, cooling, transports, etc.).
Like transmission power networks [29], multiple metrics are proposed for resilience quantification in [124,128,131,135,142]. A single resilience metric, even in the case where it embeds a maximum number of resilience features, can represent a drawback if it offers less information for enhancement strategies implementation. The reason is apparent in some strategies that only target one facet of resilience, let us say for example robustness; hence, when the metric combines many features it dilutes the information about robustness in the general index. This is why multiple metrics, each handling an aspect or phase of resilience, can help to build better knowledge and guide more specific actions.

Spatial Scale
Resilience frameworks need to combine qualitative and quantitative analyses at various temporal and spatial scales [156]. The temporal aspect is treated widely through monitoring of performance evolution with time, however, more effort should be put into considering time horizons of different events which directly relate to the system resilience and the efficiency of quantification methods [136]. For a small service area, the same failure probability of each component is considered when the distribution system suffers from natural disasters [131]. In the case of larger areas, it becomes very important to consider the spatial distribution of an event, in order to better estimate the hazard impact and recovery duration [105]. This can be achieved by defining multiple impact zones and use of failure probability or N-k contingency constraint [123,132]. Other methods use a model for event path [128], the spatial distribution of the number of outages [129], and spatio-temporal random processes [143]. Since post-disruption electric grid performance is highly sensitive to event spatial characteristics [105], the spatial dimension should be explicitly incorporated into performance function, unlike most related works.

Critical Load
Different levels of prioritization exist between loads in an electric distribution network. Resilience involves the tolerance to curtail less important customers while keeping supply to more critical ones (hospitals, emergency services, banking, government facilities . . . ). When the outage is general, critical loads are to be restored first. This behavior needs to be captured by resilience metrics where the difference between normal and crucial loads can be explicitly seen.
In the telecommunication layer of distribution networks, the concept of critical elements is less applied (not found at all in reviewed articles) due to the fact that communicating devices are mostly used in protection, monitoring, management, and control functions which are all very important to the whole network operation. However, within the telecommunication architecture used by grid functions, hierarchies exist, and entities can be prioritized. For example, a regional control center can have the highest criticality in a given region, compared to remote terminal units (RTUs) at substations, or field devices. With the advent of smart grids, there is an ever-increasing number of distributionconnected items that can be seen as loads more than controlling devices such as smart meters, industry 4.0 robots, and industrial IoT. Thereby, even more hierarchy can be put in place based on which elements are most important, or even achieve cross-importance rankings with electric infrastructure and loads in the system.

Uncertainty Quantification
The main sources of uncertainty in smart grids are HILP events, load demand, distributed generation, and market prices. Among these, HILP hazards have the characteristic to severely damage the network, thus like seen in [123,132,134,135], different methods are proposed to cope with its uncertainties. Again, this topic necessitates being investigated for the grid telecommunication layer because it is also vulnerable to extreme event uncertainties, especially as it is in the front line against cyber-attacks.

Economical Cost
DSOs do not just scrutiny costs due to phenomenal disasters and attacks, but also audit their investment strategies to find the best balance between resilience and minimal spending. Cost is inserted in resilience studies at different levels, most of the time directly on the metric [123,127,128,133,143], but can also be incorporated in objective functions of cost-benefit analyzes [131,134,138,141] that search the optimal tradeoff between resilience and associated investment costs.

Resilience Potential
Performance-based evaluation of resilience is widely adopted to conduct an assessment from event eruption until the final recovery. It is always reported to the nominal performance of the system before a contingency. Authors in [136] introduced "service potential" which describes how able is the network to deliver its service under given the unfavorable conditions. This allows comparing two grid systems or architectures under different orders of event durations. We can extend this into resilience potential, which is no more than a quantity that gives resilience of a network, considering all possible redundancies and resources, very similar to risk assessment empowered with consideration for enhancement strategies. Concisely, expand on the idea that the same nominal level of performance does not mean the same level of resilience.

Interdependencies
Separate analysis of electric and ICT services in distribution grids is deemed to converge into a joint layout due to multiple existent interdependencies, wherein the continuation of this work, the study should be steered by a resilience perspective [90]. Contribution in [61] summarizes research in interdependent power-ICT research on system modeling, failure, and resilience enhancement strategies. From the fact that mutualized resilience evaluation is the best approach to deal with interdependency which makes the coupled network more vulnerable to disruptions through cascading and escalating effects [155], many recent works conduct resilience studies jointly for both communication (or cyber) and electric domains of the grid [56,[157][158][159].
Dependencies of electric network with other infrastructures are also handled jointly in case of gas network [102], buildings [160], urban transportation [161], integrated energy system [88], water network [162]; allowing for the possibility to adapt some prominent ideas and principles for application in the specific case of smart grids. Further discussion of interdependencies is out of the scope of this article, but it should be emphasized that this topic is the natural follow-up of the work presented here.

Conclusions
In this paper, state-of-the-art studies on resilience quantification of smart distribution grids are summed up with the perspective to analyze all involved tools and point out assessment objectives. Performance calculation is identified as the main enabler of resilience evaluation, as almost all reviewed metrics rely either exclusively on operational performance measures, or as a mix of operational and topological parameters. Many models are proposed in the literature to compute performance, among which system modeling is the most dominant with a focus on four main aspects: contingency, fragility, restoration, and functional dynamics. Empirical models serve as baseline and data feeder for system models, whereas surrogate models try to bypass network modeling by the harness of advanced machine learning techniques to directly infer performance measures from various topological, topographic, and operational parameters.
Distribution grid resilience is defined in reviewed research in the face of HILP events which need to be foreseen using forecast data, historical records, estimation tools, and contingency models. Accentuation is made on the difficulty to design resilience for multiple events, especially with the fact that enhancement strategies can be very specific as they are advantageous in some cases and not in others. In addition, we propose a classification based on the time of resilience evaluation, which allows projecting real case applicability of presented assessment frameworks. The resilience phases-based approach was linked with different objectives of the assessment, from simple metrics evaluation, to either planning or response for survivability and recovery; achieved through a variety of improvement strategies for which allocation is optimized under the constraint of a limited budget. This bridges resilience studies and economic considerations in order to help stakeholders in investment plans elaboration and crisis management decision-making.
Aspects of critical load, microgrids, and uncertainty of hazards, load, and distributed generation are discussed to show their high importance, and explain available tools so far for their involvement in the study. Finally, a demonstration was made on ahead steps that resilience studies in the electric domain have compared to telecommunication domain, and an urgent need to level up the two for complete joint resilience analysis of smart grids, unlike current separate works that neglect several pertaining interdependencies. Therefore, future works need to focus on coupled electric-ICT networks with joint quantification frameworks, which not only consider the resilience of the coupled system, but seek further granularity by investigating constituent applications and functions such as distribution automation, automatic metering, and grid management.