Power Systems Resilience Metrics: A Comprehensive Review of Challenges and Outlook

Recently, there has been a focus on natural and man-made disasters with a high-impact low-frequency (HILF) property in electric power systems. A power system must be built with “resilience” or the ability to withstand, adapt and recover from disasters. The resilience metrics (RMs) are tools to measure the resilience level of a power system, normally employed for resilience cost–benefit in planning and operation. While numerous RMs have been presented in the power system literature; there is still a lack of comprehensive framework regarding the different types of the RMs in the electric power system, and existing frameworks have essential shortcomings. In this paper, after an extensive overview of the literature, a conceptual framework is suggested to identify the key variables, factors and ideas of RMs in power systems and define their relationships. The proposed framework is compared with the existing ones, and existing power system RMs are also allocated to the framework’s groups to validate the inclusivity and usefulness of the proposed framework, as a tool for academic and industrial researchers to choose the most appropriate RM in different power system problems and pinpoint the potential need for the future metrics.


Disasters
Over the last several years, the term "resilience" [1][2][3][4][5][6], or so-called "resiliency" [7][8][9], has appeared as a rising concept in power systems literature. This is due to the increasing rate of high-impact low-frequency (HILF) events and their great impact on power systems. The HILF events may have catastrophic impacts on power systems, but rarely or never occur [10][11][12]. However, the occurrence of an HILF event in a centralized-controlled power system may turn small damages in some important sections to the extensive outages [13], and this is the reason for increasing importance of resilience analysis in modern power systems.
(3) It is essential to determine whether it encompasses planning domain (before the disaster) or operation domain (during and after the disaster) or both. In References [8,15,24,41,44,50,54], resilience is related to the operation domain, whereas power system behavior before disasters is attributed to the quantities such as robustness [15,41,50], hardening [8,44] or service quality [24]. In other references, such as [1,28,55,56], resilience is related to both planning and operation domains. (4) The difference between resilience and other concepts, especially reliability, must be defined.
In Ref. [24], resilience is part of reliability. By contrast, in [25,45,46], reliability is a component of resilience. In other references [15,21,28,33,36,37,41,50,[57][58][59], resilience and reliability are two completely distinct concepts, where the difference mostly relates to the nature of the corresponding event (high-impact low-frequency for resilience and low-impact high-frequency for reliability). (5) The state of the power system after final restoration may differ from the initial state. For example, three years after Hurricane Katrina, the number of electric customers was half of the pre-storm conditions [60]. Thus, the resilience definition must consider the difference.

Paper Contributions
This paper studies the literature of extensive efforts (both academic and industrial) regarding the measurement of the resilience in power systems. Power system resilience is measured by the resilience metrics (RMs), which will be introduced in detail in the subsequent sections. Several RMs have been proposed and implemented in the power system literature, and some review papers related to these metrics will be reviewed at the end of Section 2. However, the main problem is how these RMs must be classified to provide an in-depth view for the academic and industrial researchers? This can be done by the so-called "conceptual framework", which must be comprehensive and detailed and must present an in-depth view about different types of RMs. Some researchers tried to fulfill this need and propose conceptual frameworks for RMs, such as [5,37,61,62], but these efforts have important and essential shortcomings (see Section 5.3).
In this paper, a new and detailed conceptual framework for the classification of RMs in power systems is proposed. Then, two important points must be proved: Point 1: The proposed conceptual framework is comprehensive and useful, i.e., can cover all types of RMs in power system literature.

Point 2:
The proposed conceptual framework is better than the existing frameworks in the power system literature.
In order to prove the "Point 1", after an extensive literature survey, the state-of-the-art of the power system RMs are categorized based on the framework's groups, which shows that this framework can cover all types of existing RMs (see Section 4). In order to prove "Point 2", some existing conceptual frameworks presented in the literature are critically analyzed. These conceptual frameworks are presented in technical reports provided by distinguished research institutions (such as "RAND corporation" and "Sandia National Laboratories"), or in papers presented in distinguished IEEE journals, and these references are extensively cited in the literature. Therefore, these conceptual frameworks are very important for academic and industrial researchers interested in power system resilience domain. We compare our proposed conceptual framework with the existing ones, and show some essential shortcomings in the existing frameworks that do not exist in the proposed framework (see Section 5.3).
Therefore, the usefulness, accuracy and comprehensiveness of the proposed conceptual framework compared with the existing frameworks and its ability to classify all types of RMs in the power system literature is justified. The proposed framework can be useful for both academic and industrial researchers interested in power system resilience domain. Academic researchers can concentrate on the groups that need further investigation, in order to propose new RMs. Industrial researchers can choose the appropriate group according to their needs and select the desirable metric, which will be used for planning/operation of the power system with a focus on the disaster effect. To the best of the authors' knowledge, this paper is one of the first comprehensive analytic studies regarding different types of power system RMs.

Paper Structure
The rest of this paper is organized as follows. In Section 2, the RMs are introduced, their main attributes are presented and some existing reviews regarding RMs in the power systems are introduced. In Section 3, a new conceptual framework for power system RMs is proposed and explained in detail. In Section 4, the state-of-the-art of power system resilience metrics are investigated and the identified metrics are organized based on the proposed framework, including the non-performance-based RMs, the performance RMs and the consequence (outcome) RMs. In Section 5, the application of reliability metrics and metrics based on disaster probability for resilience evaluation is analyzed. Then, the proposed framework is compared with the existing ones in power systems literature. Finally, concluding remarks and references are presented.

Resilience Metrics
Evaluating the resilience of power systems, due to several complexities, is a very controversial matter. However, this evaluation is required for the assessment of resilience enhancement strategies [28]. By the quantitative measurement of the resilience, it is possible to identify the important needs, monitor, and determine the resilience improvement level relative to a base system, and make a cost-benefit analysis [6]. The necessity for a cost-benefit analysis to improve power system resilience is frequently mentioned in the literature [2,[6][7][8]13,17,18,23,30,37,41,63].
Resilience metric (RM) [1,4,28,36,61], or so-called Resilience Index (RI) [21,47,49] or resilience indicator [4,59], is a tool for measuring the resilience level of power systems. The appropriate evaluation of the resilience level of the power system leads to effective and rational resilience enhancement strategies [33], e.g., advanced control techniques to improve the resilience level of microgrids [64][65][66][67][68]. By using quantitative resilience evaluation techniques, weak and strong areas of the power system can be identified, and resilience enhancement strategies can be proposed [37]. Appropriate measurement of the resilience level and the resilience level improvement necessitates developing associated RMs for implementing in power system planning, operation and policy-making [36,37]. The RMs can help to public utility commissions and other regulators in decision-making of resilience investment and rate recovery [69]. Since a number of factors may affect the resilience, it is difficult to quantify the resilience accurately. One simple way is the integration of the factors in a combined RM, in a form of the appropriate integration (factor selection, factor measurement, mathematical integration method, proper weighting, etc. are controversial) [6]. Quantifying the resilience is hard and requires a widely accepted metric and associated computation algorithm [6,13,43,70].
In power system literature, RMs are not standardized, and there is no general agreement about the necessary capabilities, their measurement method and their relation to the desired outcomes [5,69]. The RMs that differ from reliability metrics are lacking [33].
Some of the main attributes that power system RMs should or might have are as follows: (1) Measure the resilience components in compliance with the resilience definition [36].
(2) Consider the spatial-temporal features of a disaster on the power system [28,37].
(4) The capability of comparison between different systems and between a system with and without resilience enhancement, i.e., proving that resilience enhancement strategies are effective [36,37]. (5) One metric is required for the simultaneous use in planning and operation [36]. (6) Being extensible; i.e., comply with the technology advancement and new computational techniques [36]. (7) Quantitative or ordinal representations (the latter means using qualitative levels such as "unsatisfactory", "marginal", and "satisfactory" or A, B, C, D levels) [6,36,69]. (8) Representation of uncertainty which may be computed as a deterministic metric or a probabilistic (e.g., expected value, probability distribution) metric, based on the number of intended fault scenarios [36,37]. However, as mentioned in Section 1.1, the concept of "probability" cannot be defined for most of the HILF events, and using probabilistic RMs is still controversial (see Section 5.2). (9) Being risk-based, i.e., considering threats, system vulnerability, and associated consequences [36,37,69]. (10) Make a distinction between operational resilience (whether operational strength is maintained) and infrastructure resilience (whether physical strength is maintained) [4,28,71,72]. (11) In a performance curve, the initial state, performance level at each state and duration of each state must be considered [28]. (12) The assumption that VOLL (Value Of Lost Load) is not constant, but is time-dependent [24]. (13) Considering different scales, i.e., global, area-specific or component-specific [28]. (14) Being open, transparent, replicable, well documented and simple, to be used or checked by others (with identical data) [6]. The RM computation may need extensive data that do not exist. However, based on the frequency, severity and cost of weather-related natural disasters, all related institutes must make decisions and actions for gathering and recording the required data. (15) Considering operator control variables and their time dependency [36].  In Ref. [73], RMs in four domains (organizational, social, economic and engineering) are reviewed, where resilience assessment approaches are classified into qualitative and quantitative categories (with further sub-categories) and the main focus is on quantitative approaches in the engineering domain. However, the power systems' RM literature is not well studied in [73], and the reader cannot find an in-depth view about power system RMs. In Ref. [5], some resilience metrics in power system literature are reviewed. However, this reference only shows what resilience metric is used in each reference, and although some related papers are investigated, there is no conceptual framework to categorize these metrics and the existing frameworks are not analyzed. In this reference, it is stated that "a standardized framework would facilitate researchers in developing resiliency metrics to assess and evaluate resilience of power systems", which confirms the need for a framework for power system RMs. Other reviews of RMs in power systems are presented in [61,62], which have essential shortcomings that will be presented in Section 5.3. Thus, we believe that there is still a need for a dedicated study for RM review and classification in power systems with the consideration of disasters effect on the power systems. In this regard, this paper proposes a conceptual framework for power system RMs and classifies the existing RMs based on this framework. To the best of the authors' knowledge, this paper is one of the first comprehensive analytic studies regarding different types of power system RMs.

The Proposed Conceptual Framework for Resilience Metrics in Power Systems
After extensive and in-depth literature survey regarding the application of RMs in power systems (directly or indirectly), a conceptual framework for the classification of RMs in power systems is proposed (see Figure 2). In our opinion, this framework is comprehensive and useful and covers all different types of RMs. In future, the framework can be adjusted if necessary with the introduction of new and different RMs. It should be noted that this framework is completely general and is not restricted to any specific sector in power systems such as generation, transmission or distribution sides. In the proposed framework, at the first level, RMs are divided into performance-based and non-performance-based metrics. The performance of the system, also called functionality [28,52] or quality [51,74], is the direct output quantity of a power system. The system performance can be obtained (or assumed) by the analysis of the historical behavior of the system in a period, or by forecasting the system behavior in a period (e.g., using Sequential Monte Carlo simulation [29,58]). Performance-based metrics depend on system performance. When the system performance is determined, performance-based metrics can be computed. For example, system performance may be the supplied load, and the performance-based metric may be the normalized area under the supplied load curve. The concept becomes clearer when the reader studies the rest of this section and also the dedicated Sections 4.2 and 4.3.
Non-performance-based metrics do not depend on system performance. Instead, some factors, which can affect a power system before, during and after a disaster, are identified. Using methods such as site visit and questionnaire, quantities related to these factors are then computed. Finally, by combining these quantities, the metric will be calculated. These metrics are criteria of the status of effective system factors in the face of disasters. The metrics are usually used for resilience evaluation for a single infrastructure, or a community (including local neighborhoods, cities, regions, etc.) [6] that power infrastructure is only one of its infrastructures. These community RMs are not for power systems, but those parts of the metrics that are related to power infrastructure can be used separately. In addition, system attributes that are effective on resilience (e.g., the number of critical spare parts in inventory [36]) are considered as non-performance-based metrics.
In short, "performance-based" metrics depend on the system performance (system level quantities such as system load, system generation, system configuration, etc.) and "non-performance-based" metrics are independent from the system performance.
In the proposed framework, there is no further division for non-performance-based metrics. Performance-based metrics have divisions in more levels and can be divided into two groups: performance metrics and consequence (outcome) metrics. In performance metrics, the system performance (the direct output of the power system) is presented as RM, whereas, in consequence metrics, the effect of the power system on the diverse features of the society is presented as RM.
Both performance and consequence metrics are divided into two groups: specific and general metrics. Specific metrics are defined in an obvious manner, without ambiguity and based on specific quantities. In contrast, general metrics are defined with ambiguity and using general concepts such as performance, functionality, impact and effect. General metrics exist because the current maturity of RMs is low. By removing ambiguities, a general metric is converted to a specific metric. In future, it is expected that this maturity grows and, finally, the division into specific and general will be removed. In other words, all performance-based metrics will be specific metrics and there will be no general metrics.
Specific metrics are divided into some more levels, whereas general metrics have no division. Specific performance metrics are divided into five groups as follows: (1) Power: These metrics show how the load is or is not supplied, or how generation capacity is available. These metrics can be in the form of power or energy, and because energy is calculated based on power, the name "power" is used for this group. Cost metrics related to load or generation does not belong to this group and belong to the consequence metrics (economic group). Here, the number is only related to system equipment or customers. Metrics that are related to the number of people, households, etc. do not belong to this group and belong to the consequence metrics (social group). (4) Probability: These metrics show the probability of different aspects of disaster effect on power systems. These metrics may depend on power, duration and frequency metrics. (5) Curve: These metrics are computed based on the performance curve (system performance versus time) or resilience curve (another RM versus time). The main characteristic of these metrics is that both system performance (or the other RM) and time (period of the variation) are considered simultaneously.
It can be seen that these groups are a combination of division based on quantity and division based on mathematical concepts.
For specific consequence (outcome) metrics, a completely different division is used, based on the diverse effects of power systems on the society. Specific consequence metrics are divided into four groups as follows: (1) Economic: These metrics show costs and economic impacts of power systems on the society.
All cost metrics belong to this group. A consequence metric can be computed using performance metrics such as power, duration and frequency metrics. In addition, a consequence metric may be presented as a probability or using consequence curve (consequence versus time). Thus, consequence metric groups are not the opposite of the performance metric groups despite the difference.
It should be noted that power system RMs may be defined based on system level or asset level approaches. As "performance-based metrics" are computed based on the system performance, they are usually defined based on a system level approach, and asset level metrics are usually considered as "non-performance-based" metrics. However, it is possible that asset level resilience metrics are computed based on performance-based resilience metrics to evaluate the relative importance of each asset from the whole system resilience viewpoint. In this condition, the asset level metric is a performance-based metric (e.g., see RAW metric in Section 4.2.1).

State-of-the-Art of the Resilience Metrics Based on the Proposed Conceptual Framework
In this section, based on our comprehensive and in-depth literature survey, the identified RMs are allocated to each group in our proposed conceptual framework, to justify comprehensiveness and correctness of the proposed framework. In the literature, RMs are presented directly (plain RMs) or no metric is directly presented. However, the resilience aims to be improved, and the RMs that are indirectly used for the disaster effect are found. It should be noted that our literature survey is completely general and is not restricted to references related to any specific sector in power systems such as generation, transmission or distribution sides.

Non-Performance-Based Resilience Metrics
In the RI metric [47], resilience is divided into three components: robustness, resourcefulness and recovery. Each component is divided into some other components and this division continues up to level five. At the lowest level, some questions in the form of "yes" and "no" exist and, based on answers, the related indices are computed. Then, at each level, the related index is computed as the weighted linear combination of related components at the lower level. Finally, the resilience index is computed as the weighted linear combination of robustness, resourcefulness and recovery indices. It seems that this index is appropriate for a single infrastructure (such as a power plant) and is not appropriate for a networked infrastructure (such as power system). In the RMI metric [49], a similar method is also used by extending the method and resilience definition in [47], where resilience measures are categorized as preparedness, mitigation measures, response capabilities and recovery mechanisms, and the division continues up to level six. There are other RMs which are generally similar to RI and RMI, including Coastal Resilience Index [6,48], BRIC metric [6], RCI metric [6], CDRI metric [6], CREATE-ERI metric [6], DRI metric [6], DDI metric [6] and World Risk Index [6].
The G index (Grid Recovery Index) [21] is the only non-performance-based RM dedicated to power systems. This index is related to a power system recovery period after a disaster and is defined as the weighted linear combination of five quantities related to recovery capabilities, including the severity of extreme events, the severity of power infrastructure damage, the severity of transportation infrastructure damage, the severity of cyberinfrastructure damage, and the unavailability level of human and material resources.
Finally, any metric as a system attribute (related to disaster), or exclusively computed based on the system attributes, is regarded as non-performance-based RM, disregarding the metric usefulness. There are examples of these non-performance-based RMs including the number of critical spare parts in inventory (such as the number of spare transformers) [36], the dollar amount (per capita) of federal assistance spent annually for disasters [6], energy storage capacity and/or stocks by fuel and market, largest single source of supply in the energy market and redundancy in the network's architecture [59]. More examples can be found in [61] in columns of the detailed tables entitled "inputs", "capacities" and "capabilities".

Performance-Based Resilience Metrics-Performance Metrics
As stated in Section 3, performance metrics are divided into two groups: specific metrics (power, duration, frequency, probability and curve) and general metrics. In this section, the identified performance metrics are allocated to the related groups in the proposed conceptual framework. Figure 3 briefly shows different types of performance-based performance RMs in the power system literature (except for general metrics). It should be noted that, in detailed tables in [61], some of the quantities in the column entitled "performance" are performance metrics.
(10) RAW (Resilience Achievement Worth) [71,72]: The percentage improvement in the EENS (Expected Energy Not Supplied) when each transmission corridor is considered 100% resilient to the weather event. The RAW metrics of the transmission corridors are ranked to determine the most critical corridors for suitable adaptation strategies. (11) The difference between power plant generation before and after disaster [101].

Duration Metrics
The most notable duration metrics are as follows: (1) Load or energy curtailment duration (deterministic [88,102], expected [59,88]), including customer outage duration [28,102], CAIDI (Customer Average Interruption Duration Index) [24], LOLE (Loss Of Load Expectation) [29,94], SAIDI (System Average Interruption Duration Index) [24] and STAIDI (STorm Average Interruption Duration Index) [102]. (2) Recovery duration (deterministic [10,28,37,87], expected [102]), including time to full infrastructure recovery [28]. It should be noted that the STAIDI metric is defined as the total customer storm interruption minutes divided by the total number of customers served, i.e., SAIDI is extended to severe weather events. However, it is shown that STAIDI exhibits a high degree of uncertainty to be valid for characterizing resilience. Therefore, extending, by brute force, the reliability metrics to RMs are not viable [102].
It should be noted that the LOLF metric is used for multiple disasters (independent or dependent) in a specified period (e.g., a week), not for a single disaster. STAIFI is also defined as the total number of customers interrupted divided by the total number of customers served, i.e., SAIFI (System Average Interruption Frequency Index) is extended to severe weather events. However, it is shown that STAIFI exhibits a high degree of uncertainty to be valid for characterizing resilience. Therefore, extending, by brute force, the reliability metrics to RMs are not viable [102].

Probability Metrics
The most notable probability metrics are as follows: (1) The probability that the amount of supplied load is greater than a certain value (after a disaster [58,77] or a specific period [58]). (2) The probability that unsupplied load is greater than or equal to a certain value [54].
However, as mentioned before in Section 1.1, the concept of "probability" cannot be defined for most of the HILF events. Therefore, using probabilistic RMs is still controversial and is not recommended (see Section 5.2).

Curve Metrics
The most notable curve metrics are as follows: (1) The area under the real performance curve [75,95], which can be normalized with the area under the ideal performance curve [1,51,103]. (2) The area between real and ideal performance curves (area of the resilience triangle or resilience trapezoid) [4,72], which can be used in normalized form during the performance degradation period [103]. (3) The area between real and worst-case performance curves, which can be used in normalized form during the performance recovery period [103]. (4) The slope of performance curve during the performance degradation period [4,72] or the recovery period [4,58,72], normalized performance degradation during the performance degradation period [103]. (5) The exponent of the exponential performance curve [51].
Here, the performance is usually the supplied load, but other performances can be defined. In addition, as stated before (Section 1.2), the performance after the final restoration may differ from the initial performance (before the disaster). As a result, the ideal performance curve cannot be defined for all parts of the performance curve and should be defined for a certain part of this curve (usually for the recovery period). Therefore, when the ideal performance curve is needed, the curve metrics should be used for a certain part of the performance curve.

General Metrics
Some general RMs only have a few ambiguities about the definition of system performance (functionality, quality, service). These metrics are general formula, and when the system performance is defined, they convert easily to one of the five aforementioned performance groups. These metrics include the area under the performance curve [37], which can be in the normalized form [74], the area between real and ideal performance curves [4], the weighted linear combination of resilience triangles related to important system factors [106], the ratio of the real performance curve to the targeted performance curve [37], a reciprocal of the system performance loss (which is defined using largest relative performance deviation from normal level and can be computed in accumulated and normalized forms) [33], composite RM (which is computed based on original, final and the worst system performances) [37], the ratio of final to initial quality [74], expected impact of disasters [32,72,83], the initial drop in functionality, the time required to restore desired functionality (mean restoration time), and the changes of two latter RMs with resource variations [52]. Another metric is defined for a period as the probability of system failure at the beginning with recovery at the end, divided by the probability of system failure at the beginning [74].
Other general RMs have more ambiguities, including potential for repeat incidents [10], variable avoided lost load value outage metrics [24], degree of robustness to the initial shock, functionality achieved during the event [28], how fast the performance decreases, how slow the performance recovers, downtime (the duration of the reduced system performance) [31], loss of services, and loss of service duration [87].

Performance-Based Resilience Metrics-Consequence (Outcome) Metrics
As stated in Section 3, consequence (outcome) metrics are divided into two groups: specific metrics (economic, social, geographic, safety and health) and general metrics. In this section, the identified consequence metrics are allocated to the related groups in the proposed conceptual framework. Figure 4 briefly shows different types of performance-based consequence (outcome) RMs in the power system literature (except for general metrics). It should be noted that, in detailed tables in [61], some of the quantities in the column entitled "outcomes" are consequence metrics. In addition, in the studied references, except for the economic metrics, only a few consequence metrics are found and research is needed in this area.

Social Metrics
In Ref. [36], the decrease in labor hours by the public over the course of the recovery period and the number of citizens able to work are presented as RMs. In Ref. [10], the population affected (number of people with no power) is proposed for resilience evaluation. In Ref. [58], performance (households with power, in the form of a value or percentage) after the disaster, and the probability that performance after the disaster is greater than a certain value, are proposed as robustness metrics, as a resilience component. In addition, the ratio of recovered performance to the recovery period, and the probability that, in a specified period, performance after the disaster is more than a certain value, are presented as recovery metrics, as a resilience component. These probabilities are obtained using risk curves. Finally, employment loss is considered as RM.

Geographic Metrics
In Ref. [10], the geographic area affected (a region with no electricity in terms of square miles) is proposed for resilience evaluation.

Safety and Health Metrics
In Ref. [36], loss of human life and hospital beds unavailable due to power loss are proposed for resilience evaluation.

General Metrics
In Ref. [36], two concepts are defined for power systems: performance (system output) and consequence (effect on the society), where the consequence is obtained using performance. Then, the probability distribution of consequence and its related numerical quantities (mean, VaR, CVaR, etc.) are proposed as RMs. In Ref. [58], repair and restoration efficiency and economic regional output are considered as RM. In Ref. [10], intangibles (loss of perception of secure image) are proposed for resilience evaluation. In Ref. [60], three phases are considered for disaster effect on power system: during the disaster, the immediate aftermath of the disaster and the long-term aftermath of the disaster. For each phase, the hazard impact and the system vulnerability are multiplied and compared with an upper limit. Then, sum of these three quantities is multiplied by the probability of disaster occurrence and is defined as RM.

Resilience Metrics vs. Reliability Metrics
Currently, most of the performance metrics and the consequence metrics in power system literature are reliability metrics. This situation, because of the distinction between resilience and reliability (see Section 1.2), causes some ambiguities. According to [31], if reliability metrics are computed within a resilience framework, which models threats, system vulnerability, system response and system restoration, it is possible to analyze the loss of performance and performance restoration. In these conditions, reliability metrics can be used as proper RMs, although they may not able to capture all potentially important resilience features.
However, according to [24], reliability metrics such as SAIDI, SAIFI and CAIDI may have shortcomings in resilience analysis, since they often ignore disasters and assume that VOLL is constant. Disasters are nonetheless essential for resilience analysis, and the value of lost load for the customer one hour after an event cannot be equal to one week after an event. Thus, for resilience analysis, VOLL must be time-dependent. According to [37], the most agreed upon reliability metrics, including SAIDI, SAIFI and CAIDI, often ignore outages made by disasters. In 2012, more than half of US utilities did not consider disasters in calculating reliability metrics such as SAIFI and SAIDI, since disasters, due to higher restoration costs, complicate the mathematical calculations, and also can cause longer outages [24]. Therefore, based on these metrics, optimal investments for disasters are not considered. Nevertheless, other US utilities considered disasters in their calculations.
In addition, it is shown that generalizing some reliability metrics (SAIDI, SAIFI) to equivalent resilience metrics (STAIDI, STAIFI) exhibit a high degree of uncertainty and is not viable [102]. Therefore, using reliability metrics to measure power system resilience is not recommended, although many well-established references in power system literature used reliability metrics for resilience analysis (see Sections 4.2 and 4.3).

Resilience Metrics Based on Disaster Probability
As mentioned in Section 1.1, the concept of "probability" cannot be defined for most of disasters (HILF events), and defining any resilience metric based on disaster probability is controversial. However, these types of metrics are extensively used in power system literature, and they can belong to any performance or consequence group, such as power metrics (expected energy not supplied), duration metrics (loss of load expectation), economic metrics (expected load shedding cost) and especially probability metrics groups. Thus, since the "disaster probability" cannot be defined accurately, using any metric which is based on disaster probability is not recommended, although many well-established references in power system literature used these metrics (see Sections 4.2 and 4.3).

Comparing the Proposed Framework with the Existing Ones
In Ref. [61], provided by RAND corporation, RMs in power systems are reviewed. One metric cannot be used for all aspects of decision-making. Rather, different metrics must be used for each aspect. The metrics are structured within a coherent framework, called "logic model", with five components: inputs (what is available to support resilience such as budgets, equipment, spare parts and personnel), capacities (how inputs are organized to support resilience such as response teams, recovery plans and advanced technologies), capabilities (reflect how well capacities can serve a system when they are needed, such as the ability to detect leaks or outages, to repair damaged power lines, and to restore power outages), performance (what is produced by an engineered system, such as the amount of energy delivered or operating characteristics of a system) and outcomes (how a system generate the outcomes that society is seeking to achieve, e.g., reduced damage from disasters, increased economic activity, and reduced deaths and injuries from disasters). In Ref. [61], RMs in power systems are classified by three measures: resolution or scale (facility/system level, system/region level, region/nation level), type (five components of the logic model, as stated before), and maturity (low, medium, high).
Remarks below are presented by in-depth analysis of the RM tables in [61]:

Remark-I:
It seems that the logic model is a general model for the power system, and the disaster does not exist in this model directly. It can be assumed that the inputs, capacities and capabilities are tools that a utility has before a disaster, and performance and outcomes are results that utility is confronted after a disaster. The analysis of the RM tables shows that this restricting viewpoint is not however considered in metric selection.

Remark-II:
Some performance metrics, especially at the facility/system level, are related to power quality (such as harmonic distortions, flicker, voltage unbalance, voltage sags/swells, etc.) that after a disaster has no importance.

Remark-III:
Some outcomes metrics are not related to resilience and have no post-disaster importance (such as CO 2 emissions, allocation of losses in deregulated electricity markets, etc.).

Remark-IV:
Some inputs, capacities and capabilities metrics are not similar to "metric" or "index", such as hierarchical levels (HL-I, HL-II, HL-III), transformers, energy storage, communication/control systems/control centers, ancillary service, electrical protection and metering, etc.).

Remark-V:
Some performance metrics, especially at the system/region level, are reliability metrics. Some well-established references used reliability metrics for measuring the resilience under specific conditions, although reliability and resilience are considered distinct concepts (see Section 5.1). However, some reliability metrics in this reference are not related to resilience, such as derated power, unscheduled generator outages, protective and switching devices attributes (probability of failure, protection reliability, and reclose reliability). In addition, most of the metrics at the system/region level are collected from references dedicated to reliability, not references related to resilience/disasters. Therefore, we believe that the idea of dividing resilience into a few components and allocating existing metrics to those components is an excellent idea. However, it seems that the study in [61] is carried out without considering the disaster effect on power systems. Several collected metrics are not related to resilience/disasters or are not metrics at all. Furthermore, some of the references used are not related to resilience/disasters. Therefore, employing the metrics presented in this reference must be done with caution and referral to the original references. However, excluding some aforementioned ambiguities, the inputs, capacities and capabilities metrics in [61] are among the system attributes and can be considered in the proposed framework as non-performance-based metrics. However, these metrics can be used for the computation of performance-based metrics (through the effect on system performance) or other non-performance-based metrics (as effective system factors). In addition, performance and outcomes metrics in [61] are considered in our proposed framework as performance-based performance metrics and performance-based consequence (outcome) metrics, respectively. The proposed conceptual framework has none of the five essential shortcomings we mentioned before about the RAND logic model.
In References [62,110,111], another framework is presented for resilience metrics. The resilience metrics are divided into attribute-based and performance-based metrics, and the latter is divided into direct and indirect metrics. The direct metrics consist of four groups including electrical service, critical electrical service, restoration and monetary. The indirect metrics consist of three groups including community function, monetary and other critical assets. Based on the definition of attribute-based metrics, they are equivalent to non-performance-based metrics in our framework, and the performance-based metrics in two frameworks are the same. In addition, based on the definition of direct and indirect metrics, they are equivalent to performance-based-performance RMs and performance-based-consequence (outcome) RMs in our proposed framework, respectively. Thus, these frameworks generally are similar, but their details are significantly different, as it can be seen in the following remarks:

Remark-VI:
In both direct and indirect metrics, a "monetary" group exists which complicates the metrics categorization, whereas those metrics are concentrated in one group (economic) in our framework. All costs are incurred by the society and separating cost to direct and indirect ones is a hard and complicated problem. Thus, considering all monetary and cost metrics in consequence (or indirect) metrics is more reasonable.

Remark-VII:
In the direct metrics, both "monetary" and "restoration" groups include an identical metric: "cost of recovery". It shows that this framework has essential shortcomings since one metric should not belong to two groups.

Remark-VIII:
The metrics in groups "electrical service" and "critical electrical service" are essentially the same and only differ in the term "critical". For example, "cumulative customer-hours of outages" belongs to the "electrical service" group, whereas "cumulative critical customer-hours of outages" belongs to the "critical electrical service" group. Since it is obvious that critical facilities have more importance in resilience analysis against disasters, it seems that this distinction is not a good idea.

Remark-IX:
In this framework, a distinct group is allocated to "recovery", which it seems is not independent of "electrical service" and "critical electrical service" groups.

Remark-X:
The number of studied references for proposing the framework is not enough and some important references are not reviewed.
This framework is built based on an excellent idea and is generally similar to our proposed framework; it has, however, essential shortcomings in the group's titles, metric selection and references. These essential shortcomings do not exist in our proposed conceptual framework.
Another RM classification is presented in [37], where resilience evaluation methods (in fact, RMs) in power systems are divided into quantitative and qualitative groups (similar to [73]). Quantitative RMs are computed by evaluating the system performance and measuring the deviation (magnitude or duration) from the desirable performance. The quantitative RMs are divided into three groups: simulation-based methods, analytical methods (using the probability of system failure in a specific condition) and statistical analysis of historical outage data. In this reference, a strict definition of qualitative RMs is not presented. It is stated, however, that different aspects and capabilities can be considered in the evaluation of the resilience. The aspects include the power system and other interdependent systems (information system, energy infrastructure, fuel supply chain, business structure, etc.). The capabilities include preparedness, mitigation, response, and recovery (for example, emergency plan, personnel training and repair crew availability). The following remarks can be made regarding this framework:

Remark-XI:
Based on the definition of quantitative metrics, quantitative and qualitative RMs are equivalent to performance-based and non-performance-based RMs in our proposed framework, respectively. The group titles are, however, inaccurate and misleading. The definition of quantitative methods shows that the difference between the two groups is related to system performance, not to the evaluation method. The title "qualitative" is also not accurate, as some methods considered qualitative (for instance, the methods in [47,49]) are not based on the system performance and not "qualitative" but compute-intensive.

Remark-XII:
The four components of the "capabilities" (preparedness, mitigation, response and recovery) are adopted from [49] (see Section 4.1). These components are not independent of the so-called "aspects" (information system, energy infrastructure, fuel supply chain, business structure, etc.). For example, in [49], "communications" are considered in "mitigation measures", "response measures" and "recovery measures". In addition, "information technology and sharing" are considered in "preparedness measures", "mitigation measures", "response measures" and "recovery measures". Finally, dependencies on other infrastructures (information technology, natural gas, communications, water, wastewater, transportation, and critical products) are also considered in "preparedness measures", "mitigation measures", "response measures" and "recovery measures". Thus, "capabilities" and "aspects" are not independent factors.

Remark-XIII:
The simulation-based methods and statistical analysis of historical outage data are neither methods for measuring resilience nor RMs themselves. These methods can be instead used for assuming or predicting system performance. Considering a performance curve (system performance versus time), we can obtain this curve using historical data or Monte Carlo simulation. However, this curve is neither a resilience metric nor a resilience evaluation method.
Nonetheless, based on this curve, we can compute several resilience metrics. In contrast, the analytical methods, as mentioned in this reference, are equivalent to the performance-based probability metrics.

Remark-XIV:
The number of references studied for proposing the framework is not high enough and some important references are not reviewed.
Therefore, compared with the proposed framework, the RM framework presented in [37] has essential shortcomings in titles, groups definitions, and references for resilience metrics classification. These essential shortcomings do not exist in the proposed conceptual framework.
In References [4,71,72], RMs are divided into operational metrics and infrastructure metrics. This classification is based on infrastructure and operational aspects, whereas our classification is based on performance. A performance-based metric can be computed based on the supplied load (operational metric) or the number of online network equipment (infrastructure metric). Even though classification in [4,71,72] is based on a solid idea, our proposed framework presents more inclusive classification because: (1) It has more well-defined groups and details.
(2) The existing metrics are allocated to the framework's groups to justify its inclusiveness.
(3) It presents an in-depth insight into the existing RMs in power system literature. (4) It clarifies which groups need more research.
As stated before (Section 2), some power system resilience metrics are reviewed in [5]. However, no framework is presented regarding the classification of resilience metrics and the existing frameworks are not analyzed. This review is restricted to determining the resilience metric which is used in each reference. In addition, many important references in this area are not reviewed since this reference is not allocated only to the "resilience metrics" area. Thus, this reference cannot be present an in-depth view regarding different types of resilience metrics in power systems.
In Ref. [36], non-performance-based metrics are not considered and using system attributes as RMs is rejected. The RMs in [36] are primarily related to consequences and treated as performance-based consequence metrics. However, the consequence may be a system performance and, under this condition, these metrics can be considered as performance-based performance metrics.
In Ref. [112], power system RMs are divided into "attributes-based" and "performance-based" metrics, although this division is based on [62] that was analyzed before. Then, the existing power system RMs are reviewed in four subsections (metrics based on resilience features, code-based metrics, reliability-based metrics and other resilience metrics), although this review is not done based on the above-mentioned division. In addition, several significant considerations regarding the attributes of current power system RMs are provided.
Finally, it should be noted that power systems are a special type of "critical infrastructures". The critical infrastructure resilience studies had been started long before the power system resilience studies, and there is an extensive and very useful amount of literature regarding the critical infrastructure resilience quantification. Our future work is to study the most important references in this area in a strict manner, which can be used for presenting new resilience metrics for power systems.

Conclusions
In this paper, the concepts of resilience and resilience metrics (RMs) in power systems are studied. First, the concept of HILF events (disasters, extreme events) and our considerations regarding the most widely-used resilience definitions are presented. The concepts of RM and its properties are then analyzed. A conceptual framework is proposed for categorizing RMs in power systems since the existing frameworks regarding RMs have serious shortcomings in classifications. In the proposed framework, power system RMs are divided into non-performance-based and performance-based metrics. Performance-based metrics are divided into performance metrics (including power, duration, frequency, probability and curve metrics) and consequence or outcome metrics (including economic, social, geographic, safety and health metrics). In both performance and consequence metrics, a general metrics group also exists. The framework is discussed in detail and, to validate the framework, the identified power system RMs are allocated to related groups after an extensive literature survey. We show that the proposed framework can cover and classify all types of RMs in the power system literature. In addition, the proposed framework is compared with the existing frameworks, which show that the proposed framework has none of the essential shortcomings of existing frameworks. Consequently, the framework is inclusive and useful for both academic and industrial researchers. Academic researchers can concentrate on the groups that need further investigation, in order to propose new RMs. Industrial researchers can choose the appropriate group according to their needs and select the desirable metric, used for planning/operation of the power systems with a focus on the disaster effect.
Based on this research, the following directions can be made for the academic and industrial researchers: (1) As the resilience of power system is related to critical services of the society, there is a need for research regarding consequence metrics, which show the effect of the power system on the society. (2) Since resilience and reliability are two distinct concepts, using reliability metrics for power system resilience evaluation is not recommended. (3) Since the probability cannot be defined for most of the disasters, using resilience metrics based on disaster probability is not recommended. (4) Since the disaster has a spatial-temporal effect on power system, it is recommended to use resilience metrics that consider the system performance variations with time (e.g., curve metrics), instead of using static metrics (e.g., power metrics). (5) The power system is one of the most important critical infrastructures of the society, and critical infrastructure resilience measurement literature has a very long history. Thus, this literature is full of interesting ideas which can be used by academic researchers to propose new power system resilience metrics.
This paper is one of the first comprehensive analytic studies regarding different types of power system RMs. The subject of RMs in power systems is still in its infancy and further research is needed in this area. The suggested conceptual framework is suitable for the inclusion of incoming and different power system RMs by allocating to the groups of the proposed framework with minimal changes. For future work, the critical infrastructure resilience quantification literature will be analyzed rigorously, in order to propose some new resilience metrics for power systems based on the identified ideas.

Conflicts of Interest:
The authors declare no conflict of interest.