Resilience in an Evolving Electrical Grid

: Fundamental shifts in the structure and generation proﬁle of electrical grids are occurring amidst increased demand for resilience. These two simultaneous trends create the need for new planning and operational practices for modern grids that account for the compounding uncertainties inherent in both resilience assessment and increasing contribution of variable inverter-based renewable energy sources. This work reviews the research work addressing the changing generation proﬁle, state-of-the-art practices to address resilience, and research works at the intersection of these two topics in regards to electrical grids. The contribution of this work is to highlight the ongoing research in power system resilience and integration of variable inverter-based renewable energy sources in electrical grids, and to identify areas of current and further study at this intersection. Areas of research identiﬁed at this intersection include cyber-physical analysis of solar, wind, and distributed energy resources, microgrids, network evolution and observability, substation and self-healing, and probabilistic planning and operation methods.


Introduction
Electrical grids are adapting to become more resilient and renewable. As the pressing need to decrease carbon emissions motivates incentives for and development of renewable energy, so also increases the need for electric grids to be able to withstand and recover from more frequent and severe natural disasters. In addition, adversarial cyber and physical human threats are becoming more sophisticated and devastating. Two major themes, renewable energy integration and resiliency of power systems, are driving great changes in the operation and planning of electrical grids. These two themes are intertwined due to several reasons. First, both are currently being addressed by electric utilities. Second, the need to incorporate uncertainty analysis is necessary for both generator scheduling with renewable generation and for assessing the risk of natural disasters and cybersecurity threats in resiliency analysis. Last, distributed renewable generation has in part driven the increase in the digitalization of the grid that in turn poses cybersecurity risks and monitoring requirements. The digitization of the electrical grid has proven to be a great asset in developing the advanced monitoring and forecasting abilities necessary for high contributions of non-dispatchable and variable inverter-based renewable energy sources (VIBRES); however, the advent of massive amounts of critical infrastructure data and new communication frameworks also poses significant cybersecurity risks.
The research works in renewable energy integration and power system resilience tend to address these opportunities and challenges separately, with few assessing the evolution of electrical grids accounting for resilience in proposed methods and practices. There are significant challenges to addressing both renewable energy integration and power system resilience in research studies. These are multi-temporal and spatial topics ranging from sub-second transient stability analysis to incorporate power electronic switching models to long-term energy planning and forecasting while considering cyber-physical interfaces from new distributed generation and rapid deployment of phasor measurement units (PMUs). Numerous bigger questions arise at the intersection of these topics, such as how do researchers optimize or simulate all of these elements, interfaces, and scenarios to develop real-time, user-friendly tools which can be implemented, validated, and made accessible to industry.
Several topics of research have incorporated resilience assessment and consideration while addressing the changes involved with the evolving grid. Existing research at this overlap has been motivated by the resulting implications of these shifts occurring concurrently. The multiplication of intelligent electronic devices (IEDs) to enable wind, solar, and distributed energy resources (DERs) has created cyber and physical vulnerabilities in the system, and numerous research efforts have been spawned to address this. Microgrids have reported resilience benefits of increased renewable penetration in the face of extreme weather, and for the purpose of intentional islanding during disturbance events. The automation increase has also spurred investigation into self-healing. Inherent system resilience is becoming an important factor in longer term system topology evolution studies, as well as probabilistic planning and operations studies. At the shorter time scale of power system events, machine learning has been leveraged against the growing power system data sets to evaluate resilience more acutely. These topics include cyber-physical analysis of solar, wind, and DERs, microgrids, network evolution and observability, substation automation and self-healing, and probabilistic planning and operation methods. However, in comparison to the research works solely addressing resilience in power systems or the evolving electrical grid, there are far fewer research works at the intersection of these topics. There is a research gap and a clear needed for research to incorporate these two topic together and holistically. This paper performs a comprehensive survey of the research work addressing resilience in power systems and its intersection with the integration of VIBRES to encompass the holistic evolution of electrical grids. The contribution of this work is to highlight the current research status in power system resilience and integration of VIBRES and to identify areas of current and further study at this intersection.
The remainder of this paper is organized as follows: Section 2 discusses the change in the generation profile due to the increased contribution of VIBRES in electrical grids and its implications for reliability and resilience assessments and valuation. In Section 3, the history of power system resilience and current works on resilient methods, technologies, systems, and valuation are reviewed. Section 4 features research at the intersection of resilience and the evolving generation profile. Section 5 concludes with summarized key takeaways.

The Evolving Grid
Since the incorporation of the modern interconnections, operation and planning standards were built around assumptions of dispatchable, high-inertia, and centralized resources. As the power and energy industry continues to provide electricity for modern customers, several trends for grid infrastructure change have emerged. Renewable generation sources have become more common as the cost of establishing them has fallen due to technology breakthroughs, policy changes, and financial incentives. The past two decades have brought rapid growth of renewable, variable, non-rotating mass, and decentralized generation in electrical grids. Not only does the aggregation of more renewable generation affect certainty and forecasting but also different renewable energy plant types all come with distinctions which require unique reliability considerations at their point of connection. For example, the ability to provide frequency support and fault ride-through capabilities through inverter controls, as standardized by the Institute of Electrical and Electronics Engineers (IEEE) standard 1547-2018 [1]. Simultaneously, the increased adoption of more advanced, microprocessor-based devices has increased the amount of data, the communication capability, and the responsiveness of control for the grid. While there are varying definitions for smart grid, it generally encompasses these trends of increased sensing, communication, and control capabilities [2]. The ongoing implementation of the smart grid concept is in part a response to the changes in generation, yet is also observable at the transmission and distribution substation, and consumer levels. The shift in generation profile is characterized here by changes in dispatchability, inertia, and centrality of placement, which are illustrated in Figure 1. The smart grid, or digitalization evolution is characterized by the installation of increasingly complex and networked devices for monitoring, communication and control, exemplified by synchrophasor-based wide-area measurement systems, and smart meter-based advanced metering infrastructure (AMI).

Changing Generation Profile
Both solar and wind energy in the United States have experienced rapid growth, increasing in nominal capacity from 25 GW in 2008 to 125 GW in 2018 [4]. This growth is in large part due to state and federal policies incentivizing renewable energy research and deployment to reduce carbon emissions mitigating climate change, as well as to increase energy independence for national security. Traditional grids' operating and planning needs and practices have been built based upon the certainty and dispatchability of hydro, coal, and nuclear generators. The reliable dispatchability of these sources has greatly influenced these practices. Dispatchers can account for necessary startup or shut down times of these generation sources, determining the ideal generation levels to meet demand in real time. Solar and wind are VIBRES. The intermittency of VIBRES and the lack of control over their startup and shut down times requires re-thinking of practices as the share of energy supplied by these sources increases. The increase in VIBRES has corresponded to an increased need for peaking power plants, which has recently been addressed with an increase in natural gas generation [4]. While these plants can dispatch quickly, high penetration of this resource creates a lack of fuel diversity, making the grid sensitive to fluctuations in natural gas price. Large scale energy storage is widely considered the panacea for renewable generation's intermittency. While installation of battery storage has increased dramatically in the past decade, utility scale storage is still relatively rare and is dominated by pumped hydro storage (PHS). PHS is highly location dependent, with potential sites exhausted in many parts of the world [5]. PHS is also subject to increasing environmental regulation to protect site-specific wildlife and landscapes, as well as potentially being subject to agricultural regulation. While addressing the intermittency of renewables, the presence of large scale storage represents its own transformation and set of challenges to the traditional generation profile.
In addition to dispatchability, traditional rotating mass generation sources provide inertia, a critical grid resource which acts as a stabilizing force during disturbances on the grid. This inertia acts as natural damping control to grid frequency. Solar and wind generators, as well as battery storage, connect to the electrical grid through inverters (with the exception of doubly-fed induction generator-based wind turbines). Inverterbased generation does not supply inertia to the electrical grid by means of a rotating mass. Increasing the penetration of these generation sources potentially destabilizes the dominant frequency far from the fundamental system frequency [6].
Choice of placement is another characteristic of generation sources that informs traditional grids' structure, particularly in the case of nuclear and coal generation. These generation plants have their fuel resources transported to the generation site, allowing freedom in where they can be located. Therefore, these generators can be placed near population centers to reduce transmission losses between generation and consumption. The capacity of transmission networks and the design of distribution networks have been built based on the assumption of this centrally located generation. Solar and wind are typically decentralized, located where the resource is available, often placing large scale generation in remote locations distant from population centers, or in the case of rooftop solar, placing household scale generation at the very edge of the distribution network. This shift towards more decentralized generation calls into question some assumptions distribution and transmission networks have been built on [7], requiring more flexibility of system operation. This flexibility is facilitated by advancement in smart grid enabling technologies [8].
There are numerous research and practical challenges and opportunities due to the shift previously mentioned of dispatchability, inertia, and centrality of placement. Holttinen et al. [9] provides a detailed review of the impacts to planning, operation, and system stability of this changing generation profile as VIBRES contributions near 100%. The challenges and knowledge gaps are broken into three categories: (1) planning for adequacy, (2) operations: flexibility and balancing, and (3) stability, performance, and technology. Topics include controls, protection, black start, tools, markets, forecasting, and capacity expansion models. This is an active area of research with significant progress made and there are gaps to address.

Digitalization
The shift to serving loads with decentralized and more distributed generation has driven the need to rely more on automation rather than central planning. In addition to the shift from hierarchical power plant delivery to consumers, a broad departure from one-way communication, limited monitoring, and passive control is taking place. The traditional supervisory control and data acquisition (SCADA)-based control and monitoring of power systems is reliant on sensors, monitored by intelligent electronic devices (IEDs). To support the changing generation profile, Wide Area Monitoring Systems (WAMS) are being deployed, wherein more advanced PMUs are aggregated by phasor data concentrators (PDC) and Super PDCs. These changes enable two-way communication, more data acqui-sition for monitoring, more pervasive control, and decentralized decision making. This supports the ultimate goal of intelligent automation at the substation level. Meanwhile, installation of smart meters at customer sites provides localized usage information, and makes possible key demand-side management applications. The increasing capability and availability of digital devices at every level of the existing grid illustrates a trend towards a next generation grid with pervasive communication, distributed generation, bidirectional power flow, detailed monitoring and automated control [10].
IEC 61850 defines the communication protocols for IEDs and substation automation [11]. Conventional grid monitoring and control is widely implemented using SCADA systems [2]. At substations, sensors monitor circuit breaker (CB) status, as well as current and voltage measurements from instrument transformers. These measurements are collected at remote terminal units (RTUs) and aggregated at a centralized SCADA master station. IEDs at the substation are capable of performing control functions for protective relaying or other devices. At the SCADA master station, field device data is collected and stored, and a workstation with a human machine interface (HMI) may allow operator interaction with the system. Use of PMUs for measurement and protection in power systems is described in IEEE standard C37.118. While IEDs are microprocessor-based and can improve monitoring, control and data recording, the sampling rate of SCADA systems is one sample every 2-4 s, and may only provide RMS values. PMUs can sample a few orders of magnitude faster, up to 60 samples/s. PMUs also sample with more accuracy, employing a GPS synchronization protocol, and collecting more detailed current and voltage phasor information. While the cost is still a barrier to widespread deployment of PMUs in distribution systems, they have more implementation in the transmission system, providing more awareness of frequency and voltage stability, as well as fault detection and location [12]. The aggregation of PMU data by PDCs to create WAMS is expected to provide visibility and control over wider geographic areas of the power system compared to a solely SCADA system [13]. This data can be leveraged to meet operational objectives using machine learning techniques. Increasing penetration of digital devices supports the goal of substation autonomy for the transmission and distribution system [10]. The growth of PMUs in the United States is seen in Figure 2. At the edge of distribution systems, this trend of increasingly sophisticated and networked devices is evidenced by the rapid deployment of smart meters and associated AMI. Smart meters, that is, networked solid state electronic meters, collect site-specific customer data, and also potentially enable 2-way communication between utilities and individual customers. Smart meters are the core component of broader AMI, which also includes the communication networks supporting the transmission of data, and means for data management and aggregation [16]. AMI has been deployed rapidly in areas across the US over the last decade [17]. This fast expansion is attributed to incentives provided in the 2008 American Reinvestment and Recovery Act (ARRA) [7]. Prospective applications of AMI are allowing dynamic pricing, providing higher frequency consumption data, supporting distributed energy resources (DERs), tracking outages, demand response, and power quality monitoring [16]. In particular, AMI data is potentially provided as often as every 15 min, facilitating load profiling, monitoring and forecasting. These applications can address reliability concerns created by changes to the generation profile [17]. AMI is also identified as a key enabling technology for demand side management (DSM) solutions, such as demand response, which is the changing of customer behavior in response to electricity price, or reliability concerns [8]. This can achieved as more residential customers adopt appliances which communicate to utilities via the Internet of Things (IoT).
While climate and national security goals have been pursued with consideration to reliability, the question now is, can the integration of VIBRES and energy storage also improve electrical grid resilience in response to increasing natural disasters and human threats? Existing research has begun to address stable and reliable integration of VIBRES. Studies have investigated updating unit commitment and models to address the risks associated with VIBRES which are not captured in existing market behavior models for non-VIBRES markets [18]. Numerous methods for modeling systems with high penetration of VIBRES have been explored, and they are varied in their level of operational detail [19].
Increasing integration of VIBRES has challenged assumptions of dispatchability, inertia, and centralized generation in the power system. The imperative to install energy storage to support VIBRES further challenges existing practices. Rapid digitalization driven by this change in generation profile is exemplified by more advanced and networked AMI and PMUs being deployed to create WAMS. The reliability effects of these changes in generation profile and digitalization are well explored in literature surveys; however, knowledge of the resilience impacts has not caught up with the pace of change. Limited studies investigate how these changes impact resilience. Section 3 dives into the existing works of power system resilience to provide background and context for Section 4 discussing the existing research at this intersection of resilience and grid evolution, as well as areas requiring further study.

Defining Power System Resilience
Due to the increased impacts of extreme weather events and human-made threats, as well as the dependence of other critical infrastructure sectors on reliable power, the resilience of the electric grid has become an important topic of discussion. Defining resilience has sparked a complex ongoing debate as it involves many different facets: the relationship between resilience and reliability [20], event-specific versus agnostic, and qualitative versus quantitative indices [21]. The International Council on Large Electric Systems (CIGRE) defines resilience as "the ability to limit the extent, severity, and duration of system degradation following an extreme event" [22]. The IEEE Task Force on Definition and Quantification of Resilience defines resilience as "the ability to withstand and reduce the magnitude and/or duration of disruptive events, which includes the capability to anticipate, absorb, adapt to, and/or rapidly recover from such an event" [23]. The U.S. President's National Infrastructure Advisory Council (NIAC) defines resilience as "the ability to reduce the magnitude and/or duration of disruptive events; the effectiveness of a resilient infrastructure or enterprise depends upon its ability to anticipate, absorb, adapt to, and/or rapidly recover from a potentially disruptive event" [24], which also served as the foundation definition for the Federal Energy Regulatory Commission (FERC) definition on resilience [25]. Various other definitions exist, e.g., Reference [26][27][28][29][30][31]; however, all those incorporate three main characteristics: (1) preventing and (2) mitigating potential harm from an adverse event in addition to (3) quickly recovering from any inflicted damage. These characteristics refer to the duration of time before, during, and after an event.
Highlighting the differences between reliability and resilience provides context for an understanding on why there is a growing demand to make modern grids more resilient. Common differentiators include: probability and impact of disturbance, static versus adaptive and ongoing, evaluating power system states versus also including transition times between states, and concern with customer interruption time versus also including infrastructure recovery time [32].
The nature of the disturbing event is an especially important differentiator [33]. Reliability is evaluated by assuming and applying low impact high frequency (LIHF) events. In these cases, deterministic methods (e.g., N-1 contingency evaluations) can provide reasonably accurate results to quantify the preparedness of the system. Resilience in contrast is centered around high impact low frequency (HILF) events, and in these cases, probabilistic methods are required because the probability of the event must be taken into account. A rationale for this differentiation is that rare events do not have enough data available for accurate assessment by reliability metrics, such as mean time between failure and mean time to recovery. Nevertheless, no clear criteria that constitutes a rare versus frequent event or even reliability versus resilience exists; the line between reliability and resilience becomes blurred as the frequency of once previously considered rare events, like a 100-year-flood, increases.
HILF events, including major natural disasters (e.g., earthquakes, tsunamis, hurricanes, pandemics, geomagnetic disturbances) and acts of human volition (e.g., coordinated cyber-, physical-, and blended-attacks, high-altitude detonation of nuclear weapons) transcend other risks to the electrical sector due to their magnitude of impact and the relatively limited operational experience in addressing them [34]. Historically they have rarely occurred, but have the potential to cause catastrophic consequences: unpredictable system-wide disruptions and severe long-term damage in generation, transmission and distribution networks; endangering the continuous and reliable operation of the entire electrical grid [35,36]. Climate change has made extreme weather events more common, leading to longer durations of power outages in the United States between 2002 and 2012 [37]. Research has revealed previously unknown environmental threats, such as major seismic events (e.g., Cascadia Subduction Zone event [38]) and the dangers of solar weather [39], which result in complex HILF disasters affecting millions. Additionally, in spite of infrastructure hardening and protection efforts, human threats from cyber attacks are also emerging and have shown a source vulnerability in the grid [14,40,41], e.g., the 2017 cyber attacks on Ukraine's electrical grid caused power outages for approximately 225,000 customers [42].  With the increasing frequency and consequences of severe events that are hard to characterize using existing reliability metric methodologies, the need for greater resilience in electrical grids comes to the forefront. Public outcry resulting from extended outages, such as after the 2019 California wildfires [44], confirms the increasing value people place on resiliency. At the same time, the Pacific Gas & Electric company has acknowledged and claimed liable of grid failures involving inadequate tree trimming that caused wildfires in 2017 and 2018 that resulted in the loss of life, highlighting not only the value but also the importance of resilience and preparedness [45].
All damaging events fit on a sliding scale of severity and frequency, and accounting for the variability and uncertainty in both severity and frequency is the driver of the need to create new planning and operational methods to account for the changing landscape of these events. With these nuances explained, and for simplicity moving forward, resilience type events are referred to as HILF events in the rest of this paper.

Modern Resilience Valuation Methods
Bhusal et al. [43] suggest that resilience valuation methods can be classified into two categories: operation-based and planning-based. Operation-based methods are control and optimization strategies that use available assets to implement protection schemes against failures and keep the system operational during and after a disturbance. Planning-based methods, on the other hand, are improvement strategies that focus on targeted electrical grid expansion and hardening to withstand predicted disturbances. Regardless of category, all methods must first determine the location and severity of the damage to understand the scale and consequences of the HILF event. Modeling the damage mechanism and modeling the response of the power system should be decoupled [46]. Figure 4 presents the disturbance and impact resilience (DIRE) curve for non-resilient and resilient power systems. It describes the performance of the systems before, during, and after an extreme event and also displays the "5 Rs" of resilience as defined by [47]: recon, resist, respond, recover, and restore. Planning-based methods focus on two of the "5 Rs": recon and restore. The initial planning for hazards is carried out in the recon phase: the system is strategically positioned so as to be able to absorb the majority of the damage caused by any resilience level event. Planning-based evaluation is also crucial in the restore phase, which is highly dependent on connected infrastructure, system spares, and the lead time on replacing critical assets. Operation-based methods, on the other hand, focus on the other three of the "5 Rs": resist, respond, and recover. Strategically optimizing operation can significantly increase the system's capability to resist and lead to reduced degradation in performance (as seen in Figure 4). Optimized operation also lead to quicker response to cascading failures and rapid recovery of performance after such disturbance occurred. Multiple system assets can provide a variety of qualities that may contribute towards overall system resilience, e.g., Reference [48] discusses hydropower assets and their contribution towards resilience at an asset and system level.

Operation-Based Methods
Damaged assets can be modeled by physics-based methods or estimated by statistical methods. Examples of physics-based methods are implemented in the General Fragility Model [49], which uses calculations (e.g., bending moment of wood poles during sustained gusts) to predict the likelihood of collapse. In the study carried out by Chalishazar et al. [46] the likelihood of each substation asset-failures (e.g., circuit breakers, transformers) is determined by using resonance frequency calculations based on peak ground acceleration values. Examples of statistical modeling are described in Reference [50,51], where predictor variables (e.g., soil moisture, population density, maximum sustained gust speeds) are used to predict the number of outages or collapsed poles within cells of a spatial grid. These damage prediction methods serve as input to contingency analysis tools that allow for projection of the impact of disturbances on the operation of the grid and its ability to supply load.
Given a set of inoperable assets, methods exist, such as the demand not served (DNS) problem formulation [52,53] that uses Monte Carlo simulations to carve out the distribution for not served demand and the minimum-load-shed (MLS) problem formulation [54] that can predict the loss of power in an electrical grid. Both DNS and MLS are steady-state formulations that describe the state of a transmission system after the event has passed but before repair has been performed. The MLS formulation has some advantageous mathematical properties: it can be relaxed into a convex problem with guaranteed optimality, meaning that it provides a lower bound on the total amount of load lost. However, it does not provide insight into the peak amount of load lost.
Modeling the peak amount of load lost is challenging as that at minimum requires the modeling of protection equipment or relaying at some level-either by using a nodebreaker version of the system case-file or by augmenting the traditional bus-branch version of case-file to model protection equipment [55,56]-and potentially the dynamic behavior of both load and generation [57,58]. These challenges also arise in cascading failure modeling and are described in relevant literature [57,[59][60][61][62][63][64][65]. In the case of cascading failure modeling, the key difficulty is the large number of potential interactions between assets that propagate tripping of protective equipment. Modeling methods range from dc power flow-based methods with simplified protective relaying modeling to phasor-based dynamic models or full transient models [64,66].
Both MLS and cascading failure modeling can give insight into what electrical properties of a grid improve resilience. The work in Reference [67,68] investigates which properties of transmission networks impact the mean and variance of load shed given a collection of damaged assets, where properties include network properties (e.g., graph associativity, clustering coefficient, edge density) or electrical properties (e.g., ratio of generation to load).
Restoration time can be estimated by both optimization-based and statistical approaches. Two relevant optimization-based problem formulations are the Minimum Restoration Set Problem (MRSP) and the Restoration Ordering Problem (ROP), described in Reference [69]. MRSP is a steady-state formulation (similar to MLS) that determines the minimum number of assets which need to be repaired to be able to restore all loads in the system; this number places a lower bound on the total restoration time. However, MRSP does not give indication of the rate of restoration during this time period, nor does it account for loads with higher priorities. ROP overcomes these shortcomings: it divides the problem into timesteps and places a repair budget on each time period. To shorten the total restoration time, Reference [70,71] proposes a generation prioritization method that leverages available renewable generation.
Coordinated physical-and blended-attacks are special cases of HILF events: these targeted disturbances are executed by a well-informed attacker who disables and damages key assets. The concept of identifying sets of assets in which failure lead to significant impacts, however, is also applicable to increase system-wide resilience. In the literature, this is referred to as the N − k failure-identification problem [72]. The N − k problem focuses on identifying a set of k critical assets of the transmission network in which simultaneous or near-simultaneous failure would maximize the disruption, measured in terms of the amount of load shedding in the system. Ref. [73] considers a probabilistic generalization of the problem: the probability of failure of each asset is known a priori and the problem is formulated as a two-level optimization problem. The outer objective function is to maximize the amount of load lost by selecting a set of at most k assets to disable. The inner objective function is to minimize the amount of load lost by adjusting generation setpoints appropriately. Heuristic methods exist, such as the random chemistry method [74] or linearized approximations of the change in line flow from disabling assets [75], to find the optimal solution.
Geomagnetic disturbances (GMDs) and E3 high-altitude electromagnetic pulse (HEMP) events are other special cases of HILF events. These hazards do not necessarily disable system assets, but have the potential to significantly impact the operation of the electrical grid. GMDs are caused by severe space weather: charged and magnetized particles are blown away from the Sun, which then interact with and disrupt the Earth's magnetic field. These events are mainly driven by large solar flares and associated coronal mass ejections during solar maximums and by co-rotating interaction regions (high-speed solar winds) during solar minimums [36,76]. HEMP events are caused by nuclear explosions detonated high up in the atmosphere. These are series of electromagnetic waveforms that propagate to the Earth's surface; three main waveforms are generated during a detonation, among which the E3 late-time waveform produces electric fields with comparable time scales and area coverage as those of geomagnetic storm. However, these events are likely to have a higher peak field level resulting in greater impact and more severe damage than naturally-occurring solar flares [36,77].
Both GMD and E3 HEMP events pose a risk by generating low-frequency geomagnetically induced currents (GIC) that appear in the conductive infrastructure and flow into the high-voltage network through the neutrals of transformers [77,78]. GICs may adversely impact transmission systems and equipment as they have the potential to induce harmonics by causing half-cycle saturation in transformers. Harmonics may lead to the misoperation of protection devices causing tripping of over-current relays. Premature aging, lasting damage or complete failure of large high-voltage transformers due to overheating and thermal degradation is also a great threat. The increased reactive power consumption caused by the circulating GICs in the network may lead to the loss of reactive power support and to voltage collapses. In the worst case, widespread infrastructure damage and tripping of transmission lines may lead to cascading failures and extended power disruptions [79][80][81][82].
Simulation studies indicate that the majority of transformers do not experience high neutral currents and are not at risk for half-cycle saturation; the transformers at risk tend to be those towards the edge of a system, such as an organizational boundary, or those near natural geographic boundaries [83]. Currently, the main approach to mitigate the effects of GICs is through optimization-based methods that focus on system topology control and aim to improve system resilience by corrective (optimal) transmission line switching [84][85][86][87][88][89]. Another approach would be to limit transformer heating caused by GICs; however, research in this area is limited [90][91][92].

Planning-Based Methods
On the one hand, optimization-based problem formulations, similar to those formed for evaluating resilience, can be formed to identify resilience-improving strategies. This area of study is valuable to cost-effectively increase resilience with novel technologies (e.g., networked microgrids, intelligent fault detection and isolation devices), proven techniques (e.g., vegetation management, converting overhead circuits to underground), and the combinations of the two. On the other hand, forward-looking planning (e.g., optimal topology design and construction) and targeted development of infrastructure (e.g., transmission and distribution network hardening) directly increase resilience against major disturbances.
Considering that the majority of outages occur at distribution level, there is ample opportunity for improving the resilience of distribution networks. As an example, optimal deployment of intelligent fault detection and isolation devices enables the grid to promptly respond to and recover from contingencies in a self-healing manner [93]. Network planning introduces a number of difficulties that are not present at the transmission level, such as the need to model unbalanced line configurations (that increases the number of variables in the problem by an order of magnitude) [94][95][96] and the requirement for radial operation [97,98]. Existing methods for network hardening rely on linearized approximations of the unbalanced power flow formulations [96]. Although Reference [95] demonstrated that convex relaxations for the unbalanced optimal power flow problem exist, applying these relaxations to practical systems with a variety of transformer winding and load configurations is still an active area of research [99].
To harden distribution networks against severe weather, a scenario-based approach has shown promise [100]. This approach constructs a set of damage-scenarios in which each scenario consists of a set of disabled components. It starts hardening the system by selecting from a portfolio of hardening options, then tests for feasibility on the remaining scenarios. At each iteration, a new scenario is added to the design problem until the hardened system is feasible across all scenarios. This approach demonstrates the value in hardening critical sections of the distribution network, typically three-phase trunks, particularly those that interconnect circuits supplied by different substations.
In case of coordinated physical-and blended-attacks, a notable resilience improving method is the defender-attacker-defender formulation [101,102]. It is formed as a tri-level problem, similar to that of the N − k problem described in the previous section with the exception that another problem level is added in which the defender selects a set of components to harden in order to minimize load lost.
In case of GMDs and E3 HEMP events, a widely used mitigation method is reducing transformer reactive power consumption with GIC blocking devices [103][104][105]. These devices are switched capacitors that are placed between the transformer neutral point and the substation grounding grid. As the installation of such devices is expensive, it is not feasible to place them throughout the entire system; however, their selective and strategical placement can greatly improve resilience.
Existing placement-optimization algorithms [106][107][108] exploit the property that only a small number of transformers are susceptible to GICs; this depends on transformer core type, location within the transmission network, and the geometry of the network. Studies for the placement of these devices observe some key characteristics with their effect. First, the impact of blocking devices is largely local, that is, the change in GIC from adding a blocker is large for assets that are geographically near the blocker but diminishes rapidly as distance increases. Second, installing a blocker at one transformer can increase the GIC in nearby transformers in behavior that is conceptually similar to Braess' phenomenon [109]. This requires blockers to certainly be installed for all active transformers in a substation and possibly nearby substations [108].
Planning-and operations-based resilience valuation methods are used to prioritize power system resilience in the grid's conceptualization and design. Operation-based methods employ on physics-based methods (e.g., General Fragility Model) and statistical methods to estimate certain parameters (e.g., minimum-load-shed and demand not served), simulate consequences, and suggest optimal mitigation strategies for HILF events. Planning-based methods employ scenario-based approaches (e.g., construct a set of damage-scenario or form tri-level defender-attacker-defender formulation) to select system components for hardening to optimally improve resilience. The prioritization of resilience, in the planning or operations arena, requires the development of metrics for measuring improvements to the state of resilience.

Modern Resilience Valuation Metrics
Resilience valuation metrics are relatively new to power systems engineering compared to other scientific fields [21]. Kwasinski [110] provides an extensive review of resilience metrics, proposes a resilience metric framework, and notes a number of metrics for power systems from a utility-centric point of view. The proposed framework bases its metrics on the U.S. Presidential Policy Directive 21 [111], which defines resilience on four major components: withstanding capability, recovery speed, preparation/planning capacity, and adaptation capability. These metrics quantify the variables necessary to describe the four components; however, a metric to describe the overall resilience of the system is missing.
Further review of metrics is performed in Reference [112], which makes an important differentiation between reliability and resilience. This work notes that reliability metrics are inadequate to quantify resilience due to their inability to address topological flexibility and identify critical infrastructure, cooperation with customers, and potential preventative measure evaluation. The valuable contributions of Reference [113] provide insight into and classification of resilience in power systems and highlighting the need for cost-benefit studies of proposed resiliency improvements (e.g., those carried out in Reference [46]). Bhusal et al. [43] identify key attributes of power system resilience and provide a survey of resilience metrics (also known as performance-based metrics) based on the "5 Rs" of the DIRE curve seen in Figure 4. Performance-based metrics, based on a chosen performance indicator, estimate (1) how the system would perform during a HILF event, (2) how severe the impacts of the event would be, and (3) how quickly the system would recover and be restored to the pre-event state. Examples of such metrics are time and cost of recovery [46], number of customers offline [114], and DNS [52].

SIMPLE VS. MORE COMPLEX
Simpler metrics require less data that are usually easy to obtain. The process for integrating the data into metrics is fairly straightforward (e.g., simple arithmetic).
More complex metrics may require larger amounts of data that may be challenging to obtain. The process for integrating the data may require technical expertise (e.g., numerical modeling).

RETROSPECTIVE VS. FORWARD-LOOKING
Retrospective metrics typically measure the resilience of the system to previous events. They may be used to determine if previous performance was (un)satisfactory.
Forward-looking metrics typically measure the resilience of the system to future or hypothesized disruptions. They are commonly used to inform planning and investment activities.

TARGETED VS. BROADLY INFORMATIVE
Targeted metrics may provide limited information on a single or limited number of analysis topics (e.g., single threat).
Broadly informative metrics may be able to provide information that is useful across a variety of topics (e.g., investment, planning, operational response).

LESS CONSISTENT VS. MORE CONSISTENT
Repeated application of metrics with little consistency can be a challenge. If the metric results tend to change from analyst to analyst or do not enable comparative analysis, stakeholders may lose confidence in the metrics.
Consistent metrics enable reproducibility and comparison. Consistency builds confidence and leads to widespread usage of the metrics.
Recommendations for Resilience Valuation Metrics: • Address and capture impacts from only HILF events [43]. • Use performance-based metrics instead of attribute-based metrics [43]. • Include inherent uncertainties in both event characteristics (e.g., disruptive conditions, response time) and system characteristics (e.g, load and generation availability) [43,128]. • Consider or include connected critical infrastructure (e.g., natural gas and water networks) [127]. • Include both global and component-specific resilience [43,126]. • Allow for both consistent retrospective and prospective analysis [43]. • Capture spatiotemporal correlations and topological flexibility [43,112]. • Use realistic and non-flat lost load cost structures: the price of lost load during disturbances fluctuates and can compound for long duration events, so a flat price scheme is unrealistic [43].
Synergies between these trade-offs and recommendations may further refine the most impactful boundaries for future development of resilience metrics.

Resilience in the Evolving Grid
The fundamental shifts of the evolving grid are characterized by changes in generation dispatchability, inertia, placement, pervasive system digitalization, and increased need of resilience. Traditionally, these shifts have been studied and incorporated in industry practices in relative isolation due the nature of research to become specialized and operate in silos, the different time-frames that each of these shifts came to prominence and technology maturation, and exponential increase in complexity found when working across the necessary timescales and interfaces. Yet, some existing research has addressed the intersections of these shifts, which are grouped in this paper into the themes of cyber-physical analysis of solar, wind, and DERs, microgrids, substation automation and self-healing, network evolution, and probabilistic planning and operation methods. This section will outline the research and implementations of these themes.

Cyber-Physical Analysis of Solar, Wind, and DERs
There is emerging recognition and study of the cyber-physical vulnerabilities to electrical grids associated with solar, wind, and DERs. This is due to the increase in IEDs to facilitate wind and solar DER penetration that has expanded the cyber-attack surface and a lack of standards for security requirements.
As stated in Reference [129], the United State Department of Energy has been investing in cybersecurity for energy delivery systems (CEDS), such as solar energy in areas including secure communications, intrusion detection and response, resilient design, etc. This funding has resulted in numerous developments in communications -power systems co-simulation and big-data analytical platforms at the national laboratories. Best practices for wind cyber security have been created in Reference [130] covering topics, such as cyber hygiene and supply chain security. Both of these roadmaps focusing on solar and wind cybersecurity, respectively, note the need for stakeholder engagement and standards development.
The communication vulnerabilities associated with DERs has been outlined in Reference [131] identifying physical, data link, network, and transport layers of vulnerabilities, potential attacks, and existing solutions. These existing solutions include basic security controls, such as role-based access controls and intrusion detection, and advanced security controls, such as transport layer security and session renegotiation. Particularly of interest, controls enabling ancillary services of DER, such as volt-var, watt-var, volt-watt, and frequency droop, have been mapped to potential physical impacts due to cyberattacks [132]. This work highlights the vulnerabilities within specific IEEE communication protocols commonly used in DER communications and pinpoints opportunities for cyber hardening.
From the more physical perspective, a study has shown that communications compromised batteries systems could be caused to explode or cause fires [133]. The ramifications shown in these works demonstrate the clear physical and cyber vulnerabilities and the relationship between the two interfaces. All of these works note the need for research, development, and standards in cyber-physical and cyber-security analysis of solar, wind, and DERs in order to securely increase the penetration of these energy resources.

Microgrids
Operating isolated or islanded microgrids have existed on (geographical) islands and in the Arctic over the past century [134]. From Greece to Australia to Alaska, these microgrids have been transitioning their generation profiles and withstanding extreme weather and events for decades, and several have transitioned to or near 100% of their average annual generation coming from renewable sources (note this does include hydro generation) [135,136]. There are numerous lessons to be learned from these communities. Such lessons include successful implementation of secondary load control for wind generation smoothing through electric boilers, which provides essential heating services in several Alaskan communities (e.g., Unalakleet and Kotzebue, operated by the Alaska Village Electric Co-op) [137,138]. The installment of renewable energy, such as solar and wind in the remote Alaskan microgrids and in the island microgrids of Greece, has been driven by high electricity costs due to high cost of fuel and transportation of that oil and diesel [139,140]. Additional benefits, such as increased grid resilience, have been gained from the installment of advanced wind, solar, and storage configurations in these microgrids [141]. The direct resilience benefit of solar generation on the island of Puerto Rico was demonstrated by continuity of electricity at a solar and battery powered dwelling after Hurricane Maria in 2017 [142].
There has been growing interest in creating interconnected microgrids that are able to island and retain operation during disturbances. Methods are being researched and developed to seamlessly island towns as their own microgrids to operate autonomously [143,144]. This has been demonstrated in the field in Borrego Springs, California, USA [145] after wind storms in 2013, and in Roppongi Hills in central Tokyo and Tohuku Fukushi University campus, Japan [146], after the 2011 great east Japan earthquake and tsunami. These implementations support a flexible grid and have raised interest around optimal and dynamic network configurations.
The proper protection of microgrids against all types of faults and disturbances remains a vitally important task. McDermott et al. [147] recently evaluated issues with protecting microgrids with inverter-based generation and highlighted a number of underlying difficulties: the lack of fault current from inverter-interfaced generation [148], the varying fault current between grid-connected and islanded modes [148], and the potential for normally-meshed operation [149] and unbalanced operation due to single-phase loads [149]. A handful of solutions have been proposed for these challenges over the years. Refs. [149,150] investigated admittance protection for load protection, while Reference [151] investigated admittance relaying as a solution for the protection of microgrids. Ref. [152] investigated differential protection, based on the discrete S-transform, for line protection. Ref. [153] investigated dynamic state estimation for the protection of radial portions of microgrids. A suitable tool for performing short-circuit studies in microgrids and accurately modeling the contribution from distributed generation, particularly from inverter-based generation, is currently unavailable; existing software solutions are unsuitable to handle the inherent computational costs and data requirements of microgrid simulation. Due to the current-limiting behavior of inverters, modeling inverter-based generation in a conventional phasor-based short-circuit analysis presents difficulties and requires the addition of outer iteration-loops to the short-circuit solver [154,155]; adequate solutions have yet to be developed.

Network Evolution and Observability
The evolution of the network connectivity of the bulk power system is traditionally considered in long term power system planning studies. The process of commissioning (or decommissioning) power system elements, like a new transmission line project or new generation projects, normally undergoes several layers of permitting by the utilities, regulators, and other stakeholders, spanning several years. For relatively smaller projects, like a substation capacity upgrade, a utility, possibly aided by external engineering consulting companies, develops a plan that follows more closely immediate technical needs, as well as other forecasted elements, like local population changes and natural hazard risks. At this level of planning timescales, power system analysis can benefit from additional input from complex system studies that, for example, ascertain the inherent resilience of networks according to their topology [156], improve the observability of sensors connected throughout the network [157,158], or allow for the detection and correction of cyber attacks [159].
Given recent forensic evidence on very large cascading outage events where the influence of distribution network disturbances propagating upwards in voltage and disproportionately impacting the transmission system [160], it becomes increasingly important to improve the observability of equivalent models that aggregate the lower voltage network in order to perform cascading outage or other resilience studies. This can be done on the analysis side with network evolution strategies for example automated discovery of topologies during degraded states with limited connectivity [161]. On the physical and control side, this could be done with substation automation improvements, and smarter connectivity devices, like networked reclosers and sectionalizers. Increased observably has enabled numerous developments in islanding detection techniques which assist with the stability of electrical grids with high contributions of DERs [162].

Substation Automation and Self-Healing
Self-healing substations are typically characterized by their ability to automatically identify faults, isolate them, and restore power to unaffected areas. This significantly reduces the time it would normally take to restore power to healthy areas from hours to minutes [163]. These techniques often leverage fault detection devices and IEDs populated throughout the power grid to control switch or breaker operations [164].
In Reference [163], state estimation is used to inform a centralized genetic algorithm controller which searches for the simplest reconfiguration scheme to isolate the fault and energize as many loads as possible. The restoration times from a simulation of a substation in Brazil using this self healing control algorithm was compared to the actual restoration times for real power outages that occurred. On average, a reduction in restoration time of 14.6 % was achieved. Drayer et al. successfully leveraged graph theory and Dijkstra's shortest path algorithm to find the optimal reconfiguration for faults introduced on several different power grid test cases [165]. This approach was relatively computationally inexpensive despite having the best solutions double checked with a full power flow solver. The complications DERs introduce to system protection coordination in self-healing power grids was addressed in Reference [166] using a multi-agent approach. This multi-agent approach allowed for peer-to-peer communication between devices to ensure protection coordination settings between them are properly adjusted under a new configuration.

Probabilistic Planning and Operation Methods
A common theme in many of the modeling challenges described in previous sections is the reconciliation of the timescales at which one analyzes the system vulnerabilities, disturbances, the mechanisms of protection, and the mechanisms of restoration. These factor differently into even the most fundamental definitions of risk, impact multiplied by frequency [167], and in turn we obtain heterogeneous ways to define and measure resilience.
In terms of the power systems horizon, there is a need to integrate the factors mentioned above for both planning models and operations models. This is often primarily driven by reliability standards, for example in the U.S., the North American Electrical Reliability Corporation (NERC) discusses cascading outages as part of the TPL-001 (planning performance standards); however, this is not clearly linked to some operation standards that define basic agreement aspects, like the N − 1 contingency analysis.
One way to ameliorate these challenges and advance the analysis of the system that is undergoing fundamental changes is to harness the increased sensing and data that is available to power system researchers. Recent works have used large datasets and statistical or probabilistic approaches for the analysis of power system events in terms of quantifying risk [168]. With enough data available, one can also use these datasets as sources of features and train modern machine learning approaches for predicting and quantifying risk [169,170]. Machine learning and artificial intelligence approaches also can provide timely recommendations to the operator in charge of remedial actions [171,172].
Naturally with better access to data, both archival and real-time samples, there is better alignment with data driven or risk-based methodologies. This allows a more extensive analysis across failure dimensions, exploring faults that happen at different timescales, and possibly aggregating these failures depending on their (spatial or temporal) proximity. This granularity helps with classifying and labeling events as independent failures or dependent failures, where the latter could be common mode or cascading outages. [173,174].
In summary, a key path forward relies in the evolution in terms of data available for probabilistic planning and operation methods. Clustering and model reduction techniques will continue to allow the application of these advanced analyses with computationally tractable grid equivalents. At the same time, the breadth of objectives on planning, operational, and environmental procedures will be thoughtfully applied to a unique system with the use of multi-scale, multi-objective methodologies [175][176][177].

Areas of Opportunity
The reviewed works show the progression of research on incorporating the change in generation profile, digitization, and resilience into power system analysis and methods and highlights some of the key intersections of these trajectories. However, all of the shifts described in the evolving grid are happening concurrently. Holistic methods that incorporate all of these changes in an efficient and validated way remain an open area of research.
Listed below are some key areas of research that will help to address these changes in an interconnected and holistic manner: • Standardized and computationally efficient probabilistic methods incorporating uncertainty from disturbances, and resource forecasts and availability into operations and planning methods.

•
Power system simulation platforms to capture cyber-physical dependencies, particularly with utility control of DERs. • Assessment of the valuation and economics of resilience, including equity in design, expansion, and reliability and resilience improvement projects. • Processes for rapid development and validation of models of new and older technology that spans spacial and temporal scales, such as DERs and demand response.
The summary of these intersecting themes is illustrated in Figure 5. The evolving electrical grid goals and themes [14]. Source: Idaho National Laboratory

Conclusions
This work reviews the fundamental shifts occurring in electrical grids, the state-of-theart practices to address resilience, and the intersection of these two themes. The reviewed research works are categorized in this paper by:

1.
Works solely addressing the evolving grid, including the increase of VIBRES in the generation profile, fundamentally altering the expectations of dispatchable, inertial, and centralized generation. The group of works also included the increase in digitalization of electrical grids, such as the installation of more networked, smart devices, driving an increase in the monitoring, communication, and control capabilities of the system.

2.
Works solely addressing the resilience of power systems. These works included topics covering resilience frameworks, planning and operational method to improve resilience, and resilience metrics.

3.
Works that consider both resilience of power systems and the shifts occurring with the evolving grid. These research works are grouped into the topics: cyber-physical analysis of solar, wind, and DERs, microgrids, network evolution and observability, substation automation and self-healing, and probabilistic planning and operation methods.
The aim of this paper was to highlight the current status of research and identify areas of current and needed research at this intersection. The key takeaway of this paper is to address the need for efficient, probabilistic, validated, and holistic research and studies to address all of the concurrently occurring shifts and changes of the electrical grid to ensure sustainable, affordable, reliable, and resilient electrical grids in the future.