Probabilistic Resilience Analysis of the Icelandic Power System under Extreme Weather

This paper presents a probabilistic methodology for assessing power system resilience, motivated by the extreme weather storm experienced in Iceland in December 2019. The methodology is built on the basis of models and data available to the Icelandic transmission system operator in anticipation of the said storm. We study resilience in terms of the ability of the system to contain further service disruption, while potentially operating with reduced component availability due to the storm impact. To do so, we develop a Monte Carlo assessment framework combining weather-dependent component failure probabilities, enumerated through historical failure rate data and forecasted wind-speed data, with a bi-level attacker-defender optimization model for vulnerability identification. Our findings suggest that the ability of the Icelandic power system to contain service disruption moderately reduces with the storm-induced potential reduction of its available components. In other words, and as also validated in practice, the system is indeed resilient.


Introduction
The continuous supply of electricity is an essential facility for modern life. This essential facility is guaranteed by designing and operating electrical power systems, so that they are able to function within a certain range of likely conditions, formally expressed by reliability criteria, such as the widely adopted N-1 criterion. In recent times, the increasing occurrence of extreme weather events, the aging of the system infrastructure, and the unclear valuation mechanisms for grid security in the market-based regime are bringing power systems outside the 'comfort zone' created by reliability criteria. The modern power system is therefore additionally required to withstand a much wider range of atypical, yet not entirely unlikely, operating conditions characterized by the simultaneous failure of several system components, or at least to deteriorate moderately and promptly recover. This additional system property of interest is commonly termed electrical power system resilience [1,2]. The Icelandic power system provides a clear example of the stakes for power system resilience. Electricity is indispensable to the local community, and even more so at times of extreme weather conditions characterized by strong winds, sub-zero temperatures and lack of natural light. However, at such times, the infrastructure of the islanded power system is much more likely to fail, as the intensity of weather phenomena may simply outgrow the component design specifications. The system should therefore be able to fail in a contained manner whatever the initiating event, in order to avoid extremely impactful service interruptions. The recent storm in December of 2019 represented one of the most extreme weather events experienced by the Icelandic power system. Despite suffering numerous simultaneous outages, the consequences were largely contained to radially connected communities, practically validating the resilience of the Icelandic power system.
Motivated by the real-life data and observations that were collected from the operation of the Icelandic power system while facing the extreme storm of December 2019, this paper introduces a probabilistic resilience assessment methodology. The methodology integrates weather-dependent component failure rates serving to express failure probabilities in function of forecasted wind speed [3] with a bi-level attacker-defender model serving to quantify the extent of the worst-case outage in terms of the consequences potentially realized by the system end-users. We combine these two building blocks in a Monte Carlo framework to assess in advance the expected impact of the worst-case outage at several characteristic phases of the wind storm, subject to uncertainty on the set of surviving transmission assets that remain online as the storm develops further.

Literature Review & Contributions
Relying on Monte Carlo simulations to evaluate the effect of extreme weather events on the electrical power system is a standard approach in the literature. Most notably, in a series of publications, Panteli et al. use the so-called fragility curves to model the relationship between wind speed and line failure probability [4][5][6]. Further, these authors rely on AC optimal power flow computations to sequentially assess the viability of the power system states simulated by randomly drawing component outages and repairs. In a similar style, Trakas et al. also use the fragility curve concept and focus on exploiting Latin Hypercube Sampling and scenario reduction techniques while expressing viability through DC optimal power flow computations [7].
Our work benefits from the availability of historical records of sustained component failures at different wind speeds for the Icelandic power system. We rely on such data and the seminal work of Billinton [8] to compute transmission line failure probabilities in function of the forecasted wind speed throughout the storm event, as developed in [3]. Moreover, instead of optimal power flow computations, we evaluate the resulting randomly generated power system states by means of a bi-level attacker-defender model. The bi-level attacker-defender model has been used in several applications to expose worst-case vulnerabilities of the power system infrastructure [9][10][11][12][13]. We argue for the suitability of such model for probabilistic power system resilience assessment studies, since it provides a measure of the ability of the system to contain the impact of any additional component failure. We study this measure over characteristic phases in the anticipated evolution of the storm, differentiated in terms of the survival probability of the system components. In this way, we verify the resilience of the Icelandic power system, whose ability to mitigate the impact of any additional failure does not considerably change at the different phases of the storm.

Paper Organization
The remainder of this paper is organized, as follows. Section 2 presents a brief introduction to the Icelandic power system and a description of the severe storm event realized in December 2019. The proposed probabilistic resilience assessment methodology is presented in Section 3 and the respective results from its application on the Icelandic case study are introduced in Section 4. Section 5 concludes the paper and also discusses further research directions.

The Icelandic Power System & the Storm Events
This section provides an overview of the Icelandic power system in 2019, in order to provide context for understanding the impact of the extreme weather. This is followed by a timeline of major events during the storm, including an overview of the key impacts.

Icelandic Power System
The Icelandic power system is an islanded network. The peak load of the system in 2019 was 2345 MW [14], the majority of which is consumed by heavy industry and generated almost exclusively by hydro and geothermal power plants. Generation and load are primarily located in the south-western and eastern 220 kV regions, shown in Figure 1, which is connected by a 132 kV ring of transmission lines around the perimeter of the country. Significant operational challenges are caused by the ring connection, primarily through congestion that restricts the flow of energy from one 220 kV region to the other, and through dynamic instabilities that arise due to the relatively large electrical distance separating the two 220 kV regions [15]. Consequently, operational security is dominated by the management of inter-region congestion and reliant upon synchrophasor-measurement unit based wide-area monitoring systems and fast acting system integrity protection schemes. These protection schemes induce topological islanding between regions, or fast ramping of generation or load, as a response to the regional rate of change of frequency measurements. Fast responding protection allows for disturbances to be quickly contained, such that they do not propagate further disturbances across the grid.

Extreme Weather Event of December 2019
There are numerous exogenous threats to the Icelandic power system, most of which are weather phenomena [16]. Those threats that are most relevant to the December 2019 storm are high intensity winds, ice loading, and salt pollution. Analysis by meteorologists in the days prior to the storm anticipated it to be the worst northerly storm within the last 50 years, based on pressure differentials at various altitudes [17].
One of Landsnet's annual targets is to achieve an Average Outage Duration Index (SMS) below 50 min., which is formally defined in [18,19] as: where, Landsnet's annual performance report for 2019 contains a dedicated subsection describing the storm events [14]. The December storm event was responsible for 81.7 system minutes of the 91.2 system minutes recorded for 2019, which greatly exceeds the annual target. Notably, 80.5 system minutes were attributed to the loss of radially connected loads. In total, 103 transmission poles/towers were damaged in the storm, with additional damage in two substations. The damage to transmission infrastructure was mostly located in the north-west of Iceland, where the wind and icing loads were the greatest. Damage to substations occurred in the North-West and South-East, due to icing and salt pollution. A more detailed analysis of overhead line disturbances during the storm can be found in [3]. Ref [14] reports the most significant outages experienced during the storm.

Overview
Through the development of the extreme weather event, the progression of the strong wind front poses the unique challenge of simultaneously making electricity indispensable to the end-users and causing the outage of several transmission components, while also limiting the capacity to dispatch maintenance crews and perform restorative operations. Our methodology seeks to evaluate the system resilience in anticipation of such a challenge.
We use historical weather-dependent failure rates along with forecasted wind speed data to model the anticipated threat. We evaluate, through time, the probability that any transmission component would survive up to t hours from the start of the storm, as well as the probability that it may fail within the next interval t + dt. The detailed derivation of such weather-dependent survival and failure probabilities is presented in the following Section 3.2. Studying the evolution of said probabilities over time, we identify the moment of peak threat as the moment wherein the probability that the system remains still intact is extremely low, while the probability of realizing any further new outage is maximized (Notice here that we avoid modelling repairs. In the absence of credible repair rates under the extreme anticipated conditions, we take the conservative assumption that these operations are completed after the storm. Performing repairs is extremely difficult at times of a severe storm when several components may have failed, locations may be harder to reach, as well as on-site crew health and safety may be jeopardized).
A resilient power system should be able to contain further degradation of electricity supply, irrespective of the component failures it has already sustained. In order to assess for this property, we compare the ability of the system to withstand the worst possible additional outage at three characteristic phases, and specifically: (i) a moment prior to the start the storm, (ii) the aforementioned moment of peak threat, and (iii) a moment at the end of the storm. For each one of these phases, we perform a Monte Carlo simulation, wherein the network state is randomly drawn simulating the uncertainty in the survival of transmission components. For each sampled state, we assess the amount of load that would have to be shed upon occurence of an additional worst-case outage by solving a bi-level attacker-defender model, presented in detail in the following Section 3.3. The resulting expected worst-case load shed (i.e., expected ability to withstand any additional failure) through the three phases of the storm is our probabilistic metric for assessing the system resilience.

Weather-Dependent Failure Probabilities
We begin by defining weather dependent failure rates for each time step in the analysis horizon and component of the transmission system. To do so, we rely on the methodology introduced in detail in [3,20]. Briefly, the methodology involves categorizing the observed failures of each individual component according to the identified primary cause and mapping this data to the time series describing the intensity of the identified primary cause. The end results are (i) a set of time-invariant (average) failure rates per component and identified primary case and (ii) a set of correction factors to account for the momentary intensity of each identified primary cause.
In the present application, we focus on the impact of an extreme wind-storm on the Icelandic power system. To do so, we model two distinctive primary causes of failure, namely wind (w) and other (o). Further, we use the location-specific forecasted wind speed as the descriptor of the anticipated intensity of the wind primary cause. Accordingly, we express the time-dependent failure rate per transmission component (i) and time moment (t), as: where, Typical values for c i (v i,t ) range from 0.1 at low wind speeds up to 10 4 for high wind speeds. For wind speeds that are greater than those experienced historically by a line, the value of c i (v i,t ) becomes significantly higher. Note that the time-invariant failure rate of a component for all other causes, λ o i , represents all other possible causes of failure on the Icelandic power system, as summarized by Elíasson [16]. It should be noted that some of these causes represent time-varying phenomena that may be modeled separately, but this however requires additional data that was not available at the time of the study.
Given the time-dependent failure rates, we use the classical models of Billinton [8] to enumerate weather-dependent survival and failure probabilities per component and time instant. Precisely, we define the former as "probability that component i has not experienced any failure during the first t hours" and the latter as "probability that component i may sustain a failure within [t, t + dt] hours". Denoting these probabilities π s i,t and π f i,t , respectively, we compute them as:

Bi-Level Attacker-Defender Model
The scope of the bi-level attacker-defender model is to quantify the vulnerability of any random system state generated by simulating the uncertain impact of extreme weather on the survival of transmission system components. More specifically, we seek to quantify how well the system can cope with any additional failure, while taking into account the fact that the control room operators would use any available action to avoid the involuntary loss of load.
To do so, the problem upper-level models an intelligent attacker agent, seeking to maximize the damage that it may cause by removing an additional transmission component from service. We express the damage caused by such intelligent attacker in terms of the amount of load demand that the control room operators would have to inevitably shed upon realizing its chosen additional outage. Accordingly, the lower-level models an intelligent defender agent, seeking to minimize involuntary load shedding given the realization of the outage chosen by the attacker. The lower-level part of the problem includes the DC power flow equality constraints, upper and lower bounds on active power generation, as well as bounds on the power flowing through all surviving system branches. Mathematically, the problem is stated as: max s n ∈ arg min f,p,s,θ ∑ n∈N s n · d n (11) sev upper-level continuous variable, modeling the severity of the attacker's choice; u upper-level binary variable, modeling the choice of the attacker to remove branch from service; a binary parameter, modeling the availability status of branch ; β ,n binary parameter, modeling the connectivity of branch with node n and the assumed flow direction; M a large constant parameter; s n lower-level continuous variable, modeling the ratio of involuntary load shedding applied at node n; d n parameter, modeling the active power demand at node n; θ n lower-level continuous variable, modeling the voltage angle at node n (index n 0 is used to denote the network reference node); γ g,n binary parameter, modeling the connectivity of generator g with node n; p g lower-level continuous variable, modeling the active power output of generator g; f lower-level continuous variable, modeling the active power flow through branch ; p g parameter, modeling the capacity of generator g; X parameter, modeling the reactance of branch ; f parameter, modeling the capacity of branch ; G set of generating units; L set of transmission branches; and, N set of nodes.
Constraints (6)-(8) impose that the upper-level agent may only remove from service any single component that has survived thus far, while (9) can be used to avoid trivial solutions that are associated to structural vulnerabilities of the intact system, and more specifically the choice of an attack isolating a node that is connected to the network by a single branch. Constraint (10) imposes that the severity that is caused by the attacker is capped by the optimal reaction of the lower-level agent, as per (11)- (17). For any value of the upper-level variables u , expressions (11)-(17) state a standard minimum load-shedding DC optimal power flow problem, enforcing the balance of power at all network nodes (14) as well as the capacity ratings of generators (15) and all transmission branches (17). Problem (5)-(17) is a bi-level optimization problem, with a linear and, thus, convex lower-level (11)- (17). Exploiting this property, we further derive the single-level Mixed-Integer Linear Programming (MILP) equivalent of (5)-(17) by: 1. replacing the minimization operator in (11) with the Karush-Kuhn-Tucker optimality conditions of problem (11)-(17); 2. introducing auxiliary binary variables to reformulate the complementary slackness conditions of (11)-(17) using the large constant M; and, 3. linearizing products of upper-level binary variables and lower-level continuous variables through disjunctive cuts.

Data and Implementation
The implementation and data requirements of the weather-dependent failure rate models, as described in Section 3.2, are outlined in [20]. The data used to produce the failure rate models are restricted due to them representing sensitive information related to infrastructure of national importance. The look-ahead component survival and failure probabilities required forecast wind speed data. The forecast wind speed data were sourced from the Icelandic Bureau of Meteorology's online repository of forecast weather maps [21]. A tool was created in Python to convert the weather map images from the 10 m elevation HARMONIE model into geospatial wind speed data, at an hourly resolution. This tool was used to generate wind speed time series for a given overhead line, by finding the maximum wind speed measured across all towers at each hour of the storm.
The weather-dependent failure rate models and wind-speed forecasts were then used to compute look-ahead component survival and failure probabilities at each time step in Python. The resulting probabilities were used to compute contingency probabilities, which represented one of the main inputs to the bi-level attacker-defender model. In addition, system snapshots of the Icelandic power system were provided at 5' intervals in a bus-branch model, which consisted of electrical parameters, component states, and bus injections (load and generation). The bi-level attacker-defender model defined in (5)- (17) was implemented in Julia, and solved while using the CPLEX solver.

Results & Discussion
This section presents the application of the developed probabilistic resilience assessment methodology on the Icelandic transmission system, facing the storm event of December 2019. More specifically, our analysis covers the period from 00:00, 09/12/2019 to 23:59, 12/12/2019. We used a 5' temporal resolution along with the forecasts for wind speed and load demand available to the system operator on 08/12/2019.

Probabilistic Modeling of the Storm Threat
To set the context for the resilience analysis, we begin by representing the anticipated impact of the storm event on the availability of the Icelandic transmission system infrastructure. Figure 2 plots the temporal evolution for the aggregate system survival probability (i.e., the probability that no unplanned component outages occur, remaining intact, computed as ∏ i π s i,t ) in blue, as well as the probability of realizing a new outage during the upcoming 5' interval in red. It can be seen that the system survival probability collapses with the arrival of the storm. Approximately 38 h into the assessment horizon (14:00, 10/12/2019), the system is most likely not intact, while 4 h later the system is most certainly not intact.
The spikes in probability of realizing a new outage as new overhead lines are exposed to high wind speeds as the storm moves across the power system. The probability of realizing a new outage notably increases around hour 38, reaching its maximum value 43 h into the assessment horizon (19:00, 10/12/2019). Therefore, we designate 19:00, 10/12/2019 as the moment of peak threat to analyze the system resilience. We will further use the 5' interval 24 h ahead (19:00, 09/12/2019) as a representative moment prior to the storm. Approximately 55 h into the assessment horizon (07:00, 09/12/2019), the probability of realizing a new outage diminishes. This is not only due to the fact that the storm intensity gradually reduces, but also due to the fact that, by this point, any vulnerable components of the transmission system have most probably failed already. Accordingly, we will use the 5' interval 24 h after the moment of peak threat (19:00, 11/12/2019) as a representative moment at the end of the storm.

System Resilience
We proceed by comparing the ability of the system to sustain the worst possible outage at the three identified characteristic instances by means of Monte Carlo simulation. Recalling that the worst possible outage depends on the prior sustained failures, Table 1 summarizes the results in terms of expected worst-case load shed. All the results of this table have been obtained using 5500 samples of the respective probability distribution, with the 3rd column showing the standard deviation of the estimated quantity.  Table 1 showcases the resilience of the Icelandic system. Indeed, at any phase of the storm, and despite having already sustained different failures, the system could still cope reasonably well with the additional worst-case outage. At the most extreme case, having realized the full effect of the storm, a new outage could lead to not serving at most 4.42% of the end-user demand. Further, given the cumulative effects of the storm, we argue that the progressive increase of the expected worst-case severity between the three phases of the storm is certainly moderate. Here we recall that the results shown in Table 1 are computed under the conservative assumption that no maintenance activity is performed throughout the storm. The small difference in the magnitude of the worst-case impact from between the start, peak and end of the storm should, thus, in practice, turn out to be even smaller.
The coefficient of variation (i.e., ratio of standard deviation to the expected value) of the worst-case outage severity is an additional indicator to perceive the system resilience. To establish this point, Figure 3 presents the frequency of the different sampled system configurations at the three storm phases. The three sampled instances correspond to a very vast diversity of system configurations, as this figure shows. Notably, at the end of the storm, the system is at least in an N-9 configuration, which of course implies an atypically large number of component outage combinations. The system is clearly capable of absorbing the diversity in the possible component failures, as the worst-case outage severity coefficient variation is decreasing with the progression of the storm, Table 2. This implies that the system can respond to the storm progression in a rather stable manner.   Keeping in mind Figure 3, additional evidence of the system resilience can also be found by looking into the identified worst-case outages at the different storm phases. Over this very diverse range of configurations, very few 'critical' branches stand-out as the additional outage that would cause the worst-case severity. At the instance prior to the storm, and as the system would still be most-likely intact, the outage of the branch HRU-GLE (as shown in Figure 1) was identified as the next worst-case outage in approximately 97% of the studied samples.
During the moment of peak threat, we found two striking worst-case outages, and specifically the outages of branches RAN-VAR and HVO-RIM with respective frequencies of 26.3% and 70.4%. These two outages are also dominantly the identified worst-cases at the end of the storm with frequencies of 73% and 25.8%, respectively.The maximum severity of these outages during the moment of peak threat was estimated to be 1.65% and 0.63% in terms of percent of load disconnected due to involuntary shedding, respectively. Note that more severe worst-case outages were experienced in only 0.3% of simulations, up to a maximum severity of 5.7%. In addition to validating the present system resilience, this information can also be of use for identifying strategic measures to further enhance resilience.

Conclusions
This paper developed a probabilistic resilience assessment methodology, explicitly taking into account the impact of weather conditions on the availability of the system infrastructure. Focusing on a storm event, we relied on real-life data as inputs to an already established model of the component failure probability in function of the anticipated wind-speed. We combined this model with a worst-case assessment framework, relying on bi-level optimization to quantify the worst-possible loss of load from any new component outage, given the (randomly sampled) prior failures the system may have already sustained. We analyzed the resulting metrics at different characteristic phases throughout the storm, to capture resilience in terms of the relative ability of the system to absorb further degradation.
The presented case study concerns the islanded power system of Iceland and it is based on data that were prepared in anticipation of one of the most extreme weather events, which happened in December of 2019. Our results validate the resilience of the Icelandic system in the face of this storm event. We have shown that at the peak threat instance, when the system infrastructure may already be partially unavailable and a new outage is most probable, the worst-case new outage is expected to only cause involuntary shedding 2.35% of the consumer demand. Further, at that storm end instance, when a large part of the infrastructure should have failed and before any restorative maintenance, a new outage is expected to only cause involuntary shedding 4.42% of the consumer demand in the worst-case. This implies that the ability of the system to withstand failures degrades much slower than the degradation in its available infrastructure.
Landsnet records contain 17 distinct events of shedding of consumer demand during the studied period. Of these, 11 represent disconnections of 1% or less of consumer demand, and five others represent 1 to 2.4% of consumer demand. The remaining largest disconnection event represented 5.9% of consumer demand; however, in this particular case, 40% of the shedding was voluntary as a result of pre-existing wide-area control systems. Therefore, the proportion of involuntary load shed was 3.5%, which is within the bounds determined by the bi-level attacker defender model. Hence, the system is indeed resilient.
Future work concerns extending the framework introduced in this paper towards evaluating the impact of several measures for enhancing the system resilience. As a first-step, we plan to develop and integrate a credible restorative maintenance model, explicitly recognizing the particular difficulties present during the storm event. This will allow for quantifying the value of restorative maintenance for power system resilience and analyze which parameters of the restorative maintenance infrastructure (e.g., capacity of maintenance crews, geographical dispersion, availability of spare parts) are of critical value. In a similar manner, the framework will also be extended by modeling bespoke resilience-enhancing resources, such as back-up generation.