Global Methodology for Electrical Utilities Maintenance Assessment Based on Risk-Informed Decision Making

Modern electrical power utilities must deal with the replacement of large portions of their assets as they reach the end of their useful life. Their assets may also become obsolete due to technological changes or due to reaching their capacity limits. Major upgrades are also often necessary due to the need to grow capacity or because of the transition to more efficient and carbon-free power alternatives. Consequently, electrical power utilities are exposed to significant risks and uncertainties that have mostly external origins. In this context, an effective framework should be developed and implemented to maximize value from assets, ensure sustainable operations and deliver adequate customer service. Recent developments show that combining the concepts of asset management and resilience offers strong potential for such a framework—not only for electrical utilities, but for industry, too. Given that the quality and continuity of service are critical factors, the concept of Value of Lost Load (VoLL) is an important indicator for assessing the value of undelivered electrical energy due to planned or unplanned outages. This paper presents a novel approach for integrating the power grid reliability simulator into a holistic framework for asset management and electrical power utility resilience. The proposed approach provides a sound foundation for Risk-Informed Decision Making in asset management. Among other things, it considers asset performance as well as the impact of both current grid topology and customer profiles on grid reliability and VoLL. A case study on a major North American electrical power utility demonstrates the applicability of the proposed methodology in assessing maintenance strategy.


Introduction
Electrical power transmission networks are the backbone of power utilities because they transmit energy from the generation location to the electrical substations. Their operating and business environments are complex and involve significant uncertainties (market changes, changing regulatory framework, new technologies, malicious human actions, climate change, extreme weather events, etc.).
Additionally, electrical utilities must manage the replacement of large portions of their assets as they reach the end of their useful life, become obsolete due to technological changes or reach their capacity limits. Major upgrades are also often necessary due to growth in demand or because of the transition to more efficient and carbon-free power alternatives. Thus, electrical power utilities are exposed to significant risks and uncertainties that have mostly external origins. They must evolve and adapt. In this context, it is vital to develop and implement an effective framework to maximize value from assets, ensure sustainable operations and deliver adequate customer service. Recent developments show that combining the concepts of asset management and resilience hold strong potential for the framework-not only for electrical utilities, but for industry too. Current research only partially covers the above-mentioned challenges.
Maletic et al. [1] present the results of a survey data collected from managers in 138 international and local organizations. The authors analyze commitment of organizations to physical assets management (PAM). The results have shown that the core practices of PAM directly influence the operational performance. Gavrikova et al. [2] propose a framework for asset data management in power utilities. The authors consider asset data management from a strategic perspective by linking operational data with corporate strategy. They also take into consideration the organizational context and stakeholder expectations.
Amadi-Echendu [3] highlights the importance of physical asset management in increasing the value system in a sustainable manner throughout the asset's life. The value system in asset management is also discussed by Wijnia [4]. Park et al. [5] introduce a methodology to integrate the concept and systems of asset management in domestic infrastructure. The methodology also includes the current status of asset management in domestic and overseas infrastructure and discusses the justification for legal legislation of asset management in developed nations. The authors propose sustainable policy directivity for settlement and encouragement of infrastructure asset management.
A Work Group mandated by CIGRE [6] highlights the importance and key concepts of asset management for the electrical utilities sector. According to the document, the adoption of modern asset management techniques is a long-term effort. It requires significant cultural adjustments and thus leadership involvement.
The field of asset management has reached worldwide maturity, up to the point that, now, ISO standards have been established in this area [7] using ISO-based risk management methods [8]. Komljenovic et al. [9] propose a high-level approach in asset management for electrical utilities in the the operational and business contexts. A similar topic is also studied by Bertling et al. [10]. Petchrompo and Parlikad provide an overview of the literature on asset management for multi-unit systems with an emphasis on two multiasset categories: fleet (a system of homogeneous assets) and portfolio (a system of heterogeneous assets) [11]. This information offers insights into the current trends in multi-asset maintenance.
For electrical utilities the quality and continuity of service are the critical factors. The concept of Value of Lost Load (VoLL) is an important indicator for assessing the value of undelivered electrical energy due to planned or unplanned outages. It enables electrical utilities to better manage the overall complexity of their operational and business environments and optimize sustainable development and operations. The subject of VoLL has been broadly studied [12][13][14][15][16][17][18][19][20], but the literature review has shown that an integral approach in asset management of electrical utilities is missing. It requires further enhancements to be integrated into the asset management and resilience framework used for complex power grids. A more detailed review is exposed in Section 3.4.
The goal of this paper is to present a novel methodology integrating an asset behavior model, an original transmission power grid reliability simulator and an enhanced method for the determination of VoLL into a holistic framework. The proposed framework aims at supporting asset management and resilience for electrical power utilities. The novelty of our approach is that it links all elements of a holistic asset management for electrical power grids, from the asset to the power grid level, as part of an overall resilience management. Firstly, we develop a model to quantify the impact of changes in maintenance strategies at the asset level. Secondly, the power grid reliability simulator has been developed to link the power flow calculations with real-time changes in grid configurations caused by planned/unplanned unavailability of assets. The amount of undelivered energy is calculated with the reliability simulator and is combined with other categories of (monetized) risks to obtain a VoLL that represents a holistic picture of the risk profile at the system level. The benefits of this methodology will be demonstrated by assessing the maintenance strategy at an electrical substation. The proposed approach is a critical foundation of sound Risk-Informed Decision Making (RIDM) in asset management.
The paper is structured as follows: a comprehensive RIDM approach in asset management and resilience is presented in Section 2. Section 3 describes the asset behavior methodology as a function of maintenance strategies, the basic elements of the power grid reliability simulator, the risk model and, finally, an enhanced approach to calculating the VoLL. The application of the model in a case study is presented in Section 4. The discussion is given in Section 5, and the paper is concluded in Section 6.

The Risk-Informed Decision-Making (RIDM) Conceptual Model in Asset Management
Effective and efficient organizations use a structured approach to asset management to balance competing priorities (performance, risks and costs), manage various external and internal influence factors and ensure an equilibrium between long-term benefits and immediate needs. An asset management system based on the ISO 55000 family of standards helps organizations to establish a coherent approach and allocates resources and activities in a coordinated allocation of resources and activities. A well-designed asset management system is also able to cope with the risks of extreme and large-scale disruptive events in the strategic and asset management decision making. A disruptive (adverse) event usually has the capability to cause system-level failures (sometime cascading ones) in an organization [21].
To adequately cope with such uncertainties and complexity, we believe that it is important to combine the concepts of resilience and asset management. By doing so, organizations can achieve a sustainable development, a satisfactory level of service and an economic viability.
The concept of resilience has developed over last decades. It covers four phases related to the occurrence of a disruptive (adverse) event: (1) planning, or preparation and anticipation prior to an adverse event; (2) absorption, or the loss of performance while an event occurs; (3) recovery, during which performance is restored after the event; (4) adaptation or the lessons learned and continuous improvement after an adverse event. This concept includes economic, financial, operational, technological organizational, reputational, societal and business-model resilience [22,23]. There is no unique definition of resilience, and many definitions can be found in the literature [8,24,25]. Figure 1 presents the proposed high-level holistic resilience-based asset management model that incorporates all relevant functions and activities of an organization and shows their relationship to the four main phases of the resilience concept. Figure 1. Resilience-based engineering asset management (modified from [9]).
The proposed methodology integrates and harmonizes the following functions and activities of an organization [21] (Figure 1):  operation management; maintenance, monitoring and inspection management; asset management;  general management; risk and emergency management; feedback and lessons learned;  establishing adequate equipment and system design criteria; R&D and innovation;  legal and regulatory framework.
This approach has strong potential for supporting organizations in achieving sustainable development, satisfactory service and performance levels as well as economic viability. Normally, it is implemented through a combination of preventive and mitigating measures at the organizational, system and asset levels.
Asset management, as a part of the proposed methodology, is defined by ISO Standard 55000 as a set of coordinated activities of an organization to realize the value of assets [7]. These activities have their own specificities and particular models that interact and generate a conversion of raw data to knowledge and then to appropriate actions. Their interactions and interdependencies are also complex and are characterized by significant intrinsic uncertainties [9]. As per best practices, asset management should be closely linked to the strategic planning of an enterprise, the so-called "line of sight", which translates organizational objectives into asset management policy, strategy and objectives [7,21,[26][27][28].
In the decision-making process, it is essential to strike the right balance between numerous competing interests and factors such as performance, risks, benefits, costs, opportunities, short-term goals vs. long-term sustainability, etc. Opportunities are sometimes considered positive risks in a broader risk-management context.
Modern electrical utilities attempt to address the above-mentioned challenges using various models and tools that help in decreasing the uncertainties and better quantify risks within their asset management decision-making processes. These models and tools are typically based on traditional methods that have been shown to be of limited efficacy in dealing with complexities and uncertainties. Very little work has been published on how to link information and insights obtained from quantitative models to the decision maker's needs. Furthermore, the impact of other barely quantifiable or intangible factors (e.g., public perception, political influence, company reputation) may occasionally become dominant in final decisions, yet they are quite difficult to adequately account for. As a result, future challenges require new ways of thinking about the complex, interconnected and rapidly changing world [21,26,27,[29][30][31][32].
The proposed RIDM in asset management methodology is a three-step process: (1) establish the decision-making framework, (2) perform detailed analyses and (3) deliberate and make final decisions ( Figure 2) [21,26,27]. RIDM involves considering, appropriately weighting and integrating a range of complex inputs and insights into decision-making. The inputs and insights considered may come from "traditional" engineering analyses, reliability analyses, occupational and health requirements, deterministic and probabilistic risk analyses, operational experience, cost-benefit considerations, regulatory requirements, climate change adaptation, allowable "time at risk" and any other relevant quantitative, qualitative and/or intangible influence factors and considerations. It is deliberative and iterative [21]. The Canadian nuclear power industry defines RIDM as a process that provides a formalized, rational, and systematic methodology for identifying, assessing, and communicating the various factors that support making a risk-informed decision [33]. It should be emphasized that this approach is more suitable for large projects addressing strategic concerns, such as long-term and mid-term performance, major upgrades, replacements and sustainability issues. It is somewhat impractical for daily decision making. Risk assessment is only a part of the whole RIDM process, which includes a broader range of analyses (engineering, risk, economic, societal, environment, regulatory, etc.) in order to be effective. Figure 2 also presents the participants involved-namely, decision makers, management, stakeholders and subject matter experts (SMEs)-along with their functional roles at each step.
It is important to note that Step 3 of RIDM in asset management is mainly performed by the decision maker, supported by SMEs, analysts and stakeholders ( Figure 2). The aim of this qualitative and quantitative step is to gather all relevant insights for satisfactory decision making. Through the RIDM process, an organization gives the decision maker the authority and responsibility to make critical decisions. While ultimate responsibility for selecting from possible alternatives belongs to the decision maker, the evaluation can be performed in deliberative forums that may be held before the final decision is made. The final asset management decision should be made only after the deliberations have taken place. This is one of the differences between a risk-informed and a risk-based process.
Deliberations are necessary because there may be aspects of a particular decision that cannot be considered formally or through the use of models. It is important to understand that deliberations do not delegitimize the use of either scientific understanding or detailed quantitative analyses. The insights gained from Step 2 (detailed analyses) may eventually lead to the formulation of additional decision alternatives, where it is necessary to go back to Step 1, as indicated by the feedback loop (the decision diamond in Figure 2). Without sound analyses from Step 2, deliberative processes can lead to agreements that might be unwise, misleading or unfeasible. In fact, the deliberation should be understood as the final "sense-making" exercises in the decision-making process.
Once deliberations are complete, the final decision is made and documented. The organization has to provide the necessary resources to implement it. This may be done as part of the organization's regular activities or as a distinct project, using internal or external resources. The approach will depend on the scale and scope of the activities to be carried out as well as on the organization's internal rules of governance. Key stakeholders must be informed [21,26,27].
The proposed generic RIDM asset management approach that also integrates the concept of resilience (Figures 1 and 2) establishes a framework for a specific asset management methodology for electrical utilities, which is presented in the following sections.

Asset Management Approach for Electrical Utilities
In this section, we further describe an enhanced methodology for RIDM in asset management for electrical utilities that takes account of their particularities. The methodology is based on the findings of research involving Hydro-Québec, a major North American electrical utility, and particularly its transmission division, known as HQT. This methodology also takes account the other research in the field [1][2][3][4]6,7,10,11,21,[26][27][28]32,[34][35][36].
The comprehensive asset management model is presented in Figure 3. It was first developed at HQT and is now being implemented in close collaboration between HQT, Hydro-Québec's Research Institute (IREQ) and several universities as part of an R&D project called "PRIAD." (A French acronym for a robust decision-making support tool in asset management.) Figure 3. Functional relationships between the modules of the global asset management model (adapted from [9,26]).
The model consists of five main modules supported by a data analysis platform as well as an analytics module, which allows for the acquisition, exploration and analysis of historical data. The modules are designed to maximize the impact of any knowledge regarding asset behavior through a set of reliability-focused computing tools, which include: (a) an asset behavior model, (b) predictive models and reliability database, (c) a simulator of events, (d) a transmission grid reliability simulator coupled with power flow software, (e) a risk model and (f) a planning and optimization module ( Figure 3).
More details will be provided on the asset behavior model, grid reliability simulator (PRISME) and risk model incorporating the VoLL methodology. All these modules are parts of Step 2 of the detailed analyses presented in Figure 2, providing input and insights for the decision makers in Step 3. These modules have their own modeling approaches and simulation capabilities. They communicate and inform one another about the impact of certain asset management strategies on asset behavioral changes, up to and including the impact on the power grid's financial performance.

Asset Behavior Models and Reliability Database
To obtain realistic behavior of the electrical power grid in various asset maintenance strategy scenarios, a powerful asset behavior model is required. Such models aim to predict the effects of time-based preventive maintenance and replacement strategies on the reliability performance of various asset classes. They intend to represent the physical deg-radation of each asset class by combining expert knowledge and advanced statistical analyses of failure data. Unlike statistical approaches, physics-based models are more robust when dealing with limited or bad data. They are also easier to interpret for experts and decision makers. The hybrid approach is inspired by recent studies in the field [9,10,[34][35][36] and is suited to the Hydro-Québec operating context.
The proposed approach is an extension of the Failure Mode and Effects Analysis (FMEA) [37]. For each asset class, the approach aims to identify all failure modes and their degradation processes, including propagation times, without considering the effects of maintenance actions. Subsequently, the effects of the various maintenance actions on each degradation process are estimated. The model is based on the strong hypothesis that a conditional maintenance action will be carried out within a reasonable time after an inspection maintenance action detects a degradation beyond a predefined threshold. A deterministic algorithm based on the reliability approaches and physical model [38,39] predicts the equipment's failure rate based on the various systematic maintenance scenarios and the equipment's age. This model has been discussed and presented in detail in recent publications [9,36] and is illustrated in Figure 4.  Figure 4 shows the structure of the degradation model (described above), as well as the model's predictions according to two different scenarios after being previously trained on historical failure data. For example, if we consider scenario A, which we will call the pessimistic scenario, historical data have shown that when the maintenance strategy is unchanged (factor of 1 on the inspection intervals), this asset class's failure rate is 0.015. If we increase all inspection intervals by 50% (factor of 1.5), the predicted failure rate is 0.033, which represents a 120% increase. On the other hand, if we reduce all inspection intervals by 50% (factor of 0.5), the predicted failure rate is 0.006, representing a decrease of 60%. Because the model takes the granular impact of each maintenance type and degradation process into account, the resulting behavior (or change in failure rate) is not linear.

Grid Reliability Simulator (PRISME)
PRISME, a French acronym for "PRogramme Intelligent de Simulation de la Maintenance et ses Effets,", which means "Intelligent program that simulates maintenance and its effects", is a reliability simulator that simulates the transmission power grid behavior. It is a Monte Carlo simulator that runs a high number of simulations to assess the reliability of the power transmission network. PRISME uses as inputs statistical models that characterize outage frequency and duration and maintenance events on various grid assets. The outage models represent the physical degradation process of each asset family by combining expert knowledge and statistical analyses of failure and maintenance data, such as those presented in the previous section [9].
Once these grid events are generated and applied to the simulated network, all the effects and impacts are recorded, analyzed and summarized. Because thousands of simulations are run, all simulation results are archived in a stochastic manner, providing an understanding of the grid's overall reliability and resilience. PRISME is composed of two autonomous but complementary simulators that work in a paired way. The first is a stochastic discrete event simulator that simulates outage and maintenance events. Its main objective is to evaluate the resulting effects of concurrent unavailability. The second simulator, which relies on a commercial power flow engine (software) known as PSS/e, employs a Contingency Analysis (CA) algorithm to identify electrical violations or interrupted loads [38] following unavailability events. The stability (transmission) and thermal limits are also analyzed for the main transmission grid within a reasonable time. In the next section, we describe each simulator in detail. In addition, we present a simple but comprehensive example of how the two simulators work in a cooperative way to deliver a more realistic simulation result.

Power Flow Simulator
To evaluate the impact of equipment unavailability, every contingency scenario is simulated using a node-breaker network model in the PSS/e environment. The equipment, whether a transformer or line, is able to transit a maximum amount of energy determined by thermal limits. These stability limits depend on the network topology. Therefore, the unavailability of equipment can impact these limits. We also ensure that reserves are always transportable. Two levels of analysis are used: In level 1, we perform the pre-contingency analysis, while in level 2, we identify the worst contingency case using a postcontingency analysis. The algorithm uses the following inputs: the detailed node-breaker network model, the period covered by the analysis, the list of unavailable equipment and the analysis level. The various levels of analysis developed for this evaluation rely on PSS/e software. A detailed simulation was performed on the HQ system to determine the impact of equipment unavailability on the power system. The assessment does not distinguish between unavailability causes (maintenance or outage). In addition to the HQ test system, a fictional test network ( Figure 5) is used as the baseline case model, which makes it possible to study various limit violation scenarios. Using this test model, we provide a simplified demonstration of how the impact of the unavailability of transmission equipment is assessed. In this scenario, line L24-1 is isolated using the line switch breaker. As a result of the line tripping, the power transfer on line L24-2 exceeds its nominal level (thermal limit: 125 MVA), causing a violation on line L24-2. Furthermore, the interface stability limit (170 MW) is also exceeded. In the CA algorithm, we prioritized the re-dispatching of generators rather than load shedding for economic and customer satisfaction reasons. Furthermore, we defined zone 1 and zone 2 as the dispatchable zone areas, while bus 5 is excluded from the zone. Therefore, to eliminate the violation, only bus 1 and bus 3 generators will be re-dispatched; if this is insufficient, load shedding will be employed. To this end, the active power of the generators is decreased to their minimum power (75 MW). Furthermore, the algorithm makes a load adjustment of 52 MW.

Stochastic Simulator
The stochastic simulator simulates outage and maintenance events using a Monte Carlo approach. It has the complex task of combining the effects of one or several equipment unavailability and processing useful indicators such as outage duration (i.e., downtime) and frequency for selected equipment or loads. Currently, two types of events can be simulated: unplanned events (i.e., outages) and planned events (i.e., maintenance). Fundamentally, the difference between maintenance events and outage events is their ability to be rescheduled under certain conditions. Outages are invariant and cannot be postponed. Since the frequency and duration of the events are described by statistical distributions, we use the inverse cumulative distribution function to determine when the event has to be created and how long it lasts. A random uniformly distributed number U [0,1] is drawn and the event timestamp is calculated according to Equation (1) in the case of an exponential distribution. For example, the probability distribution function for the exponential distribution, where λ > 0 is the rate parameter and x is a random variable: The Cumulative Distribution Function (CDF) for the exponential distribution is: The Inverse Cumulative Distribution Function (ICDF) for the exponential distribution, where U is a random uniformly distributed number over [0,1], is: A general view of how the events are analyzed and processed is presented in Figure  6. For example, suppose that we have two outages starting at times T1 and T2, running consecutively. The first event from T1 to T4 is an outage event involving equipment E1. A second outage event occurs for equipment E2, from T2 to T3 (represented by a green rectangle in the graphic). Note that the second event occurs during the first event. To adequately simulate the impact of each equipment failure, our stochastic simulator slices the original two events into three events. The first event, from T1 to T2, is created to simulate the failure of equipment E1, represented by the following formula:  By splitting the original events into three events, the simulator is able to adequately simulate competing events, knowing exactly when they occur (for example, from T2 to T3) and where they occur (for example, E1 and E2). The stochastic simulator relies on the power flow simulator to assess the electrical impact of having equipment E1 and E2 unavailable at the same time. For the two remaining events, there is no need to electrically simulate the impact of losing E1 (between T1 and T2) or E2 (between T3 and T4) because the Hydro-Quebec transmission grid is designed to be able to tolerate at least one outage, meaning that no customers will be interrupted.
Let us analyze another simple interaction scenario between planned and unplanned events. The following Figure 7 presents two events happening one after the other. The blue rectangle is an unplanned outage event occurring from time T1 to T3. The white rectangle is a planned maintenance event for equipment E2. The mutual unavailability of E1 and E2 causes power outages (i.e., a power interruption) for customers. This unwanted situation is identified by the stochastic simulator because, upon request, the power flow simulator runs the CA and it returns the amount of undelivered energy. Consequently, the stochastic simulator will automatically postpone the planned E2 maintenance, avoiding power interruptions to customers. As a result, the two original events are converted into one main event, as presented in Figure 7. Once a high-level analysis is made by the stochastic simulator, a detailed analysis is conducted to simulate the transmission power grid's protection mechanism. In fact, when outage or maintenance events occur, a series of breakers and switches are opened or closed by the operators to isolate the outages from the rest of the grid. This is a more complex simulation involving a sequence of actions that depends on the nature and location of the outages, the interrupted customers, the degree of emergency, etc. For that reason, we have developed a rule-based engine that simulates two kinds of rules, general and specific, to mimic the way human operators deal with planned and unplanned events. In contrast to specific rules, general rules do not concern equipment. They are global and are implemented prior to specific rules. General rules can be seen as being always true or pertaining to something that should always happen. For example, the closest breakers must open to isolate and de-energize the equipment affected by an outage event. Another example of a general rule is that the closest switches will open to isolate the equipment after it has been successfully de-energized by the breakers (Figure 8). On the other hand, specific rules, provide information on when a switch has to open or close. An example of a specific rule is: "WHEN Switch(SW1,SW2) = open THEN switch(SW3) = close", which means that when the switches SW1 and SW2 are open, then switch SW3 has to close. To illustrate how specific rules work, let us take the example of a substation. The energy is supplied by a line to power transformers T1 and T2. All the switches are closed except switch L1B3, which remains Normally Open (NO) for reliability reasons. It closes when breaker 120-1 is unavailable to bypass the isolated section and to transmit the energy to the loads. In this case, the specific rule is as follows: "WHEN Switch (1L1,1B1) = open THEN switch(L1B3) = close".
However, in rare cases, a breaker or a switch does not behave as expected or designed and does not operate when maneuvered by a human operator. This failure is known as a latent or dormant failure and is only detected when the equipment is maneuvered [39]. This kind of failure occurs when the equipment is partially damaged but is still functioning within specifications. Because the degradation does not impact equipment isolation or transit functions, it is hard to detect. However, when the equipment is used or put under stress, the failure occurs, making it inoperative. In the case of switches or breakers that refuse to open, the isolated subsection of the network becomes bigger than expected, creating a risky configuration that leads to a higher number of interrupted customers and a drastic decrease in system reliability. As it is important to simulate this phenomenon adequately, we decided to enhance our rule engine by incorporating a confidence level for each general and specific rule. As a result, every rule is triggered according to a certain probability. In this way, we simulate latent failures in PRISME.

Example of an Output from PRISME
The PRISME simulator is capable of processing complex substations as well as a subset of interconnected substations. Currently, we are able to assess the reliability of a substation in less than two hours by running 100,000 simulations. In addition, we used PRISME to simulate a Hydro Québec transmission sub-network composed of 5 interconnected substations and a total of 350 assets. It took between 9 to 11 h of processing time to obtain the convergent reliability indicators. Such outstanding results were made possible by using the parallelization module and the optimization module. Both modules improved the processing time by a factor of 10 to 100, as compared to our previous results [9]. These two modules are beyond the scope of this paper.
Recently, we conducted a sensitivity study on the above-mentioned substation. We wanted to quantitatively assess the benefits of having a switch L1B3 parallel to breaker 120-1 (Figure 8). Instinctively, we would readily conclude that redundant equipment can improve the reliability of a system. However, the analysis is more complex because of the maintenance expenses for switch L1B3. Table 1 summarizes the simulation results for the analyzed substation. We simulated a 40-year period 100,000 times. Only the unplanned events for the breakers, power transformers, switches and lines were simulated for this study. As we can see in Table 1, the average number of outages for breaker 120-1 is less than one outage for a simulated 40 years period, while the number of power interruptions is 1.9. This means that customers will be interrupted almost twice over the next 40 years. The cumulative interruption duration is 7.2 h with switch L1B3 in place, as compared to 275 h when L1B3 is non-existent. This configuration will produce 2755 MW of undelivered energy for a simulated period of 40 years. It is clear that switch L1B3 reduces lost energy by a factor of 100. In addition, our simulation results showed that the impact of switch L1B3 is greater at the end of the lifespan of the breaker 120-1. The energy saved by L1B3 is constant for the first 20 years of simulations and then increases significantly starting at 30 years. One possible recommendation would be to install redundant equipment when the system becomes older and less reliable.
Finally, we confirmed a strong correlation between switch operating delay time (the delay between the apparition of the event and the successful operation of the switch), average interruption duration and undelivered energy (MWH). In fact, the sooner the switch is activated, the lower the interruption duration and undelivered energy. The undelivered energy projected for the analyzed substation for 40 years of simulations is expressed by the following formula: Undelivered Energy(MWH) = 37 MWh + 0.9 × h' × 10 MW where h' stands for the switch operating time with a maximum value of 4 h, the quantity of 37 MWh of undelivered energy is caused by the transmission line and 10 MW is the average power consumption for the substation. Using the above formula, it is easy to understand how changing L1B3 operating time improves the substation's overall reliability. This can lead to a quantifiable gain, which can help justify improvement investments such as adding motorization and remote control.

Risk Model
The risk model uses relevant inputs from previous models. It aims to quantify and translate the inputs into monetary values whenever possible. Risks are expressed through relevant costs and potential losses (monetized risk). All categories of consequences are calculated in monetary terms. Multiplying the consequences' monetary value by the frequency of various events obtained from upstream modules produces the expected value of those consequences or monetized risks. Since both the monetary values and frequencies are stochastic variables, the risk categories and their total values are expressed through various statistical distributions (Figure 9). The following categories of consequences and risks are considered: operability, reliability, availability and maintainability (RAM), environmental, financial, regulatory, occupational safety and health (OSH), company reputation and any other risks that may be considered relevant. The interdependence between such risks is also analyzed to optimize risk management strategies. Figure 9 shows the workflow for the risk calculation using the Monte Carlo simulation. It is part of the comprehensive process depicted in Figure 3.
The costs of preventive and planned systematic maintenance activities, including asset replacement, are calculated simultaneously with the costs of reparable and lethal outage events. The PRISME simulator verifies potential energy delivery interruptions at the system level, which are then translated into monetary losses expressed using VoLL.
The Monte Carlo simulation generates an output expressed as the expected value and aggregated distribution of individual costs and losses as well as the total cost. Various maintenance strategies affect failure rates, which are calculated by the asset behavior models with various impacts at the system level. This translates into complex cost distributions ( Figure 9). These results are exported to the planning and optimization module to improve and optimize maintenance strategy and minimize total cost.

The Value of Lost Load Calculation
The VoLL is an indicator reflecting the economic and inconvenience impact of the loss of power supply on users. It is now recognized as a critical parameter for determining the optimum level of reliability.
VoLL returns an economic value for the amount of undelivered power stemming from planned or unplanned events. It is generally expressed in $/kWh or $/MWh. From a socioeconomic perspective, VoLL is an important indicator addressing the economic consequences of power blackouts and the monetary evaluation of power supply uninterruptedness [15].
This indicator is used in many applications, such as distribution and transmission system quality incentive programs, energy legislation and reliability standards [18], load reduction contracts [14], system investment decisions [16], cost-benefit analyses [17] and evaluations of customer interruption costs. However, most of these applications reduce VoLL to a single constant value [12], often using time to failure and/or recurrence frequency. This considerably reduces the "explanatory power" of this indicator, which depends on a large number of variables [13,20]. A thorough knowledge of VoLL is required to make sound reliability decisions.
This section describes an effective model that makes more dynamic use of VoLL in the overall resilience and RIDM asset management model presented in Figures 1 and 2.
Thus, VoLL is determined by relating the monetary loss arising from a power outage (as a result of inconvenience and the loss of economic activities) to the quantity of energy not supplied during an interruption. This is sometimes referred to as the "willingness to pay to avoid the consequences". The various VoLL measurement techniques can be classified into two types. These are direct or survey methods, which obtain information on power interruption costs directly from end users and indirect methods which require sources of information such as statistical data. Each of these two methods encompasses a wide variety of techniques that have a fundamental influence on the calculated VoLL [12][13][14][15][16][17][18][19][20].
In this article, we will focus on the macroeconomic approach (indirect method). This type of analysis is widely used by global regulators to quantify power grid reliability simply and quickly, using available macro indicators. The use of a macroeconomic approach has one main advantage that only a small amount of data is required. That approach is presented below.

Approach to Determining VoLL
For industrial and commercial customers, an interruption results in a loss of opportunity to produce and sell goods and services. In this case, the macroeconomic approaches include the production function approach to calculate VoLL. The consequences of an interruption are calculated in terms of unsupplied kWh, and it is assumed that industrial customers are entirely dependent on electricity and that power outages bring production lines to an abrupt halt. VoLL is then calculated as the annual Gross Domestic Product (GDP) divided by the annual industrial and commercial electricity consumption. VoLL represents the potential reduction in GDP during the interruption. More precisely, the consequence of production line shutdown is measured using Gross Value Added (GVA), as follows: where: ELC (MWh/year)-Annual industrial electricity consumption. The value of GVA is determined as follows: where: SP ($)-Subsidies on products; TP ($)-Taxes on products. For residential customers, a power outage limits their freedom to manage their time as they see fit and theoretically and this warrants compensation. The indicator considers the value of leisure time lost due to a power outage. As published research shows, two assumptions are considered when determining VoLL. Firstly, it is assumed that 100% of the monetary value of leisure depends on electricity. Secondly, the same hourly value is used for employed and unemployed people. This method is called "Household Income." VoLL can be calculated by dividing the total value of annual leisure time by the total annual household consumption: where: LV ($)-Household leisure value; ELCh (MWh/year)-Annual household electricity consumption. The total value of leisure that is dependent on electricity is calculated as follows: where, the substitutability factor ( ) reflects the proportion of leisure time that depends on electricity and the unemployed coefficient ( ) reflects the lower value given to an hour of leisure by those who do not work [20].
Applied to a variety of sectors of activity, this method can lead to underestimating VoLL. A multiplication factor is used to reduce this bias. For large companies, the macroeconomic analysis excludes indirect costs (physical deterioration: equipment, semi-finished products, raw materials loss; restart costs: lost productivity, personnel costs, etc.) and is based on an immediate restart of the production line following an outage. However, internal and external experience feedback from large industrial enterprises shows that a short-term power outage (lasting one hour or less) can shut down the production line for many hours or even days. Thus, a multiplicative factor (MF) must be applied to account for the lost productivity and the re-launch of the production assets. For commercial enterprises and smaller industries, the macroeconomic analysis excludes raw materials loss (e.g., food for groceries) and assumes that they sell products constantly (24/7, 365 days a year). This assumption overestimates the hourly electricity consumption required to generate GDP. Considering that small commercial and industrial enterprises generate incomes 70 h/week, an MF must also be applied [17].
While a multiplicative factor must be applied to ensure optimal VoLL estimates, it is not sufficient in itself. Indeed, VoLL is still processed as a unique and constant value for industrial and residential customers. The macroeconomic approach can be applied the level of a specific region or district, a group of industries or even a large company. However, the flexibility of this approach has not been fully exploited. The idea developed in this paper is to apply the macroeconomic approach to microeconomic data. The proposed approach considers some parameters that affect the cost of an outage, such as the activity, industry, season, region and duration of the interruption. Along with the frequent use of VoLL in long-term reliability planning, this more effective approach will make it possible to apply VoLL in short-term operational planning.
To take advantage of this method, VoLL is estimated by the sector of activity. The data is classified according to the North American Industry Classification System (NA-ICS). The NAICS is a classification of business establishments by economic activity (production process), developed by the national statistics services of Canada, Mexico and the United States of America. The NAICS is a numbering system that employs a six-digit code at the most detailed industry level. The first two digits, which designate the largest business sector, are considered in this analysis (Table 2). In estimating VoLL, it is also important to consider the duration of the interruption. The approach developed in this paper introduces an outage duration factor, ( ). Note: If i corresponds to the residential class, no NAICS code is assigned. In this case, the code SC00 (for non-industrial) is assigned to the variable "sec" in Equation (9). Likewise, when the activity sector is not specified, the code SC99 (for all sectors) is assigned to the variable "sec." The parameters in Equation (9) are determined as follows: Multiplicative factor: in accordance with [17,18], = 8 ± 2 is used for medium and large enterprises, and = 3 ± 1 is used for small enterprises. In addition, no multiplicative factor is applied to residential class, so it is assumed that its value is = 1. Variation rate: ( ) is calculated by using data on interruption costs (Table 3) published in the report "Updated Value of Service Reliability for Electric Utility Customers in the United States," Berkley Laboratories [19]. Table 3. Estimated interruption cost per unserved kWh (USD 2013) by duration and customer class [19]. The variation rate ( ) is shown in Table 4, based on data from Table 3. The 30-min downtime is used as a benchmark (100%) (for example: , , × 100 = 510). The variation rate expressed as a percentage of the cost ($/kWh) is shown in Figure 10.  Based on the data in Table 4, the variation rate ( ) is modeled using the following formula:

Interruption Duration
which behaves as a constant for long durations and varies as 1/t for short durations. Based on the data in Table 4, and using the least-squares method, the parameters m and b in Equation (11) for the three classes are determined and presented Table 5. Table 5. Equation (11) where: -Expected Energy Not Served in kWh: The total customer interruption cost ( ) caused by this outage is equal to the sum for each affected customer: where: , (h)-Outage duration of class i and sector sec.

Validation of the VoLL Methodology
The case study shows how VoLL is applied for a major power outage that occurred at a strategic Hydro-Québec transmission substation. During the event, the first contingency involved approximately 700 MW. A total of 275,000 customers were affected, including two major manufacturers, two hospitals, numerous small enterprises and part of a subway line. Traffic lights were also out of service.
Service to the two hospitals, subway line and industrial sector (priority P1 and P2 customers) was completely restored after an outage duration of 1.82 h. The outage duration for the residential sector was 3.49 h.
The estimated VoLL for the residential sector (Equation (9)) is: The VoLL for the industrial sector takes several factors into account. GDP: Under this approach, the GDP generated by each industry during the power outage has to be known. In the absence of this information, this study takes account of the GDP for the quarter of the year in which the interruption occurred. The following indus-tries are considered: transportation and warehousing (SC4B), health care and social assistance (SC62) and industrial (all sectors combined, SC99), which represent the subway line, the two hospitals and the two industrials. According to the Institut de la statistique du Québec (Québec Statistical Institute) [40,41], these industries respectively contribute up to HQ is owned by the Government of Québec [42]. It manages 62 hydroelectric power plants with approximately 38 GW of installed generation capacity (mostly in northern Québec and Labrador), as well as 353 hydroelectric generators. It operates the most complex and extensive transmission system in North America, with over 34,000 km of highvoltage (49 kV to 735/765 kV) lines (including more than 11,000 km of 735-kV lines) that transport electrical energy to customers in very harsh climates and natural conditions and 533 transmission substations. Approximately 85% of the load is concentrated in southern Québec, in the Montréal metropolitan loop and the Québec city area. HQ has 15 strategic inter-connections with neighbor grids for exports/imports (Ontario, New Brunswick, NE USA). Its distribution grid comprises 118,000+ km of medium-voltage and 107,000+ km of low-voltage lines, 680,000+ pole-mounted transformers and 3000+ distribution substations.
The substation under study has a peak load of 500 MW and average load of 325 MW. Two 315-kV power lines feed the substation, which then supplies power to customers via 10 120-kV power lines. It is equipped with 4 power transformers, a static compensator, 24 circuit breakers, 2 capacitor banks and 88 switches.
For this case study, the substation load is estimated at 65,000 residential customers, 5000 small enterprise customers and 3 large enterprises. The breakdown of energy distribution is approximately 41% for residential customers, 27% for small enterprises and 32% for large enterprises. For confidentiality reasons, the facility system diagram is not presented in this paper.
The reliability of the substation assets is then modeled using the model described in Section 3.1. In particular, the model predicts changes in the equipment failure rate attributed to a change in the maintenance inspection intervals, as illustrated in Figure 4.
These failure rates modeled as statistical distributions are then fed into the PRISME simulator to determine the lost load (undelivered energy to customers) due to equipment failures resulting in unavailability (Figure 9). In the current study, only reparable and lethal failures are considered. Other events will be included in an upgraded version of PRISME, including planned outages for inspections, foreign inference, human error, weather events, switching failures, etc. For this study, 100,000 simulations were carried out for a 40-year analysis period, using 7 different inspection intervals with an initial periodicity of 0.5 to 2.0. The current maintenance interval is assigned a relative value of 1 ( Figure 11). Using the lost load obtained from PRISME, we can then estimate the value of the potential undelivered energy in accordance with the methodology described in Section 3.4. The results are shown in Figure 11 for VoLL, and the maintenance inspection costs and their sum are expressed in arbitrary (monetary) units (A.U.).
As depicted in Figure 11, the current inspection intervals for various assets under study (relative value of 1) provide a minimum overall cost. An increase in the maintenance inspection interval leads to a decrease in the preventive maintenance costs (orange line in Figure 11). However, it also causes a significant increase in the cost of undelivered energy (blue line in Figure 11) and therefore an important increase in the total costs that cannot be offset by reducing preventive maintenance costs.
Thus, any change in maintenance periodicity would lead to an increase in total costs and, consequently, is not recommended in the current situation. Such a recommendation is covered by Step 2 in the RIDM approach ( Figure 2) and provides input for decision makers called upon to make final decisions regarding potential changes to maintenance strategies (Step 3 in Figure 2). Other factors, such as the annual maintenance window and grouping equipment maintenance needs should be considered by decision makers.
It is important to stress that the inputs considered in this case study are only partial, since the maintenance strategy is designed at the system (power grid) level. Many event types and cost categories are not considered at this stage of our preliminary study, such as corrective maintenance, replacement, environmental impact, occupational health and safety, legal requirements, public perception and any other relevant cost category deemed important. They will be considered as part of the continuing development of the global model. This more holistic approach leads to system-level optimization, as shown in Figure  3.

Discussion
The above-presented methodology and case study illustrate the relevance of considering the concepts of asset management and resilience for electrical utilities holistically. The applicability of its use in regard to asset behavior, the power grid stochastic simulator PRISME and the monetized risk model including the enhanced approach to calculating VoLL has been demonstrated in the case of a major North American electrical utility, Hydro-Québec. In future analyses, the PRISME simulator will provide interruptions in MWh affecting customers across the power grid (system-level analysis). The cost of those interruptions will be calculated using the proposed VoLL methodology, which will be a major input for calculating monetized risks, as presented in Figure 9. Together with other inputs, these insights will support sound risk-informed decision making in asset management, as presented in Figure 2. Future work will include further improvements to the described methodology through a real-life application, as well as the development of new knowledge where it is deemed necessary through R&D and innovation. Moreover, it is important to further investigate the relationship between resilience and asset management strategies (Figure 1) in order to provide a more effective decision-making framework in the context of major uncertainties and a complex operational and business environment.

Conclusions
This paper presents an integrated framework for a comprehensive decision-making process in asset management for electrical power utilities, considering it as a complex adaptive system based on RIDM and the concept of resilience. It aims to maximize value from assets, optimize resources and ensure its overall sustainability and resilience in a complex operational and business environment marked by significant uncertainties. The approach developed here is the part of the PRIAD research project, a collaboration between Hydro-Québec TransÉnergie (HQT) and Hydro-Québec's Research Institute (IREQ). This paper describes in greater detail the key modules of the proposed methodology: asset behavior model, the power grid stochastic simulator (PRISME), the risk module for monetizing risks and the enhanced approach to calculating VoLL and incorporating it into the asset management process as a whole.
A case study related to a major outage at a strategic Hydro-Québec TransÉnergie substation demonstrated the applicability of the proposed methodology. The study showed among other things that the methodology has strong potential for supporting the strategic decision-making process in asset management at HQT. Other electrical utilities may tailor the approach according to their specific context.