A Failure Modes and Effects Analysis Framework for Assessing Geotechnical Risks of Tailings Dam Closure

: Tailings dams remain on site following mine closures and must be designed and reclaimed to meet long-term goals, which may include walk-away closure or long-term care and maintenance. The underperformance of these structures can result in signiﬁcant risks to public and environmental safety, as well as impacts on the future land use and economic activities near the structure. In Alberta, Canada, the expectation is for a tailings dam to be reclaimed and closed so that it can undergo deregistration. To aid in assessing the risks of underperformance during and after closure, a Generalized Failure Modes and Effects Analysis (G-FMEA) framework was developed to assess the long-term geotechnical risks for tailings dams in Alberta, with the goal of assessing the potential success of a tailings dam closure strategy. The G-FMEA is part of an initiative to enhance closure evaluations in Alberta in a collaborative effort between industry, the regulator, and academia. The G-FMEA incorporates the element of time to account for the evolution of the system, which should be applied at the planning stage and updated continually throughout the life of the facility. This paper presents the developed G-FMEA framework for tailings dams in Alberta, including the developed risk matrix framework.


Introduction
Historically, tailings dams were designed with the consideration of geotechnical failure modes that could occur during the active life of the mine. Little consideration was given to closure of the facility and the long period afterwards. This approach has resulted in many tailings dams that are in a poor position for closure or have been orphaned all together. This is problematic as the failure of these structures can still occur after the closure of a mine site, such as at the Matachewan Consolidated Mine in Ontario [1]; or when the mine is in a period of care and maintenance, such as at Obed Mountain Coal Mine in Alberta [2]. Regardless of whether the mine is active or not, tailings dams can pose a significant risk to the public and the environment if not properly managed or decommissioned. In the medium-to-long term, after decommissioning, uncertainties associated with the performance of these structures increase, as there is limited information regarding how a tailings dam ages in perpetuity. This knowledge gap poses a significant environmental, public, and financial risk, especially when combined with the serious consequences associated with the failure of tailings dams. As a result, it is vital that risk management practices be developed to assess the risks associated with the long-term behavior. Mitigation strategies should be adopted early in the life of the structure, or as soon as they are practical for active or closed facilities. There is a transition towards integrating sustainable mining practices at the outset of mining. Of note, the Landform Design Institute (LDI) advocates for mining with an end in mind, noting that there will always be residual risks present following the closure of a mine [3]. Managing the longterm residual risks requires a transparent mechanism that includes collaborative design, construction, reclamation, and aftercare [3].
In Alberta, Canada, the mining industry consists primarily of coal and oil sands extraction. The external tailings facilities (ETFs) at these mines are at various points in their lifecycles. With some facilities reclaimed or in the process of being reclaimed, it is important that systems are in place to aid operators in understanding the regulator's requirements for the process of deregistration. Deregistration is the process where the governmental body in charge of regulating these structures (i.e., Alberta Energy Regulator, AER) assesses a dam to determine if it can be removed from the dam and pond registry and no longer be regulated as a dam; Alberta is making ongoing progress towards this goal. This is evident in the release of the Alberta Dam and Canal Safety Directive (the Directive) and Manual 019: Decommissioning, Closure, and Abandonment of Dams at Energy Projects (Manual 019) [4,5]. Manual 019 indicates that "in closure, a dam operator provides active care and may transition to passive care, depending on site-specific circumstances. As a result, the Directive adopts a number of formal risk-management principles to ensure the long-term care and maintenance of tailings dams after the construction and operation phase has ended" [5].
With the impending closure of a number of external tailings facilities in Alberta, a Generalized Failure Modes Effects and Analysis (G-FMEA) framework was developed to assess the landform design for closure. This assessment should ideally be conducted before mining begins (during permitting) and updated as the project progresses. Ultimately, the goal of the G-FMEA framework is to aid in managing the residual risks associated with ETFs in both a practical and economical way. The residual risks must be acceptable to stakeholders.
In order to complete a comprehensive risk assessment, it may be necessary to conduct various levels of modelling to evaluate the behaviour and/or assess consequences. For example, in operational risk assessments, runout modelling and inundation mapping are important elements of the risk assessment, as noted by Ghahramani et al. [6]. The utility of such assessments for conducting a long-term closure risk assessment for the purposes of deregistration may need to be evaluated in the context of the regulator. For example, the AER (Manual 019) requires flowable tailings to be removed or mitigated for a tailings facility to be deregistered, such that they do not pose an unacceptable risk to dam safety [5]. Similar criteria regarding flowable tailings is outlined by Al-Mamun and Small [7]. While further advancements and an increased confidence in the results of the runout modelling and inundation mapping are useful for understanding the risks associated with flowable tailings, they may not be useful tools within the current deregistration framework. However, they remain useful tools for assessing the long-term risks of tailings facilities where it is not possible to remove the flowable tailings or for orphaned facilities.
The G-FMEA will fit most structures and failure modes but may highlight failure modes that require additional analyses, such as quantitative risk assessments. As such, the G-FMEA is intended to be used as a screening tool for the closure phase of the life cycle of an external tailings facility, where the risks assessed as acceptable require no further analyses and higher risks (or multiplicity of relatively low risk) can trigger more detailed and/or quantitative approaches. The G-FMEA is, therefore, meant to investigate the potential of the facility to be deregistered as a dam by investigating the risk of geotechnical failure. This paper presents the developed G-FMEA framework, including a developed risk matrix framework. The G-FMEA charts developed as part of the framework present the failure modes that are applicable to closure and should not be used to assess dam safety during construction or operation.

FMEA Background
FMEA is a top-down risk assessment tool that allows for the systematic identification and analysis of the different failure modes and their associated consequences [8,9]. FMEA aims to identify all potential failure modes, the consequences of these failure modes, the mechanisms of failure, and how the risks associated with the failures can be avoided or controlled [10]. The entire process is based on determining what happens if a specific component or element of the system fails [11]. When extended and applied to a dam, FMEA focuses on developing a clear picture of the dam, including the various components and how they interact in a systematic way [11]. Using this basis, FMEA can be used to evaluate how component failures can lead to overall system failures, the consequences of component and system failures, and the criticality of various components for risk control [11].
FMEA is generally site-specific and considers the site data, construction method, construction materials, and past behaviour of the structure [12]. As a result, it can be a timeconsuming, complex process (occasionally even requiring field or laboratory investigations). Despite this, it is proven to be an extremely useful tool as it allows risks to be assessed and managed. As shown by Santos et al., FMEA is effective in demonstrating the potential failure modes of all components of a tailings dam system which then allows for the development of a comprehensive dam monitoring and surveillance system [13]. To conduct an FMEA, the following structure can be used [11]: 1.
Define the system, including all components.

2.
Based on component interaction, de-aggregate the system into functional sub-systems. 3.
Break the sub-systems down into key elements and functions. 4.
Analyze the failure modes of the different elements. 5.
Assess the failure effects and consequences of the various elements. 6.
Repeat as necessary.
Steps 4 and 5 involve analyzing the failure mode, effects, and consequences of the different elements. This is an important step as it allows the effect of component failure modes on other components of the sub-system and the overall system to be evaluated.
Applying an FMEA to a tailings dam is not a new or revolutionary task; however, the application of an FMEA to assess the risk of a tailings dam over post closure time frames is challenging as a major limitation of an FMEA is the inclusion of time dependence and the depreciation of performance of system components [13]. It is expected that the dam will evolve significantly over time (i.e., clogging of drains, ageing of sands, etc.) [14,15]. Ultimately, the likelihood and consequences of failure are not static and may increase or decrease over time [14][15][16]. Robertson and Shaw note that some risks have a different likelihood or consequences if they occur during operations or post closure [9]. Risk matrices aim to include time through defining the time frame in which the probability of failure (PoF) is valid. This is typically set as an annual PoF, for the duration of the life of the structure. This concept is complex for evaluating closure as PoF is expected to change over long periods of time.
Incorporating time into an FMEA requires an evaluation and consideration of how risk profiles may change over time due to system changes, as shown in Figure 1. Point A represents the state of a system (dam structure) in terms of its PoF. Point A has an associated probability of failure that is above the acceptable limit at time zero. To move from point A to A' so that the risk meets the acceptable limit would require risk control measures to be implemented. While this satisfies the conditions at time zero, the question remains: What will happen as time progresses and the dam undergoes evolutionary processes? As shown in Figure 1, the risk profile may follow several different trajectories that range from decreasing over time to increasing over time. The changing risk profiles over time are influenced by the closure plan and considerations of the custodial transfer scenario of the local jurisdiction. In cases where the closure risks are intended to be managed with long-term maintenance, consideration may be required for scenarios where the maintenance may be disrupted for a period of time (for example, by war or another pandemic). The developed G-FMEA considers different temporal scales as a means of accounting for the time dependence, depreciation of system elements, and the associated changing risk profiles over time, as shown in Figure 1.

Risk Matrix Background
FMEAs may be qualitative or quantitative in nature and are commonly combined with qualitative risk matrices to estimate the likelihood and consequences of different failure modes. Risk matrices combine the estimates of the likelihood of a negative outcome with estimates of the magnitude of consequences to determine a risk level [9,16,17]. The risk level then determines the level and timing of the required mitigative measures and critical controls to be implemented to reduce the risk level [18]. Risk matrices are often perceived as being simple to interpret and use. However, according to Porter et al. [16], "guidance notes that explain the details of how the matrices were developed and how they were meant to be used for risk assessment, risk communication and risk management are often brief and cryptic". Ultimately, this lack of guidance can lead to problems with the application of the risk matrix, which is already commonly criticized for being subjective. Risk matrices have been critiqued by a number of authors, as shown in Table 1. The changing risk profiles over time are influenced by the closure plan and considerations of the custodial transfer scenario of the local jurisdiction. In cases where the closure risks are intended to be managed with long-term maintenance, consideration may be required for scenarios where the maintenance may be disrupted for a period of time (for example, by war or another pandemic). The developed G-FMEA considers different temporal scales as a means of accounting for the time dependence, depreciation of system elements, and the associated changing risk profiles over time, as shown in Figure 1.

Risk Matrix Background
FMEAs may be qualitative or quantitative in nature and are commonly combined with qualitative risk matrices to estimate the likelihood and consequences of different failure modes. Risk matrices combine the estimates of the likelihood of a negative outcome with estimates of the magnitude of consequences to determine a risk level [9,16,17]. The risk level then determines the level and timing of the required mitigative measures and critical controls to be implemented to reduce the risk level [18]. Risk matrices are often perceived as being simple to interpret and use. However, according to Porter et al. [16], "guidance notes that explain the details of how the matrices were developed and how they were meant to be used for risk assessment, risk communication and risk management are often brief and cryptic". Ultimately, this lack of guidance can lead to problems with the application of the risk matrix, which is already commonly criticized for being subjective. Risk matrices have been critiqued by a number of authors, as shown in Table 1. Table 1. Risk matrix issues, references, and associated comments related to the G-FMEA.

Issue
References Comments related to the G-FMEA Subjective nature (not as simple and transparent as it would seem). [8,[19][20][21][22][23] Quantitative data should be used as often as possible with supporting verbal descriptions and quantitative descriptors (ranges or anchor points to the definition of categories). The risk assessment should be accompanied by a description of the risk assessor, including their risk tolerances.
Can lead to incorrect risk prioritization. [21,22,24,25] The risk matrix does not focus on risk prioritization, but is intended as a screening tool to assess the closure design.
Can shut down conversations about risk instead of opening them up due to the pre-defined colour coding scheme determining risk prioritization. [24] The risk matrix is designed to determine if a closure design is adequate and is a screening tool. It involves input from stakeholders to determine the consequence ratings and is intended to open up dialogue.
Ranking reversal: where quantitatively smaller risks are assigned qualitatively higher rating levels than some quantitatively larger risks due to incorrect risk prioritization. [19,20,26,27] The risk matrix does not focus on risk prioritization, but is intended as a screening tool to assess the closure design.
Does not account for different risk tolerances of the individual conducting the risk assessment. [22,28] The risk assessor's background should be listed with the completed risk assessment.
Range compression where risks with very different likelihoods and consequences are grouped together. Risk matrices with too many categories may give false resolutions.
[19, 26,28,29] The number of risk categories should be developed with consideration of range compression.
Centering bias can be an issue where individuals have the tendency to avoid extreme values or statements when presented with a choice. This can exacerbate range compression. [26,30] An extra category can be added to both sides of the expected range for the consequences and likelihoods, as suggested by Duijm [31].
Category definition bias where different definitions exist for a given likelihood or consequence descriptor. [26,31] Clear definitions must be provided for the consequences and likelihoods categories.
Risk matrix can be misleading as it implies that risk is categorical as opposed to a position on a risk continuum. [20] The risk matrix should be accompanied by a clear definition of risk and associated discussion on the risk continuum.
Ambiguity of the consequence definition. There are different definitions used in practice for the consequence which can lead to issues if not clearly defined (worst case, most likely, a number of alternate discrete outcomes).
[31] The consequence category must be clearly defined.
Risk matrices cannot provide aggregate measures of risk (i.e., total risk). [23,27,32] The goal of this risk matrix is to serve as a screening tool and is not intended to provide an indication of total risk.
Risk matrices are unable to aggregate risk from multiple consequence dimensions. This means that different types of consequences should not be directly compared (i.e., impact on the environment, human life). In practice, a hazard is often assigned a risk level based on the most severe consequence. This is misleading. [23,31,33] A risk rating should be assigned for each consequence category for a hazard.
Corporate-wide risk matrices are intended to be used as a way of standardizing risk assessment and risk acceptance criteria across a company. This is problematic as risk tolerance may vary throughout a company. [31] Corporate-wide risk matrices should not be used with the Generalized FMEA. To aid in mitigating the known issues with risk matrices, a number of authors have suggested various improvements, including logarithmic scales [19,27]; probabilityconsequence diagrams with continuous scales [31]; and following the Cox axioms of weak consistency, betweenness, and consistent colouring [19]. Despite the issues associated with risk matrices, some authors suggest that an important element of using risk matrices is applying them in the appropriate scenarios. Bao et al. suggests that subjectivity is a vital characteristic of risk matrices as they are effective for assessments where data are insufficient and quantitative tools cannot be applied [32]. Baybutt notes that risk matrices should be used to provide " . . . initial decision guidance, which should be used with caution and the application of common sense. Risk matrices should not be used in isolation to make decisions" [27]. Considering all of this, the risk matrix developed as part of this research is intended to be used as a screening tool, where the risks assessed as acceptable require no further analyses and higher risks (or multiplicity of relatively low risk) could inform where/if quantitative risk assessment techniques need to be used in the further decision-making stages for the closure of a facility.

Generalized FMEA Framework
A Generalized FMEA framework was developed, based on the current practice with FMEA, for use by regulators and the industry to assess the long-term risk of the failure of a tailings dam following closure. In order to conduct an FMEA, the definition of tailings dam failure must be clearly laid out. For the purposes of this work, a dual definition will be used for failure, similar to the Ultimate Limit States and Serviceability Limit States used in foundation engineering, as defined in the Canadian Foundation Engineering Manual [34].
Here, failure is defined as:

•
Ultimate failure: the collapse of a tailings dam leading to catastrophic failure as defined by the Global Tailings Standard [35]: • "A tailings facility failure that results in material disruption to social, environmental and local economic systems. Such failures are a function of the interaction between hazard exposure, vulnerability, and the capacity of people and systems to respond. Catastrophic events typically involve numerous adverse impacts, at different scales and over different timeframes, including loss of life, damage to physical infrastructure or natural assets, and disruption to lives, livelihoods, and social order. Operators may be affected by damage to assets, disruption to operations, financial loss, or negative impact to reputation. Catastrophic failures exceed the capacity of affected people to cope using their own resources, triggering the need for outside assistance in emergency response, restoration and recovery efforts." • Serviceability failure: failure to perform as intended.
The goal of performing the G-FMEA is to guide practitioners and regulators through the process of continuing to ask questions in a disciplined manner to prevent ultimate failure. The risk management tool is not intended to be a static tool and should be revisited at specific intervals and triggered by changes in the facility or events of underperformance. A clear trigger protocol needs to be developed for revisiting the G-FMEA. The completed G-FMEA should undergo a robust scrutiny, just like the designs, be accompanied by peer reviews, and assessed by external reviewers or review boards, where applicable. The development of the G-FMEA is assisted by the following: 1.
Literature reviews related to: a. Tailings dam characteristics in the oil sands and coal industries; b.
Tailings dam failure modes; c.
Risk assessment tools.

3.
The input from the regulator and industry stakeholders.
Schafer et al. [14,15] and a detailed literature review were used to develop the overall system definition, which included dam characteristics that impact dam behaviour. The sys- tem definition serves as the foundation of the development of the G-FMEA and Ayyub notes that the first step to conducting a system definition is to establish the objectives [8]. This step is essential to providing context and ensuring that the users of the risk framework are asking the same questions when applying the G-FMEA to their sites. The objective question of the G-FMEA is: How should geotechnical risks associated with an external tailings facility in Alberta be managed in the long-term to achieve an acceptable closure plan, such that the facility is able to be deregistered as a dam? All users of the G-FMEA must have an understanding of the objective and intended use of the FMEA.
Following this, the system boundaries can be defined. The system definition is summarized in Figure 2 and shows the characteristics of the dam that should be carefully considered and defined prior to starting the FMEA. The general categories are summarized in the flow chart on the left with details on the general categories in the table on the right. Schafer et al. [14,15] and a detailed literature review were used to develop the overall system definition, which included dam characteristics that impact dam behaviour. The system definition serves as the foundation of the development of the G-FMEA and Ayyub notes that the first step to conducting a system definition is to establish the objectives [8]. This step is essential to providing context and ensuring that the users of the risk framework are asking the same questions when applying the G-FMEA to their sites. The objective question of the G-FMEA is: How should geotechnical risks associated with an external tailings facility in Alberta be managed in the long-term to achieve an acceptable closure plan, such that the facility is able to be deregistered as a dam? All users of the G-FMEA must have an understanding of the objective and intended use of the FMEA.
Following this, the system boundaries can be defined. The system definition is summarized in Figure 2 and shows the characteristics of the dam that should be carefully considered and defined prior to starting the FMEA. The general categories are summarized in the flow chart on the left with details on the general categories in the table on the right.   The system definition serves as the minimum of what should be defined for a tailings dam prior to conducting the FMEA. However, these parameters should be defined comprehensively and may take the form of a fact sheet. This step aids in evaluating and conducting the FMEA.
Following the development of the system definition, a hazard mapping exercise was conducted, which focused on identifying potential failure modes, mechanisms, triggers, and hazards. A failure mode is defined as the overarching cause of failure. The failure mechanism is the mechanism by which a hazard leads to a failure mode. This may consist of a series of steps. A trigger is defined as that which initially leads to the failure mechanism. A hazard is defined as a condition with the potential to cause an undesirable consequence. Combined, the system definition and hazard mapping contributes greatly to the development of the G-FMEA. The steps required to use the G-FMEA in practice are outlined in Figure 3. If risk mitigation is required, or the closure plan is altered, the process may need to be repeated. This may require the risk assessor to return to Step 1 to update the system definition if substantial changes are made. The system definition serves as the minimum of what should be defined for a tailings dam prior to conducting the FMEA. However, these parameters should be defined comprehensively and may take the form of a fact sheet. This step aids in evaluating and conducting the FMEA.
Following the development of the system definition, a hazard mapping exercise was conducted, which focused on identifying potential failure modes, mechanisms, triggers, and hazards. A failure mode is defined as the overarching cause of failure. The failure mechanism is the mechanism by which a hazard leads to a failure mode. This may consist of a series of steps. A trigger is defined as that which initially leads to the failure mechanism. A hazard is defined as a condition with the potential to cause an undesirable consequence. Combined, the system definition and hazard mapping contributes greatly to the development of the G-FMEA. The steps required to use the G-FMEA in practice are outlined in Figure 3. If risk mitigation is required, or the closure plan is altered, the process may need to be repeated. This may require the risk assessor to return to Step 1 to update the system definition if substantial changes are made. The G-FMEA includes four individual charts for the drainage system, foundation, dam body, and landform, which are provided in Appendix A. An excerpt of the drainage system chart is provided in Table 2. The drainage system and landform chart consider failure modes for a variety of different elements that could be present within a dam. The foundation chart and dam body chart do not provide failure modes for individual elements, since most failure modes are applicable to multiple elements. Each identified foundation element should be evaluated for every failure mode on the foundation list. Similarly, each dam body element must be evaluated for every failure mode on the dam body list.
Failure modes that are not applicable should be screened out prior to completing the risk assessment, as discussed in Step 5. A justification must be recorded for screening out The G-FMEA includes four individual charts for the drainage system, foundation, dam body, and landform, which are provided in Appendix A. An excerpt of the drainage system chart is provided in Table 2. The drainage system and landform chart consider failure modes for a variety of different elements that could be present within a dam. The foundation chart and dam body chart do not provide failure modes for individual elements, since most failure modes are applicable to multiple elements. Each identified foundation element should be evaluated for every failure mode on the foundation list. Similarly, each dam body element must be evaluated for every failure mode on the dam body list.
Failure modes that are not applicable should be screened out prior to completing the risk assessment, as discussed in Step 5. A justification must be recorded for screening out a failure mode and re-evaluated each time the FMEA is revisited. This may include a justification of whether a failure mode is credible or non-credible as defined by the Global Industry Standard on Tailings Management [35]. It is important to note that failure modes that may be credible in the short-term may not be credible in the long-term and vice versa.
While the list of failure modes in Appendix A for the various elements is extensive and reasonably comprehensive, it is recommended as a pre-list of failure modes as no list can be completely comprehensive as there are always additional site-specific hazards.
The G-FMEA charts should be applied to a facility using the complete worksheet provided in Appendix B (Table A5). Change to water discharge velocity, creation of secondary channels, localized areas of erosion, instability of dam

Temporal Scales
It is essential that the FMEA for a closed tailings dam considers the element of time (i.e., how the dam evolves). Representing this is not straightforward and can be approached using different methods. The first method would be to complete a series of FMEAs for different points in time considering the different evolutionary processes that may occur. Ultimately, this adds a lot of complexity and may be prohibitively time-intensive. Another route, which is adopted here, is to complete immediate-term, short-term, medium-term, and long-term risk assessments using a risk matrix for each. The determination of the time frames requires consideration of what the overall time frame is for closure. An evaluation of this was carried using the following points: geological context, nuclear repository industry, Uranium Mill Tailings Remedial Action (UMTRA) surface project, the current databases that are relied on in the industry (often for less than 100 years), our ability to forecast the future, and transitions in periods of management.
For the long-term time period, time frames in excess of 10,000 years were deemed to be inappropriate in the mining industry as it is expected that glacial periods in time frames in excess of 10,000 years would change the geology and effectively bury the tailings facilities. The use of these time frames is perhaps better suited for systems such as nuclear waste repositories in tectonically inactive areas. With the consideration of this and the expectation for a tailings dam closure lasting 1000 years [5,36], the maximum time frame selected for the G-FMEA is 1000 years. This is also reflected by the UMTRA project where the facilities are designed to be effective for 1000 years where reasonably achievable with a minimum required performance of 200 years [37][38][39][40]. All risk assessments conducted for the 1000-year time frame should include a careful consideration of the uncertainties associated with the current databases, climate change predictions, and our ability to forecast the future.
The selection of appropriate time frames for the immediate-term, short-term, and medium-term risk assessment was completed with consideration of the phases of the tailings facility following closure. Initially, consideration was given to aligning the different risk assessments with the 'active care' and 'passive care' phases commonly used in industry. This was ultimately discarded as different organizations describe these phases in different ways, which may ultimately lead to confusion [41][42][43]. To mitigate this, the performance monitoring assessment periods recommended by the International Network for Acid Prevention (INAP) were selected. INAP outlines three specific assessment periods, including the adaptive management period, proactive management period, and reactive management period, which are outlined in Table 3 [44]. The primary difference between the assessment periods is the degree of direct management employed [44]. The focus on management, as opposed to the actions taken during the assessment period, is attractive from a safety management system perspective. Table 3. Assessment periods [44].

Assessment Period Definition
Adaptive Management Occurs following closure activities when the closed facility may be at its greatest risk of failure prior to reaching equilibrium. The operator has the greatest capacity to respond. This period may be defined explicitly by the regulator, using accumulated knowledge, or using a site-specific scientific basis.

Proactive Management
Occurs as personnel and equipment are reduced. Involves a regular fixed frequency monitoring and maintenance schedule to confirm that the landform is trending along the designed trajectory. It is expected that the frequency will be less than during the adaptive management period.

Reactive Management
Issues are rectified strictly on a reaction basis, once a trigger event occurs. There should be a clear plan in place that outlines what the trigger events are and how they will be managed. Monitoring may occur in response to events such as fires, floods, earthquakes, and other extreme events.
Based on the characteristics associated with the adaptive management period, the immediate-term risk assessment should occur during this phase. The short-term risk assessment may occur during the adaptive management period or the proactive management period, depending on site-specific characteristics.
The selection of a time frame for the medium-term risk assessment involved the consideration of the reactive management period and the time frames outlined by the UMTRA project. The medium-term risk assessment should fall within the reactive management period where direct management efforts (and monitoring) decrease substantially and the facility largely shows that it meets closure objectives. The recommendations for the time frames for the various levels of risk assessment are provided in Table 4. The approximate assessment periods provided are intended as a guideline only and should be based on site-specific considerations. It is expected that the immediate-term and short-term risk assessments would have the most active forms of risk mitigation employed with risk measures, slowly transitioning to more passive methods as the risk assessment moves to medium-and long-term temporal scales. It is also expected that the level of confidence will decrease as the risk assessment moves from immediate-term to long-term scales for some failure modes.

Risk Matrix Framework
The development of the risk matrix for the G-FMEA involves a consideration of the known pitfalls with risk matrices, as outlined in Table 1; recommendations for improvement from Baybutt and Duijm [25,31]; and the evaluation and consideration of other published risk matrices, including those provided by Brown, Hadjigeorgiou, the Guidelines for Mine Waste Dump and Stockpile Design, Mine Environment Neutral Drainage (MEND), and the Oil Sands Tailings Dam Committee (OSTDC) [17,18,43,45,46]. A number of key observations can be taken from the reviewed published risk matrices, including: • Clear descriptions of likelihood and consequence ratings are essential. as provided by Griffin and MEND [17,46]. It is common for different risk matrices to use the terms 'almost certain', 'likely', and 'possible' but it is equally common for these to not be accompanied by clear definitions as shown in Brown and Hadjigeorgiou [18,45]. This is problematic as terms such as these are not universally understood to have the same meaning amongst practitioners [47]. This aligns with the critiques in Table 1.

•
A level-of-confidence scale (or similar) is recommended to go along with the risk matrix and selected risk rating [17]. • MEND and OSTDC were both developed for closure scenarios and provide useful analogues for the development of the risk matrix [17,43]. • OSTDC determines the consequence rating based on the loss of function of the structure, the degree of human intervention required on the structure after an event occurs, population at risk, and environmental economics [43]. In contrast, a risk matrix designed to assess the risk for an operational facility may consider elements such as health and safety, environment, community, reputation, and legal aspects when assessing the consequences of failure.
While the risk matrix developed by OSTDC provides the best analogue and a useful basis for this work, it cannot be applied directly to the G-FMEA, for the following reasons [43]:

•
The G-FMEA presented in this research assesses the individual elements of a dam, not the dam as a whole, which is what the OSTDC [43] risk matrix was developed for. The developed risk matrix must be fit for purpose.

•
The OSTDC risk matrix was developed to assess if a facility could be deregistered as a dam [43]. The consequence and likelihood ratings reflect this. In contrast, the risk matrix is used here to assess the risk of failure over time and if a closure plan is adequate to prevent failure to support a facility being deregistered as a dam.
The risk matrix framework is intended as an example of how a qualitative risk assessment method could be combined with the G-FMEA charts to assess a closure plan and includes a likelihood rating table, consequence rating table, and a risk matrix (including details on how to colour code the risk matrix and an example of a colour-coded matrix). The risk matrix to be used for a site should be defined by the industry, regulators, and other stakeholders with considerations of the technical, social, and economic aspects.

Likelihood Rating
The developed likelihood rating shown in Table 5 has seven different categories for likelihood from 'Close to non-credible' to 'Almost certain', which are defined with considerations of annual probabilities. Qualitative descriptors were developed to provide further context on the annual probabilities. The inclusion of seven likelihood categories is more extensive than many risk matrices, which commonly have five likelihood ratings. A more extensive likelihood rating list was selected due to the time frames associated with the closure and to aid in mitigating centering bias (Table 1). It also serves to cover the full range of likelihoods that may be encountered, which is an essential step to developing a likelihood table [25]. The risk matrix is intended to be applicable from the time that closure works have been completed to 1000 years following the completion of closure works. This means that the risk matrix is not intended to capture failure modes that may occur during the construction of the closure landscape. The risk matrix is also intended to be applied to orphaned facilities, which are defined as facilities with "owners that cannot be found or for which the owner is financially unable or unwilling to carry out the clean up" [48]. These facilities should not be confused with those that undergo abandonment as per the Alberta Dam and Canal Safety Directive [4]. It could be argued that the categories of 'Almost certain' and 'Likely' should not be applied to a closure scenario as the assumption is that the facility is designed to a standard such that the failure of an element will not occur within a year. While this may be true for many facilities, it is unlikely to be true for all facilities (i.e., orphaned facilities). Furthermore, some failure modes may become more likely when conducting a long-term risk assessment as factors such as degradation and climate change occur. The likelihood of occurrence is often described in terms of the probability of failure. While this is an important way of evaluating the likelihood, it is valuable to describe the likelihood categories using qualitative descriptors to anchor the probabilities [25]. Further, humans are notoriously poor at estimating the probability of extreme events [49]. As such, providing qualitative guidance for assessing the likelihood of a failure mode would be useful. Table 5 describes the likelihood categories in terms of the annualized probability and provides a qualitative interpretation. The quantitative interpretation guidance for the example risk matrix considers that each rating is a range of probabilities. Anchor values can also be adopted, if preferred, as long as it is clear to the users that the anchor values are a point approximation of this range of probabilities.

Almost certain
Almost certain that an incident will occur given the circumstances. Very high probability of one or more occurrences per year.
Higher than 10% probability in a year p ≥ 0.1 Likely High likelihood. Commonly observed at similar facilities.
Higher than 10% probability in 10 years p ≥ 0.01

Possible
Has occurred a number of times within the industry and at least once at the site (or similar facilities in the region).
Higher than 1% probability in 10 years p ≥ 0.001 Unlikely Has occurred before within the industry, but not at the site.
Less than a 1% probability in 10 years p < 0.001 Rare Low likelihood of occurrence, but not impossible. Has not occurred at the site but has occurred in industry.
Less than a 1% probability in 100 years p < 0.0001

Very rare
Very low likelihood of occurrence, but not impossible. Occurrence cannot be deemed non-credible Less than a 1% probability in 1000 years p < 0.00001 Close to non-credible Extremely remote likelihood of occurrence. Although the mechanisms are technically plausible for the occurrence, it is seen as near non-credible.
Less than a 1% probability in 10,000 years p < 0.000001 1 Industry encompasses the mining industry as a whole.

Consequence Rating
The consequence rating shown in Table 6 has five categories that range from 'Slight' to 'Severe'. The consequence rating is selected based on:

•
The degree of the consequence of failure of an element on the rest of the system.

•
This involves an assessment regarding if the failure will result in cascading consequences to other elements.

•
The degree of human intervention (post failure) required.
• It is important to note the degree of human intervention required in response to a failure as the ultimate goal is to reach a state of minimal or no human intervention. In cases where this is not possible, this may suggest a careful evaluation of the custodial transfer scenario for that facility.

•
The impact on the community should be assessed using input from the relevant stakeholders. This is difficult to evaluate in light of the time frames associated with closure. Our ability to predict downstream and upstream populations in the future is limited by our short-term knowledge; however, a consideration can be made of the likelihood that the affected area could be inhabited in the future (i.e., is the mine located in inhospitable terrain or in an area that is desirable for development?). This would allow baseline assumptions to be made about the future community impact. A key step to defining the impact on a community is identifying the community. As defined by the Impact and Benefit Agreement (IBA) Community Toolkit [50], this involves answering the questions: "Who is the community? How is the geographic, ethnic, or scope of community defined? Who legitimately represents the community? Is it simply representatives from local community organizations, or is it necessary to reach out to more diverse groups to ensure all elements are consulted? The definition of "community" should be inclusive enough to promote equity and avoid future conflict resulting from lack of inclusion" [50]. The community consequence rating provided in Table 6 is provided as a guideline only and requires input from the relevant stakeholders to explicitly define the community impact. This should be accomplished using a meaningful engagement as defined by the Global Tailings Standard [35]. Defining the consequence rating for the community may involve a consideration of health (including fatalities), loss of access/destruction of traditional lands, housing, destruction/damage of farmland, harm to livestock, damage to water or soil resources, impacts to trapping and fishing, the loss of animals, overall cultural impact, and employment. It is critical that the impact on all valued components (and their condition following a failure) to the relevant stakeholders are considered. A valued component is that which is considered important by the community [50]. It should be noted that Impact and Benefit Agreements developed between the mining companies and Aboriginal communities may include agreement provisions to account for catastrophic failures and losses and should be consulted when assessing the consequences of a dam failure to the community [50].
• The environmental impact must be assessed with a consideration of the impacted land (both surrounding the facility and in the facility itself, with respect to its post mining land use) and waterbodies and the toxicity of the tailings. The Canadian Dam Association (CDA) developed a Working Group to revamp their environmental classification system, noting that the existing system lacked a clear scientific basis and used vague criteria that was open to interpretation based on personal beliefs and principles [51]. Nikl et al. provides a summary of the draft environmental consequence classification system [51]. The framework considers three variables: ecological impact, the intrinsic hazards of contents, and the duration of the impact, to determine the consequence category (low, significant, high, and very high) using a matrix and dial combination method approach [51]. The consequence classification is intended to assess the environmental consequences from a global tailings dam failure (i.e., ultimate failure in this case). In contrast, the risk matrix and G-FMEA is intended to assess the failure modes of individual elements; however, the principles and concepts from Nikl et al. can still be used to support the environmental consequence category [51]. As such, the methodology used by Nikl et al. is adapted to fit within the consequence rating framework [51]. The goal of a closure plan is for the facility to remain safe and sustain a particular land use. Consequently, the environment category should be assessed with a consideration of the impact on the post mining land use, in addition to the surrounding environment. It is recognized that post mining land uses may change over time as the closure plan develops [52]. As the agreed post-mining land use changes, hazards should be re-evaluated. The most serious consequence associated with the post mining land use is when the promised land use is destroyed leaving the land sterilized. In such a situation, the facility may be fenced off to prevent all access to the site. This situation may also lead to downstream effects where additional land is sterilized. Ultimately, this would yield a severe consequence rating.
Fatalities were included in the consequence rating table in the 'Severe' consequence rating in the community column. The inclusion of fatalities in risk decisions can be a contentious issue as various stakeholders may have different risk tolerances for fatalities, especially with regard to voluntary versus involuntary risks. It could be argued that fatalities should be included throughout the risk matrix. For example, one fatality could classify a moderate risk rating, 10-100 fatalities could classify a major risk rating, and greater than 100 fatalities could classify a severe risk rating. Including fatalities only in the 'Severe' consequence rating is out of step with the risk tolerances for other industries and fails to reflect the reality that multiple-fatality events are more consequential than single-fatality events (e.g., road vs. air accidents). However, this goes against the Global Industry Standard on Tailings Management, which has a goal of zero harm to people and the environment with zero tolerance for human fatalities [35]. Considering this and the overall goals of the closure design, the current consequence rating table classifies one fatality as 'Severe'. This may be adapted through a meaningful engagement with stakeholders, if appropriate.
The approximate time frames for the environment and community categories are provided: short-term (<5 years), medium-term (5-25 years), and long-term (>25 years). These time frames are intended as guidance and may be amended with the input of stakeholders. Depending on the risk tolerance of the stakeholders, it may be necessary to reduce these time frames. Intervention or maintenance required to limit impact of cascading consequences.
Released tailings are not toxic 2 , and/or moderate loss of habitat (5-20%) of species of special interest 3 , and/or acceptable restoration of water bodies and environment feasible in a short time frame (<5 years).

Major
Global failure of tailings dam with minor release of tailings.
Intervention or maintenance required to maintain function of structure as a whole.
Released tailings are toxic 2 , and/or significant loss of habitat (20-50%) of species of special interest 3 , and/or acceptable restoration of water bodies and environment feasible in a moderate time frame (5-25 years).

Severe
Global failure of tailings dam with catastrophic release of tailings. Structural repair not possible.
Released tailings are toxic 2 , and/or very significant loss of habitat (>50%) of species of special interest 3 , and/or acceptable restoration of water bodies and environment unlikely within an extended time frame (>25 years).
Notes: 1 Assigned consequence should reflect the most likely outcome. If assigning consequence with consideration of the worst case or a combination of discrete outcomes, this must be declared. 2 Toxicity assessment of tailings should consider an assessment of the fluids and solids (leaching potential, acidity, radioactivity). 3 Species of special interest is defined as a species that lives in the inundation area that would be greatly impacted by habitat loss (preferable to select a species that is provincially or federally listed). 4 Community impacts must be determined through meaningful engagement with stakeholders and may include a consideration of health, loss of access/destruction of traditional lands, housing, destruction/damage of farmland, harm to livestock, damage to water or soil resources, impacts to trapping and fishing, loss of animals, overall cultural impact, and employment. Reputation, legal aspects, and economics are not considered in this consequence table as they are considered site-and corporation-specific. It may be necessary to assess these aspects on a site-specific basis.

Risk Matrix and Rating
The seven likelihood ratings and five consequence ratings can be combined to form a risk matrix, as shown in Figure 4. The challenging part of finalizing the risk matrix lies in determining the assigned risk category for a given likelihood and consequence rating. The risk categories for the G-FMEA are presented in Table 7. The risk categories were developed with the consideration of the suitability of the closure design to prevent failure such that a facility could be deregistered as a dam. This resulted in four risk categories from 'Low' to 'Extreme'. As the risk category increases, the level of the required risk mitigation increases as the closure plan is assessed as being inappropriate in preventing serviceability failure of a particular element.
It is common practice for risks to be managed using the ALARP principle: As Low As Reasonably Practicable (ALARP). In ALARP, all risk reduction measures should be employed as long as the cost of implementing them is reasonably practicable with a consideration of cost effectiveness [53]. In Table 7, the high-risk category is defined with consideration of the principle of ALARP. In the high-risk category, the risks are undesirable and must be reduced using ALARP. If the risk category cannot be reduced using ALARP, the closure plan should be altered to accommodate risk mitigation.

Risk Matrix and Rating
The seven likelihood ratings and five consequence ratings can be combined to form a risk matrix, as shown in Figure 4. The challenging part of finalizing the risk matrix lies in determining the assigned risk category for a given likelihood and consequence rating. The risk categories for the G-FMEA are presented in Table 7. The risk categories were developed with the consideration of the suitability of the closure design to prevent failure such that a facility could be deregistered as a dam. This resulted in four risk categories from 'Low' to 'Extreme'. As the risk category increases, the level of the required risk mitigation increases as the closure plan is assessed as being inappropriate in preventing serviceability failure of a particular element.
It is common practice for risks to be managed using the ALARP principle: As Low As Reasonably Practicable (ALARP). In ALARP, all risk reduction measures should be employed as long as the cost of implementing them is reasonably practicable with a consideration of cost effectiveness [53]. In Table 7, the high-risk category is defined with consideration of the principle of ALARP. In the high-risk category, the risks are undesirable and must be reduced using ALARP. If the risk category cannot be reduced using ALARP, the closure plan should be altered to accommodate risk mitigation.  Once the risk categories are developed, they can be applied to the risk matrix to develop a colour-coded 'heat map' based on the combination of the likelihood rating and consequence rating. The risk matrix is not colour-coded or populated with risk ratings as this defines risk tolerance. This step should be completed with the input of all relevant stakeholders (i.e., industry, regulator, the public). However, an example of the colour-coded risk matrix is provided in Figure 5. The colour-coded matrix presented in Figure 5 demonstrates major hazard aversion and presents a threshold line that is suggested to be used to trigger a more detailed quantitative analysis. The framework for attaining the colourcoded matrix is provided in this section.  Once the risk categories are developed, they can be applied to the risk matrix to develop a colour-coded 'heat map' based on the combination of the likelihood rating and consequence rating. The risk matrix is not colour-coded or populated with risk ratings as this defines risk tolerance. This step should be completed with the input of all relevant stakeholders (i.e., industry, regulator, the public). However, an example of the colourcoded risk matrix is provided in Figure 5. The colour-coded matrix presented in Figure 5 demonstrates major hazard aversion and presents a threshold line that is suggested to be used to trigger a more detailed quantitative analysis. The framework for attaining the colour-coded matrix is provided in this section.
Minimal guidance is provided in the literature on how to colour code a risk matrix. Ayyub indicates that colour coding a risk matrix involves shading each box depending on a "subjectively assessed risk level" [8]. IEC provides more guidance on assigning risk categories to the risk matrix, indicating that assigning risk categories should depend on the definitions of the likelihood and consequence tables [23]. Using these definitions, the risk matrix can have risk categories assigned to provide extra weight to the consequences or likelihoods, or it may be symmetrical [23]. Oboni suggests that a risk matrix cannot be symmetrical as this typically indicates an incorrect prioritization of risks [54]. Risk matrices should be coloured with consideration of what the cells tell the user.
The developed risk matrix for this research helps to assess if the risks associated with decommissioning an ETF are acceptable or where/if a more detailed analysis is required, and could be applied throughout a tailings facility's life. In light of this, an alternative way to use the developed risk matrix is to work with a "threshold". Any hazard categorized with a risk higher than the defined threshold (orange in Figure 5) requires a more detailed quantitative risk assessment method, which may require a more in-depth investigation, design, or modelling to be completed. This is an important step to aid in mitigating issues associated with upper severity limits [25].
The framework for developing the example risk matrix was developed with a consideration of the tool being used as a screening method; guidance provided by Baybutt [25], Duijm [31], and Levine [29]; and a consideration of the known pitfalls that can be associated with risk matrices, as discussed in Table 1. The development of the example risk matrix in Figure 5 was accomplished using the following steps, and may be used as guidance for colour coding the risk matrix for a specific site (see Figure 4). Appendix C shows how Figure 5 was developed using the following steps.

2.
Develop iso-contours of equal risk based on the estimated quantitative consequence measure and the provided likelihoods (Likelihood = Risk/Consequence). Assuming that the consequence quantitative measure is an order of magnitude between categories, the consequence and likelihood iso-contours should be plotted in log-log space.
Plot the iso-contours on the risk matrix. Assign the risk categories to the iso-contours of equal risk and use these to develop an initial colour-coded risk matrix. This step serves as a starting point for colour coding the risk matrix. Any available known acceptance criteria can be used as an initial starting point for this exercise. Decisions will need to be made about the cells that have an iso-contour cutting through them (i.e., do you assign the higher or lower risk category?). This exercise demonstrates a known issue with risk matrices where risk is represented as categorical, rather than on a continuum, and hazards are binned into these categories.

a.
A note from Baybutt [25]: When consequences and likelihood categories differ by orders of magnitude, they are represented in log-log space. Practically, this means that high consequence-low likelihood events (negatively correlated) have the most uncertain risks in these type of rating schemes. This is especially problematic as these events already have a lot of inherent uncertainty. Ultimately, it is not possible to increase the precision of these events on a risk matrix. This concept of uncertainty related to these types of events supports the idea of using a threshold value on the risk matrix to trigger more extensive risk analysis.

3.
Assess the initial colour-coded risk matrix against the Cox axioms for a well-defined risk matrix [19]. This step is simply a logical 'check'. Duijm recommends that a key to using the risk matrix is to recognize that the colouring of the risk matrix is a risk definition in its own right (and cannot be separately and stringently defined as Risk = probability × consequence) as it expresses a subjective risk perception (i.e., major hazard aversion), which is an important element of risk decisions [31]. Duijm notes that, when no reference is made to an external risk definition, then "the colouring of the matrix itself is the only relevant risk definition, then the axioms of weak consistency and consistent colouring are trivial" [31].

4.
Assess if the Levine lettering scheme is more appropriate for your risk analysis goals [29]. In the study by Levine, logarithmic axes are used for the consequence and likelihood axes [29]. Straight line iso-contours of equal risk are drawn. Instead of using colours, the different areas are labelled A, B, C, D, E, F, and G (each line represents a new letter). This method results in a matrix that is somewhat unintuitive, but this prevents risk matrix users from making assumptions about the risks based on the colouring scheme. When a risk matrix is constructed in this manner, the following conclusions can be drawn: a. Risks in one letter category can only be distinguished from risks in another category if they are more than one letter apart (i.e., C > A, D > B). b.
Risks in categories that are zero or one letter apart are not able to be distinguished from one another (i.e., it is not known if C > B or B > C).

5.
As noted by Duijm, another way to develop risk scores and colouring is by using basic arithmetic (multiplication and addition) based on ordinal numbers assigned to each consequence and probability category [31]. When the categories are essentially linearly spaced, then the multiplication of the ordinal numbers is an appropriate way of defining the risk score. When the categories are essentially logarithmically spaced, the addition of the ordinal numbers is desired. Apply the appropriate mathematical operations and compare to the risk matrix developed in Step 3. 6.
Assess if major hazard aversion is required and apply as necessary. Hazard aversion is the concept of low-probability-high-consequence events being assigned a higher risk than a high-probability-low-consequence event, even if the expected loss is mathematically the same [31]. This concept is used in scenarios where low-probability-highconsequence events are of greater concern and may require different decisions [25]. 7.
Conduct "logic checks" by stress testing the risk matrix with different scenarios. Focus on what the risk matrix is telling the user. Adjust the colour scheme as needed and repeat. This is an important step as the risk matrix may be developed using one quantitative measure. As such, it must be assessed to determine if it is applicable across the different consequence categories.

Limitations of the Risk Matrix Framework
As noted in Table 1, a common issue related to risk matrices is their inability to aggregate risks, both in general and from multiple consequence categories. With regard to aggregating risks, as noted by IEC "one cannot define that a particular number of low risks or a low risk identified a particular number of times is equivalent to a medium risk" [23]. This is an important limitation of risk matrices, and one that risk users must be cognizant of. Therefore, sound engineering judgement is required for evaluating the results of the matrix where

Limitations of the Risk Matrix Framework
As noted in Table 1, a common issue related to risk matrices is their inability to aggregate risks, both in general and from multiple consequence categories. With regard to aggregating risks, as noted by IEC "one cannot define that a particular number of low risks or a low risk identified a particular number of times is equivalent to a medium risk" [23]. This is an important limitation of risk matrices, and one that risk users must be cognizant of. Therefore, sound engineering judgement is required for evaluating the results of the matrix where multiple low-risk failure modes are present. Once the assessment is complete, a careful evaluation of the results must be completed. All hazards that receive a risk rating of red or orange need to undergo further analyses and may need additional controls. If a scenario occurs where all the hazards are green (or green and yellow), the organization should evaluate if the aggregation of them represents an intolerable risk. In that case, the quantified descriptions in the likelihood matrix can help with the aggregation of likelihood.
Next, it is common practice for a hazard to be assigned a risk level based on the most severe consequence. Caution should be employed with this practice, and if practitioners choose to use this methodology, it must be clearly declared. Instead, it is recommended that risk ratings are assigned for each consequence category (i.e., consequence of failure of an element on the rest of the system, degree of human intervention required, environment, and community). This is largely due to the inability of risk matrices to aggregate the risk from multiple consequence dimensions, meaning that the consequences should not be directly compared (as noted in Table 1). Therefore, a careful consideration is needed when determining a final risk rating for the hazard.
One final issue that must be addressed is related to the correlated risk. The G-FMEA involves breaking the system down into its individual components. It is possible that one hazard may become a trigger for another hazard. Risk matrix users must be aware of the correlated risks and note them explicitly in the reporting of results, along with how they were considered in the study.
Given the time frames associated with closure, it is recommended that a level of confidence regarding the risk estimates is provided. There are uncertainties associated with assessing the likelihood and consequence ratings, which may be associated with a lack of data, lack of system understanding, uncertain future operating conditions or uncertain maintenance, and regional development post closure [9]. As the time frames associated with the G-FMEA may extend to 1000 years, it is expected that the level of confidence may decrease as the temporal scale increases. A level of confidence framework is currently being developed at the University of Alberta to accompany the G-FMEA.

Summary and Conclusions
Tailings dams can pose a significant risk to the public and the environment following closure. To prevent the failure of these large structures after mining, their long-term behavior must be understood and incorporated into the closure design, ideally from the conception of the project and initial design. Challenges lie in the limited information available regarding how these structures age over time and the forecasting, loading, and environmental scenarios over long-term periods (weather events, seismicity, human activity, etc.). A G-FMEA framework was developed to aid in assessing the potential success of a tailings dam closure strategy for external tailings facilities in Alberta, Canada. The goal of the G-FMEA was to reduce the future risk of geotechnical failure on the environment and the public to a degree that is both practical and economical. The G-FMEA can be used to assess the closure risks at the onset of mining if adopted early in the design process, which allows meaningful changes to be made to reduce the long-term risks from operational design decisions and ultimately support the goal of deregistration. In many cases, this is not possible as the tailings facility may be further into its life cycle (as is the case at many Alberta tailings facilities).
While this geotechnical risk management tool was developed with Alberta in mind, it does have global significance and could be easily adapted to other jurisdictions or organizations (i.e., to account for the different types of mines present in other parts of Canada or the world). Furthermore, the G-FMEA may be adapted for in-pit facilities with perimeter dykes (as desired) by updating the G-FMEA tables to account for failure modes applicable to these types of facilities. The current G-FMEA tables were developed with a consideration of external tailings facilities at oil sands and coal mines in Alberta. Differences in physiographic regions, seismic regions, expected climate change, etc., could lead to other failure modes that should be included in the G-FMEA tables.
The G-FMEA requires that the dam be broken down into the individual elements. Each element should be analyzed using the G-FMEA framework, which includes four individual charts for the drainage system, foundation, dam body, and landform. The G-FMEA is accompanied by a developed risk matrix, including a likelihood rating table, consequence rating table, and final risk matrix. The risk matrix framework is intended as an example of how a qualitative risk assessment method could be combined with the G-FMEA charts to assess a closure plan. It is recommended that the risk matrix to be used for a site should be defined by industry and regulators with considerations of technical, social, and economic aspects. Guidelines are provided on how to colour code the risk matrix. It is recommended that the risk matrix be accompanied by a level of confidence rating, which is currently being developed at the University of Alberta.
This risk management tool is timely and necessary in Alberta as many tailings facilities approach closure. It provides a systematic method for assessing the post closure risks associated with tailings facilities, so that they can be managed, and, ultimately, aims to develop actionable tools consistent with the Global Industry Standard on Tailings Management. The G-FMEA framework aids in satisfying the risk management requirements required by the Alberta Energy Regulator for the closure of a tailings dam to support these facilities being deregistered as dams.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.  Brittle rupture of geosynthetic, lack of control of phreatic surface (potential rise in phreatic surface), increase in seepage, pond on reclamation surface, internal erosion, global instability, release of water into downstream shell, erosion on downstream slope

Creep deformation
Slow and continuous deformation under long-lasting shear and pressure forces from consolidation of overlying material How much deformation is expected over time as the materials above the geotextile consolidate and settle? Has the impact on the geosynthetic been considered?
Reduction in thickness leading to reduction in drain capacity or shear failure of drain if in-place deformation reaches a critical value, lack of control of phreatic surface (potential rise in phreatic surface), increase in seepage, pond on reclamation surface, internal erosion, global instability, release of water into downstream shell, erosion on downstream slope Does the downstream or exit surface of the geosynthetic have the potential to be blocked and prevent drainage from sediment, vegetation, ice, snow, etc.? Could adjacent materials impede movement of water to the geosynthetic?
Lack of control of phreatic surface (potential rise in phreatic surface), increase in seepage, pond on reclamation surface, internal erosion, global instability, release of water into downstream shell, erosion on downstream slope Blinding where fine-grained soils are prevented from entering the geotextile, which creates a filter cake Formation of a filter cake at the interface of the geosynthetic from coarse particles being retained by the geotextile and intercepting fine particles migrating from the soil Is the geosynthetic in intimate contact with the soil? Have all appropriate filter criterion been followed during design?
Lack of control of phreatic surface (potential rise in phreatic surface), increase in seepage, pond on reclamation surface, internal erosion, global instability, release of water into downstream shell, erosion on downstream slope Low permeability cores Hydraulic fracture Decrease in total stress (i.e., differential settlement, arching in narrow cores), increase in porewater pressure Is there a narrow core? Is there the potential for excessive differential settlement that could lead to a decrease in total stress? Is there the potential for an increase in porewater pressure? Is there an effective downstream filter to prevent internal erosion of the core?

Cracking, internal erosion, global instability
Internal erosion in dam from suffusion High hydraulic gradients, design/construction defect, presence of widely gap-graded or non-plastic gap-graded soils Is the material widely gap-graded or gap-graded non plastic? Is there an effective downstream filter to prevent internal erosion of the core?
Global instability, seepage on the downstream slope, settlement of the crest, permeability may increase as erosion progresses or decrease if clogging occurs Internal erosion in dam from concentrated leak Cracks from vertical deformation in foundation, starter dyke, or other tailings materials or differential settlement; tunnels created by burrowing animals; hydraulic fracture; high hydraulic gradient; design/construction defects Is there a crack or gap that could allow for a concentrated leak to develop? Is there an effective downstream filter to prevent internal erosion of the core?

Global instability, development of a pipe
Internal erosion in dam from contact erosion Parallel flow in coarser layer to the interface between the coarse-grained and fine-grained soil, high hydraulic gradients, design/construction defects Is there a contact between a coarse and fine-grained soil? Is there a filter in place? Is there an effective downstream filter to prevent internal erosion of the core? Is there an effective downstream filter to prevent internal erosion of the core?
Global instability, static liquefaction, settlement of the crest, loss of stability or unravelling, eroded material can clog the permeable layer and increase the porewater pressure (could result in hydraulic fracture and uplift of the downstream toe or a rise in the phreatic surface), development of a pipe Release of pore water and loss of height (potential for pond to develop on reclamation surface), development of cracks above starter dyke, internal erosion, overtopping      * Likelihood rating, consequence rating, risk rating, level of confidence, and controls must be determined for each failure mode for the short-term assessment, medium-term assessment, and long-term assessment.

2.
In order to develop the iso-contours of equal risk, quantitative values from 0.01 to 10,000 were assigned to the consequence categories and assumed to have an order of magnitude increase between the categories. For individual projects, site-specific consequences could be considered here where there are known magnitudes of the consequences (i.e., financial impacts of environmental consequences). Iso-contours of equal risk were developed based on the estimated quantitative consequence measure and the provided likelihoods using the definition (Likelihood = Risk/Consequence). The iso-contours are shown in Figure A1, which show the annualized probability plotted against the consequences. It is important to remember that this is an estimation technique only and serves as a first-order step for colour coding the matrix. It is desirable to use quantitative measures of the consequences that extend across the full range of categories. Risk categories were then assigned to the iso-contours of equal risk (Table A6). This was used to colour code the initial risk matrix shown in Figure A2. Cells that had an iso-contour cutting through them were assigned to the higher risk category.

Appendix C. Risk Matrix Development
The steps to colour code the risk matrix outlined in Section 4.2.3 are described here with the associated development of the example risk matrix in Figure 5 used for illustration purposes. 1. The likelihood ratings (Table 5), consequence ratings (Table 6), and risk ratings (Table 7) were evaluated. 2. In order to develop the iso-contours of equal risk, quantitative values from 0.01 to 10,000 were assigned to the consequence categories and assumed to have an order of magnitude increase between the categories. For individual projects, site-specific consequences could be considered here where there are known magnitudes of the consequences (i.e., financial impacts of environmental consequences). Iso-contours of equal risk were developed based on the estimated quantitative consequence measure and the provided likelihoods using the definition (Likelihood = Risk/Consequence). The iso-contours are shown in Figure A1, which show the annualized probability plotted against the consequences. It is important to remember that this is an estimation technique only and serves as a first-order step for colour coding the matrix. It is desirable to use quantitative measures of the consequences that extend across the full range of categories. Risk categories were then assigned to the iso-contours of equal risk (Table A6). This was used to colour code the initial risk matrix shown in Figure  A2. Cells that had an iso-contour cutting through them were assigned to the higher risk category. Figure A1. Iso-contours of equal risk. Table A6. Assigned risk categories to iso-contour risk levels.  Table A6. Assigned risk categories to iso-contour risk levels.
3. The initial colour-coded risk matrix was assessed to determine if it satisfied the Cox axioms (weak consistency, betweenness, and consistent colouring). 4. The risk matrix was assessed using the Levine lettering scheme [29]. For this, logarithmic axes were used for the consequence and likelihood axes and straight-line isocontours of equal risk were drawn (similar to Figure A1). Following this, each area was labelled with a letter as opposed to a colour, as shown in Figure A3 (each line represents a new letter). Levines method results in a matrix that is somewhat unintuitive, but this prevents risk matrix users from making assumptions about risks based on the colouring scheme. When a risk matrix is constructed in this manner, the following conclusions can be drawn: a. Risks in one letter category can only be distinguished from risks in another category if they are more than one letter apart (i.e., C > A, D > B). b. Risks in categories that are zero or one letter distant are not able to be distinguished from another (i.e., it is not known if C > B or B > C). Figure A3. Risk matrix based on Levine [29].
5. As noted by Duijm, another way to develop risk scores and colouring is by using basic arithmetic (multiplication and addition) based on ordinal numbers assigned to each consequence and probability category [31]. As the categories were logarithmically spaced, the addition of the ordinal numbers was used, as shown in Figure A4. For this example, this results in a risk matrix that is colour-coded in the same way as Figure  A2. Figure A2. Initial colour-coded risk matrix with consideration of iso-contours.

3.
The initial colour-coded risk matrix was assessed to determine if it satisfied the Cox axioms (weak consistency, betweenness, and consistent colouring). 4.
The risk matrix was assessed using the Levine lettering scheme [29]. For this, logarithmic axes were used for the consequence and likelihood axes and straight-line iso-contours of equal risk were drawn (similar to Figure A1). Following this, each area was labelled with a letter as opposed to a colour, as shown in Figure A3 (each line represents a new letter). Levines method results in a matrix that is somewhat unintuitive, but this prevents risk matrix users from making assumptions about risks based on the colouring scheme. When a risk matrix is constructed in this manner, the following conclusions can be drawn: a. Risks in one letter category can only be distinguished from risks in another category if they are more than one letter apart (i.e., C > A, D > B). b.
Risks in categories that are zero or one letter distant are not able to be distinguished from another (i.e., it is not known if C > B or B > C).
Minerals 2021, 11, x FOR PEER REVIEW 30 of 32 Figure A2. Initial colour-coded risk matrix with consideration of iso-contours.
3. The initial colour-coded risk matrix was assessed to determine if it satisfied the Cox axioms (weak consistency, betweenness, and consistent colouring). 4. The risk matrix was assessed using the Levine lettering scheme [29]. For this, logarithmic axes were used for the consequence and likelihood axes and straight-line isocontours of equal risk were drawn (similar to Figure A1). Following this, each area was labelled with a letter as opposed to a colour, as shown in Figure A3 (each line represents a new letter). Levines method results in a matrix that is somewhat unintuitive, but this prevents risk matrix users from making assumptions about risks based on the colouring scheme. When a risk matrix is constructed in this manner, the following conclusions can be drawn: a. Risks in one letter category can only be distinguished from risks in another category if they are more than one letter apart (i.e., C > A, D > B). b. Risks in categories that are zero or one letter distant are not able to be distinguished from another (i.e., it is not known if C > B or B > C). Figure A3. Risk matrix based on Levine [29].
5. As noted by Duijm, another way to develop risk scores and colouring is by using basic arithmetic (multiplication and addition) based on ordinal numbers assigned to each consequence and probability category [31]. As the categories were logarithmically spaced, the addition of the ordinal numbers was used, as shown in Figure A4. For this example, this results in a risk matrix that is colour-coded in the same way as Figure  A2.

5.
As noted by Duijm, another way to develop risk scores and colouring is by using basic arithmetic (multiplication and addition) based on ordinal numbers assigned to each consequence and probability category [31]. As the categories were logarithmically spaced, the addition of the ordinal numbers was used, as shown in Figure A4. For this example, this results in a risk matrix that is colour-coded in the same way as Figure A2. 6. Major hazard aversion was applied to the risk matrix in Figure A2 to assign a higher risk rating to high-probability-low-consequence events, as these events were considered to be of greater concern, as shown in Figure A5. Figure A5. Colour-coded risk matrix following application of hazard aversion.
7. Following the application of major hazard aversion, the risk matrix was stress tested by evaluating its performance in different scenarios and evaluating what the risk matrix told the user. The evaluation showed that the yielded risk ratings from Figure A5 for the 'Slight' and 'Minor' consequence rating columns were too high. Some amendments were made to the risk matrix and it was stress tested again. This resulted in the final example risk matrix in Figure A6.

6.
Major hazard aversion was applied to the risk matrix in Figure A2 to assign a higher risk rating to high-probability-low-consequence events, as these events were considered to be of greater concern, as shown in Figure A5. 6. Major hazard aversion was applied to the risk matrix in Figure A2 to assign a higher risk rating to high-probability-low-consequence events, as these events were considered to be of greater concern, as shown in Figure A5. Figure A5. Colour-coded risk matrix following application of hazard aversion.
7. Following the application of major hazard aversion, the risk matrix was stress tested by evaluating its performance in different scenarios and evaluating what the risk matrix told the user. The evaluation showed that the yielded risk ratings from Figure A5 for the 'Slight' and 'Minor' consequence rating columns were too high. Some amendments were made to the risk matrix and it was stress tested again. This resulted in the final example risk matrix in Figure A6.

7.
Following the application of major hazard aversion, the risk matrix was stress tested by evaluating its performance in different scenarios and evaluating what the risk matrix told the user. The evaluation showed that the yielded risk ratings from Figure A5 for the 'Slight' and 'Minor' consequence rating columns were too high. Some amendments were made to the risk matrix and it was stress tested again. This resulted in the final example risk matrix in Figure A6. 6. Major hazard aversion was applied to the risk matrix in Figure A2 to assign a higher risk rating to high-probability-low-consequence events, as these events were considered to be of greater concern, as shown in Figure A5. Figure A5. Colour-coded risk matrix following application of hazard aversion.
7. Following the application of major hazard aversion, the risk matrix was stress tested by evaluating its performance in different scenarios and evaluating what the risk matrix told the user. The evaluation showed that the yielded risk ratings from Figure A5 for the 'Slight' and 'Minor' consequence rating columns were too high. Some amendments were made to the risk matrix and it was stress tested again. This resulted in the final example risk matrix in Figure A6. Figure A6. Example colour-coded risk matrix. Figure A6. Example colour-coded risk matrix.