Advancing Hazard Assessment of Energy Accidents in the Natural Gas Sector with Rough Set Theory and Decision Rules

The impacts of energy accidents are of primary interest for risk and resilience analysts, decision makers, and the general public. They can cause human health and environmental impacts, economic and societal losses, which justifies the interest in developing models to mitigate these adverse outcomes. We present a classification model for sorting energy accidents in the natural gas sector into hazard classes, according to their potential fatalities. The model is built on decision rules, which are knowledge blocks in the form of “if (condition), then (classification to hazard class x)”. They were extracted by the rough sets method using natural gas accident data from 1970–2016 of the Energy-related Severe Accident Database (ENSAD) of the Paul Scherrer Institut (PSI), the most authoritative information source for accidents in the energy sector. This was the first attempt to explore the relationships between the descriptors of energy accidents and the consequence (fatalities). The model was applied to a set of hypothetical accidents to show how the decision-making process could be supported when there is an interest in knowing which class (i.e., low, medium, high) of fatalities an energy accident could cause. The successful use of this approach in the natural gas sector proves that it can be also adapted for other energy chains, such as oil and coal.


Introduction
Societies worldwide rely on energy to satisfy many of their needs, among others, cooking, warming, cooling, transportation, and electricity generation [1]. Nonetheless, maintaining a constant provision of energy is a technological and political challenge, since there are continuous disruptions that can affect its reliable and efficient supply. They can be initiated by different causes, such as man-made (e.g., lack of maintenance), technological (e.g., collapse of an infrastructure), and natural (e.g., earthquake, flood), resulting in events such as, for example, explosions, fires, and release of toxic substances. These disruptions cause health, environmental, economic, and social impacts [2], such as fatalities, injuries, evacuees, ban on consumption of food, release of toxic substances, and economic losses [3,4]. the natural gas energy chain was adapted for the analysis with the rough sets approach (a type of Multiple Criteria Decision Analysis (MCDA) method) by defining a set of descriptors (i.e., attributes), which describe how each accident happened and also a class of fatality caused by each event. Second, an initial minimal set of relevant and non-redundant descriptors of the ENSAD accidents that can be used to distinguish accidents that cause different levels of fatalities for the natural gas energy chain is proposed. Third, an original sorting model, based on decision rules of multiple typologies, was proposed for classification of energy accidents in hazard classes. It must be pointed out that this model can also support the analysts in learning about the problem under consideration and allows them to justify their choices using a transparent and systematic process. This methodology can guide the analysts to explore, interpret, and even debate about the subject [28]. The simple syntax of the decision rules ("if . . . , then . . . ") can help the analysts discussing the classifications of the accidents with the decision maker (DM) and start a dialogue that can lead to a more informed perspective on the problem.
A main objective of our research is to contribute to the generation of "lessons learned from the accidents" to provide an indication of the possible impacts of new accidents in the natural gas sector. This strategy can start complementing, from an energy management perspective, the long-standing initiative of the European Commission for the mitigation of potential consequences from incidents, accidents, and near misses [29].
Another objective of the research is to complement the resilience literature by looking at how advanced hazard assessment can contribute to natural gas performance evaluation. In fact, the natural gas energy chain is a complex system, whose performance can decrease according to the severity of each disruption. This loss of performance and need of recovery of a key service links clearly to the concept of resilience, which can be interpreted as the capacity of a system to absorb and recover from adverse events [30][31][32]. Due to the importance of natural gas in the current society, several studies started looking at the resilience of natural gas from multiple perspectives [33][34][35]. A recent framework, based on three main sets of functions, has been proposed for infrastructure resilience assessment by the Future Resilient Systems (FRS) program at the Singapore-ETH Centre (http://www.frs.ethz.ch/) [36]. Such functions include the ability of a system to resist, restabilize, rebuild, and reconfigure its functionality (biophysical functions), the potential to implement emergency responses (enabling functions), and the capabilities to monitor the state of the system and identify weak signals to prevent detrimental effects of disruptions (cognitive functions). The research presented in this paper contributes to one of the latter functions, specifically to the development of the one called "remember". The information stored in ENSAD can, in fact, be used to tackle forthcoming accidents with more awareness of their possible impacts.
The remainder of this paper is organized as follows. Section 2 describes the dataset that was used in this research and the method that was employed to analyze it. Section 3 is devoted to the presentation of the results, while Section 4 discusses and concludes the paper.

Materials and Methods
This section describes the structure of the dataset based on ENSAD that was created for this research (Section 2.1) and the MCDA method that was employed to analyze it and develop the classification model (Section 2.2).

The Energy-Related Severe Accidents Database (ENSAD)
ENSAD is a long-term and active initiative that started in the early 1990s by PSI and it is currently one of the most respected databases on energy accidents in the world [9]. Some of the reasons for this achievement are the verified and traceable natures of each information entry, the coverage of complete energy chains, and the adoption of consistent severity thresholds on the outcomes [9].
The complete list of these accident descriptors is presented in Tables S1-S3 in the Supplementary Information (SI). The outcomes are the resulting fatalities, injuries, evacuees, release of toxic material, and economic loss. For this research, the natural gas chain was selected, a choice driven by the fact that this energy chain is a key asset for primary energy production worldwide and it is currently a major research focus within the ENSAD team [20,34,37]. ENSAD contains natural gas accidents data from 1970 to 2016, and an update and verification up to 2018 is under way as part of the FRS program and PSI efforts. Though, the last consolidated and validated database version for natural gas was used, including accidents from 1970 to 2016, and was employed to develop the classification model based on the decision rules. It included all the severe accidents (according to the definition adopted in ENSAD, severe accident causes, for example, at least 5 fatalities, or 10 injuries, or 200 evacuees, etc. [13]. For simplicity, the adjective "severe" will be omitted throughout the paper) that caused at least 5 fatalities, which were 250 in total. The format of the accidents stored in ENSAD resembles a standard information table for MCDA, with the rows as the alternatives and the columns as the features (attributes or criteria) describing them. There is normally also one column that characterizes the evaluation of each alternative, i.e., an aggregated measure of the individual features.
In the current case study, the alternatives are the energy accidents from ENSAD, and the attributes are eight, including country cluster (a 1 ), energy chain stage (a 2 ), infrastructure type (a 3 ), and event chain sequence from 1 up to 5 (a 4−8 ) (see Table 1 for a sample of the information table and Tables S1-S3 in SI for a full list of attributes). These were considered as main descriptors to discern accidents according to their entity of consequences, an assumption that was tested in the analysis phase. *: The numbers within brackets indicate the code used for the value of each attribute. DOM/COM: Domestic or commercial premises. NA: Not Available. For the infrastructure type, NA means that it is not known, while for the event chains, it indicates that the chain has stopped at the previous step, or the accident reports do not provide enough information, or there is some more information in the accident reports, but it is not clear enough to be used.
The outcome of ENSAD is currently the number of fatalities that each energy accident caused, which is a primary concern for each risk and resilience analyst, decision maker (DM), and also the general public. Three classes of the 1970-2016 knowledge base datasets were defined for the ranges of fatalities with the framing of low, medium, and high impact events. This translates in Cl 1 = 5 to 10 fatalities (161 accidents); Cl 2 = between 11 and up to 20 fatalities (53 accidents); Cl 3 = more than 20 fatalities (36 accidents). The lower boundary for low impact was selected according to the minimum severity threshold of 5 fatalities in ENSAD [14], while the boundary for high impact was chosen based on the minimum selection criterion of 20 fatalities for man-made disasters and natural catastrophes used by SwissRe, a renowned reinsurance company in this area [38]. The medium class with fatalities between 11 and 20 was defined to examine possible differences of these accidents with respect to those with higher or lower impacts.

MCDA Method: Rough Sets
The selection of the MCDA method needs to fit with the structure of the problem [39,40], which in this case included the following requirements:

1.
Assess the quality of information (i.e., attributes) stored in ENSAD to distinguish accidents in relation to the range of fatalities caused; 2.
Discover the patterns that explain the accidents by accounting for the interrelations and interdependencies of the attributes used in ENSAD; 3.
Provide a class of concern for potential forthcoming accidents not part of the knowledge base dataset, while transparently justifying the sorting.
According to these constraints, the MCDA literature [40][41][42][43][44] was searched to verify if a relevant method was available, which was eventually the case. The method that appeared as suitable was the rough sets approach as it can provide all the contributions required to start filling the research gaps [45,46]. Firstly, the approach provides information about the classification ability of the selected attributes and the minimal set indispensable for the consistent classification. Secondly, a classification model composed of decision rules expressed as "if (condition), then (classification to Cl x )" is provided. The rules are transparent and easily understandable by the DMs. They are also related to specific alternatives (i.e., energy accidents), which allows tracing and improving the decision process. Thirdly, the classification model can be used to classify accidents not being part of the knowledge base to assess the concern inherent in a new and unforeseen accident or a past one. The next sections provide a detailed description of the rough sets method (Section 2.2.1), the contributions of this method when applied to ENSAD (Section 2.2.2), and the strategies used to validate the proposed model for sorting energy accidents in fatality classes (Section 2.2.3).

Description of the Rough Sets Method
This section describes how the rough sets method operates on the ENSAD data. Table 2 shows that the ENSAD dataset is in MCDA terms a complete information table, with an objective measure (i.e., class of fatalities) as a characteristic for every accident. This typology of information tables is suitable for patterns and trends analysis, whose methods can discover the information/knowledge "hidden" in (in)complete datasets of attributes (condition attributes) and outcomes (decision attributes) [47,48].  Table 2 is an example of simplified information table from ENSAD composed of S = (U, A, P), where U = set of accidents; A = set of attributes (a i ); P ⊆ A.
The subsets of accidents that have the same values for the attributes represent alternatives that are indiscernible (similar) in light of the available information. They actually represent elementary granules of knowledge and are defined as elementary building blocks (atoms) of our dataset. Unions of elementary concepts are called crisp, whereas any other sets are called rough (vague, imprecise). Formally, an indifference class I P (x) of accident x ∈ U with respect to P ⊆ A is defined as a set of accidents which have exactly the same values as x on all condition attributes in P.
Every set X has two types of crisp sets, called the lower and upper approximation of X. The lower approximation of X (P(X)) is composed of all the elements (accidents) that surely belong to X, whereas the upper approximation of X P(X) is the set of elements (accidents) that possibly belong to X. Formally, these approximations are defined as follows: The above definitions admit that X is either a single class or a class union. Both these options are accounted for in the main paper.
The difference between lower and upper approximation is called the boundary region and a set is rough when it has a non-empty boundary region. The main characteristic of the boundary region is that its elements cannot be classified precisely using the available information. All the accidents in Table 2 have different values for all condition attributes (a 1−3 ) and outcome D. Consequently, all the accidents are discernible according to the information from the attributes. However, considering only condition attributes a 1 , a 2 , and a 3 , accidents 2 and 3 are indiscernible. Subsets of attributes enable the partition of the dataset into clusters of accidents having the same score on such attributes. As an example, attributes a 1 , a 2 , and a 3 provide this clustering of accidents {1}, {2, 3}, {4}, {5}, {6}.
An important issue for the management of energy accidents is to understand which are the conditions that lead to accidents that are less severe (class Cl 1 corresponding to accidents with 5 to 10 fatalities) and which are more severe (class Cl 2 corresponding to accidents with 11 to 20 fatalities). In other words, the objective is to characterize accidents in classes Cl 1 and Cl 2 in view of the attributes a 1−3 .
One first feature to note is that accidents 2 and 3 are characterized by the same values for a 1−3 but accident 2 causes 11 to 20 fatalities, whereas accident 3 causes 5 to 10 fatalities. This implies that the current information allows to state that accidents 1 and 6 cause 5 to 10 fatalities, accidents 4 and 5 cause 11 to 20 fatalities, and for accident 2 and 3 it is not possible to know whether they cause 5 to 10 fatalities or 11 to 20 fatalities.
According to attributes a 1−3 , it is possible to state that accidents 1 and 6 surely cause 5 to 10 fatalities, i.e., surely belong to class Cl 1 , while accidents 1, 2, 3, and 6 possibly cause 5 to 10 fatalities, i.e., possibly belong to class Cl 1 . The set {1, 6} represents the lower approximation P(Cl 1 ) , whereas the set {1, 2, 3, 6} constitutes the upper approximation P(Cl 1 ) of class Cl 1 . The difference between upper and lower approximations is the set {2, 3}, which constitutes the boundary/rough set region (Bn P (Cl 1 )) of class Cl 1 .

The Contributions of Rough Sets Analysis When Applied to ENSAD
The first useful measure that was obtained with rough sets analysis applied to the dataset from ENSAD is the quality of classification γ P (Cl), i.e., the ratio between the accidents that surely belong to a class (i.e., in the lower approximations) with respect to all the accidents in the dataset γ P (Cl) = card P(X) card (U) . γ P (Cl) expresses the ratio between the accidents P-correctly classified (P(X)) with respect to all the accidents in the dataset. In the previous example from Table 2, γ P (Cl) = card P(X) card (U) = 2+2 6 = 0.67, i.e., 67% of the accidents can be correctly/surely identified according to the available information as causing 5 to 10 fatalities or 11 to 20 fatalities. It indicates how well the selected attributes allow discerning the accidents in relation to the class of fatalities.
The second contribution of rough sets analysis is the evaluation of possible superfluous information in the dataset, with respect to the selected attributes. It is possible to state that an outcome D (e.g., fatality) totally depends on the set of attributes A if all the values of the outcome are uniquely determined by the values of the attributes. In case where there is a subset of A, named A , such that γ(A, D) = γ(A , D). A represents a reduct of A as it allows obtaining the same quality of classification of A. In other words, the reduct represents the minimal subset of attributes that is sufficient to guarantee that the highest degree of dependency between A and D is maintained, with the least amount of information (i.e., attributes) from the dataset. In the example from Table 2, {a 1 , a 2 } and {a 2 , a 3 } are the two reducts with respect to D, meaning that either country (a 1 ) and energy chain stage (a 2 ) or energy chain stage (a 2 ) and infrastructure type (a 3 ) can be used to distinguish the range of fatalities caused and obtain the same quality of classification (i.e., 67%) as the whole set of attributes A.
The dependencies that exist between attributes and outcome, expressed as A ⇒ r D , can be defined in the form of "if . . . , then . . . " decision rules (denoted by ρ i ), which is the third contribution of the rough sets method. They are logical formulas describing conditions and outcomes in the dataset. They are built up from elementary characteristics of the accidents (attribute, value) connected by means of the preposition "and".
Some examples of certain rules from Table 2 are: • "If the energy stage (a 2 ) is transport, then the fatalities caused by the accident are 5 to 10 (from accidents 1 and 6); • "If the country (a 1 ) is non-OECD and the energy stage (a 2 ) is extraction, then the fatalities caused by the accident are 11 to 20" (from accident 5); • "If the energy stage (a 2 ) is upstream, then the fatalities caused by the accident are 11 to 20" (from accident 4).
The rules induction algorithm used in this case study is the LEM2, as it is a good option for searching set of rules by selecting the minimum elementary conditions of the attributes [49]. In this paper, we add to the standard single class-based rules the rules for ordinal classification (i.e., union of classes-based rules), which look at the combined conditions of the accidents that lead to a union of classes and not a single one. A simple example of such a rule from Table 1 is "If the country (a 1 ) is OECD and event chain 1 (a 4 ) is fire, then the fatalities caused by the accident are either between 5 and 10 or 11 and 20". This implies either Cl 1 or Cl 2 , which means that the union of classes can be expressed as ≤ Cl 2 . In this case study, the rules of interest for ordinal classification are for at least Cl 2 , indicated as ≥ Cl 2 or for at most Cl 2 , indicated as ≤ Cl 2 (note that the rules with the decision part corresponding to the class unions "at least Cl 3 " and "at most Cl 1 " are equivalent to the standard rules with conclusions "Cl 3 " and "Cl 1 ", respectively).
Having rules of two typologies brings the added value of providing an additional layer of information extracted from ENSAD as well as further support in the analysis of new accidents. In fact, the decision rules can be employed to classify new accidents according to the values of their attributes in order to understand their concern level from a potential fatalities perspective. Two strategies have been employed in this research, the first called standard classification and the second referring to advanced classification (see details in Błaszczyński et al. [50]). The standard classification scheme assigns the class according to what rules cover the accident. The standard scheme is very transparent in the sense that its sorting can be visualized by means of the rules that match the new accidents and shown accordingly in a figure (for an example see Cinelli et al. [27]). However, there can be cases where rules for different classes (e.g., Cl 1 and Cl 2 , Cl 1 and ≥ Cl 2 ) match the accident and consequently a contradiction in terms of suggested classification takes place. A univocal allotment can still be reached by means of an advanced classification scheme that considers the set of decision rules covering a given accident m (referred to as set R) and provides a measure expressed as Score net R (Cl t , m), which is the result of the difference between two other scores [50]. The first one is Score + R (Cl t , m) that accounts for all the rules that support the assignment to class of interest, Cl t . The other is Score − R (Cl t , m), which conveys the rules suggesting a class other than Cl t . Score net R (Cl t , m) results from Score + R (Cl t , m) − Score − R (Cl t , m) and it provides an overall measure of strength for the assignment to a certain class. The class with the highest net score is then recommended by the advanced scheme.
Both classification schemes discussed above assume that for each accident there are one or more rules that match its values of the attributes. However, it might be the case that there is no single rule that exactly aligns with the description of a specific accident. Hence, the recommended class must be extrapolated from the rules that are "nearest" to this accident. The strategy adopted in this case is the valued closeness relation proposed in [51], which is based on the assumption that providing the DM with an indication of the classified accidents that do not excessively differ from the one under analysis can be the best compromise, instead of offering nothing. For example, an accident with the following description (country (a 1 ) = OECD, energy stage (a 2 ) = extraction, infrastructure type (a 3 ) = compression station, . . . ) matches 2 out of 3 conditions of the following rule, hence being relatively close to it: "If the country (a 1 ) is OECD and the energy stage (a 2 ) is extraction and the infrastructure type (a 3 ) is platform, then the fatalities caused by the accident are more than 20".

Validation of the Sorting Model
Two validations of the model were performed, one predictive based on cross-validation and one experimental using realistic future accidents [52,53]. The predictive validation did not provide satisfactory results (its detailed results are available in Section S.3 of the Electronic Supplementary  Information (ESI)), while the experimental one confirmed the usefulness of the rough sets model and it is discussed in Sections 3.2 and 4. Similarly to the experimental validation proposed by Augeri et al. [54], a realistic set of alternatives (in this case, energy accidents) was selected to test the applicability of the rules of the rough sets model and the capacity to better analyze the problem. In fact, the sorting of the accidents is driven by one or more rules that match fully or partially the conditions of the accidents and can lead the sorting challenge using objective energy accidents information stored in ENSAD.

Results
The results are presented in two sections, distinguishing relevance of attributes (Section 3.1), and the classification model based on the decision rules, including the classification of new energy accidents to hazard classes (Section 3.2).
The quality of the employed ENSAD for natural gas accidents is just under 60%, hence close to two thirds of the 250 accidents can be described by the selected attributes and assigned to the class of fatalities without ambiguity. The remaining accidents are part of the upper approximations, meaning that they have the same values for the attributes but caused different ranges of fatalities.

Relevance of Attributes in ENSAD for Natural Gas
There is only one reduct from this ENSAD dataset, which includes all the attributes except event chain 5 (a 8 ). This means that attributes a 1−7 represent the minimal set of relevant attributes from the original dataset that are sufficient to characterize the decision table with the same quality of classification as all the eight attributes. The presence of one reduct only covering all expect attribute a 8 can be interpreted as the confirmation of the relevancy of the data gathering strategy adopted by the ENSAD developers since the early 1990s. In fact, the categorization of the accidents according to country cluster, energy chain stage, infrastructure type, and types of event chains are confirmed as being necessary for distinguishing between events of different fatality entities.
The influence of the attributes on the quality of classification was studied to identify the attributes that mostly affect the classification. This was conducted through a trial-and-error procedure by eliminating one attribute at a time. The quality of classification is presented in Figure 1 by accounting for the effect of the reduction of one attribute at a time on the rough sets analysis.

386
The classification model developed in this research is built upon two types of decision rules 387 that were conveyed by rough sets analysis, single class and union of classes (rules for ≤ 1 and ≥ 388 3 are the same as those for = 1 and = 3 .). Table 3      It is evident from Figure 1 that the attribute event chain 1 (a 4 ) has a key role in the quality of classification, meaning that if deleted, a large number of inconsistent accidents emerge, i.e., the same values of the attributes for the accidents, but different class assignments. Lower but still notable relevance of infrastructure type (a 3 ), country cluster (a 1 ), as well as event chain 2 and 3 (a 5 and a 6 , respectively). Energy chain stage (a 2 ) and event chain 4 (a 7 ) have limited discernibility capacity, while event chain 5 (a 8 ) does not add any relevant type of information to the dataset.
Attributes energy chain stage (a 2 ) and event chain 4 (a 7 ) have thus limited discernibility capacity as they have limited potentials of distinguishing accidents assigned to different classes. Regarding energy chain stage (a 2 ), this is due to the fact that most of the accidents with different fatalities classes have the same values. Specifically, the most common energy chains are transport and domestic or commercial premises (DOM/COM), which represent 45.6% and 38.4% of the values for this attribute in this dataset, respectively.
This can be an indication of whether their coding (i.e., values) should be re-framed (possibly for a 2 ) or the attribute not considered at all and thus dropped from the analysis (possibly for a 7 ). It must be noted that the higher the number of attributes (including their values) the higher the time required to gather the information for ENSAD (and dataset generation in general). Therefore, a trade-off is in place between the search of information on accidents and the added value this research brings to the analysis of such data. Rough sets analysis can thus be seen as a useful approach to assess the worthiness of investing in data gathering strategies for datasets generation, in this particular case for energy related accidents.

Classification Model and Sorting of New Energy Accidents to Hazard Classes
The classification model developed in this research is built upon two types of decision rules that were conveyed by rough sets analysis, single class and union of classes (rules for ≤ Cl 1 and ≥ Cl 3 are the same as those for = Cl 1 and = Cl 3 ). Table 3 provides a summary of these rules. The complete model composed on all the rules is available in SI S.3 and S.4.
As far as fully certain rules are concerned, 61 for Cl 1 , 26 for Cl 2 , 16 for Cl 3 , 66 for ≤ Cl 1 , 60 for ≤ Cl 2 , 42 for ≥ Cl 2 , and 22 for ≥ Cl 3 were discovered (Table 3). This means 103 and 190 overall rules for the single class-based and union of classes-based variants, respectively.
The rules represent pieces of objective knowledge contained in ENSAD that are characteristics of the energy accidents that caused certain classes of fatalities. Standard rules contain the attributes' values (i.e., conditions) of the energy accidents that result in a certain outcome (i.e., decision), be it having caused between 5 and 10 fatalities (Cl 1 ), between 11 and 20 (Cl 2 ), and above 20 (Cl 3 ). In this manner, it is possible to perform a mapping of the patterns that characterize the accidents that surely represent each of the classes. It is important to note that the rules are an objective and succinct representation of the information "hidden" in the dataset and not an elaboration from the experts and/or analysts. In fact, they are supported by one (or more) accidents that uniquely showed certain characteristics for the attributes and resulted in a certain range of fatalities. Some illustrative single class rules include: • "If (country cluster, a 1 = non-OECD) and (infrastructure type, a 3 = pipeline) and (event chain 1, a 4 = release), then the class of fatalities is Cl 1 ". Rule triggered by accidents n. 99 in ENSAD dataset. • "If (country cluster, a 1 = OECD) and (infrastructure type, a 3 = pipeline) and (event chain 1, a 4 = explosion) and (event chain 2, a 5 = fire), then the class of fatalities is Cl 1 or Cl 2 ". Cl 1 is triggered by accidents n. 46, 50, and 63, while Cl 2 is activated by accident 135 in ENSAD dataset. • "If (country cluster, a 1 = OECD) and (infrastructure type, a 3 = compression station) and (event chain 1, a 4 = explosion), then the class of fatalities is Cl 1 ". Rule triggered by accidents n. 220 in ENSAD dataset.
An additional contribution of this research is the extraction of rules for the union of classes, which in this case are represented by the accidents assigned to at most or at least to Cl 2 . In the first case (i.e., rules for ≤ Cl 2 ), they refer to the accidents that contain characteristic values for the accidents classified to either Cl 1 or Cl 2 . In the other case (i.e., rules for ≥ Cl 2 ), they include accidents belonging to either class Cl 2 or Cl 3 . This can enrich the decision support potentials of the knowledge base as explained below.
From a risk and resilience management perspective, knowing the potential impact range of an accident is of pivotal importance to plan appropriate response strategies. Decision rules extracted from this ENSAD set can be used to estimate this potential impact in the natural gas sector. They can provide warning signs based on the country cluster, energy chain, infrastructure type, and event chain of a real or hypothetical energy accident that the expert/DM can use to make the most out of the information consistently reported in ENSAD. The decision rules can be of use to different experts and DMs who might be interested in (i) developing hazard and resilience assessments for energy technologies and scenarios by exploiting the capacity of dealing with accidents with more awareness of their possible impacts (directly linked with "remember" function in FRS resilience framework [36]), (ii) supporting the development of insurance packages according to the riskiness of the energy technology investment (e.g., insurance companies), and those (iii) looking for recommendations about energy policy development (e.g., policy makers).
Let us consider ten realistic accidents (denoted by n 1 -n 10 ) as shown in Table 4, including OECD and non-OECD country clusters, energy chain stages, infrastructure types, and event chains. For these accidents, the classification results are presented in the last two columns (for details see Section S.5 in SI). The classification provided by the single class-based rules indicates that there is correspondence between the standard and advanced schemes for seven out of ten accidents, while it shows ambiguity for n 6 and n 7 , and no solution for n 9 . The latter ones are thus the accidents that can be seen as more problematic from an assessment perspective, and specifically justify the use of the advanced scheme, which can assign a unique class to each of them. The added value of the advanced scheme is also emphasized in the union of classes-based rules, as most of the accidents have union of more than one type of at most classes, or at least classes or a combination of both (n 4, n 8 ), which renders the classification quite ambiguous.
In the case of accident 3 (n 3 ), there is only a single-class rule that leads the sorting, which states that "If country cluster = OECD, infrastructure type = building commercial, and event chain 3 = not available, then the class of fatalities is Cl 1 " (supporting accidents n. 3,25,56,67,81).
Another accident of particular interest is n 8 , whose single class rule includes only event chain information, stating that "If event chain 2 = rupture and event chain 3 = fire, then the class of fatalities is Cl 1 " (supporting accidents n. 44 and 75). From a complementary perspective, the union of classes-based rules enriches the sorting with the rule stating that "If country cluster = non-OECD, event chain 1 = human error, and event chain 3 = fire, then the class of fatalities is ≥ Cl 2 " (supporting accidents n. 137 and 142 for Cl 2 and 225 for Cl 3 ). The reason for the sorting disagreement between the two types of rules is apparent, with the single class-based rules using only event chains as conditions, while the union of classes ones include the country cluster too in the syntax of the rule.
These examples confirm how the information provided by the rules-based model can support a refined understanding of which accidents caused certain impacts and how closely they can be compared to the ones currently under scrutiny. The model can also handle sorting challenges when there is not an exact match between the rules and the conditions of the accidents to classify, as the ninth accident (n 9 ) shows. In this case, there is in fact no exact rule that matches the conditions of the accidents and no class can be provided with the standard scheme. However, the sorting model can still provide a classification with activation of the valued closed relation classification, meaning that there are rules that in this case match part of such conditions and are used to trigger the sorting. There is one rule of single class type, which states that "If country cluster = OECD, energy chain stage = transport, event chain 1 = release, event chain 2 = explosion, and event chain 3 = not available, then the class of fatalities is Cl 1 " (supporting accidents n. 51,78,96). In this case, 4 out of the 5 conditions are matched by the accident, since the first event chain is overpressure and not a release as indicated in the rule. There is also a union of classes rules that complements the sorting, whose structure is "If country cluster = OECD, infrastructure type = pipeline, event chain 1 = other, event chain 2 = explosion, and event chain 3 = not available, then the class of fatalities is ≤ Cl 2 " (supporting accident n. 201 for Cl 1 and 131 for Cl 2 , respectively). Moreover, in this case, 80% of the conditions are covered by the rule as the first event chain is overpressure and not the category "other" as indicated in the rule. This type of accidents should obviously be flagged clearly to the DM, making him/her aware that the sorting is based on partial correspondence between the rule in ENSAD and the accident under analysis. The advanced classification scheme offers a unique sorting based on the highest score net , which is a measure of the strength of the support for the most certain class allotment (for details of score calculation see Błaszczyński et al. [50]). The class assigned by this scheme is used to discuss the hazard of these four test accidents. In this set of accidents, the advanced scheme classification provides the same results for 8 out of 10 of the accidents, showing a high degree of consistency. The use of more than one sorting approach is a main advantage of this model, as it allows to highlight the accidents that require specific attention, in cases when there is disagreement between the assigned classes.

Discussion
Risk and resilience assessments are inherently dependent on reliable, credible, and consistent data, as well as on the effective use of such information. This research has demonstrated one possible use of the records on energy accidents in the natural gas sector, collected by the PSI since the 1990s, combined with an MCDA method to complement the capabilities of risk and resilience assessments in this area and beyond. The analysis of the ENSAD dataset with rough sets represents a first-stage confirmation of the objective integrated relevance of gathering energy accident data following a consistent and transparent procedure. For this purpose, the data was structured on information about where the event happened (i.e., country cluster), in which stage of the energy chain, which infrastructure was affected, the sequence of events (i.e., event chain), and the type of impact in terms of fatalities. Our analysis demonstrated that each accident attribute has a different capacity to distinguish the number of fatalities, with primary relevance for the first event chain step, followed by the type of infrastructure and then country cluster.
The main contribution of this research is the classification model for assigning energy accidents into preference-ordered classes, according to their concern level in terms of potential fatalities. The model is built upon decision rules, pieces of information that unveil the hidden objective relationships in ENSAD within the timeframe 1970-2016 that are unique for energy accidents causing a certain range of fatalities. These decision rules can be used to classify new accidents to assess their level of concern. We applied this model to a set of realistic (i.e., not part of the model development) accidents to show its decision support potentials. The structure of the rules, composed of information blocks in the form of "if (condition), then (classification to hazard class x)" makes it easy for the user to understand the reasons why a certain class is assigned, backed up by the objective accidents data consistently stored in ENSAD. Particular advantages of the proposed models are the combination of rules of different typologies (i.e., single class and union of classes) as well as classification schemes (i.e., standard and advanced). The classifications are supported by multiple facets as they analyze the ENSAD information with several algorithms, adding further credibility and stability assessment to the sorting.
The validation with the 10 energy accidents highlighted a key contribution of this rough sets model and the application of MCDA methods in general, which is to aid developing a deeper understanding of the multiple criteria-based problem, in this case by means of the activated rules for each energy accident. Rough sets models have already been tested with experimental validation in several other application areas, showing the benefit of using knowledge and information reported in the learning dataset and applied to new alternatives or to alternatives that were not part of the set. Some examples include the location selection of waste incinerator [55], budget allocation [54], location selection for waste management plants [56], and green synthesis of nanoparticles [57].
Even if the cross-validation does not prove satisfactory for this model, the experimental validation confirms the value of the model as one of its main strengths and added value is that it provides a classification of new accidents to hazard classes based on ENSAD accidents that have been consistently assigned to that fatality class. The classification scheme applied in this model computes scores of support for different hazard classes, and the one with the greatest support (i.e., number of accidents), is the one assigned by the model. It is then up to the analyst to evaluate each rule in support of a certain sorting by looking at the individual accidents that triggered it and see how applicable they are to the accident under assessment. This is the reason why rough sets is considered a decision support method that aids the analysis of a complex problem, but does not replace the analyst or the decision maker [48].
Two topics that could receive further exploration are the quality of classification and strength of the decision rules. Possible solutions include (i) increasing the discrimination potential of attributes by adding values (e.g., more refined countries grouping and not only OECD and non-OECD as proposed by [2]) and (ii) introducing additional attributes (e.g., distinction of accidents according to the typology and topography of the area the accident took place).
The main focus of our rough sets analysis has been on the explanation of the relationships hidden in the ENSAD dataset by developing a classification model that can be seen as a "glass box" in the spirit of transparency and intelligibility which are key advantages of MCDA methods [48]. This research has demonstrated that decision support systems can make efficient and effective use of information of different typology and provide risk and resilience analysts with transparent and justifiable evaluations of the alternatives under consideration. In order to advance pragmatic policy-making support, this type of integrative research is more needed than ever and interdisciplinary teams will be a necessity rather than simply an advantage in the near future.