An Overview of Indicator Choice and Normalization in Raw Material Supply Risk Assessments

Supply risk assessments are an integral part of raw material criticality assessments frequently used at the country or company level to identify raw materials of concern. However, the indicators used in supply risk assessments to estimate the likelihood of supply disruptions vary substantially. Here, we summarize and evaluate the use of supply risk indicators and their normalization to supply risk scores in 88 methods published until 2020. In total, we find 618 individual applications of supply risk criteria with 98 unique criteria belonging to one of ten indicator categories. The most often used categories of supply risk indicators are concentration, scarcity, and political instability. The most frequently used criteria are the country concentration of production, depletion time of reserves, and geopolitical risk. Indicator measurements and normalizations vary substantially between different methods for the same criterion. Our results can be used for future raw material criticality assessments to screen for suitable supply risk indicators and generally accepted indicator normalizations. We also find a further need for stronger empirical evidence of widely used indicators.


Introduction
Raw material criticality assessments are carried out to identify materials of concern [1]. Their goals range from risk mitigation to hotspot analysis. The actors can be governments and companies alike. The scope of risk consideration ranges from physical accessibility to reputation damage. Even the material scope can differ from chemical elements to whole supply chains [2]. It is good practice to follow four phases for the design and communication of a criticality assessment, consisting of (i) goal and scope definition, (ii) indicator selection and evaluation, (iii) aggregation, (iv) interpretation and communication [2]. Most criticality assessments consider indicators in the two dimensions "supply risk" [3] and "vulnerability" [4]. Several different indicator categories are used for both, as identified by Schrijvers et al. [1]. However, there is little evidence for the general significance of individual risk aspects for raw material criticality [5]. Commodity prices are linked to changes in supply risk aspects, but the scale and significance level of this empirical evidence depends strongly on the specific raw material [6].
The present article is an update to an earlier review by Achzet and Helbig [3]. When that review was published, only 15 criticality assessments were available for a systematic review. In the past eight years, raw material criticality assessments have increased substantially in quantity, impact, and scope [7]. The International Round Table on Materials Criticality (IRTC) held a series of expert workshops and conducted a broad review of various criticality assessments, focusing on risk types, geographical scope, time horizons, and objectives of the methods [1]. However, their study did not cover the details of each criterion and the normalization and interpretation of each of the supply risk and vulnerability indicators [1].
Nevertheless, looking at such information is essential to guide future method developers and users in applying assessments. Such detailed information helps in the second criticality assessment phase, indicator selection, and evaluation [2]. Therefore, the present review focuses on indicator usage instead of the general goals of the methods or aggregation procedures. We provide an overview on supply risk indicator usage in all relevant criticality assessment schemes.
For this purpose, we distinguish indicator categories, criteria, measurements, and normalizations. Indicator categories are general supply risk aspects considered in assessments and may have multiple evaluation criteria. We identify frequently used indicator categories and, for each category, the most relevant supply risk criteria. The criteria need to be measured and consequentially normalized. We want to provide an overview on possible measurements and normalizations. Due to a lack of empirical evidence, we cannot provide a recommendation for best practice on each criterion. Normalization can happen with a continuous formula, stepwise normalization, or point-wise evaluation. For example, Graedel et al. [8] consider the country concentration of production in the criterion for concentration, measured with the Herfindahl-Hirschman Index (HHI), and apply a logarithmic normalization formula to evaluate this criterion on a shared supply risk score. Using such a procedure transparently and in a reproducible manner helps improve criticality assessments and follows good practice [2]. Our review fosters this transparency and reproducibility.

Method
Our review includes 88 supply risk assessment methods published from 1977 to 2020. The methods are published in peer-reviewed literature, research reports, working papers, books, book sections, or corporate or institutional websites. The previous reviews by Achzet and Helbig [3] and Schrijvers et al. [1] contributed to this collection. The list of studies was extended with citation chaining, considering only publications in English or German. The complete list of studies is included in Appendix A.
Most of the 88 methods are full criticality or supply risk assessments that follow the four good practice steps in criticality assessment [2]. Others are either a collection of indicators, which do not aggregate the results, or methods consisting of only a single supply risk indicator. The Supplementary Material spreadsheets additionally list publications that we did not include in our review because they were reviews, obsolete publications (which have been updated by the same authors or institutions by now), or applications of supply risk assessment methods without any methodological change. All of these exclusions avoid double-counting.
All methods included in the review were reviewed concerning their supply risk indicators. If the method additionally had a vulnerability or economic importance dimension, those indicators were not considered. For each of the 618 indicators used in the various supply risk assessments, we identify the overarching indicator category, the measurement (with minimum and maximum values) and the normalization type (normalization formula, stepwise supply risk levels, point-wise evaluation, or no normalization). The list of all indicators, including the normalization formula, supply risk levels, or evaluation points, can be found in Supplementary Material spreadsheets. Table 1 shows a glossary for the relevant terms contained in this data sheet.
The review process also included an attempt for the harmonization of terminology in supply risk assessments. For reasons of transparency, the spreadsheet in the Supplementary Material therefore also contains the original criterion name. However, in our review, harmonized criterion names are used. For example, one method may call its indicator "producer diversity", while another calls its indicator "company concentration", and both may be measured with the HHI. Therefore, company concentration is used in this case as the harmonized criterion name for both cases.
Category names are also harmonized always to indicate risk or problem, as shown in Figure 1. For example, many supply risk assessments consider some form of recycling in their method, but recycling itself is not a problem for supply risk-the contrary is the case. The lack of secondary production increases the dependence on primary production to maintain global material flows and supply chains. Therefore, all categories have received a name indicating that "more" in this indicator equals higher supply risk and criticality, even for those studies that initially assessed supply security or supply chain resilience rather than supply risks.
Resources 2021, 10, x FOR PEER REVIEW 3 of 28 both may be measured with the HHI. Therefore, company concentration is used in this case as the harmonized criterion name for both cases. Category names are also harmonized always to indicate risk or problem, as shown in Figure 1. For example, many supply risk assessments consider some form of recycling in their method, but recycling itself is not a problem for supply risk-the contrary is the case. The lack of secondary production increases the dependence on primary production to maintain global material flows and supply chains. Therefore, all categories have received a name indicating that "more" in this indicator equals higher supply risk and criticality, even for those studies that initially assessed supply security or supply chain resilience rather than supply risks.

Results
The review of all 88 supply risk assessments results in a list of 618 individual indicators. These indicators can be grouped into ten indicator categories with a varying number of criterions each. Risks and criterion labeling follow a single Latin letter and a two-digit numerical code, e.g., A01 for country concentration production.
The categories are (A) concentration, (B) scarcity, (C) political instability, (D) regulations, (E) by-product dependence, (F) dependence on primary production, (G) demand growth, (H) lack of substitution options, (I) price volatility, and (J) import dependence. A Resources 2021, 10, 79 4 of 26 total of 53 additional indicators did not fit these ten categories and therefore have been allocated to the group of other indicators (X). The review results in each of these categories are described in the following subsections one by one. Table 2 shows the indicator categories and their frequency.  Figure 2 summarizes the use of all criteria used at least three times. It shows almost all criteria are still used nowadays, with the prominent exception of criterion B07, depletion time reserve base, which is not used anymore because most data providers discontinue reserve base data.
For each of the indicator categories, we show a graphical representation of relevant normalizations. To allow a better comparison, the original formulas are rescaled to a common "normalized supply risk score" between 0 and 100 for all categories.

Concentration (A)
The market concentration (A) is the most frequently used indicator category making up 137 of the 618 indicators (22%). The associated indicators can be grouped into a total of 18 harmonized criteria, of which the five most frequently used criteria are country concentration production (A01), company concentration (A02), country concentration reserves (A03), and country concentration import (A04). In total, these four criteria are used in 118 of 137 indicators (86%) of the concentration category.
The first appearance of concentration as a supply risk indicator dates back to 1977 when Grebe et al. [9] considered the number of countries accounting for 40%, 60%, or 80% of the global production or global reserves as a measurement of A01 and A03. These measurements were converted for both indicators into a scale from 1 to 5, indicating the extent of supply risk, whereby the exact transformation routine is not given [9]. The socalled Herfindahl-Hirschman Index (HHI) [10,11] is a much more frequent concentration measurement for criteria A01 to A04. Some publications proposed a combined indicator composed of the HHI and an indicator from another category as a weighting factor, for example, the political instability category. Another frequently used measurement is the accumulated share gathered from top countries of production or reserves, as presented by Grebe et al. [9]. Normalization approaches using the HHI measurement for A01 to A04 are presented in Figure 3. Information about the remaining harmonized criteria can be found in the Supplementary Material spreadsheets.
In the case of country concentration production (A01), the logarithmic transformation of the HHI (ranging from 0 to 10,000) into a normalized supply risk scale (ranging from 0 to 100) was applied by Graedel et al. [8] and other methods (cf. Equation (1)).

of 26
The values 17.5 and 61.18 in Equation (1) have been set by Graedel et al. [8] to fit the normalization so that an HHI value of 1800 results in a normalized score of 70 and an HHI value of 10,000 marks a normalized score of 100. Helbig et al. [12] adopted this approach with other fitting parameters, resulting in a slightly different normalization applied by three other publications.
Nassar et al. [13] and four other methods do not explicitly mention a normalization procedure for the HHI. We conclude that a simple linear transformation of the HHI into a score from 0 to 100 represents their interpretation of the country concentration best. Zhou et al. [14] also determine normalized scores by scaling the HHI values linearly, but they use the extreme values observed in their data set as thresholds.

Concentration (A)
The market concentration (A) is the most frequently used indicator category making up 137 of the 618 indicators (22%). The associated indicators can be grouped into a total of 18 harmonized criteria, of which the five most frequently used criteria are country concentration production (A01), company concentration (A02), country concentration reserves (A03), and country concentration import (A04). In total, these four criteria are used in 118 of 137 indicators (86%) of the concentration category.
The first appearance of concentration as a supply risk indicator dates back to 1977 when Grebe et al. [9] considered the number of countries accounting for 40%, 60%, or 80% of the global production or global reserves as a measurement of A01 and A03. These meas-  method. Eggert et al. [19] also use the same levels as in A01 to assign HHI values to supply risk levels.
For A04, only two different normalization approaches for the HHI are identified. Zhou et al. [14] applied the same curve to A04 as for A01. Li et al. [25] decreased the number of levels from four for A01 to three for A04, using thresholds of HHI 1500 and 2500. The limits of the levels applied in this approach are identical to those from Rosenau-Tornow et al. [22] for A01 and A02.
Resources 2021, 10, x FOR PEER REVIEW 8 of 28

Scarcity (B)
The second-most frequently occurring indicator category is scarcity (B), for which we identified 25 different harmonized criteria. The four most common criteria are the deple-  Schneider et al. [15] define a threshold of 1500 for the HHI. Below this threshold, the normalized supply risk score is 0. Above 1500, the HHI is normalized by the squared ratio of the HHI value, which is also called a distance-to-target method [16]. Three other methods applied this parabolic approach. A similar approach is proposed by Pell et al. [17], but we could not fully reconstruct the normalization approach. We interpret that the HHI values were first scaled from 0 to 1 by the minimum and maximum values of the observed raw material and consequently normalized by the distance-to-target method. Based on the results, the normalization formula of Equation (2) was applied. (2) The remaining normalization schemes for A01 uses various stepwise functions with two [18] to seven levels [19]. Except for Habib et al. [20], the stepwise procedure only has single appearances.
For A02, we identified less variety in terms of measurements and normalization schemes. The most frequently used is the method of Schneider et al. [15] which was already explained for A01. The threshold of 1500 is once again used, which results in an identical normalization curve. Three other publications applied this approach. Pell et al. [17] applied the same normalization scheme with extreme values of HHI observed for A02 and the distance-to-target approach. Helbig et al. [12] adopted their normalization formula from A01 for A02 with different key points, leading to a slightly different formula (cf. Equation (3)).
The same formula is also applied in two other publications. Kolotzek et al. [21] stuck to the key points used by Helbig et al. [12] for A01 and applied a logarithmic transformation for A02. The work from Rosenau-Tornow et al. [22] is the only study involving a level-based normalization on the HHI for A02. A03 is also dominated by normalizations based on normalization formulas. Habib and Wentzel [23] and Nassar et al. [13] applied no transformation. Therefore, we assigned an HHI of 0 to the normalized supply risk score of 0 and an HHI of 10,000 to a score of 100. Helbig et al. [24] applied the same logarithmic transformation as Helbig et al. [12] for A01. Schneider et al. [15] also used the distance-to-target method with an HHI threshold of 1500 as for A01 and A02. Each of the three approaches is applied in one other publication. Pell et al. [17] proceeded as in A01, A02 scaling the HHI values according to the observed minimum and maximum values for A03 followed by the distance-to-target method. Eggert et al. [19] also use the same levels as in A01 to assign HHI values to supply risk levels.
For A04, only two different normalization approaches for the HHI are identified. Zhou et al. [14] applied the same curve to A04 as for A01. Li et al. [25] decreased the number of levels from four for A01 to three for A04, using thresholds of HHI 1500 and 2500. The limits of the levels applied in this approach are identical to those from Rosenau-Tornow et al. [22] for A01 and A02.

Scarcity (B)
The second-most frequently occurring indicator category is scarcity (B), for which we identified 25 different harmonized criteria. The four most common criteria are the depletion time of reserves (B01) and resources (B02), the sufficiency of reserves (B03), and the crustal content (B04). Figure 4 visualizes the normalization approaches for B01, B02, and B04. Information about the remaining harmonized criteria can be found in the Supplementary Material spreadsheets. Both depletion times are quantified in years, crustal content in parts per million (ppm). The criterion B03, the sufficiency of reserves, is not shown due to a lack of evident normalization and measurement in the respective assessments.

Political Instability (C)
The third most used supply risk category is political instability (C), which is dominated by two harmonized criteria, namely geopolitical risk (C01) and political instability (C02). These two criteria make up 67 out of 75 cases (89%) for this category. Each of the eight remaining criteria identified is applied only once. Both depletion time of reserves (B01) and depletion time of resources (B02) first appeared in the work of Grebe et al. [9]. The ratio between the available deposits and the current (primary) production rate determines the depletion time. Some authors also use terms such as the static reach for these criteria. No matter the name, the ratio is typically expressed in years. For B01, the considered deposits are available reserves, meaning the deposits are identified, and extraction is techno-economically viable. Graedel et al. [8] presented the most frequently used normalization scheme adopted by 11 other methods (cf. Equation (4)).
A parabolic function is used to assign high depletion time (DT) to low supply risk scores. Three key points are used to determine the shape of the parabola in Equation (4): A DT of 0 years leads to a normalized supply risk score of 100, whereas a value of 50 years is assigned to a score of 70 and a DT of 100 years results in a score of 0. Depletion times above 100 years are interpreted with no supply risk by Graedel et al. [8].
Pell et al. [17] applied their approach already presented for concentration criteria A01 to A03 by rescaling the inverted DT to a score from 0 to 1 with the observed minimum and maximum values and applying the distance-to-target method. The remaining publications presented for B01 in Figure 4 developed individual level-based normalization approaches. The number of levels varies from just two proposed by Behrendt et al. [26] to five in the work of Grebe et al. [9].
The depletion time of resources (B02) shows more consensus in the normalization approach. Resources, in contrast to reserves, also include inferred and sub-economic deposits; therefore, the depletion time of resources is larger than the depletion time of reserves. In most cases, the normalization of B02 is similar to that of B01. Helbig et al. [12] proposed a parabolic transformation comparable to the approach of Graedel et al. [8] for B01. For the DT of the resources, they suggest different key points by doubling the periods: A DT value of 200 years is considered as causing no supply risk at all, resulting in a supply risk score of 0, and a value of 100 years results in a score of 70. Four other publications followed this approach. The only different normalization approach found was the level-based normalization by Grebe et al. [9], consisting of five levels. Here, a DT exceeding 1000 years yields a supply risk score of 0. However, this method has never been applied by another study in our review.
For the crustal content (B04), the "abundance in earth's crust" was identified as the mainly used indicator. Two different approaches were found for normalization. Ashby [27] considers a high supply risk for materials with rare abundance in the earth's crust, but it does not propose a specific transformation into a normalized supply risk score. Nevertheless, we want to display Ashby's intention of assigning a high supply risk score to a low abundance [27]. Therefore, we conducted a simple linear transformation considering a value of 10 6 ppm as no supply risk and a value of 0 ppm as a normalized score of 100. An alternative method of normalization was applied by Duclos et al. [28] and one other method subdividing abundance values into five levels of supply risk.

Political Instability (C)
The third most used supply risk category is political instability (C), which is dominated by two harmonized criteria, namely geopolitical risk (C01) and political instability (C02). These two criteria make up 67 out of 75 cases (89%) for this category. Each of the eight remaining criteria identified is applied only once.
C01 appeared for the first time in the work of Eggert et al. [19] in 2000. To evaluate the geopolitical risk, they used the political country risk evaluation of Hermes/BMWi classification on a scale from 1 to 7. They use this classification three times for indicators in this category, each with a different weighting: the production shares, the export shares, and the reserve shares of the countries, respectively.
In contrast, C02 appeared first in the method of Morley and Eatherley [29] in 2008. They classified the percentile rank of the Worldwide Governance Indicator "Political Stability and Absence of Violence/Terrorism" (WGI PR PV ) [30] of the largest producer into a supply risk score using a three-level normalization function. In addition, the World Bank developed five other Worldwide Governance Indicators, which are updated yearly: Government Effectiveness (WGI GE ), Voice and Accountability, Control of Corruption, Regulatory Quality, and Rule of Law. They are available as WGI score and WGI percentile rank [30].
The classification displays the normalization schemes used for the criteria of political instability according to their measurement: WGI score and WGI percentile rank (cf. Figure 5). In most cases, the WGI scores or ranks are weighted by production share. Other weighing factors are import shares [31] or the consideration of the largest producers [32]. The composition of a WGI dimension in combination with the HHI is described in Section 3.1. Other schemes are expressed in the Supplementary Material.

Regulations (D)
The fourth most often mentioned category is regulations (D), which is used in 68 out of 618 indicators (11%). We identify policy perception (D01), human development (D02), trade barriers (D03), and environmental performance (D04) as the most prominent harmonized criteria. In contrast to most other categories, the risk from regulations has emerged more recently in the work of Thomason et al. [39] in 2010. They determined the percentage of produced goods expressed in U.S. market shares that a country intends to supply to the U.S. as a measurement of D03. In 2012, Graedel et al. [8] developed a measure to determine D01 and D02 for the first time.
It is worth mentioning that three dominant measurements of regulations have been developed within the criteria: The Policy Perception Index (PPI) in D01 is provided by the Fraser Institute and captures the influence of policies on mining activities in a country [40]. The Human Development Indicator (HDI) in D02 has been developed by the UNDP and evaluates the living conditions of a country [41]. The Environmental Performance Index (EPI) in D04 is provided by Yale University and rates the ability of a country to cope with environmental challenges [42]. All three measurements are updated annually. The associated normalization schemes are displayed in Figure 6.
Graedel et al. [8] weighted the PPI of mining regions by the respective production The WGI indicators in the default unit usually range from −2.5 to 2.5 [30], where a low value indicates bad governance. Consequently, the most frequently used normalization approach for both C01 and C02 is the linear transformation of WGI scores based on a hypothetical lower bound of −2.5 and upper bound of 2.5, as presented in Equation (5). After the conversion, values of −2.5 in WGI units and lower yield the highest supply risk score. Twelve other methods have adopted this transformation.
Diverging from this are Nassar et al. [33], who instead assume a range from −3.5 to 3.5 (cf. Figure 5). Three methods use the observed minimum and maximum WGI scores for normalizing them to supply risk scores: Blagoeva et al. [34] and Zhou et al. [14] based on the arithmetic mean of all six WGI dimensions, and Nassar et al. [13] based on the geometric mean of all six WGI dimensions. DERA [35] and Jasinski et al. [36] presented a level-based normalization with three and four levels, respectively. In addition to the approach of Erdmann et al. [32], none of the above-presented methods has been taken up so far. Sun et al. [37] used the function presented in Equation (6) to normalize the weighted arithmetic mean of the WGI-dimensions. WGI normalized = −1.9841·WGI arith.mean +5.7001 (6) For the normalization of the percentile rank of the WGI dimensions, four different methods were identified. Graedel et al. [8] simply inverted the percentile rank (on a scale from 0 to 100) by assigning the highest political stability with the lowest supply risk (cf. Equation (7)). This approach was adopted by five other methods, whereas other methods did not take up the remaining approaches for the normalization of WGI percentile ranks.
Eheliyagoda et al. [38] developed a proceeding for both C01 and C02 using Equation (8) to invert and rescale the weighted WGI PR PV respectively WGI PR GE (Governance and Effectiveness).

Regulations (D)
The fourth most often mentioned category is regulations (D), which is used in 68 out of 618 indicators (11%). We identify policy perception (D01), human development (D02), trade barriers (D03), and environmental performance (D04) as the most prominent harmonized criteria. In contrast to most other categories, the risk from regulations has emerged more recently in the work of Thomason et al. [39] in 2010. They determined the percentage of produced goods expressed in U.S. market shares that a country intends to supply to the U.S. as a measurement of D03. In 2012, Graedel et al. [8] developed a measure to determine D01 and D02 for the first time.
It is worth mentioning that three dominant measurements of regulations have been developed within the criteria: The Policy Perception Index (PPI) in D01 is provided by the Fraser Institute and captures the influence of policies on mining activities in a country [40]. The Human Development Indicator (HDI) in D02 has been developed by the UNDP and evaluates the living conditions of a country [41]. The Environmental Performance Index (EPI) in D04 is provided by Yale University and rates the ability of a country to cope with environmental challenges [42]. All three measurements are updated annually. The associated normalization schemes are displayed in Figure 6.
Graedel et al. [8] weighted the PPI of mining regions by the respective production share. They normalized this measurement by a simple inversion subtracting the PPI from 100 according to Equation (9). The PPI ranges from 0, indicating low policy attractiveness, to 100, displaying high policy attractiveness for mining activities. Ten other methods adopted this approach in the same way.
Bach et al. [43] applied the distance-to-target method as described in Section 3.1 on the inverted and weighted PPI values using a threshold of 55. Eheliyagoda et al. [38] applied the same approach for the WGI percentile ranks of D01 and D02 (cf. Equation (8)) on the weighted PPI values. Both Zhou et al. [14] and Pell et al. [17] adopted their previously presented approaches. Zhou et al. [14] applied the normalization based on the minimum and maximum observed values of the PPI, whereas Pell et al. [17] first applied a rescaling from 0 to 1 according to the minimum and maximum observed values followed by the distance-to-target approach.

By-Product Dependence (E)
The fifth most often occurred category is by-product dependence (E), which is used in 44 out of 618 indicators (7%). We have found one dominant criterion giving the same name as the category by-product dependence (E01), which first appeared in the work of Grebe et al. [9]. A qualitative approach was used to assign the observed raw materials to a supply risk score ranging from 1 to 5. More information about the classification can be For D02, the HDI weighted by production shares of mining countries was the most often used measurement. The HDI evaluates the three dimensions of life expectancy, educational standard, and standard of living on a scale from 0 to 1 [44]. Ciacci et al. [45] rescaled the weighted HDI values as presented in Equation (10) by a scaling factor of 100, which was followed by eight other methods.
Eheliyagoda et al. [38], Zhou et al. [14], as well as Pell et al. [17] applied their approaches on the HDI as conducted for previous criteria: Eheliyagoda et al. [38] applied the normalization formula shown in Equation (8) on HDI values, Zhou et al. [14] normalized according to the observed minimum and maximum values, Pell et al. [17] applied their combination of a normalization to a scale from 0 to 1 based on minimum and maximum values and the distance-to-target method. Schneider et al. [15] also stuck to their distanceto-target method already performed for A01-A03 with a threshold for the weighted HDI of 0.12. Helbig et al. [46] applied the formula presented in Equation (11) to normalize the weighted HDI values, resulting in 0 to 100. All of the previously mentioned studies deem high values of the HDI as high supply risk.
An opposite interpretation of the HDI was proposed by Jasinski et al. [36], resulting in an alternative normalization. They consider countries with low human development as critical because of the high probability of improving social conditions by introducing policies that disrupt mining activities. In other words, high weighted HDI values lead to a low supply risk in their study. Therefore, a stepwise normalization consisting of four levels is conducted.
The discrepancy in the interpretation of indicators continues for D04 with the EPI as a single measurement. Roelich et al. [47] used the production-weighted EPI as a measurement of D04 for the first time to describe the risk that a country has or introduces environmental policies that might restrict mining activities. Thus, an EPI value of 100 yields a high potential for restrictions in mining activities due to environmental policies. Since an EPI of 100 indicates high supply risk, no further normalization needs to be applied.
In contrast, Zhou et al. [14] and Jasinski et al. [36] oppositely interpreted the EPI. According to their normalization approaches, a higher EPI value leads to lower supply risk scores. They are less vulnerable to incidents and related supply failures because of their environmental standards [36]. While Zhou et al. [14] applied the same method as previously used for A by scaling the weighted EPI according to the minimum and maximum values observed, Jasinski et al. [36] used a four-level normalization applied for D02 with slightly different limits.

By-Product Dependence (E)
The fifth most often occurred category is by-product dependence (E), which is used in 44 out of 618 indicators (7%). We have found one dominant criterion giving the same name as the category by-product dependence (E01), which first appeared in the work of Grebe et al. [9]. A qualitative approach was used to assign the observed raw materials to a supply risk score ranging from 1 to 5. More information about the classification can be found in the Supplementary Material spreadsheets. However, the most commonly used measurement for E01 is companionality developed by Nassar et al. [13] and the companion metal fraction (CMF), which is the percentage share of a raw material produced as a by-product. Companionality (CP) evaluates the contribution of raw material to the profitability of a mine in contrast to other raw materials sourced from the same mine for all sourcing locations. The CP values are usually rescaled by a factor of 100, as shown in Equation (12) to result in a supply risk score from 0 for no risk by independent raw materials to 100 for high supply risk posed by full dependence of raw materials from other mined materials.
The normalization schemes of CMF are displayed in Figure 7. Same as for companionality, the most common approach is a multiplication by 100 to create a supply risk score ranging from 0, indicating a raw material is not produced as a by-product to 100, meaning a raw material is entirely made as a by-product. This method is used by Graedel et al. [8], followed by six other methods. Schneider et al. [15] applied the same distance-to-target approach for the categories above using a threshold of 0.2. BGS [48] and Jasinski et al. [36] proposed a stepwise three-level respectively four-level normalization. Other methods have not adopted either approach.
The normalization schemes of CMF are displayed in Figure 7. Same as for companionality, the most common approach is a multiplication by 100 to create a supply risk score ranging from 0, indicating a raw material is not produced as a by-product to 100, meaning a raw material is entirely made as a by-product. This method is used by Graedel et al. [8], followed by six other methods. Schneider et al. [15] applied the same distance-to-target approach for the categories above using a threshold of 0.2. BGS [48] and Jasinski et al. [36] proposed a stepwise three-level respectively four-level normalization. Other methods have not adopted either approach.

Dependence on Primary Production (F)
The sixth supply risk indicator category is the dependence on primary production. This terminology inverts the typically used original category name of recycling or recyclability. The inversion reflects that it is precisely the lack of recycling that increases the supply risk. The two most often used criteria in this category are the end-of-life recycling rate (F01) and the recycled content ratio (F02). The UNEP report on recycling of metals a decade ago has given a good overview on the terminology on metal cycles, including differentiation between old scrap and new scrap, and the importance of collection rates, remelting yields, and growing material demands for the measurements of recycling in global cycles [49]. The report is still often used as the data source for various supply risk assessments.

Dependence on Primary Production (F)
The sixth supply risk indicator category is the dependence on primary production. This terminology inverts the typically used original category name of recycling or recyclability. The inversion reflects that it is precisely the lack of recycling that increases the supply risk. The two most often used criteria in this category are the end-of-life recycling rate (F01) and the recycled content ratio (F02). The UNEP report on recycling of metals a decade ago has given a good overview on the terminology on metal cycles, including differentiation between old scrap and new scrap, and the importance of collection rates, remelting yields, and growing material demands for the measurements of recycling in global cycles [49]. The report is still often used as the data source for various supply risk assessments.
The argument for why dependence on primary production and thus a lack of secondary production, i.e., recycling, causes higher supply risk is the following: Secondary raw materials are a raw material source independent of the primary production route, in particular mining; it is available without geological exploration, with its availability depending predominantly on past material use, and it is known locally in the countries of utilization. Therefore, the availability of secondary raw materials makes shortages of primary raw materials and high market concentrations less likely.
In general, there are two schools of thought on how recycling should be measured as a criterion for dependence on primary production: Either method uses the end-of-life recycling rate (EoLRR) as the measurement or they use the recycling content ratio (RCR). Figure 8 shows the normalization schemes applied in these two measurements.
The EoLRR measures the share of end-of-life wastes collected and recycled so that the material can enter a new fabrication or manufacturing stage. Since there will always be waste flows that are not collected and thermodynamic limits to remelting yields, this EoLRR will always be smaller than 100%. The predominant normalization formula applied to the EoLRR is the naïve approach to linearly rescale the values of 0% to 100% to scores of 100 to 0 points. pending predominantly on past material use, and it is known locally in the countries of utilization. Therefore, the availability of secondary raw materials makes shortages of primary raw materials and high market concentrations less likely.
In general, there are two schools of thought on how recycling should be measured as a criterion for dependence on primary production: Either method uses the end-of-life recycling rate (EoLRR) as the measurement or they use the recycling content ratio (RCR). Figure 8 shows the normalization schemes applied in these two measurements. The EoLRR measures the share of end-of-life wastes collected and recycled so that the material can enter a new fabrication or manufacturing stage. Since there will always be waste flows that are not collected and thermodynamic limits to remelting yields, this EoLRR will always be smaller than 100%. The predominant normalization formula applied to the EoLRR is the naïve approach to linearly rescale the values of 0% to 100% to scores of 100 to 0 points.
In contrast, the RCR measures the share of recycled content in fabricated or manufactured goods. Since these goods can only consist of primary or secondary materials, this In contrast, the RCR measures the share of recycled content in fabricated or manufactured goods. Since these goods can only consist of primary or secondary materials, this ratio will also be between 0% and 100%. However, because raw material markets have been growing for most materials over the past decades, the RCR will often be lower than the EoLRR. Therefore, methods already attribute no supply risk scores for any RCR over 50%.
Both EoLRR and RCR have their flaws as measurements for primary production dependence. For the EoLRR, there may be high recycling rates at end-of-life; however, these are irrelevant if the supply risks are emerging from rapidly growing future technology demand. As the EoLRR considers only recycling from old scrap, which is only formed after the use phase, there is a natural time lag between demand growth and growth of end-of-life wastes. For the RCR measurement, the ratio of recycled content may be high in a fabricated product. Still, if this all came from new scrap recycling, recycling before the use phase does not alter the primary material demand. One should be cautious that high prompt scrap formation rates with high recycling rates for prompt scrap might artificially increase the RCR without providing any risk-reducing alternative raw material source.

Demand Growth (G)
The seventh supply risk category is that of expected demand growth, in particular from future technologies. The most common approach is to relate the expected additional demand in the future and relate it to current production volumes. Angerer et al. [50] have first utilized this approach, which is a study that has been excluded from our dataset because it has been updated by Marscheider-Weidemann et al. [51].
This approach to calculate the future technology demand as a ratio between additional demand growth and current production typically needs a base year (for current production) and reference year (for future technology demand). For example, Angerer et al. originally calculated the raw material demand for various future technologies for 2030 and used 2006 as the base year. Using the ratio rather than, e.g., the quantity or value of future technology demand also allows comparing different raw materials produced in orders of varying magnitude. Not the absolute amount of material production is problematic, but rather the required relative demand growth. Since base years and reference years differ between supply risk assessments naturally, depending on their publication date and goal and scope of the evaluations, normalizations can only be compared based on the annualized additional demand growth, given in percentages (cf. Figure 9). use phase does not alter the primary material demand. One should be cautious that high prompt scrap formation rates with high recycling rates for prompt scrap might artificially increase the RCR without providing any risk-reducing alternative raw material source.

Demand Growth (G)
The seventh supply risk category is that of expected demand growth, in particular from future technologies. The most common approach is to relate the expected additional demand in the future and relate it to current production volumes. Angerer et al. [50] have first utilized this approach, which is a study that has been excluded from our dataset because it has been updated by Marscheider-Weidemann et al. [51].
This approach to calculate the future technology demand as a ratio between additional demand growth and current production typically needs a base year (for current production) and reference year (for future technology demand). For example, Angerer et al. originally calculated the raw material demand for various future technologies for 2030 and used 2006 as the base year. Using the ratio rather than, e.g., the quantity or value of future technology demand also allows comparing different raw materials produced in orders of varying magnitude. Not the absolute amount of material production is problematic, but rather the required relative demand growth. Since base years and reference years differ between supply risk assessments naturally, depending on their publication date and goal and scope of the evaluations, normalizations can only be compared based on the annualized additional demand growth, given in percentages (cf. Figure 9).

Lack of Substitution Options (H)
The eighth indicator category for supply risk is that of lack of substitution options. A lack of viable substitutes for a material or product creates a dependency in the supply chains, reducing the system's resilience. In 22 out of 26 cases (84 %), the substitutability of raw material is used as a criterion in this category. Substitutability can happen on a material, component, assembly, or conceptual level as described by Habib and Wenzel [23] at the example of wind turbines. In particular, for high-tech applications, the substitution of materials is often limited, as developed by Nassar [52] for platinum-group metals. The most prominent evaluation of substitutability for a large set of raw materials has been published by Graedel et al. [53], who set out to identify the main applications of each element, identify possible substitute materials in each of these applications, and then evaluate the performance of that substitute. As a result of the heterogeneity of applications, these are difficult to quantify. Therefore, experts' judgment on a multi-point scale is used to conclude the substitutability score. Application shares are afterwards used to calculate a weighted average of the scores [53]. Graedel et al. were not the first and not the only ones to use such an approach for evaluating the lack of substitution options. The first to use the lack of substitution options as an indicator for supply risk was again Grebe et al. [9], however only with the straightforward classification of raw materials with "no substitution", "hardly any substitution", and "substitution" and no differentiation of application shares. The European Commission [54] and Erdmann et al. [32] also applied this concept of a weighted average of expert assessment for the main applications of the raw material. A shift from substitutability to substitution has been used for the later updates of the EU Critical Raw Materials list [55]. While this may seem to be quibbling, the difference is substantial, as substitution only considers proven and readily available substitutes. Consequentially, the supply risk scores of many raw materials in the EU criticality study increased due to this change [56,57].
The normalization scheme for lack of substitution options is trivial: typically, a linear scale is used, with no further rescaling. Therefore, no figure is shown for this indicator category.

Price Volatility (I)
The ninth indicator category for supply risk is price volatility. It was impossible to identify different criteria for this category, so all 17 cases are assigned to the same criterion I01. In detail, the measurements vary between the price volatility, the variation coefficient, and the relative price change within a specific period. All methods use a stepwise normalization of price volatility measurements to supply risk scores. Most studies use four-level to five-level normalization functions in which higher price volatility leads to higher supply risks. The only exception is the method by Eggert et al. [19], who use a seven-level decreasing normalization function. The authors were among the early supply risk assessments, and they did not explain why they evaluated low price volatility with high supply risk. Figure  According to economic theory, the interpretation of the criterion price volatility is ambiguous, because a price increase should result from a supply-demand gap, not the reason. It is also questionable if one can anticipate future supply risks by the analysis of historical price development. Therefore, it is not surprising that this indicator category has only been used in very selected supply risk assessments and not been used consecutively by a series of methods.

Import Dependency (J)
The tenth indicator category for supply risks is import dependency. This indicator category is specifically designed for a national perspective. This category is measured with the net import reliance criterion in eight out of 16 cases (50%) (J01). The net import reliance (NIR) is calculated as the ratio between net imports and apparent consumption, as shown in Equation (13).

NIR =
Net imports Apparent consumption = Imports − Exports Domestic Production + Imports − Exports The rationale behind using this indicator for supply risk assessments is to identify According to economic theory, the interpretation of the criterion price volatility is ambiguous, because a price increase should result from a supply-demand gap, not the reason. It is also questionable if one can anticipate future supply risks by the analysis of historical price development. Therefore, it is not surprising that this indicator category has only been used in very selected supply risk assessments and not been used consecutively by a series of methods.

Import Dependency (J)
The tenth indicator category for supply risks is import dependency. This indicator category is specifically designed for a national perspective. This category is measured with the net import reliance criterion in eight out of 16 cases (50%) (J01). The net import reliance (NIR) is calculated as the ratio between net imports and apparent consumption, as shown in Equation (13).

NIR =
Net imports Apparent consumption = Imports − Exports Domestic Production + Imports − Exports (13) The rationale behind using this indicator for supply risk assessments is to identify materials for which the upstream supply chain is out of the hands of domestic policy and trade. If a country has to rely on foreign exploration, extraction, or processing, it can consider their continuous operation, or the access to the materials, as less reliable. Therefore, higher net import reliance is considered with higher supply risks.
Most methods, starting with Goe and Gaustad [58], use the simple linear normalization approach where no net imports result in no supply risk, and 100% NIR results in a supply risk score of 100. Only Li et al. [59] define the steps at 40% and 70% NIR as thresholds for their three-level normalization function. Figure 11 shows the normalization functions for J01. Other criteria are shown in the Supplementary Material spreadsheets.
Resources 2021, 10, x FOR PEER REVIEW 20 of 28 Figure 11. Normalization schemes for the net import reliance criterion (J01), which is measured in percent.

Other Indicators
Our review found an additional 67 cases of indicator uses that could not be grouped into indicator categories. Therefore, these "other" indicators are a collection of 53 widely differing criteria, none of them used more than four times in total. Those that are at least used twice are the current market balance (X01), stock keeping (X02), purchasing potential (X03), supply adequacy (X04), natural disasters (X05), economic importance (X06), the Sector Competition Index (X07), the economy of storage and transport (X08), storage complexity (X09), investment potential (X10), material cost impact (X11), and material dependency (X12).
If one wants to find patterns in this loose collection of indicators, three areas of interest may occur: stocks and storage patterns, price and cost aspects, as well as total demand and market size. However, due to the high variation in indicator application and measurements and a lack of repeated implementation in supply risk assessments, we will refrain from discussing these indicators in detail. All individual indicators are listed with their respective measurement and normalization scheme in the Supplementary Material spreadsheets.

Discussion and Conclusions
The variety in supply risk indicator usage is impressive. It is understandable because of the different goals and scopes of studies in our review. For example, omitting physical scarcity as a risk factor makes sense when the assessment is focused on short-term risks. Likewise, companies will be much less concerned about import dependence than nations. Therefore, even after another "five years of criticality assessments", the harmonization Figure 11. Normalization schemes for the net import reliance criterion (J01), which is measured in percent.

Other Indicators
Our review found an additional 67 cases of indicator uses that could not be grouped into indicator categories. Therefore, these "other" indicators are a collection of 53 widely differing criteria, none of them used more than four times in total. Those that are at least used twice are the current market balance (X01), stock keeping (X02), purchasing potential (X03), supply adequacy (X04), natural disasters (X05), economic importance (X06), the Sector Competition Index (X07), the economy of storage and transport (X08), storage complexity (X09), investment potential (X10), material cost impact (X11), and material dependency (X12).
If one wants to find patterns in this loose collection of indicators, three areas of interest may occur: stocks and storage patterns, price and cost aspects, as well as total demand and market size. However, due to the high variation in indicator application and measurements and a lack of repeated implementation in supply risk assessments, we will refrain from discussing these indicators in detail. All individual indicators are listed with their respective measurement and normalization scheme in the Supplementary Material spreadsheets.

Discussion and Conclusions
The variety in supply risk indicator usage is impressive. It is understandable because of the different goals and scopes of studies in our review. For example, omitting physical scarcity as a risk factor makes sense when the assessment is focused on short-term risks. Likewise, companies will be much less concerned about import dependence than nations. Therefore, even after another "five years of criticality assessments", the harmonization that Graedel and Reck asked for has not taken place [7]. However, many of Graedel and Reck's other "desirable aspects" are covered nowadays by the methods in this review.
The material scope often includes various chemical elements and biotic raw materials and minerals [60,61]. The risk factors also include geology (scarcity, by-product dependence), regulations, and geopolitics (political instability, import dependence). For example, even cultural aspects are used, the "conformity of ideological values" by Nassar et al. [62]. The substitutability or the lack thereof, the dependence on primary production, and the by-product dependence are three of the ten indicator categories. However, these categories often rely on previous assessments such as Graedel et al. [49,53] or Nassar et al. [13]. In contrast to the studies reviewed by Achzet and Helbig [3], in 2013, most of the methods in this review are now published in peer-reviewed journals, not as technical reports. Similar to the European Commission or the United States, some governmental reports undertake the split path of publishing the technical report and a peer-reviewed methodological paper in parallel [62,63].
However, the periodical update that Graedel and Reck [7] also asked for is a rare feature. Many studies are carried out by researchers at universities or other academic institutes without permanent funding for such updates. The EU and US criticality lists, updated every three to four years, are exceptions [60,64].
The transparency of some methods is hampered by the non-disclosure of data [15,[65][66][67][68]. While we understand the importance of confidentiality, particularly for company reports, from a scientific perspective, this reduces the transparency and accessibility of corporate supply risk assessment reports [28,69]. Some other methods used sophisticated intermediate scores, thresholds, and renormalizations up to the point that results turned out to be irreproducible or the quantitative results simply contradict the textual explanations [17,34,62,70].
Some methods such as Zhou et al. [14], Pell et al. [17], and Bach et al. [71] use normalizations based on the specific material scope, for example setting the bounds by looking at the minimum and maximum observed measurement, leading to a distortion of the supply risk score since the score is dependent on the raw materials selected for the assessment. In other words, the supply risk score of the observed raw materials varies depending on the investigated raw materials. The integration of individual bounds depending on the values in the normalization function leads to different supply risk scores for the same indicator value. Therefore, results in between studies are not comparable, and the overall results are weakened. Adjusting indicator calculation or normalization schemes may be viable for specific purposes of custom methods. However, in these cases, full transparency and reproducibility are even more important.
The supply risk assessments in the review still often do not adequately report data uncertainties and sensitivity to methodological choices. Only very few authors undertake the effort of doing Monte Carlo simulation or other error propagation methods [8,12,31,72]. Variations of indicator choices and normalizations, which have been discussed by Erdmann and Graedel [73], are also rare.
Concluding, we want to highlight two efforts by individual researchers and one general recommendation which, in our view, would improve future supply risk assessments. Firstly, the effort from Mayer and Gleich [6] (and in many other publications from this working group) to link commonly used supply risk indicators with price variations, as the theoretical result of supply shortages through multiple regression analysis is a practical approach. One of their results was that impacts of supply risk aspects vary between chemical elements and, therefore, no universal indicator set will be found. Secondly, Hatayama and Tahara [5] established a list of supply disruption events, which, if continued, extended to global coverage, and further evaluated could be an excellent basis for event studies. Such event studies could be used to statistically assess the likelihood of supply disruptions at various levels of supply risk indicators. For example, this would eventually allow identifying a non-linear normalization formula instead of naïve approaches or single threshold values. However, the normalization formula and thresholds are still better than the semiquantitative approach of point-wise or step-wise normalizations used in many assessments. Supply risk indicators can be measured and should be interpreted quantitatively.
We strongly recommend updating some of the data sources commonly used in supply risk assessments. While the USGS provides annual updates to production and reserves data, and while the various political and regulatory indices are also updated annually, the data sources for by-product dependence, dependence on primary production, and lack of substitution options by now are up to a decade old. Given increasing efforts to implement a circular economy, ongoing technological development, and rapid material extraction growth, the values of these data sources are at risk of becoming outdated.