Integrative Scenario Assessment as a Tool to Support Decisions in Energy Transition

: Energy scenarios represent a prominent tool to support energy system transitions towards sustainability. In order to better fulﬁl this role, two elements are widely missing in previous work on designing, analyzing, and using scenarios: First, a more systematic integration of social and socio-technical characteristics of energy systems in scenario design, and, second, a method to apply an accordingly enhanced set of indicators in scenario assessment. In this article, an integrative scenario assessment methodology is introduced that combines these two requirements. It consists of: (i) A model-based scenario analysis using techno-economic and ecological indicators; (ii) a non-model-based analysis using socio-technical indicators; (iii) an assessment of scenario performances with respect to pre-determined indicator targets; (iv) a normalization method to make the two types of results (model-based and non-model-based) comparable; (v) an approach to classify results to facilitate structured interpretation. The combination of these elements represents the added-value of this methodology. It is illustrated for selected indicators, and exemplary results are presented. Methodological challenges and remaining questions, e.g., regarding the analysis of non-model-based indicators, resource requirements, or the robustness of the methodology are pointed out and discussed. We consider this integrative methodology being a substantial improvement of previous scenario assessment methodologies. to “Inertia” (less global crises), and Germany manages to sustain a reasonable economic growth. This prepares the ground for the population to meet the challenges of the energy transition with a positive attitude, allowing more transformative measures than possible in the “Inertia” scenario. By preference, this scenario rather shows inclination to economic and societal liberalisation. However, the urgency of rapid transformation also requires regulative approaches in some cases. The CO 2 reduction in this scenario is strong, though not target complying.

lenges suitably are missing. One of the very few framework approaches is suggested in [22], and it offers guidelines primarily on how to carry out a qualitative assessment.
In this article, we introduce an integrative scenario assessment approach that provides a comprehensive framework by combining socio-technical scenarios with a comprehensive sustainability assessment tool. The particular combination of two methodological assessment elements, that are described in the following, addresses shortcomings mentioned above and builds the new and value-adding essence of this approach: first, enhancing model-based quantitative analyses of common technological and energy-economic indicators by non-model-based analyses of indicators to better cover social and socio-technical aspects of the energy system and its transition. Second, an assessment approach of future indicator values including targets determined in different ways for the two indicator types, a normalization method to make the two types of results comparable, and a proposal to classify assessment results in order to allow for their structured interpretation. The ultimate goal of this approach is to draw differentiated pictures of scenario performances by revealing and highlighting strengths and weaknesses with respect to sustainability across model-based and non-model-based analyses. This allows for a comprehensive comparison of different scenarios and for providing improved support of system transition processes by better addressing key energy system characteristics. First exemplary results of an application of this approach are presented in this article.
In the following Section 2, the design and selection of socio-technical scenarios for assessment, the selected sustainability indicators that were analyzed model-based and non-model-based, the determination of target values for these indicators, and the step of gaining information for future indicator performances are described. In Section 3, the indicator assessment process, including the normalization step to make future indicator values comparable, assessment results, and the classification of results are presented in detail exemplarily. The complete approach and its application are discussed and critically reflected in Section 4, followed by final conclusions and proposals of further methodological and application-related development and research requirements in Section 5.
The development of this integrative approach is mainly based on work carried out in the Helmholtz-Alliance ENERGY-TRANS, an interdisciplinary research project where different pathways of the German energy system transition were analyzed [34].

Materials and Methods
The starting point of the analysis, and a key element of our approach, are the so-called socio-technical scenarios. Those scenarios describe possible future developments of the energy system, not only through techno-economic characteristics, but also through social and political context factors. The socio-technical scenarios used in this study have been taken from [35]. These scenarios are characterized by the fact that they were developed using the Cross-Impact Balances (CIB) method [36]. However, the approach presented here can be used for any socio-technical scenario with a sufficiently high degree of detail, in particular with regard to socio-political aspects of the transformation. The scenarios taken from [35] were only used to illustrate our assessment methodology. The innovative core of this methodology, i.e., the integration of model-based and non-model-based elements in scenario assessment and the approach for comparing scenarios, can be described and understood, basically, even without information from [35]. Both, the context scenario approach used here, and the CIB method are shortly described in Section 2.1.1. Figure 1 illustrates the workflow and the different working steps (1-9) of the integrative approach in this analysis, which will be outlined hereinafter: Step (1): As a first working step, six scenarios were selected from the large number of consistent context scenarios developed in [35] for a more in-depth analysis of the sustainability performance of possible energy system transformation pathways. This selection process is described in Section 2.1.2.
Step (2): A comprehensive assessment of the sustainability performance of transformation pathways requires considering a broad range of sustainability indicators, whichin turn-often need different assessment approaches, are measured on different (noncomparable) scales with different units, etc. The selection of indicators for this study was based on the indicator set published in [37], based on the Integrative Concept of Sustainability (ICoS) [38], and is described in Section 2.2. For the purposes of the present work, a distinction was necessary between model-based indicators (quantifiable with the help of an energy system model including an environmental impact assessment) and non-modelbased indicators (quantified and assessed with a special approach presented in this paper). Both indicator categories are treated differently at the beginning of the assessment process. However, the following normalization step (see below) allows for comparing the scenarios performances with regard to both indicator categories.
Step (3): In order to quantify the model-based indicators, we proceeded as follows: The six selected context scenarios are applied as boundary conditions for a detailed energy system model in order to obtain detailed integrative socio-technical energy transformation scenarios up to 2050. Energy system-related indicators such as primary energy demand or the share of renewables were directly calculated in the model used here. Additionally, this energy system model includes a module allowing us to also calculate and assess energy-related emissions of greenhouse gases and pollutants.
Step (4): Quantitative results for the future performance of the selected model-based indicators were obtained from both the main energy systems model and the environmental impact module (for steps 3 and 4, see Section 2.3).
Step (5): Calculating future performances of the non-model-based indicators required the development of a new approach based on expert judgements regarding the impacts of descriptor variants in the context scenarios on non-model-based indicators. This approach will be illustrated in detail in Section 2.4.
Steps (6) and (7): The determination of targets differs between model-based (step 6) and non-model-based indicators (step 7): Whereas model-based indicator targets have been taken from the literature [39,40], the targets for the non-model-based indicators are derived from the frequency distribution of the indicators' values in a very large number of context scenarios (see Section 2.5).
Step (8): In order to be able to compare the scenario performances with regard to all indicators and to identify strengths and weaknesses of particular scenarios with respect to individual indicators, both model-based and non-model-based indicator values had to be normalized. In this paper, a distance-to-target approach has been chosen for the normalization (explained in Section 2.5).
Step (9): Normalized model-based and normalized non-model-based indicators are included in the further evaluation of the sustainability performance of the selected scenarios (the approach is described in Section 2.6).

Context Scenarios and CIB
The aim of an integrative sustainability assessment of energy scenarios requires that the scenarios provide not only information about technological and energy-economic developments, but also the societal context driving the energy-related developments or being influenced by them. To give an example, the expansion of renewable energy production as part of an energy scenario, effecting a decrease of greenhouse gas emissions, should be assessed differently from an integrative sustainability perspective, depending on the particular context: It may take place, e.g., in a society consensually supporting the energy transition project and participating in it, for instance by investment cooperatives, or in a world where the change is pushed through as an 'elite project' against an adverse population, perceiving little more than unwelcome infrastructure projects in their neighborhood and rising energy prices. Hence, integrative sustainability assessments of energy scenarios depend on holistic descriptions of energy pathways and their societal embedding, as provided, for instance, by the story-and-simulation approach [41,42] and the context scenario approach [9]. While the classical story-and-simulation approach relies on intuitive group discussion techniques for constructing societal "stories", the context scenario approach uses Cross-Impact Balances (CIB) [36], a formal scenario method for constructing discrete-state scenarios.
CIB uses a set of scenario factors ("descriptors") describing the most important scenario topics in a qualitative or quantitative way. When used within a context scenario analysis, the descriptor set usually consists of a mixture of societal, economic, and political key factors, combined with the most important model input parameters [9]. For each descriptor, a set of usually 2-4 alternative futures ("variants") is defined to capture the uncertainty regarding the future descriptor development. Interdependencies between the descriptors are assessed by experts and a consistency algorithm is applied for identifying a set of self-reinforcing configurations of the descriptor futures ("consistent scenarios"). A state-of-the-art description of the practice of combining societal storylines and energy modeling is given in [10].
We opted to use the context scenario approach because, being CIB-based, it supplies the sustainability assessment not only with additional socio-technical context information about energy pathways, but also with a comprehensive database of qualitative explanations about societal and techno-economic interdependencies. These interdependencies also affect sustainability indicators and, thus, the sustainability assessment. Recent examples of context scenario studies in energy research are [35,43], or [44].

Scenario Selection ("Step 1")
For the demonstration of our sustainability assessment approach, we used results of the context scenario analysis of [35]. In this study, 4869 consistent context scenarios were constructed for representing a broad variety of societal and energy-related developments in Germany until 2050, ranging from three poles of idealized societal futures ("Inertia", "Market", and "Value Shift"), and also covering gradations between these idealized poles (see Figure S2 in the supplement). In [35], four context scenarios have been selected for the in-depth energy systems analysis representing the three poles plus one scenario compatible with the energy political goals of the German government [45] ("Target" scenario), and we used these four scenarios for our analysis. Additionally, we decided to select two further context scenarios from [35] for our demonstration, based on two criteria:

1.
Scenario descriptors with particularly high influence on the sustainability assessment, like demographic and GDP development, may dominate indicator performances and, thus, the assessment. In order to create a subset of scenarios suitable for the analysis of weaker impacts of other descriptors on the sustainability performance, we requested that both additional scenarios should include the same assumptions about demographic and GDP developments as the 'Target' scenario. This ensures that differing sustainability performances within this subset are caused by other scenario characteristics besides these two major factors.

2.
Within the remaining degrees of freedom, the "social sustainability" of the two scenarios, i.e., the performance of the non-model-based indicators published in [37], should be as different as possible, in order to allow for a sufficiently contrasting sustainability comparison, especially regarding these indicators.
For executing the second criterion, we carried out an a-priori social sustainability estimation for these indicators for all descriptor variants used in [35] and calculated an according stainability index for each context scenario by adding the descriptor ratings. Finally, we chose the scenarios with maximum and minimum index for completing our scenario set.
Our analysis has been performed in parallel with the analysis of [35], and our scenario selection was based on a preliminary version of the scenario set in [35]. Judging by using the final cross-impact database of [35], one of the scenarios selected ("Coping with Pressure") shows a consistency score (which is CIB's key quality measure for scenarios) slightly below the threshold applied by [35] for their own study. Since the consistency of "Coping with pressure" was still on an acceptable level, we decided to leave this scenario in our sample. Table 1 shows the six selected scenarios and the key elements of their storylines. A comprehensive list of the descriptor variant combinations for the six scenarios is provided in the supplementary document (Table S1).

Market
The global market paradigm drives liberalization in Germany. Materialism fosters an indifferent attitude toward the energy transition. Nevertheless, strong economic development affords the government leeway to pursue the energy transition with a steady hand and renders the materialistic population willing to accept certain burdens of the transition without considerable resistance. Renewable energy (RE) deployment achieves limited success with a main focus on the power sector. Therefore, transition targets are not met, though CO 2 emission reduction outcomes are still considerable. Primary energy consumption levels are rather high due to strong economic development (higher GDP), which results in higher levels of final energy consumption in the industrial and service/commerce sectors, and in higher levels of non-energetic consumption by fossil energy carriers [35].

Target
The "Target" scenario is a variation of the "Market" scenario and involves a downward feedback loop between economic growth, birth rates, and migration that results in medium economic success and a lowered population. Nevertheless, supported by reinvigorated EU integration and harmonized energy policies, prospects are sufficient for the implementation of ambitious efficiency measures and high RE shares across all sectors, resulting in the lowest level of primary energy consumption and second lowest level of GHG emissions across all six scenarios. Intensified levels of electrification and hydrogen use in the transport and heating fields increase gross electricity consumption [35].

Target-Centralized
This scenario is closely related to the "Target" scenario. However, this scenario pursues the transition rather within the traditional centralized energy system leading to different technology choices, whereas the "Target" scenario develops a mixed system architecture. This difference is connected to a more negative public attitude towards the energy transition project than in "Target", making a more decentralized system architecture challenging to achieve. Vice versa, the centralized approach, offering less opportunities to individual engagement, does little to improve the public view on the transition project. A further consequence of the negative public attitude is a rather sluggish development of new mobility structures in this scenario.

Value Shift
A society at ease with its European and global environment pursues the energy transition with belief and involvement. However, its post-materialistic values develop only on foundations of advanced wealth after a phase of economic prosperity. High GDP and population values burden the emissions balance in spite of successful structural and mental changes. Compared to the "Target" scenario, the "Value Shift" scenario results in only slightly higher emissions, due to the highest share of RE and the highest primary energy use of all variants. The latter results from high GDP and population growth levels increasing final energy consumption despite similar energy intensities involved. In addition, the higher share of heat pumps used in households and higher levels of hydrogen demand for industry and transport cause this scenario to involve the highest levels of power demand. As the "Value Shift" scenario uses more geothermal power (with rather low efficiency), this also increases primary energy demand. The strong deployment of renewable energies is made possible by a strong acceptance of new technologies, a positive public attitude, high levels of political stability, and coordinated and multi-scale governance, especially towards the use of wind power, photovoltaic installations, and grid expansion. The "Value Shift" scenario benefits as well from successful global policy reforms and from intensified European integration and cooperation and therefore from similar developments regarding RE deployment and infrastructure expansion [35].

Inertia
The "Inertia" scenario is characterized by a quite different societal storyline. Severe international conflicts, global fragmentation, and high fossil fuel prices should motivate society to transform. However, this also narrows political margins of the government and political, regulative, and societal uncertainties, while unsettled public opinion averse to experimentation renders it impossible to take up the challenge. Weak economic development and limited innovative ability limit RE expansion and improvements in industrial efficiency. However, weak economic development also results in relatively low levels of energy demand. Nonetheless, the "Inertia" scenario shows the highest CO 2 emissions for 2050 relative to the other scenarios as a result of relative weak RE expansion observed in the power and heat sectors [35].

Coping with Pressure
Again, this scenario describes a society under pressure from political and economic developments (EU disintegration and high fuel prices). However, pressure is reduced compared to "Inertia" (less global crises), and Germany manages to sustain a reasonable economic growth. This prepares the ground for the population to meet the challenges of the energy transition with a positive attitude, allowing more transformative measures than possible in the "Inertia" scenario. By preference, this scenario rather shows inclination to economic and societal liberalisation. However, the urgency of rapid transformation also requires regulative approaches in some cases. The CO 2 reduction in this scenario is strong, though not target complying.

Indicator Selection ("Step 2")
For the sustainability assessment of the scenarios, an indicator system consisting of 45 indicators was used as a basis that has been developed for the assessment of the German energy system [37,39] applying the Integrative Concept of Sustainable Development (ICoS) as a conceptual framework [38].
For the test application of the assessment method introduced here, 22 indicators from this set have been selected, structured in two types: Including the whole indicator set in this demonstration would require considerably more time resources, whereas the focus of this article is on presenting an approach to deal with the challenge of combining these two indicator types methodologically sound.
For a suitable assessment of current and future indicator performances, indicator target values are essential. In the case of the approach presented here, a mixture of sources to determine such targets has been used, based on work published in [40]: (i) Existing policy-based official and binding targets in Germany or other countries, as far as existing; (ii) proposals of policy consulting or other expert institutions; (iii) observable trends in national and international scientific or public debates about targets; (iv) in cases where none of these sources were available, targets were determined using plausibility considerations or conclusions by analogy, i.e., they were adopted from or defined in line with targets for other thematically linked indicators within the set applied.
For the model-based indicators, targets determined in this way were applied ("step 6"). For the assessment of non-model-based indicators ("step 7"), this has been replaced by another approach, for particular methodological reasons (see Section 2.5).

Scenario Modelling and Quantification of Model-Based Indicators ("Steps 3 and 4")
The selected context scenarios described in Section 2.1.2. have been used as boundary conditions for a detailed scenario modelling with the scenario development tool MESAP/PlaNet (Modular Energy System Analysis and Planning Environment) [46]. The principal approach is identical with the approach used in [35], details on the method and challenges to couple an energy system model with CIB-based context scenarios can be found there. MESAP/PlaNet is a technology-rich bottom-up accounting framework of the German energy system that has mainly been used for the development of normative energy scenarios [47][48][49][50]. It takes into account all end-use sectors (residential, industry, service, commerce and trade, transport), seven types of end-use applications (space heat, hot water, process heat, space cooling, process cooling, mechanical energy, IIC-illumination, information, and communication), and includes numerous technological options in the conversion sector: Power and district heat generation in power plants, combined heat and power (CHP) plants, and heating plants, as well as the generation of "new" (biogenic and synthetic) fuels and gases. A further focus of the model is the coupling of the power, heat and transport sectors through CHP, through the direct electrification of heat (electric heat pumps, electric resistance heaters) and mobility (battery electric vehicles, BEV, and plug-in hybrid electric vehicles, PHEV), or indirect electrification through synthetic gases (H 2 , CH 4 ) and fuels.
MESAP/PlaNet allows for the consistent integration of a wide range of aspects and expert knowledge into the scenarios, e.g., knowledge on efficiency potentials in the demand sectors, potentials, performances, and costs of future technologies, feedbacks, interactions, and interdependencies in a highly coupled future energy system, regulative interventions and their consequences on market development, etc. However, experience and knowledge of the scenario developers is essential in order to develop a plausible energy scenario. In this respect, the scenario development approach with MESAP/PlaNet differs fundamentally from optimization models that, e.g., focus on minimizing total system costs.
The model basically calculates full energy balances-from useful energy for different end-use applications and freight and passenger transport services, to final and primary energy for all relevant energy carriers and technologies for the period 2015-2050 (in steps of five years). Relevant standard outputs for the sustainability assessment here include the annual technology-, sector-, and application-specific energy demand, installed capacities for power and heat generation, as well as energy-related CO 2 emissions. For the purpose of this study and in order to include as many indicators as possible from the set based on ICoS (see Section 2.2), the model has been further developed in order to quantitatively assess pollutant emissions, the number of electric vehicles, and the area under cultivation of energy crops.
The augmented model is then capable of calculating numerous indicators used to assess the sustainability performance of the analyzed scenarios. This is summarized in Table  S4 in the Supplement. The emissions of particulate matter (TSP), Cadmium, energy-related CO 2 emissions, and acid forming gases have been calculated from energy scenario results and technology/fuel-specific emission factors (assessment method "EF" for "emission factors" in Table S2 in the Supplement). For the CO 2 emissions, we use fuel specific emission factors from [51]. The other emission factors are based on German pollutant emission data from [52] for power and heat generation, and from [53] for tailpipe emissions from transport. However, as the technological granularity in emission modeling does not match the categories represented in the scenario model, some adjustments were necessary (see also [54]): Since an emission calculation could only be made on an aggregated level of categories and fuels, we applied a top-down calibration using the results of official bottom-up calculations for emission reporting [55]. The calibration was done for the years 2009, 2011, 2012, and 2014. All emission factors are assumed to stay constant over the entire modelling period, as no estimates of the tightening of emission standards until 2050 are available.

Quantification of Non-Model-Based Indicators ("Step 5")
In order to evaluate scenarios, it is necessary to determine the future development of individual indicators and to compare the resulting values with targets determined. If the development of an indicator can be quantitatively determined with the help of model calculations, this is possible without any problems. However, for many indicators, especially those addressing socio-technical aspects, model calculations do not provide quantitative values, as is the case for the MESAP/PlaNet model. In this section, a methodological approach will be described to determine future values for such non-model-based indicators as a basis for scenario assessment.
As already mentioned above, the future development of the non-model-based indicators is assumed to depend on various political, societal, economic, cultural, and technological trends provided by the context scenarios and the descriptors, respectively. In order to determine future indicator values, the impacts of the descriptors on the 6 selected indicators listed in Section 2.2 were discussed and documented in a number of expert panels consisting of DLR, ZIRIUS, and ITAS researchers from economics, engineering, geography, and systems sciences. The expert panels rated descriptor impacts on indicators by assigning integer numbers on a scale from "−3" to "+3". A positive value means that a descriptor influence favors an increasing indicator value. Vice versa, a negative value means that the descriptor favors a decreasing indicator value. A value of "0" means that either there is no influence, the influence is very small (in the range from "−0.5" to "+0.5"), or the expert panels could not provide a distinct decision due to various influences in different directions (so-called "uncertainty zero").
The following points should be noted regarding the assignment of values for descriptor/indicator impacts:

•
In the approach presented here, only direct impacts were considered, and no impacts conveyed indirectly via another descriptor, in order to avoid "double counting" and an according overvaluation of certain influences; • every descriptor/indicator impact on the −3/+3 scale refers to changes against the present state of a descriptor or an indicator. For example, if the current (status-quo) value of the descriptor "Attitudes of the population towards the transformation of the energy system" is positive and the future value of this descriptor in a given scenario is also "positive" (i.e., no change occurs), then a "0" should be assigned to the indicator development.
Using the indicator "Acceptance of Renewable Energies in the neighborhood" as an example, the following Table 2 shows the impacts of the descriptors "Policy stability in the energy sector" and "Attitude of the population towards the transformation of the energy system" on this indicator, as identified by the expert panels. Table 2. Example of the line of argumentation for descriptor impacts on indicators estimated in the expert panels for the indicator "Acceptance of Renewable Energies in the neighborhood".

Impacts on indicator "Acceptance of Renewable Energies in the neighbourhood"
Descriptor "Policy stability in the energy sector" and scoring by experts Variant 1: Reduced policy stability (scoring: "−1") Variant 2: Constant policy stability (scoring: "0") Variant 3: Higher policy stability (scoring: "1") Arguments in the expert discussions: Reduced policy stability (Var. 1) has a negative impact on acceptance: Declining confidence in the competence of policymakers, for example in their risk/benefit assessments of renewable energies, and thus in the technological solutions they propose. However, the impact is unlikely to be too pronounced, so a "−1" is assigned here. Correspondingly, a "1" is assigned for higher policy stability (Var. 3). If policy stability remains constant (Var. 2), nothing will change compared to the status-quo, and a "0" must therefore be assigned here. Descriptor "Attitude of the population towards the transformation of the energy system" Variant 1: Trend towards a positive attitude (scoring "3") Variant 2: No trend recognizable (scoring "0") Variant 3: Trend towards a negative attitude (scoring "−2") Expert arguments: Variant 1: Strong "Not in my backyard" (NIMBY) behaviour will not occur, as it is considered socially unacceptable with regard to the transformation of the energy system. Therefore, acceptance increases. However, since there are currently even greater reservations in parts of the population, a positive attitude gives a major boost to acceptance. Therefore, a "3" is assigned here. Variant 2: The population's attitude towards the transformation of the energy system is positive, but there is no clear commitment to sharing its negative implications. Positive and negative narratives on the transformation of the energy system compete with each other, and no single direction is prevailing as a general picture. Therefore, a "0" is assigned. Variant 3: Although the energy system transformation is still advocated in principle, local or regional implications are largely rejected. Negative narratives on the transformation of the energy system, experiences and fears associated with the transformation of the energy system perceived as negative, push the population's attitude towards the transformation in a generally skeptical direction. People react to infrastructural measures with strong NIMBY tendencies. This is why acceptance is diminishing. Due to the fact that in principle the energy system transformation is still supported, "only" a "−2" is assigned for this case. Table S5 in the supplement gives an overview of the impacts of all descriptors and all variants on this indicator.
The total indicator performance in a certain scenario is calculated by summing up all impact values according to the particular combination of descriptor variants for this scenario (see supplement Table S1). For instance, for a scenario combining higher policy stability (+1) and a trend towards negative attitude (−2), a net impact of both descriptors on acceptance of + 1 − 2 = −1 is assumed. The total performance of the indicator "acceptance" results from adding all scenario-specific descriptor impacts on this indicator to a total net impact value (see Section 3.1.2).

Normalisation of the Indicator Results ("Step 7" and "Step 8")
The quantitative values for model-based and non-model-based indicators differ in terms of units, ranges of values, and approaches to estimate the values. Thus, in order to allow for a quantitative comparison of the future development of all indicators between 2015 and the target year 2050, these indicator developments have to be normalized. For this purpose, for all model-based and non-model-based indicators, start values I 2015 and target values I target for the year 2050 have to be defined. The normalized indicator development I norm is then calculated as follows from the quantitative indicator result I calc : I norm thus describes the change of an indicator between 2015 and 2050 in the scenario in relation to a targeted development. If I norm is greater than one, the scenario overperforms with respect to the targeted change of the indicator. A normalized indicator I norm smaller than one indicates that the development lags behind the target. Thus, the sign of I norm is independent of whether the intended change in I is positive (target: Increase in I, as, e.g., for energy productivity) or negative (target: Decrease in I, as, e.g., for CO 2 or pollutant emissions). However, a negative sign indicates that even the direction of the change in the scenario is different from the intended direction (as it would be, e.g., the case if CO 2 emissions in the scenario were increasing or energy productivity decreasing).
For the model-based indicators, values for I target are taken from [39]. I calc is the model output for the respective indicator for the year 2050 obtained as described in Section 2.3. As the model is calibrated with statistical data for the year 2015, I 2015 generally matches very well with official statistic values for those indicators compiled in [39].
In [39,40], targets for the non-model-based indicators are also provided. However, the approach to quantify these indicators presented in Section 2.4 yields only (unitless) values which cannot be translated into the physical units in [39,40]. As a consequence, the targets defined by [39,40], with physical units, cannot be used to normalize the (unitless) non-model-based indicators here.
For these indicators, it is assumed that the point of departure in 2015 (I 2015 ) is described by a value of 0 for each indicator. This determination of I 2015 is motivated by the fact that within the CIB approach (see Section 2.1.1), a value of 0 for the impact of a descriptor variant A on a descriptor variant B means "no impact of A on B". Thus, a value of 0 for any non-model-based indicator in a given scenario implies that the combination of descriptor variants in this scenario has no effect on the indicator-the indicator remains unchanged between 2015 and 2050 in this scenario.
In order to obtain a target value I target for the non-model-based indicators ("step 7"), the space of possibility for each indicator was estimated by evaluating the indicator performance in 2050 for each of the 4.869 context scenarios. For those non-model-based indicators for which an increase of the indicator value is preferable ("federal expenditures on energy research", "acceptance of renewable energies in the neighborhood", "degree of internalization of energy-related external costs", "share of households producing renewable electricity", "number of energy cooperatives engaged in renewable energy plants"), I target was set to the 75% percentile of the frequency distribution of this indicator. For the indicator "Monthly energy expenditures of households with a monthly net income less than 1300 Euros", the target value was set to the 25% percentile, because in this case, a decrease of the indicator value is desired. The rationale behind the choice of this percentile for I target for the non-model-based indicators is that the target should be ambitious (i.e., significantly better than the average development), but not over-ambitious (i.e., outside of the range of possible future performances, or achievable only under extremely favorable circumstances). However, it is clear that the choice of the 75% percentile (resp. 25% percentile) is somewhat arbitrary, as also other percentiles could have been chosen while still following the same rationale. In the supplement Table S3, the target-values for the non-model-based indicators are shown. Figure S1 in the supplement.

The frequency distribution for all non-model-based indicators is shown in
Please note that approaches to normalizing the indicator values other than the one applied here are also possible, e.g., normalization to the target, to the average value of all scenarios, to the minimum or maximum of all scenarios, etc. The choice of the normalization method depends on the particular question or objective of the analysis. In [56], an overview of possible normalization approaches is provided.
2.6. Methodical Basics of the Scenario Assessment Approach ("Step 9") In view of the high complexity of the energy system, as well as methodological challenges, uncertainties, and limitations of quantitative and qualitative analyses to be carried out, the scenario assessment approach introduced here focuses on differentiated strengths-weaknesses-considerations. Exemplary results are shown in Section 3.1. Single indicator performances resulting from both model-based and non-model-based analyses are assessed for each indicator, instead of aggregating them into a total performance index, although building such indices is quite common. Indices are justified by arguments emphasizing the advantage of reduced complexity and, thus, facilitated communication of results to addressees, not the least allowing for catchy statements.
However, building indices is associated with several methodological drawbacks (see, e.g., [57]). Aggregation implicates and requires standardization of units and weighting, a step that is often done implicitly, controversially, and often lacks transparency. In fact, this step virtually cannot be taken by science alone because it depends to a great extent on ethical values and societal preferences. Moreover, aggregation leads to a broad loss of information included in single indicators (at least if only the index is communicated). Thus, the support of decision makers would be strongly limited since particular problem areas are not visible anymore and measures can't be designed on target. Getting back to single indicators' performance data is then necessary anyway, in order to identify acting priorities. Finally, interdependencies between single indicators can t be considered, although they are highly relevant for designing suitable problem solution strategies, by allowing us, for instance, to foresee, avoid, or handle unintended side-effects and trade-offs resulting from particular measures. Therefore, aggregated scenario assessment numbers are not considered as the preferred methodological approach in the present case, whereas they can be used complementary to a differentiated assessment picture. Such a combined approach better allows for balancing between legitimate demands of simplicity or communicability, and representation of complexity.
Differentiated scenario pictures can then be taken as a basis for scenario comparisons. Exemplary results are shown in Section 3.2. A comparative perspective is essential to provide orientation for decisions within the broad range of possible futures. According results should provide information such as "Scenario A performs better than scenarios B, C, D . . . with respect to indicators 1, 2, 3, . . . and worse with respect to indicators 4, 5, 6, . . . because of the facts x, y, and z". Comparative information of this type allows for suitable decision support, in particular by revealing problem hot spots, key triggering factors, and by increasing awareness with respect to such issues. This better allows for a necessarily differentiated and targeted design of measures. Comparative scenario assessments can, thus, be used as an early warning tool. Nevertheless, ultimate decisions are subject to weightings by decision makers, which of the scenario performance patterns should be preferred. This obviously depends on values, preferences, or the degree to which indicator performances can in fact be influenced by political or societal decisions, or are assumed as being influenceable.

Strength-Weaknesses-Analysis for an Exemplary Scenario
In this section, we discuss selected indicator results for the INERTIA scenario (and for the year 2050). INERTIA was chosen because it shows interesting results (in terms of individual strengths and weaknesses) in both indicator groups.

Exemplary Results for Model-Based Indicators for the INERTIA Scenario
In the following paragraphs, selected results for model-based indicators for the INER-TIA scenario and for the target year 2050 will be presented and explained referring to the driving factors and relevant variants from the context scenario. Furthermore, strengths and weaknesses of this individual scenario with respect to the indicator targets are identified. A cross-scenario comparison of model-based indicators follows below in Section 3.2. Although indicator estimates are available on an annual basis for the entire simulation period, those data are not discussed here. The reason for this static analysis is the fact that the set of non-model-based indicator values presented below can only be defined for a single target year, since the CIB approach applied here informs the model only for the target year. Thus, a dynamic analysis of scenario performance on an annual basis for both indicator types is not possible. Figure 2 shows the results for the model-based and the non-model-based normalized indicators for the scenario INERTIA. An overview of the absolute scenario results for the model-based indicators for all scenarios can be found in the Supplement (Table S4).  Figure 2 illustrates that the scenario INERTIA has its individual strengths in particular on the efficiency side: INERTIA results exceed the defined targets for the indicator "final energy demand in the residential sector per capita". Those scenario results are closely linked to context descriptors: The INERTIA scenario results in an ambitious renovation rate and depth of private buildings, which are translated by the model to low final energy consumption for space heat and-as the energy demand for space heat makes up a large portion of the energy demand in the residential sector-a low per capita residential energy demand. Moreover, the high efficiency improvements in the service sector seen in the INERTIA context directly result in a high added value in this sector in relation to its energy demand, exceeding the targets defined by [39,40].
Retrieving all descriptor impacts that promote the efficiency developments in the INERTIA scenario reveals that the unprecedented increase of fuel prices in this scenario as well as the preferred use of regulatory measures by the government force actors in the household and the commercial sector to implement serious efficiency measures. The impulse of high fuel prices is particularly strong in the INERTIA scenario, where only limited fuel substitutions are implemented. Policy makers, on the other hand, tend towards regulatory measures in this scenario because the urgency of reducing fuel consumption (caused by high fuel prices, climate protection goals, and a series of global conflicts) heavily contrasts in this scenario with the actors' reluctance to take proactive action on their own accord.
Additionally, to the three indicators discussed above, the reduction of total suspended particle (TSP) emissions (slightly) also exceeds the targeted emission reduction (−26% to 45.6 kt) by a few percent. In the model, the main source for TSP emissions are diesel engines for passenger and freight transport. The main driver for the reduction of TSP emissions is a decreasing road passenger transport (due to strongly decreasing population), an only moderately increasing freight transport (due to low GDP increase), a moderate increase in engine efficiency, and a moderately increasing share of new vehicle concepts (BEVs, PHEVs, FCEVs) with no or only low TSP emissions. However, although INERTIA exceeds the targets with respect to TSP emissions, the emission reduction in INERTIA is the lowest among all other scenarios that were analyzed.
Low population in INERTIA is mainly caused by disintegration tendencies in the European Union and a global tendency to establish regional "fortresses", both developments cutting off Germany from migration. In addition, birth rates are low in this scenario as a consequence of a growing low-income class, job insecurity, and materialistic attitudes in the society. This leads, together with the unfavorable international context, to a relatively low GDP, and relatively low passenger and freight transport volumes.
On the other hand, the particular weaknesses of the INERTIA scenario are connected with the development of the indicators "area under cultivation of energy crops" and "import share". They do not even take the desired direction with respect to targets: In INERTIA, the area for energy crops increases by almost 40% between 2015 and 2050 (to 3.1 Mio. ha), whereas a reduction of 27% is targeted (1.6 Mio ha). The target for the reduction of the import share is 27 percentage points between 2015 and 2050 (from 70% to 43%), whereas the imports in INERTIA even increase (to 79%) until 2050. The reason behind the increase in energy crops area is the increasing share of biofuels, which, in turn, is the consequence of a very high oil price in combination with low ecological awareness in the INERTIA scenario. The high and increasing import shares in INERTIA are mainly a consequence of a low deployment of renewable energies and innovative drive technologies in the transport sector (and the resulting relatively high demand for imported oil, gas, and hard coal). Note, however, that the absolute imports (in PJ/a) decrease in INERTIA due to a combination of efficiency efforts, population decline, weak economic development, and the-albeit small-deployment of renewable energies as described by the context factors in INERTIA.
The development of all other indicators in INERTIA is at least taking the desired direction, however, they fail to meet the targets more or less considerably. In this group of indicators, "Cadmium Emissions" perform best and reach 90% of the targeted reduction. The reduction of these emissions is mainly driven by a reduced primary energy demand of fuels potentially containing Cadmium (e.g., coal), which, in turn, are driven by similar developments such as the (absolute) energy imports (see above). Note, that although cadmium emission reduction in the INERTIA scenario almost reaches the target, the reduction is significantly higher in all other five selected scenarios.
Energy-related CO 2 emissions are reduced by only 19% (2015-2050) in INERTIA (to 582 Mt CO 2 in 2050). The scenario thus significantly fails to meet the GHG emission reduction target of −73% (for the same period). This is due to the lack of expansion of renewable energies in both the heat (930 PJ) and the electricity sector (287 TWh), as described by the INERTIA context. Furthermore, innovative drive technologies only play a niche role in the transport sector (19% of the transport service in individual passenger traffic is provided by BEVs, PHEVs, and FCEVs). The counteracting effects of relatively high efficiency improvements in the residential and service sectors, a population decline, and a weak economy are not strong enough to reduce GHG emissions to the targeted level.
In addition, the indicator "renewable share in gross final energy consumption ("RES share in GFEC") significantly fails to meet the 2050 targets in this scenario (33% instead of 60%). Although gross final energy demand decreases (for reasons similar to the reduction of imports), the weak expansion of renewable energies in both the heat and power sectors is also the main reason for this failure. The weak expansion of renewables in the power sector-as described by the INERTIA context-can also be seen in the indicator "RES capacity" (118 GW instead of targeted 169 GW), the low deployment of new propulsion technologies in the indicator "No. of electric vehicles" (11.7 million vehicles instead of the targeted 22 million vehicles). Both developments are almost immediately determined by the context factors.
The deeper explanation for the poor achievements in terms of RES expansion and structural change in mobility structures lies in the overall spirit of the INERTIA scenario. It describes a society absorbed by severe international developments (EU disintegration and global fragmentation), undermining Germany's previous economic and political "business model". This may evoke vigorous adaptation actions and, in the end, self-assertion in a changed world. However, this kind of stress may also end in adaptation failure and partly paralyzed societies, and this is the case INERTIA explores. Economic downturn and social resistance against giving up familiar structures in times of uncertainty and taking burdens for collective long-term precaution measures limit the ability of the government to pursue ambitious RES expansion goals and finally lead, by stress and weakness, to a disheartened and structurally conservative society. What remains are actor's responses to economic pressure caused by high fuel prices leading to efficiency measures and RES expansion to a limited extent.
It must be emphasized here that only the comparison of individual strengths and weaknesses of different scenarios (as in Section 3.2) allows for seeing the whole picture. For example, although INERTIA seems to have individual strengths in the indicator TSP emissions, it is still the scenario with the highest TSP emissions among the scenarios analyzed here. As a consequence, one might assume that the determined target is not very ambitious. Thus, the good performance of INERTIA with respect to TSP (and also Cd) emissions must be put into perspective. In the context of the other scenarios, individual strengths (or weaknesses) of one particular scenario may have to be seen in a different light.

Exemplary Results for Non-Model-Based Indicators for the INERTIA Scenario
This section aims at demonstrating how the approach introduced here can be used to assess the sustainability performance of scenarios with respect to indicators which cannot be quantified by models and, therefore, require semi-quantitative analysis.
Based on the impact assessment tables (exemplary shown in the following Table 3 for the indicator "Acceptance") we can build aggregated impact scores for each indicator characterizing if the promoting impacts exerted by a scenario overweigh the hindering impacts or if the opposite is the case. Table 3. List of descriptor impacts on the indicator "Acceptance" for the scenario INERTIA.   Adding the set of impacts on an indicator to a single impact sum does not do complete justice to the wealth of quality aspects of the impacts, some of them not being additive in nature. Thus, the impact sum should be rather seen as a rough measure of the balance between promoting and hindering impacts. However, it is useful to guide the attention to critical aspects of a scenario and opens the opportunity to perform an impact analysis for a large number of indicators and scenarios. Calculating the impact sums for the complete set of six test indicators and the complete set of context scenarios leads to results shown in the following Figure 3, and for the whole picture, together with the model-based indicators, see Figure 2. A first issue of interpreting the results is the algebraic sign of the impact sum. It indicates whether the weighted majority of impacts presses the indicator towards the target, or away from it. Three indicators show positive impact sums in Figure 3 (Energy expenditures, Energy Research, and Share of HH producing RES). However, indicator "Energy expenditures" has an inverse target direction, meaning that a positive impact sum means an undesired net-tendency of increasing energy expenditures. Considering this, we can conclude that four indicators (Energy expenditures, Acceptance, Internationalization, and Cooperatives) receive a net push into the undesired direction and improvements with respect to these indicators cannot be expected in the INERTIA scenario.
INERTIA's most positive impact sum in the group of non-model-based indicators (+9) is achieved for indicator "Share of HH producing RES". Yet, this cannot be seen as a specific quality of the INERTIA scenario, because the value range for this indicator reveals that the scenario analysis forecasts a general tendency in favor of this indicator, and scenario INERTIA simply shares this general tendency, and even does this to a relatively poor amount, far below the median value of all scenarios.
All in all, the performance of INERTIA is poor for all non-model-based indicators, compared to the 75%-percentile values (goal values) and even to the median values. The scenario's weakest point in terms of absolute and relative performance is the indicator "Internalization", for which INERTIA marks the absolute minimum of all 4.869 context scenarios. Its best performance in relative terms is provided by the indicator "Energy Research", where the impact sum comes halfway close to the median value, at least.
For more detailed analyses, the impact scores can be disassembled, revealing the components of the sum. This is particularly useful for poorly performing indicators because the disassembly may give hints about problem causes and possible interventions. Figure 4 shows the disassembly of the indicator "Cooperatives" for the scenario INERTIA. It shows that the impact score −5 results from a net balance of two promoting impacts of total value 3 and seven hindering impacts of total value 8. An example of an intervention to increase the impact sum and the prospects of cooperatives would be to avoid decreasing policy stability. However, an advanced assessment of interventions should go deeper and include the analysis of its systemic effects, e.g., its potential to modify the development of other descriptors, with additional consequences on the impact sum. Though, this kind of systemic intervention assessment requires to go back to the CIB scenario construction process. MINT education (education with focus on mathematics, informatics, natural sciences, and technology) promotes a deeper understanding of technologies in general, including RE-technologies. This creates motivation to participate in energy cooperatives. However, the effect is limited because access barriers mean that only part of the population benefits from this form of education. High fossil energy prices have a stronger promoting effect because they affect the whole population and make participation in energy cooperatives more profitable.
On the other hand, there is a substantial number of hindering impacts. A strongly decreasing population, for instance, means fewer people who could be motivated to create an energy cooperative. Decreasing policy stability increases uncertainties of legal conditions for energy cooperatives and particularly discourages small investors without resources to wait out uncertainties.
These and further justifications for the impact assessments were collected during the expert panels.
In summary, the typical output of a single scenario analysis based on semi-quantitative data in our approach is a strength-weakness assessment as shown in Table 4.

Integrative Scenario Performance Analysis
Section 3.1.1 discussed the performance of scenario INERTIA with respect to the model-based indicators. Section 3.1.2 did the same for the non-model-based indicators. However, the core surplus of our approach is to analyze the scenario performance not separated for both types, but integrated. This means our approach, and in particular Figure 3, provides the base to extend the strength/weakness-analysis of Table 4 to the combined set of model-based and non-model-based indicators, and no further distinction between both indicator types in the procedure of the analysis is necessary. However, there is one difference to Table 4: For model-based indicators, we were not able to identify relative strength and weakness in our demonstration analysis, because this would require a complete model analysis of all 4.869 context scenarios as a base for comparison, which was beyond our resource limits.

Strengths
Analyzing Figure 2  Though the majority of weak performances is connected to non-model-based indicators, it is the group of model-based indicators that accounts for the worst case: Area crops. Besides the broad picture of poor performance of the non-model-based indicators, however, the many model-based indicators that perform mediocre must also be seen as a critical characteristic of INERTIA: Positive performance values, but on a low level and far from targets. In the end, it is also the large scale of mediocre indicators that excludes INERTIA from being a credible sustainability vision.

Comparison of Scenario Performance Based on Selected Indicators
Whereas Section 3.1 focuses on assessing the strength and weakness of a single scenario, this section aims at discussing how a set of scenarios can be compared based on their performance profiles. This could be done using Multi-Criteria Analysis [57]. However, this usually requires assigning weights to the indicators. Instead, this section aims at demonstrating what can be done without presupposing a hierarchy between the indicators. Rather, the ultimate goal is to work out the "personality" of the scenarios by comparing their strengths and weaknesses with respect to the various sustainability criteria. Figure 5 shows the indicator performances for all 22 indicators and all scenarios, using our common normalization concept.

Identifying Pareto-Optimal Scenarios
A possible approach for distinguishing between poor and well performing scenarios is to identify the Pareto-optimal scenarios within the scenario set. Pareto-optimality of scenario X would mean that there exists no other scenario in the set that are better than X for at least one indicator and at least equivalent to X for all other indicators [58]. All scenarios failing the Pareto-criterion can be considered to be inferior and less desirable from an objective viewpoint, even if there is no consent about the relative importance of the indicators.
However, our calculations show that all scenarios of our set are Pareto-optimal (each of them having at least one advantage when compared to another scenario of the set). We conjecture that this will frequently be the case in small and medium scenario sets, and, therefore, argue that additional analysis techniques are required for comparing the performance profiles of a scenario set. We describe a possible approach in the next section.

Indicator Profile Archetypes
The set of performance values of all scenarios with respect to a single indicator forms the indicator profile. The shape of the profile can be discussed in general terms, e.g., it may present itself as rather even in performance, or there may exist positive or negative outliers, or there may be two clusters of good and poor performers. We tried to classify the profile patterns we found in our example and identified four "archetypes", discussed in Figure 6. In our view, archetypes can be useful to compare indicators within and across assessment exercises, to develop analysis questions uncovering the specific messages behind each profile archetype, and, by this, contributing to a standardized proceeding for discussing indicator profiles. Having worked with a limited demonstration exercise, we do not expect that our list of archetypes is complete. Rather, we interpret our effort as a starting point.
Two indicators (Share of HH producing RES, RES share in GFEC) do not match well with any of these types and are classified as "inconclusive intermediate cases". They are, therefore, not included in the further discussion.

Proposing a "Typification Project"
The archetypes shown above were derived from the data of our demonstration case. Other cases may share some of our types, but require the addition of further types. Each type should be defined by an archetype, a description, a conclusive name, and a set of analysis questions that is tailored to the peculiarities of the type. A comprehensive catalogue of archetypes cannot be developed by a single application case (not to speak of a demonstration case such as ours). But it may mature if applied, step-by-step improved, and completed by a larger number of method applications.

Discussion of Type I indicator results
As mentioned above, no further analysis is needed for this type.

Discussion of Type II indicator results
Focus of this analysis is identifying and analyzing the "laggards" for each indicator belonging to this type. As can be seen in Table 5, the laggards are: INERTIA and COPING WITH PRESSURE. Only the scenarios INERTIA and COPING WITH PRESSURE take the role of laggards, and this is a first item of 'personality' for the two scenarios. In most cases, it is INERTIA that falls behind. This is easily understandable considering the storyline of this scenario ( Table 1). The INERTIA society is unable to adequately respond to the transformation challenges. Among others, this results in high PEC and GHG emissions, and in low RES investments, implying low fossil fuel substitutions.
The reasons behind the extraordinarily poor indicator value for the final energy consumption per floor area (meaning low insulation quality) in COPING WITH PRESSURE are less obvious. The scenario's failure with respect to this indicator goes back directly to the context scenario data (supplement, Table S1). We can see that COPING WITH PRESSURE is the only scenario of the set that includes a poor building renovation rate (1%/y instead of 2%/y for all other scenarios). This particularity results from the combined influence of (a) the strength of the market-and-growth paradigm on the international stage in this scenario (implying aversion against international environmental protection actions which might have stimulated also local renovation progress), (b) the very strong expansion of renewable heat production (which acts as a strategic alternative to renovation), and (c) a strong trend towards tabloidization in the media discourse (amplifying concerns and narratives about negative side effects of high-level insulation). Together, these negative influences compensate for the renovation-promoting factors embedded in the COPING WITH PRESSURE scenario.

Discussion of Type III Indicator Results:
The central topic of the analysis of Type III indicators are goal conflicts and trade-offs. Trade-offs express themselves when scenario A is a strong performer and scenario B is a poor performer for indicator X, whereas the roles reverse for another indicator Y. Such configurations add further to the 'personalities' of the scenarios because, through this, preferences for indicators translate into preferences for scenarios.
Role reversals were identified for many configurations. One example is given in Figure 7 for the indicators 'Area crops' and 'No. electric vehicles' and the scenarios TARGET and TARGET-CENTRALIZED. Striving for the scenario TARGET instead of the scenario TARGET-CENTRALIZED implies a considerable improvement for indicator "No. of electric vehicles", but at the cost of dramatic losses for the indicator "Area crops".
These results contradict at first intuition: One would expect that higher shares of electric vehicles reduce the demand for fuels and thus for biofuels (similar shares of biofuels assumed), which finally reduce the area for energy crops. However, the situation is more complex: The lower share of electric vehicles in TARGET-CENTRALISED is accompanied by higher efficiency gains of vehicles with internal combustion engines and lower shares of biofuels in Otto and diesel fuels. The latter is a consequence of society's general tendency to have an aversion to new technologies and fuels and a negative attitude towards the energy transition and climate protection in general.
These considerations illustrate that there is no goal conflict per se between the number of electric vehicles and the area for energy crops, but it is the societal constellation in TARGET and TARGET-CENTRALISED that causes this unexpected result. Interpreting trade-offs as a manifestation of goal conflicts between indicator targets would require that a larger number of scenarios reproducing the same trade-off pattern is available. This presupposes a broader set of context scenarios than used in this methodological demonstration.
A general tendency of synergies or conflicts between indicators can be uncovered by calculating the correlations between the indicator performances, using the data shown in Figure 8. Again, the results would gain reliability when calculated on the basis of a larger context scenario set. Most negative correlations in Figure 8 are associated with indicator "Area crops" (ArC). We can conclude that this indicator is involved in a particular amount of trade-offs between scenarios.
The reasons for this general trend are difficult to analyze in detail, as the relationships between indicators are complex (see the argumentation above). However, as long as biofuels are seen as an option to reduce fossil fuel-based CO 2 emissions in the transport sector, a trend towards sustainability can be accompanied with a trend towards higher cultivation area for energy crops (in particular if biofuel imports are limited-also due to sustainability concerns).
All described conclusions about this type are provisional. They need confirmation by applying the analysis to a larger scenario set better representing the space of possibilities than our small set. However, our main goal is demonstrating a method rather than elaborating conclusive assessments.

Type IV analysis:
This analysis focuses on searching for common causes in the high performing resp. low performing subgroup of scenarios. Identifying such common causes might uncover key decisions controlling success or failure for a specific indicator.
As an example, for indicator "Energy research expenditure", the scenarios VALUE CHANGE and COPING WITH PRESSURE build the high-performance subgroup, whereas TARGET, TARGET-CENTRALIZED, INERTIA, and MARKET belong to the low-performance subgroup. The reasons behind the performance gap between the subgroups can be found in the impact assessment table, exemplarily shown for indicator "acceptance" in Table  2, containing the expert judgments on the influences on the indicator "Energy research". The table reveals that the scenarios VALUE CHANGE and COPING WITH PRESSURE share three common features (listed below), each of them scoring on the "Energy research" scale, while the scenarios of the low-performance subgroup do not share these features. This conveys a discriminating advantage for the high-performance subgroup. These key scorings are connected to: 1.
strong expansion of renewables in the electricity sector (requiring a high level of research); 2.
transition towards decentralized electricity generation and storage (requiring a crucial change in the system architecture, which has to be prepared by research); and 3. positive citizen's attitude towards the energy transition (lowering the political hurdles for spending high amounts of tax money in energy research).
The examples described above show the broad variety of insights about the strong and weak points of a single scenario and the comparative strengths and weaknesses of a set of scenarios that can be gained without leaning on indicator weights and despite the fact that sustainability assessments have to include model-based and non-model-based information. Although we found our approach well suited to meet these challenges, we nonetheless identified several limitations that are discussed in the next section.

Discussion-Lessons Learned
In the authors' view, the integrative approach to the sustainability assessment of scenarios presented here can contribute to an improved energy policy decision-making compared to previous practice. There are three elements that make up its added value: • an approach to analyze and assess not only commonly applied techno-economic, model-based indicators, but also non-model-based social and socio-technical indicators previously missing in scenario assessment studies, • the use of social context scenarios as a tool to suitably address social and sociotechnical aspects in scenario design, and as a source of information to explain and justify assessments, • an approach to standardize and cluster evaluation results in a compatible way, allowing for the comparability of different types of indicators, indicator values, and assessment procedures, and, thus, for systematic comparison of scenario performances.
Without doubts, this approach is complex. It consists of several steps, each including more or less complex procedures, and data and result uncertainties as well. This complexity may contradict somewhat a key goal of scenario analysis, which is to reduce the broad range of complexities and uncertainties, or at least making them more manageable, in order to be useful in supporting decision-making. This points to the classical dilemma between the need of methodological complexity of analytical tools, which particularly arises in sustainability assessment contexts or in socio-technical fields such as the energy system, and practical operability requests, being important for reasons of analytical feasibility, but particularly in decision support contexts.
But how to deal with this dilemma? First of all, it should be noted that combining sustainability and simplicity is, to a certain extent, an oxymoron. Complex problems call for tools and solutions with a context-adopted "optimal" degree of complexity. "Simple" tools and solutions address such problems insufficiently and fail to fulfill expectations. Moreover, concerns about reduced usefulness of too complex tools can be reduced by interpreting and communicating complex results, or results produced with complex tools, in a way that facilitates understandability and comprehensibility-a particular task of tool users that obviously needs improvement. Finally, in order to avoid a misleading sense of precision of results, since they in fact reflect various assumptions, simplifications, and uncertainties, transparency about how results have been produced is essential. In particular in its non-model-based part, our approach allows for transparency, e.g., with respect to the broad range of possible factors influencing sustainability performances, and the expert estimations of their impact direction and intensity as well.
Basically, the "optimal" degree of complexity of a methodology or tool strongly depends on the question to be addressed and the particular contexts to be considered, and should, therefore, be decided carefully and based on sound and transparent criteria. Methodologies such as the one presented here cannot, and should not be applied to every case. For certain assessment tasks, more simple approaches may be more appropriate, whereas in cases requiring a more detailed breakdown of cause-impact-circumstances, we consider our approach to be superior. It may, thus, provide a complementing tool for the existing tool box.
With regard to larger resource requirements for implementing such a complex tool, decision makers or project funding agencies have to decide to which extent and under which conditions they would provide for these resources. However, such fundamental concerns definitely transcend the setting of this paper.
Beyond this general reflection of the approach, methodological challenges, also deficits and resulting requests, in its individual steps, are discussed in this section, as a basis to derive proposals for further development of the methodology in the concluding Section 5.
First of all, it should be noted that the CIB methodology has a very important function for the evaluation. On the one hand, it provides, together with the energy system model, socio-technical energy scenarios that can be used to also estimate the impact of societal developments on energy-economic dynamics and on sustainability indicators. This would not be possible with classical techno-economic scenarios because they do not contain the relevant societal "drivers" that influence model-based and non-model-based sustainability indicators. On the other hand, by generating consistent scenarios, the CIB provides, in addition to pure scenario descriptions, explanatory contexts that are important for the interpretation of determined future indicator values. Since the context scenarios themselves "only" represent an "input" for the description of the methodology, the strengths and weaknesses of the CIB methodology in the generation of scenarios will not be discussed in detail here. In this respect, reference can be made to [10], for example.
The following discussion is structured along the 9 steps of the methodology (see Figure 1).

Step 1 (Scenario Selection):
The focus of this article is on the description and discussion of an innovative methodology for the sustainability assessment of scenarios. The evaluation results presented have, therefore, only an illustrative function, they do not serve to derive "real" conclusions or recommendations for action. For this reason, the process of selecting the evaluated scenarios, which was also carried out for illustrative purposes only, is not explained or discussed in detail here as well. In the case of an analysis aiming to use results for example for consulting purposes, this would, however, be indispensable.

Step 2 (Indicator Selection):
A selection of indicators is always necessary, regardless of the object of evaluation, in view of the immense quantity of conceivable possibilities. It is always normative, i.e., characterized by values and moral concepts with regard to the object of consideration as well as the underlying sustainability model, and is therefore always subject to criticism. Transparent disclosure of the selection process, including the involvement of stakeholders where appropriate, is therefore essential. Since the indicators used here also serve primarily to illustrate the methodology, their justification, analogous to step 1, is not necessary in this case (for more details, see [37]). However, this would be different in the case of an analysis with a goal of consulting. It is clear that with the inclusion of indicators, that better reflect the socio-technical character of the energy system and in many cases require analyses that are not model-based, the methodological challenge in estimating and evaluating future indicator values will increase.

Steps 3 + 4: (Detailed Scenario Modelling and Quantification of Model-Based Indicators)
The assessment of model-based indicator values is prone to uncertainties stemming from different sources and affecting the different indicator types in different ways.
The context scenarios are the starting point of the analysis. The differences between the context scenarios represent uncertainties about the future development of societal contexts. However, this source of uncertainty is reflected by the differences between the scenarios (energy balances, impacts, etc.) and is not discussed here.
A second source of uncertainty stems from the fact that-within a plausible scopedifferent quantitative values for the scenario variants are conceivable (see Section 2.3), which immediately translate into uncertainties in the main scenario drivers and boundary conditions, and indirectly into uncertainties of calculating the model-based indicators. The resulting uncertainties in the indicators cannot be quantified as long as the CIB variants are not characterized by a distribution of possible values within one variant.
Some indicators (such as efficiency gains) depend more or less directly on the context scenario descriptors. This means that indicators, such as the final energy demand in the residential sector per capita or the final energy productivity of the industry, depend closely on context assumptions regarding efficiency developments in the industry sector or the renovation rate assumed in the model. For the quantification of other indicators, such as the primary energy demand, the total capacity for renewable power generation, and the final energy demand in the transport sector, more assumptions have to be made within the energy system model, and intermediate calculation steps are necessary. Each of these assumptions and calculation steps entails new uncertainty, which-again-is hard to quantify.
Finally, for environmental impact indicators (emissions of greenhouse gases (GHG), acid forming gases (AFG), cadmium (Cd), and total suspended particulate matter (TSP)), three main sources of uncertainty have to be addressed (additional to the sources of uncertainty discussed above): First, in the model, all fuels are represented only by a few fuel categories (such as "lignite" or "natural gas"). Similarly, technologies are characterized by broadly simplified reference technologies. Environmental impact indicators are calculated from fueland technology-specific emission factors for each reference. However, in reality, these emission factors might depend significantly on the specifics of each fuel (e.g., the specific type of lignite) as well on the specific plant type. Thus, the coarse representation of fuels and technologies in the model might lead to systematic biases in the assessment of environmental impacts.
Second, the literature source for the estimates of the emission factors for TSP, Cd, and AFG used here is partly based on different definitions of sectors and fuels than the MESAP model, which introduces additional uncertainties when matching emission factors to the model technologies.
And third, the analysis here assumes constant emission factors until 2050. Except for the plant efficiency, it thus neglects the effects of future technology developments (e.g., more efficient filters, flue gas cleaning systems, ...), environmental regulations (such as emission limit values), etc. This might lead to an overestimation of those emissions in the target year 2050.
Possible errors due to the first and second source of uncertainty are more relevant if strengths and weaknesses of individual scenarios are identified, in distinction from comparative scenario analyses. In the first instance, it may thus be assumed that the differences between the scenarios are only slightly affected by those sources of error. However, as errors in emission factors might differ between fuels or technologies, systematic errors depend on the energy and technology mix. Thus, the actual error might change over time in the scenarios.
In contrast, errors in environmental impacts due to the third source may differ more between the scenarios. This is due to the fact that it is appropriate to assume that different societal contexts (e.g., different attitudes towards sustainability) might affect the development of environmental regulations differently in the different scenarios.
Moreover, the indicator area for energy crops can only be determined with some uncertainty, as, e.g., the yield per area depends strongly on the plant variety, the soil quality, the quantity of fertilizer used, etc.
As this discussion shows, the uncertainties in the model-based indicators cannot be addressed in a quantitative manner. However, it is clear that the uncertainty of some indicators (which are closely related to context descriptors) is comparably small, whereas the uncertainty in particular of the environmental impacts is comparably large. This has to be kept in mind when drawing conclusions from differences between the scenarios with respect to the sustainability indicators. Being transparent in this respect as much as possible would at least provide important information to users of assessment results and allow for debates about what would happen if assumptions change.

Step 5 (Determining Future Values for Non-Model-Based Indicators):
This innovative core element of the approach has four particular aspects to be discussed: First, the context scenarios (or the context descriptors on which they are based) and their characteristics play an important role in the justification of future indicator values.
Secondly, the method of expert-and/or stakeholder-based estimation of future values for the non-model-based indicators implies a collection of knowledge, which of course depends on the number and expertise of the participants and will always be selective and incomplete, ultimately. Furthermore, the assessments of participants depend not only on the information provided to them, but also on their interpretation of descriptor characteristics or their impact on indicators, which has an impact on the assessment itself. However, this is not specific to the approach presented here, but a general phenomenon that any kind of expert/stakeholder involvement must deal with. It is, therefore, important to avoid arbitrariness and ensure traceability through the transparent disclosure of constellations and processes. Major deviations between assessments should be minimized or justified by using methods such as group delphis, which require an exchange between the participants with the highest disagreement.
Thirdly, it should be noted that, due to the lack of a benchmark for the evaluation of descriptor-indicator relationships on a +3/−3 scale, it cannot be guaranteed that the same values always represent an equally strong effect of a descriptor on the indicator. The fact that there are different numbers of impacts between the respective indicators and the descriptors, i.e., indicators are determined to different degrees by the context, can also influence the results. Therefore, the summation of the individual influence values to indicator sums should rather be interpreted as a rough measure. Even more important is the identification of supporting and hindering factors ( Figure 4) in order to gain clues for measures to improve indicator values.
Fourthly, the procedure also includes judgment uncertainties to the extent that insufficient distinction may be made between direct effects of descriptors on indicators that are only to be considered, and indirect effects not to be considered.

Step 6 (Identification of Goals for Model-Based Indicators):
Target values are necessary to benchmark indicator values. The determination of these targets is unproblematic for most of the model-based indicators considered here, since official, politically set, or at least generally accepted goals exist. Where this is not the case, targets should be formulated based on other sources, e.g., the procedure in comparable countries, the state of scientific and societal debate on the according topic, or conclusions by analogy to other thematically related indicators and their targets. Disclosure of sources used and justification of derivation steps are essential, in any case.

Step 7 (Determining Goals for Non-Model-Based Indicators):
The target values for the non-model-based indicators are considered here, which were determined in physical units, such as tons or Euros, in [39] were not used, because it was considered too difficult to translate these targets into indicator values on the +3/−3 scale, in order to determine the distances-to-targets in these units. Instead, the potential range of future indicator values, measured as a percentile, was used as a target orientation in relation to all consistent scenarios (see Section 2.5). Thus, the definition of the percentile is of crucial importance. This is a normative step, which can always be criticized as arbitrary, it therefore requires justification. The simple idea behind the 75% value used in the example here was to be ambitious, but not too strict. However, it is also true for "traditional" critical values that they are often based to a considerable extent on normative elements and negotiation processes.
Such an approach, which refers to the entire range of possible consistent scenarios, only makes sense if a larger number of explorative context scenarios is available and analyzed. This would not be possible with the common practice of analyzing a small number of scenarios. This underlines the relevance of the CIB method-or any other method for developing context scenarios that allows us to create a large number of sufficiently different and consistent scenario variants. In this context, the percentile reference for the target definition also has an impact on the procedure for the scenario evaluation and the interpretation of results.

Step 8 (Standardization in the Distance-to-Target Procedure):
In order to ensure the comparability of evaluation results of the two indicator types, normalization is necessary. There are different approaches for this, which generate different results that have to be interpreted always in the light of the chosen approach. In the procedure presented here, the distances to the defined goals (for model indicators to the absolute values, for non-model indicators to the percentile values) are normalized to percentage numbers by relating the achieved change of the indicator value to the desired change. Alternatively, the respective absolute values could also be put into relation, or other standardization methods could be used. Hence, the applicability of the approach presented here does not depend on the selected normalization method. However, the advantage of the method used here is that it provides additional information compared to the normalization with absolute values: It also shows if the indicator is moving in the right direction or if it is deteriorating compared to today.
Step 9 (Evaluation): The approach presented here provides an answer to the (research) question striving for a methodology for evaluating energy scenarios, which allows us to take into account the socio-technical characteristics of the energy system in a more appropriate way compared to the approaches used so far. Based on the development and application of a correspondingly extended set of criteria, scenarios are analyzed with regard to their strengths and weaknesses. Thus, in principle, targeted recommendations for action can be given, at which points future "worlds" promoting or inhibiting factors for sustainable development may exist that would make interventions necessary.
With regard to the evaluation step and the interpretation of evaluation results, the following aspects should be noted:

1.
With the focus on a differentiated strength-weakness analysis of scenarios, the approach differs from common MCDA methodologies, which aim at a ranking of alternatives. Such a ranking requires an explicit or implicit weighting of criteria or indicators, which, in the authors' view, should not be carried out by scientific experts, but rather by political decision-makers and social groups. Therefore, with the method presented here, no scenario ranking is aimed at first, but providing a differentiated basis for a possible subsequent ranking process.

2.
There are several points to be noted regarding the possibility of a structured evaluation of indicator results, which goes beyond or complements a strength-weakness analysis based on individual indicators, as indicated in Section 3.2: • An allocation of indicators to predefined archetypes, which refer to the structure of the indicator value distribution and the goal achievement of the indicators across the 6 selected scenarios, can be first of all meaningful in labor-economic terms. If complete typologies and valid analysis programs for each archetype are available, a larger number of indicators could be processed, compared, and discussed for more scenarios and a state-of-the-art how indicator performances of a scenario set should be compared can emerge.

•
The consideration of such value and goal achievement profiles of indicators rather makes it possible to recognize structural similarities of results between indicators across all regarded scenarios, even if the different scenarios exhibit different distributions of indicator values in detail.

•
The degree of goal achievement is the main reference point for the profile analysis of indicators. Thus, the profiles of indicators that have been worked out do not represent fundamental "characteristics" of indicators, but depend on the target values set for them. The type assignment of indicators can change if targets are defined differently. For example, in the case of more challenging target values, indicators that were assigned as "Nobrainer" in the described example could change into a different, worse performing archetypes. Changes could also occur with a different scenario selection regarding type or number. In general, the more scenarios are considered, the more robust the typification approach becomes.

•
The orientation of target values for non-model-based indicators on the performance of all scenarios and the normalization based on this have consequences for a plausible interpretation of the archetype evaluation. Therefore, there is no information available about the good or bad performance of scenarios in absolute terms, but only "better-worse statements" are possible. For example, in a scenario set in which all scenarios perform more or less well with respect to an indicator, some scenarios would still be evaluated as relatively bad compared to the better ones. This compulsion, which is implicit in the normalization process to identify both relatively good and relatively bad scenarios, regardless of their absolute performance, also puts the interpretability of the archetypes into perspective: For example, no representative set of scenarios can then be composed only of scenarios that can be evaluated as very good in relation to an indicator, as it would correspond to archetype I ("Nobrainer"). Thus, for methodological reasons, this type can only make meaningful statements for model-based indicators with their externally set target values, whereas its occurrence with non-model-based indicators would only be an indication of a non-representative scenario selection.
The identification and analysis of possible conflicts of goals between indicators or between the achievement of corresponding goals is essential to develop effective measures for improving indicator performances, and to avoid that improvements in one indicator leading to deterioration in others. However, this analytical procedure has rarely been carried out systematically in (scenario) studies so far. A profile analysis as outlined here can facilitate or support the identification and justification of such conflicts of goals.
A fundamental conflict between two indicator goals exists if there are no or very few scenarios in the entire scenario set (not only in a selection of scenarios) that serve both goals, whereas the vast majority of scenarios serve only one of the two goals well at best. It is desirable to base the analysis on the entire scenario set to ensure that such an indicator "behavior" is not randomly based on the selection of scenarios, but has a more profound reason. One example could be the fact that many (of today's) renewable energy technologies are more resource-intensive in their production than conventional ones and that a conversion to renewable energies inevitably leads to a higher consumption of certain resources. If such a behavior occurs only within a scenario selection, one could only speak of trade-offs between indicators or between target values within the limited decision space of the particular scenario selection.
One reason for such a conflict of goals can be that the individual sustainability evaluations are already conflictual, i.e., the performance of an indicator in practically all scenarios is systematically evaluated positively, whereas other indicators are evaluated negatively, or vice versa. In this case, there can be no scenarios that serve both goals. Such a "primary" conflict of goals can be distinguished from a "secondary" one, which would exist if there could be "favorable" scenarios for which both goals can be achieved, but which are rejected in the CIB as inconsistent due to internal contradictions.

Conclusions
The critical reflection of the individual steps of the approach presented here gives reason to think about possibilities to improve and further develop it, in order to increase the added value of the methodology. Some points are listed below: • A selection of scenarios to be evaluated out of the total number of consistent scenarios, developed by a CIB analysis or comparable approaches, is at least required from a labor-economic point of view, if, as is usually the case, project resources are limited. Nevertheless, for the non-model-based indicators in the example outlined here, all consistent scenarios could be evaluated using suitable algorithms. Ideally, such an overall view should also be carried out for the model-based indicators. For this purpose, suitable procedures should have to be developed. Their concrete design, actual development and application will depend, on the one hand, on the number of context scenarios, the models used and the associated computational effort, as well as the effort required to translate information from the context scenarios into model logics; on the other hand, it will depend on the willingness to provide resources for this purpose, weighing up expected benefits in terms of better (more robust) evaluation results and required efforts.

•
In addition, possibilities should be explored for which models it would be possible to integrate indicators into the model logics that have been non-model-based ones previously, and the cost-benefit-relation of such model enhancements should be examined.

•
Against the background of the necessary efforts outlined for both model-based and non-model-based indicators, for reasons of analytical practicability as well as communicability of results to addressees, a reduction of the set of 45 indicators used here as a starting point should also be considered. Here, it is important not to lose relevant topics and to involve experts and stakeholders in potential prioritizations of indicators. One option to solve this challenge could be to determine a system of core indicators mandatory for any application case, and supplementary indicators that can be added on a case-by-case basis. • In cases where a determination of clear and comprehensible absolute targets for model-based indicators is difficult or impossible, the procedure applied to the nonmodel-based indicators, i.e., referring to "qualitative" or relative targets based on frequency distributions of indicator values in all scenarios analyzed (step 7), could be an option to be considered as well. The key assessment criterion would then be the progress towards what is achievable within the scope of "future worlds" described in the set of scenarios.

•
As an alternative to the percentile approach used to determine targets for the nonmodel-based indicators, an attempt could be made to translate the future indicator values classified on the +3/−3 scale into a scale with physical units (tons, Euros, etc.). This would allow for a direct measuring of the distances to targets expressed in these units. It would require a well-founded decision on the detailed methodological procedure, in any case additional expert knowledge (and thus additional resources), and at the same time raise new methodological questions.

•
In addition, the choice of the normalization method for values of model-based and nonmodel-based indicators for comparability purposes could be discussed, with respect to alternative options to be tested and, in general, referring to how this decision depends on the particular question or objective of the analysis.

•
Regarding the intention to classify scenario performances in a structured way, alternatives to the archetype-based typification applied here could be discussed and tested.
• A sustainability-related ranking of scenarios, which has not been carried out in the example described in this article for reasons mentioned above, could be envisaged as a complementary step to a strengths-weaknesses analysis. For this purpose, prioritizations of indicators would be necessary. This could be based on a CIB analysis, e.g., revealing indicators with most and strongest interdependencies to other indicators (and thus being most relevant), but also on stakeholder assessments or sensitivity analyses with different indicator prioritizations, in order to show the effects of varying indicator priorities on evaluation results. • Another approach would be to conduct sensitivity analyses for different percentile variants, in order to reduce the need to justify the use of only one variant. In principle, such sensitivities would also have to be carried out for those model-based indicators for which no "official" target values exist and own proposals have been made.

•
Beyond the results presented in this article, the robustness of analyses and results, being an important criterion for the quality of a methodology, could be analyzed in two respects: • The number of descriptors influencing an indicator and the strength of influences is relevant for its influenceability and, thus, for recommendations to decision makers. With regard to the performance of an indicator, robustness increases the number of influencing factors, because the weight of the individual factor or its non-consideration decreases accordingly; • from the perspective of an intervening actor, however, robustness in the sense of intervention effectiveness increases if the number of drivers influencing the indicator decreases.
A correspondingly differentiated handling of the concept of robustness would, therefore, be necessary, and could contribute to an improved estimation of results and of possibilities to intervene politically.
Finally, it should be noted that efforts to address legitimate concerns about a limited usefulness of complex tools, such as the one presented here, should be intensified. Transparency with regard to assumptions, simplifications, and uncertainties, as well as efforts to interpret and communicate results as understandable and comprehensible as possible, are essential for this. However, to date these are often only buzzwords which need more reflection about how to establish what kinds of procedures to implement them suitably.
With these concluding considerations, important aspects for the further development of the presented scenario evaluation methodology and the interpretation of its results were pointed out. The authors would like to understand this as a starting point for further debates about possibilities to improve the approach presented here and its added value compared to previous approaches to support energy transition processes.
Supplementary Materials: The following are available online at https://www.mdpi.com/1996-107 3/14/6/1580/s1, Table S1: Tableau of the six selected scenarios and their combinations of descriptor variants, Table S2: Details of assessment methods and target values for the 16 model-based indicators, Figure S1: Frequency distribution of the values of all six non-model-based indicators among all consistent scenarios, Table S3: Definition of target-values for non-model-based indicators, Table  S4: Overview of model-based indicator results, Figure S2: Localizations of the selected scenarios on Pregger et al.'s "landscape of societies", Table S5: Descriptor impacts on the indicator "Acceptance of Renewable Energies in the neighborhood" for different variants of the descriptors used in the scenarios.