Evaluation of Sustainable Development in European Union Countries

: Despite the great importance of sustainable development for a country, it is possible to say, having reviewed the literature widely, that this research is the ﬁrst to use a Multi-Criteria Decision Method (MCDM) to analyze the sustainability of EU countries, considering di ﬀ erent dimensions and weighting the criteria with the help of a group of experts. This paper therefore sets out a Multi-Criteria Model for analyzing the development of sustainability in EU countries (and Norway and Iceland). This required prior ﬁltering and analysis of the data from the Eurostat database. The model was built with the multi-criteria Analytic Hierarchy Process (AHP) technique. Four experts in sustainability participated in the weighting process. The results of the AHP model are identiﬁed by areas of sustainability, with the highest value found in Norway, and the rest are found around it forming rings of sustainability, where sustainability decreases the further a country is from Norway. This research could be used to identify the strengths and weaknesses of each country with regard to sustainable development, and by analyzing the measures taken by Norway and other countries with very high sustainability, by continuous improvement processes, reach similar levels of sustainable development.


Introduction
In recent decades, one of the greatest problems facing humanity has been the sustainability of the planet, which is compromised by such things as the indiscriminate use of resources, polluting emissions, etc. It is thus necessary to develop tools that enable critical avenues for achieving better progress towards sustainable development to be identified, as well as the use of techniques for reducing or eliminating the impact of these activities.
The definition of Sustainable Development (SD) has evolved over time. It was originally defined, in 1987, as "to seek equity, public awareness and cohesion and participation, to meet the needs and aspirations of the present without compromising the ability to meet those of the future" [1]. SD is defined by Nebel et al. [2] as a process, measurable by environmental, political and social indicators, which tends to improve people's quality of life and productivity, which is based on appropriate measures of preservation of ecological balance, environmental protection, and use of natural resources, so as not to compromise the meeting of the needs of future generations. Thus, SD should seek an equilibrium between preserving the ecosystem and meeting human needs in three basic areas: environmental, social and economic sustainability [3].
Sustainability assessment is used to assess the level of implementation of sustainability measures [4]. In this way, it is possible to determine the SD of a country by analyzing its sustainability indicators. These are commonly categorized as: economic, environmental, institutional and social [5]. These categories can also interact with each other, giving mixed sustainability indicators. The results

1.
Development of a multi-criteria model for evaluating the SD of EU countries using AHP. This model assesses countries of the EU (and Norway and Iceland), using a total of 39 indicators or subcriteria organized into the following criteria: socioeconomic development, consumption and sustainable production, social inclusion, public health, demographic change, climate change and energy, sustainable transport, global alliance for development, and good policies. This evaluation allows a complete classification of the alternatives (countries) with a single value to be obtained, using the period 2005-2015.

2.
Although the original premise was to use the greatest possible amount of data with respect to alternatives, subcriteria and the years assessed, the following limits were recognized: scarcity of information, imprecision of valuations, erroneous data, or the need for MCDM techniques to guarantee the independence of the subcriteria (indicators) included in the model. Thus, in order to get an accurate model, which is non-redundant and has as much information as possible, a three-stage filter was designed: visual inspection to identify the periods and countries where there is a greater concentration of data, a filter using the Pearson correlation coefficient to identify redundant indicators, and a final stage in which the final structure is reviewed and the least significant elements discarded. 3.
The model, unlike most of the literature, weights the indicators, since, despite recognition of the difficulty of obtaining objective weightings, not all the indicators contribute equally to SD. This model uses judgements from four experts in different areas of SD to get weightings for the criteria and subcriteria. 4.
The model considers the record of each country using a weighted moving average of the valuations of each country, since this technique allows the results of the full chronological evolution to be used, giving greater importance to those that belong to the most recent periods. The weightings used in each period were calculated as the first eleven terms of a smooth first-order exponential, and so the smoothing constant α was determined so as to find the weighting for each year. 5.
The model described can be easily applied to other countries, as long as the necessary data are available.
Appl. Sci. 2019, 9,4880 3 of 39 6. The research could be used to find each country's strengths and weaknesses in the field of SD, and to analyze the strategies and policies produced by the countries at the top of the ranking, in order to use continuous improvement processes to reach similar levels of sustainability.
This paper is structured as follows. Section 2 contains a literature review of contributions in the field of SD assessment in countries. Section 3 describes the analysis and treatment of the data, including the visual filtering, the filtering by Pearson correlation coefficient, and the final filtering of the available data. Section 4 sets out the model for assessing sustainable development in EU countries, including the criteria and subcriteria used, the hierarchy, and the weighting process. Next, Section 5 gives the results, including a description of how the valuations of countries with incomplete data were carried out. Section 6 sets out the sensitivity analysis, and then the Conclusions and References come at the end.

Literature Review
There are a number of contributions in the literature that analyze SD, wholly or partly, in different fields, such as, for example, Higher Education Institutes in India [9], the liquefied natural gas industry [10], the concrete industry [11], the construction industry in China [12], the textile industry in Brazil [13], urban sustainability in the Yangtze River Delta [14], etc. There are also authors and organizations around the world who have developed different systems or methodologies for evaluating SD that include its three dimensions. In 2015, the United Nations established the 2030 Agenda for Sustainable Development with 17 Sustainable Development Goals (SDG) which were intended to be considered as policies in all countries [15]. The 17 goals, comprising 169 targets and 244 indicators, have been implemented from 2015 to 2018 [16], becoming an international agreement that allows SDGs and their targets to be transformed into a tool for management and assistance to countries for formulating and implementing strategies that facilitate the distribution of resources and provide a framework for evaluating the evolution of SD [17]. Based on the Human Development Index (HDI) created by the United Nations, which classifies countries by the quality of life of their citizens, the Human Sustainable Development Index (HSDI) designed by Togtokh and Gaffney [18] adds the per capita carbon emissions to gross national income, life expectancy at birth, and years of schooling. An application of HSDI to the Beijing-Tianjin-Hebei urban agglomeration region from 2000 to 2015 is given in Chen et al. [19].
Lim et al. [20], Sachs et al. [21][22][23][24], Clark and Kavanagh [25,26], Fullman et al. [27], Campagnolo et al. [28], and Huan et al. [29] produce different frameworks, with different numbers of indicators, to assess all or a part of the SDGs formulated by the United Nations. It should be pointed out that the European Union (EU) has produced its own indicator set to monitor progress towards the SDG established by the United Nations in 2015. The set comprises 100 indicators, of which six are associated with each Sustainable Development Goal, with the exception of Oceans and Global partnership, which have five indicators [30]. The assessment method analyzes whether an indicator have moved towards or away from a sustainable development objective, as well as the speed of movement; the method, therefore, is a long-term trend analysis [31]. The results of the analysis by indicators for the 28 countries of the EU can be seen in [32]. However, there is no procedure for aggregating the information and obtaining a sustainability value by country and SDG, and an overall value which integrates SDG by country.
The latest sustainability ranking update by countries provided by the investment specialist focused on sustainable investing, RobecoSAM [33] shows that, from November 2018, Sweden is the most sustainable country in terms of investment, followed by Denmark and Switzerland. The dimensions of human, environmental and economic wellbeing are considered in the Sustainable Society Index [34] for the calculation of a country ranking. Public data are used; each of the 21 indicators considered is assigned the same weighting, and the totals are weighted by population size. Among the results of the most recent study, carried out in 2016 [35], is the fact that renewable energy and energy savings are, together with organic farming, the indicators with the highest scores. Nevertheless, from the results of alternative, taking into account environmental impacts, economic benefits, and decision-maker preferences. Thus, it is seen that the methods of analysis and applied techniques vary between studies, although in general complex decision-making techniques are not used, and, especially, those include uncertainty of incomplete information analysis. This would improve the reliability of weightings assigned to the criteria, since, in general, the indicators or criteria used are not associated with a specific weighting depending on the group of expert decision makers.
No contributions have therefore been found in the literature analyzing SD in EU countries with objective analytic techniques, such as Multi-Criteria Decision Making (MCDM). This research uses AHP to build an assessment model and, unlike most of the literature, the criteria and subcriteria are weighted using judgements given by four experts in sustainability. Although these weightings might undergo some changes if other experts, or a greater number of experts, were used, the weighting process should be borne in mind as not all criteria and subcriteria make the same contribution to SD. This research also takes into account the SD record of each country, via the moving weighted average, which has not been seen before in the literature. The utility of AHP in real-world problems is widely recognized, and it also provides greater objectivity to the solutions, and so it is specifically applicable to the evaluation of country SD.

Analysis and Handling of Data
The Eurostat database catalogues sustainability indicators in ten categories (as set out in Figure  1), of which each comprises a number of subcriteria. As the modelling of the problem requires a division by nodes of criteria and subcriteria (where each node has a maximum of nine elements), the Eurostat classification will be used as a basis for structuring the model. The sustainability indicators set out by Eurostat are as follows: 1. Socioeconomic development: the capacity of countries to generate wealth with the aim of maintaining or improving social and economic well-being in the community. 2. Sustainable consumption and production: the use of products and services which allow the basic needs of the population to be satisfied, improving their quality of life without compromising the needs of future generations. This is done by efficient use of resources, reduction of toxic emissions, improvement in access to health services, etc. 3. Social inclusion: a dynamic and multi-factorial process which guarantees the social integration of all members of a community, ensuring their social well-being. In this way, each citizen will be able to make good use of their individual abilities and benefit from the opportunities to be found in the environment. 4. Demographic change: this means transformations in the population make-up of a country, and is determined by studying factors such as aging, birth rates, death rates, and rates of immigration and emigration. 5. Public health: elements that determine the population level health of a country (life expectancy, unsatisfied medical needs, etc.). Countries promote public health by promoting healthy lifestyles, funding scientific research, and through education and awareness campaigns. 6. Climate change and energy: the study of the factors which contribute to global warming (emission of polluting agents) and analysis of energy production as a function of the existing technology. The sustainability indicators set out by Eurostat are as follows: 1. Socioeconomic development: the capacity of countries to generate wealth with the aim of maintaining or improving social and economic well-being in the community.

2.
Sustainable consumption and production: the use of products and services which allow the basic needs of the population to be satisfied, improving their quality of life without compromising the needs of future generations. This is done by efficient use of resources, reduction of toxic emissions, improvement in access to health services, etc.

3.
Social inclusion: a dynamic and multi-factorial process which guarantees the social integration of all members of a community, ensuring their social well-being. In this way, each citizen will be able to make good use of their individual abilities and benefit from the opportunities to be found in the environment.

4.
Demographic change: this means transformations in the population make-up of a country, and is determined by studying factors such as aging, birth rates, death rates, and rates of immigration and emigration. 5.
Public health: elements that determine the population level health of a country (life expectancy, unsatisfied medical needs, etc.). Countries promote public health by promoting healthy lifestyles, funding scientific research, and through education and awareness campaigns. 6.
Climate change and energy: the study of the factors which contribute to global warming (emission of polluting agents) and analysis of energy production as a function of the existing technology.

7.
Sustainable transport: actions that help to reduce pollution produced by private vehicles; this is achieved by promoting sustainable mobility, the use of electric and hybrid vehicles, social awareness, etc. 8.
Natural resources: natural goods and services that have not been altered by humanity, catalogued as flora, fauna, and quality of land. Natural resources are important for the population as they contribute both to well-being and to development (food, raw materials, and minerals). 9.
Global alliance for development: initiatives for progress and globalization of countries through subsidies, aid, and support in international exchange (imports and exports). 10. Good policies: measures taken by governments and public administrations (taxes) for sustainable development in countries.
The initial inspection of the database reveals a total of 161 subcriteria, which are number and codified so that the first three letters identify the higher criterion it belongs to, and the last three are specific to each subcriterion. Table 1 shows the number of subcriteria that each criterion originally contained, with the identifying acronym for each one. There is also a sweep of the data to identify the countries (alternatives of the model) and the periods (years to be analyzed) for which the Eurostat database has valuations, totaling 35 countries over the period 1990-2016.
Nevertheless, the initial premise of covering as much data as possible with regard to alternatives, subcriteria, and development over time is limited in the following ways: scarcity of information, imprecise valuations, inaccurate data, or the characteristic of multi-variate analysis itself, which specifies that the criteria should be mutually independent. For this reason, to obtain a model that is true, non-redundant, and has the most possible information, it is necessary to apply a three-stage filter. These stages consist of the first stage, visual inspection to identify the periods and countries with the greatest concentration of data, the second, which consists of a filter using the Pearson correlation coefficient to determine the redundant elements, and a final stage which revises the structure and discards the least significant elements.

Visual Filtering of the Data
The visual inspection allows periods, alternatives, or subcriteria to be detected that have a systematic lack of data. The most common reasons for discarding elements in the visual analysis of the data are: • Subcriteria: lack of information for most countries and periods.
• Alternatives: difficulty in obtaining measurements, or incomplete information. In some cases, there are alternatives that have sufficient data in certain subcriteria, but in others the information is scarce, and thus is discarded. • Periods: the starting date for gathering data on the subcriteria is very variable, and some elements are measured only in even or odd years. In addition, in some criteria, the information is not updated to the year 2016.
This leads to a total of 67 subcriteria being identified whose information is insufficient, that is, 41.61% of all the subcriteria from the database. Table 1 summarizes the number of subcriteria discarded as a function of the higher-level criteria, underlining the references to natural resources, the global alliance for development and good policies, as, in all of these, over 70% of the subcriteria were eliminated.
The visual inspection has identified four countries, Albania, Macedonia, Serbia, and Turkey, for which data are missing for most of the subcriteria, and they are therefore discarded, causing a reduction from 35 to 31 alternatives (11.43%). Furthermore, periods can be found in which sustainability can be evaluated, and a range was set from 2005 to 2015 (eleven years), and so this first filter has led to a reduction from 27 periods to 11 (a reduction of 59.26%).

Filtering with the Pearson Correlation Coefficient
Correlation is a statistical procedure used to determine whether there is a linear relation between the two pairs of elements. It is defined by the Pearson correlation coefficient r. To calculate this coefficient (between two sets x and y), covariances are calculated and divided by the product of the standard deviation as shown in Equation (1): In order for the AHP model to be solid and consistent, the subcriteria to be compared must be independent, so that the information is not redundant.
The range of results given by this coefficient is between the values [−1, 1] and represents the level of correlation between a pair of elements, showing the level of intensity (as shown in Table 2). Likewise, the correlation intensities are classified as positive when the elements are directly correlated, and negative when they are inversely correlated. Table 2. Types of correlation as a function of r.

Intensity of Correlation Range of Values
Nevertheless, the fact that the Pearson coefficient finds a linear relation between the subcriteria does not imply a real correlation between them, as they may have different behaviors over time for different reasons. To reduce the importance of this possibility, the analysis of the subcriteria is performed on groups of elements that are conceptually interrelated. Figure 2 shows, as an example, the criterion Climate change and energy (CCE), in which the subcriteria are compared in three groups: climate change, energy, and another which brings together the subcriteria Greenhouse gas emissions, taking as a baseline the year 1990 (CCE-GGE), and Primary energy consumption (CCE-PEC). A similar study is performed for each group of subcriteria, and once the redundant ones are discarded, a second comparison is made between those that remain, in order to find the subcriteria that will ultimately be used in the model.  The Pearson correlation requires the prior preparation of the data because, in the data available, each alternative has a valuation per period, and in order to apply this technique, the subcriteria must be described as a single time evolution. This reduction is done by applying the arithmetic mean of all the valuations for each year (it is assumed that all the countries will have similar behavior) which gives a single time evolution for each subcriterion. Figure 3 shows this reduction for the subcriterion intensity of greenhouse gas emissions in electrical consumption (CCE-GEC), where the grey lines correspond to the individual evolution of each country, and the line marked with asterisks shows the arithmetic mean.   The Pearson correlation requires the prior preparation of the data because, in the data available, each alternative has a valuation per period, and in order to apply this technique, the subcriteria must be described as a single time evolution. This reduction is done by applying the arithmetic mean of all the valuations for each year (it is assumed that all the countries will have similar behavior) which gives a single time evolution for each subcriterion. Figure 3 shows this reduction for the subcriterion intensity of greenhouse gas emissions in electrical consumption (CCE-GEC), where the grey lines correspond to the individual evolution of each country, and the line marked with asterisks shows the arithmetic mean. The Pearson correlation requires the prior preparation of the data because, in the data available, each alternative has a valuation per period, and in order to apply this technique, the subcriteria must be described as a single time evolution. This reduction is done by applying the arithmetic mean of all the valuations for each year (it is assumed that all the countries will have similar behavior) which gives a single time evolution for each subcriterion. Figure 3 shows this reduction for the subcriterion intensity of greenhouse gas emissions in electrical consumption (CCE-GEC), where the grey lines correspond to the individual evolution of each country, and the line marked with asterisks shows the arithmetic mean.  The Pearson correlation coefficient compares elements in pairs using, as a starting point, the matrix (m × n), where m represents the number of periods, and n the subcriteria to be confirmed, returning a symmetrical matrix of n × n with the different comparisons. Thus, the results in the comparison matrix allow correlated pairs of subcriteria, and therefore those that should be discarded, to be identified. For a subcriterion to be discarded, it needs to show a strong positive correlation (r > 0.7) with at least one other subcriterion, proving that they have a similar time evolution.
The two sweeps that were carried out are shown as an example, firstly by groups of subcriteria, and then comparing all those that pass the first filter, for the criterion Climate change and energy. Initially, the criterion Climate change and energy have ten subcriteria. The results of applying the second filter are shown in Figure 4. The Pearson correlation coefficient compares elements in pairs using, as a starting point, the matrix ( ), where represents the number of periods, and the subcriteria to be confirmed, returning a symmetrical matrix of with the different comparisons. Thus, the results in the comparison matrix allow correlated pairs of subcriteria, and therefore those that should be discarded, to be identified. For a subcriterion to be discarded, it needs to show a strong positive correlation ( 0,7) with at least one other subcriterion, proving that they have a similar time evolution.
The two sweeps that were carried out are shown as an example, firstly by groups of subcriteria, and then comparing all those that pass the first filter, for the criterion Climate change and energy. Initially, the criterion Climate change and energy have ten subcriteria. The results of applying the second filter are shown in Figure 4. After the second filter, six subcriteria remain. There is then a second filtering, as shown in Figure  5. The final result is that the criterion Climate change and energy is made up of four subcriteria, which is a reduction of 60% with respect to the initial data (see Figure 6). This same process of study was applied to all the criteria. It should be mentioned that, in some cases, there are high correlations between more than two subcriteria, which means that in the filtering, only one of them will be valid, and the rest will be discarded. Table 3 shows the number of subcriteria left after application of the double filter via the Pearson correlation coefficient, leaving a total of 41 subcriteria. This means that the second reduction leads to a decrease of 56.38% in the subcriteria with respect to those obtained with the first filter. It should be noted that the criteria with the greatest reduction were those that were not reduced during the visual filtering, with decreases ranging from 57.14% to 70%. After the second filter, six subcriteria remain. There is then a second filtering, as shown in Figure 5. The final result is that the criterion Climate change and energy is made up of four subcriteria, which is a reduction of 60% with respect to the initial data (see Figure 6). This same process of study was applied to all the criteria. It should be mentioned that, in some cases, there are high correlations between more than two subcriteria, which means that in the filtering, only one of them will be valid, and the rest will be discarded.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 9 of 40 The Pearson correlation coefficient compares elements in pairs using, as a starting point, the matrix ( ), where represents the number of periods, and the subcriteria to be confirmed, returning a symmetrical matrix of with the different comparisons. Thus, the results in the comparison matrix allow correlated pairs of subcriteria, and therefore those that should be discarded, to be identified. For a subcriterion to be discarded, it needs to show a strong positive correlation ( 0,7) with at least one other subcriterion, proving that they have a similar time evolution.
The two sweeps that were carried out are shown as an example, firstly by groups of subcriteria, and then comparing all those that pass the first filter, for the criterion Climate change and energy. Initially, the criterion Climate change and energy have ten subcriteria. The results of applying the second filter are shown in Figure 4. After the second filter, six subcriteria remain. There is then a second filtering, as shown in Figure  5. The final result is that the criterion Climate change and energy is made up of four subcriteria, which is a reduction of 60% with respect to the initial data (see Figure 6). This same process of study was applied to all the criteria. It should be mentioned that, in some cases, there are high correlations between more than two subcriteria, which means that in the filtering, only one of them will be valid, and the rest will be discarded. Table 3 shows the number of subcriteria left after application of the double filter via the Pearson correlation coefficient, leaving a total of 41 subcriteria. This means that the second reduction leads to a decrease of 56.38% in the subcriteria with respect to those obtained with the first filter. It should be noted that the criteria with the greatest reduction were those that were not reduced during the visual filtering, with decreases ranging from 57.14% to 70%.

Final Filtering
Finally, an exploration is made of the subcriteria to identify those whose information is insufficiently meaningful as compared to the rest. This is done via an analysis of importance among the 41 subcriteria left after the Pearson filtering, concluding that the two which belong to the criterion Natural resources are not sufficiently relevant and so should be discarded. Thus, the final number of subcriteria to be used in the model is 39, as shown in Table 3.

Model for Assessment of Sustainable Development in EU Countries
AHP [47] uses hierarchies to produce a structured model and to interrelate criteria, subcriteria, alternatives, and the goal. AHP is based on four fundamental axioms from graph theory, which provides a solid mathematical grounding [48].
AHP has the following advantages: it has a solid mathematical base, it analyzes the problem in parts, it allows the joint interaction of qualitative and quantitative criteria, using a common scale, it is possible to have a number of decision centers or one decision group, it allows the logical consistency of the judgements given by the decision centers to be checked, it is possible to perform a sensitivity analysis, and it is easy to implement and allows additional mathematical optimization methods to be applied [49].
According to Saaty [47], the method has three basic stages: modelling the problem, valuing the elements, and prioritizing and synthesis. Nevertheless, to guarantee robustness of the models, a sensitivity analysis must be added, to analyze the answers with respect to different scenarios by modifying the model parameters.
AHP is widely used in the literature, and so is well known; it is therefore recommended to read [47,50,51] for a detailed description of the methodology. This paper sets out the detailed application of the model. However, it should be remarked that the priorities were calculated using the right main  Table 3 shows the number of subcriteria left after application of the double filter via the Pearson correlation coefficient, leaving a total of 41 subcriteria. This means that the second reduction leads to a decrease of 56.38% in the subcriteria with respect to those obtained with the first filter. It should be noted that the criteria with the greatest reduction were those that were not reduced during the visual filtering, with decreases ranging from 57.14% to 70%.

Final Filtering
Finally, an exploration is made of the subcriteria to identify those whose information is insufficiently meaningful as compared to the rest. This is done via an analysis of importance among the 41 subcriteria left after the Pearson filtering, concluding that the two which belong to the criterion Natural resources are not sufficiently relevant and so should be discarded. Thus, the final number of subcriteria to be used in the model is 39, as shown in Table 3.

Model for Assessment of Sustainable Development in EU Countries
AHP [47] uses hierarchies to produce a structured model and to interrelate criteria, subcriteria, alternatives, and the goal. AHP is based on four fundamental axioms from graph theory, which provides a solid mathematical grounding [48].
AHP has the following advantages: it has a solid mathematical base, it analyzes the problem in parts, it allows the joint interaction of qualitative and quantitative criteria, using a common scale, it is possible to have a number of decision centers or one decision group, it allows the logical consistency of the judgements given by the decision centers to be checked, it is possible to perform a sensitivity analysis, and it is easy to implement and allows additional mathematical optimization methods to be applied [49].
According to Saaty [47], the method has three basic stages: modelling the problem, valuing the elements, and prioritizing and synthesis. Nevertheless, to guarantee robustness of the models, a sensitivity analysis must be added, to analyze the answers with respect to different scenarios by modifying the model parameters.
AHP is widely used in the literature, and so is well known; it is therefore recommended to read [47,50,51] for a detailed description of the methodology. This paper sets out the detailed application of the model. However, it should be remarked that the priorities were calculated using the right main eigenvector method [47], instead of the method of geometric mean by rows [52] or goal programming [53]. Table 4 shows a brief description of each subcriterion which will be used in the hierarchical model. In addition, as the application of multi-criteria models requires that all valuations are on the same scale (0-10), it is necessary to establish a conversion factor to be applied to each criterion, depending on whether the relation is direct or inverse with respect to the goal of the model. Population at risk of poverty, or social inclusion: number of citizens at risk of poverty, or who live in households with low employment; that is, people whose disposable income is below the threshold for poverty risk (60% of the median nation disposable income, after including social assistance).

SIN-IPW
Index of population in work at risk of poverty: percentage of over-18's in employment whose income is below the threshold for risk of poverty (60% of the median nation disposable income, after including social assistance).

CCE-EDE
Energy dependence: energy dependence shows how much the economy of a country depends on importing to satisfy demand. This indicator is calculated as net imported energy divided by the gross consumption of the country.

CCE-PRE
Percentage of renewable energy use: proportional use of renewable energy, and thus the degree to which they are replacing energy generation by traditional technologies (fossil and nuclear fuels).

CCE-GCT
Generation with cogeneration technology: percentage of energy produced by cogeneration technology. Cogeneration is a technology which increases the energy efficiency of the power plant, using a gas turbine with heat recovery.

STR-ECT
Energy consumption in transport with respect to gross domestic product: energy cost of transport in relation to gross domestic product, including the energy consumed by all kinds of transport except for maritime and pipeline transport.

STR-PTU
Percentage of train use as a means of overland transport: proportional use of the train as a means of transport to total overland transport, where overland transport means trains, cars, buses and coaches.

STR-PCU
Percentage of car use as a means of overland transport: proportional use of the car as a means of transport to total overland transport, where overland transport means trains, cars, buses and coaches.
Inverse STR-PTR Percentage of transport by road: proportion of road transport to total internal transport (by road, rail or navigable inland waterway). Inverse

STR-VPG
Volume of passengers with respect to gross domestic product: relationship between volume of passengers on internal transport (car, bus, coach or train) and gross domestic product.
Direct STR-ECA Energy consumption in international aviation: energy costs for transport by international air travel. Inverse

STR-PFC
Mean annual price index applied to fuel for consumers: official inflation rate for fuels and lubricants for consumers in the Eurozone according to the Harmonized Index of Consumer Prices (HICP). An inspection of the subcriteria is then carried out to identify those with incomplete information in any of the alternatives. This occurs a total of 27 times and therefore the number of measurements with no valuation is 2.31% of the total.

Inverse
Once the exploration is complete, the conclusion is that no subcriterion should be discarded for this reason. Nevertheless, although the amount of missing information is not a large proportion of the total, it is interesting to perform the same inspection the other way around; that is, adding up the complete lack of data by country. Table 5 shows that Switzerland stands out above the other alternatives as, with 13 occurrences, it represents 48.4% of the occasions on which a country has no data for a particular subcriterion. Therefore, it was decided that there was insufficient information and so it was not included in the model. On the other hand, it is felt that the lack of data in the other alternatives can be handled, and thus they are not discarded, and the information gap is only 1.12% of the total. Table 5. Missing data for all periods by alternatives. Greece  1  Croatia  3  Cyprus  1  Luxembourg  1  Malta  3  Romania  1  Iceland  4  Switzerland  13  Total  27 Finally, the resulting hierarchical model comprises 30 alternatives to be compared by valuation of a total of 39 subcriteria, as shown in Figure 7. This model must be evaluated eleven times (once per period), to establish the evolution over time of sustainability in each country.

Weighting Process
The first step is to produce a questionnaire, easily filled out, to evaluate simply the relative importance of the criteria and subcriteria of the model. Each expert should provide judgements to fill out a total of ten comparison matrices (one for the criteria and nine for the groups of subcriteria).
The second stage uses the judgements of experts to establish the relative importance of each criterion with respect to the final goal. This is done by sending out the questionnaire to four specialists in sustainability from different fields, who set out the priorities by filling out the comparison matrices they contain. Once the analyst has received the surveys completed by the experts, expert judgement aggregation strategy is applied using geometric means. In this way, a measurement is obtained that, as well as including the judgements of all the experts, is not compromised by the axiom reciprocal judgements (a ij = 1/a ji ) as otherwise this equality could not be satisfied.
Subsequently, the aggregated pairwise comparison judgement matrices are solved, in order to calculate the relative weightings of the criteria and subcriteria, using the main eigenvector method. Table 6 shows the aggregated pairwise judgement matrix for the criteria and the resulting weightings. This process is carried out for all the groups of subcriteria, and Tables 7-15 show the comparison matrices after applying the geometric mean with the respective relative weightings (right), and the RC obtained, where it can be seen that RC is well below the limit for inconsistency. Table 7. Comparison matrix and weightings for Socio-economic development.  Table 8. Comparison matrix and weightings for Sustainable consumption and production.  Table 9. Comparison matrix and weightings for Social inclusion.   Table 13. Comparison matrix and weightings for Sustainable transport.  Finally, multiplying the weightings of the criteria by the individual weightings of each subcriterion gives the relative weight of these with respect to the final goal, and these valuations are shown in Table 16.

Valuations of Countries with Incomplete Information
Each subcriterion requires 330 valuations and, as previously mentioned, there are cases in which the database has no measurements for one or more of them. Thus, with the aim of mitigating this, the missing valuations must be identified, and estimates made for them.
The missing values are identified by an individual analysis of the subcriteria, and the lack of information is classified by whether the lack of data is partial or total in the affected alternatives. This distinction is made because the methodology used to obtain the estimates is different in each case (see Figure 8).
A sweep of the subcriteria was first performed, to determine which have partially incomplete values in the alternatives, and the missing data were counted for each subcriterion.
Eighteen of the 39 subcriteria are affected by partial absence of data, where Number of companies respectful of the environment (SCP-CER) has the greatest lack of information (9.09%). Therefore, although the partial absence of valuations exists in almost half the subcriteria, the number of missing data points is a fairly small percentage of the total. In addition, given that the alternatives have valuations for other periods, it is possible to do estimates for the unvalued years by using a least squares approximation. the database has no measurements for one or more of them. Thus, with the aim of mitigating this, the missing valuations must be identified, and estimates made for them.
The missing values are identified by an individual analysis of the subcriteria, and the lack of information is classified by whether the lack of data is partial or total in the affected alternatives. This distinction is made because the methodology used to obtain the estimates is different in each case (see Figure 8). A sweep of the subcriteria was first performed, to determine which have partially incomplete values in the alternatives, and the missing data were counted for each subcriterion.
Eighteen of the 39 subcriteria are affected by partial absence of data, where Number of companies respectful of the environment (SCP-CER) has the greatest lack of information (9.09%). As an example, Figure 9 shows the least squares approximation for the alternative Croatia in the subcriterion Unmet medical needs (PHE-NMI). It can be seen that the database only contains information for six periods (from 2010 to 2015), and so the values for the years 2005-2009 must be estimated. The black dots in the figure represent real data, and so applying a linear adjustment gives the regression line best adapted to them (discontinuous line). The affected periods are then substituted into the regression line, obtaining estimates for the data on evolution over time of the alternative (shown in Figure 9 with white dots).
Appl. Sci. 2019, 9, x FOR PEER REVIEW 19 of 40 Therefore, although the partial absence of valuations exists in almost half the subcriteria, the number of missing data points is a fairly small percentage of the total. In addition, given that the alternatives have valuations for other periods, it is possible to do estimates for the unvalued years by using a least squares approximation.
As an example, Figure 9 shows the least squares approximation for the alternative Croatia in the subcriterion Unmet medical needs (PHE-NMI). It can be seen that the database only contains information for six periods (from 2010 to 2015), and so the values for the years 2005-2009 must be estimated. The black dots in the figure represent real data, and so applying a linear adjustment gives the regression line best adapted to them (discontinuous line). The affected periods are then substituted into the regression line, obtaining estimates for the data on evolution over time of the alternative (shown in Figure 9 with white dots). However, there is a case where the alternative has a single valuation for its evolution over time. This is the case of Iceland for the subcriterion Generation with cogeneration technology (CCE-GTC), and so it was decided to use the same value in all the periods to be assessed.
The second step consists of estimating the valuations of the subcriteria when there is no information about them for some alternative. Given that the affected alternatives have no reference, the estimates are made by assigning the least favorable value corresponding to the other alternatives in the same period. This is because it is held that the lack of data is as bad as the worst valuation of the other countries in this subcriterion. The lack of information suggests that the item is not However, there is a case where the alternative has a single valuation for its evolution over time. This is the case of Iceland for the subcriterion Generation with cogeneration technology (CCE-GTC), and so it was decided to use the same value in all the periods to be assessed.
The second step consists of estimating the valuations of the subcriteria when there is no information about them for some alternative. Given that the affected alternatives have no reference, the estimates are made by assigning the least favorable value corresponding to the other alternatives in the same period. This is because it is held that the lack of data is as bad as the worst valuation of the other countries in this subcriterion. The lack of information suggests that the item is not controlled, which means that improvement strategies cannot be implemented.
The last step in assessing the countries in the subcriteria is to identify and modify those subcriteria that return negative values, since the transformation of scales requires all the values to be positive.
Then, the subcriteria are found that have returned a negative value, and they are classified depending on whether this does or does not make sense. This is because, in some cases, the estimates obtained by the regression line give negative values (when the real measurements are close to zero) causing inconsistencies with regard to the characteristics of some subcriteria. Table 17 shows that those subcriteria have negative values, and justifies their congruence.

DCH-ANM 112
Makes sense. As the migration rate is described, the negative values indicate more emigration than immigration, and the positive values more immigration than emigration.

PHE-UMN 1
Does not make sense. It is a parameter that indicates the percentage of people with unmet medical needs, and as such its value cannot less than zero.

CCE-EDE 19
Makes sense. Negative values of energy dependence suggest that the country imports energy to meet its needs.

GAD-ODA 5
Does not make sense. This parameter expresses the aid and loans given by the EU with respect to the gross domestic product of the country.

GAD-EDF 13
Makes sense. Negative values of funding suggest that the country receives less credit than it has to pay back in that period. Table 17 shows that three subcriteria have values that are not congruent with their definition (SCP-EMA, PHE-UMN and GAD-ODA), and so it was decided to change these values to zero. Furthermore, a transformation was applied to the remaining subcriteria (SED-SRH, DCH-ANM, CCE-EDE, and GAD-EDF) consisting of adding to all the alternatives of the same period the absolute value of the most negative value. In this way, all the assessments of the period will be positive, and any that had a negative value will be valued at zero.

Results
Given the number of countries to be analyzed, it was decided to classify the alternatives of the model depending on their geographical situation. They were categorized into four regions: Northern, Western, Eastern, and Southern Europe (see Figure 10). Appl. Sci. 2019, 9, x FOR PEER REVIEW 21 of 40 Figure 10. Differentiation of the alternatives by geographical region.
Analyzing the results, it is possible to classify the alternatives into different sustainability rankings, and so they are then categorized by their geographical area: 1. Northern European area: this area shows a higher average than the others; it can be seen that Norway is the country with the highest SD, and an intermediate group Once the scaled matrices have been found, an AHP analysis is carried out for each period, introducing the weightings and value matrices in the Expert Choice software. This tool gives its judgements by percentages through relative assessments, where the alternative of greatest sustainability is the one whose percentage is highest, and a ranking is created between them. However, this is a handicap for the analysis of the evolution over time, since it is not possible to control whether all the alternatives increase or decrease their real sustainability in the same period.
Applying the AHP methodology to all the periods gives an evolution over time of the sustainability of each of the alternatives. Figure 11 shows the resulting assessments for the countries for each period, classifying the results by the geographical area they belong to.
Analyzing the results, it is possible to classify the alternatives into different sustainability rankings, and so they are then categorized by their geographical area:

1.
Northern European area: this area shows a higher average than the others; it can be seen that Norway is the country with the highest SD, and an intermediate group with Sweden and Iceland.
In addition, the evolution of Finland is very clear, starting at the lower level and finishing at the intermediate, and of Denmark, which starts at the higher level and ends at the intermediate. The variation of the Northern countries in terms of SD in 2015 with respect to 2005 shows a mean value of −0.02, essentially due to the decline of Denmark.

2.
Western European area: three levels of sustainability can be discerned, a lower one that includes Ireland, a medium level comprising Belgium, Germany, France, Luxembourg, Austria, and Czech Republic and a higher level where the Netherlands stands out at the alternative with greatest sustainability, followed by the United Kingdom. Looking at the variation of the countries in SD in 2015 with respect to 2005, we find a mean of 0, since the great increase of the Czech Republic balances the small declines of Belgium, France, Luxemburg, the Netherlands, and Austria. Analyzing the evolution over time of these countries, it can be seen that in general there is no sustainable growth of the SD suggesting a trend in any country, as the small increases in one year are followed by similar decreases in the subsequent years and vice versa. 2007 and 2015 are the years when the greatest number of countries reduced their SD with respect to the previous year (9 and 12, respectively), and so it does not seem to be related to the economic recession which began in 2009. In fact, it was in 2008 and 2009 that the greatest number of countries increased their SD with respect to the previous year (10 and 8, respectively). It should be pointed out, therefore, that there is no greater observed concern for SD by countries in 2015 despite the increase in the importance that citizens attribute to this matter in recent years.
According to the results shown in Figure 11, it is not possible to conclude that there are great similarities in the evolution over time of these countries. However, the analysis by regions draws attention to the existence of different levels of sustainability, and so it is especially interesting to analyze all the alternatives together, to identify the situation with respect to the rest. It was decided to use a moving weighted mean of the evaluations for each country, as this technique allows the results of the full evolution over time to be used, giving greater importance to those belonging to more recent periods. The weighted mean is calculated as described in Equation (2), where the values are the assessments for each period, represent the number of periods, and their weightings: The weightings ( ) with respect to each period were calculated as the first eleven terms of the smoothed exponential (see Table 18), so the smoothing constant must be identified, in order to find the weighting for each year.

Term ( ) …
Weighting ( ) Analyzing the evolution over time of these countries, it can be seen that in general there is no sustainable growth of the SD suggesting a trend in any country, as the small increases in one year are followed by similar decreases in the subsequent years and vice versa. 2007 and 2015 are the years when the greatest number of countries reduced their SD with respect to the previous year (9 and 12, respectively), and so it does not seem to be related to the economic recession which began in 2009. In fact, it was in 2008 and 2009 that the greatest number of countries increased their SD with respect to the previous year (10 and 8, respectively). It should be pointed out, therefore, that there is no greater observed concern for SD by countries in 2015 despite the increase in the importance that citizens attribute to this matter in recent years.
According to the results shown in Figure 11, it is not possible to conclude that there are great similarities in the evolution over time of these countries. However, the analysis by regions draws attention to the existence of different levels of sustainability, and so it is especially interesting to analyze all the alternatives together, to identify the situation with respect to the rest. It was decided to use a moving weighted mean of the evaluations for each country, as this technique allows the results of the full evolution over time to be used, giving greater importance to those belonging to more recent periods. The weighted mean is calculated as described in Equation (2), where the values x i are the assessments for each period, n represent the number of periods, and w i their weightings: The weightings (w i ) with respect to each period were calculated as the first eleven terms of the smoothed exponential (see Table 18), so the smoothing constant α must be identified, in order to find the weighting for each year.
Therefore, considering the constraint that the sum of all the weightings must be unity ( n i=1 w i = 1), the equation is solved to find the value of the smoothing constant of α = 0.1822. Table 19 brings together the resulting weightings for each period. Once the weightings are determined, the moving weighted average is calculated by taking the product of these and the chronological valuations of each alternative. In this way, a single value is obtained for each country, and a ranking can be established to define the levels of sustainability and compare graphically the relative sustainability of each country on the map (see Figure 12).   Figure 12 shows the different regions of sustainability clearly, with Norway as the epicenter of greatest value, and the other levels around it. It is thus possible to distribute countries by their level of sustainability: • The results establish Norway as the reference to be followed in order for the other countries to become more sustainable.
The model could serve as a guide to any country that wants to increase its sustainability, identifying its strengths and weaknesses and pointing to the criteria that need to be improved.

Sensitivity Analysis
The sensitivity analysis allows the relative importance of the criteria to be identified, through the variation in the weightings. This was done by changing the weighting of each criterion, assigning values of ±5% and ±10% with respect to the base weighting, which means that, since the model comprises nine criteria, eighteen new studies must be performed for each period. Nevertheless, given that countries did not experience great changes in their evolution over time, the sensitivity analysis was only carried out for the most recent year (2015). The results of the sensitivity analysis for each criterion are shown in Figures 13-21 for variations of ±10%. The values for the countries are quite close, and so the small differences in the weightings of the criteria can lead to a country's classification varying its position to a higher or lower one, but quantitatively it is seen that the variations in sustainable development would be ±0.1 with variations in the weightings of the criteria of ±10%.          For modifications in the weightings of the criteria of ±5%, we see that the three leading countries, Norway, the United Kingdom, and Sweden, do not change position by varying Socio-economic development, Social inclusion, Demographic change, Public health, Sustainable transport, and Good policies; in Sustainable consumption and production, Norway and the United Kingdom keep their positions, although Luxembourg passes Sweden with an increase of 5% in this criterion. With increases of 5% in the weighting of Global alliance for development, Norway, United Kingdom, and Sweden tie for first place. Thus, Norway is still the country with the highest sustainability in all cases.
The three leading countries, Norway, the United Kingdom, and Sweden, do not change position with variations in the weightings of the criteria of ±10% in Socio-economic development, Demographic change, Public health, and Good policies. Norway, the United Kingdom, and Malta, with increases of 10% in Sustainable consumption and production, would have the same result, and only Luxembourg would surpass these countries. Norway is still the leader with a variation of −10% in this criterion. With a decrease of 10% in Climate change and energy, the United Kingdom surpasses Norway, while Sweden passes the United Kingdom for increases of 10%, with Norway still in first place. In Sustainable transport, the United Kingdom and Sweden would be equal in the classification for increases of 10%, with Norway still in first place. In Social inclusion, Iceland overtakes Sweden for third place and ties with the United Kingdom for increases of 10%, but Norway is still in first place. It should be remembered that a variation of ±10% in the weightings of the criteria leads to great variation, and, even so, Norway, the United Kingdom, and Sweden are still in first place, a few specific exceptions. Therefore, the model is robust in the face of variations in the criteria of ±5%, and with a few exceptions, also in the case of variations of ±10%.

Conclusions
Despite the large number of contributions in the literature that analyze SD in countries and cities, there are no contributions analyzing this question in EU countries with objective analytic techniques, such as MCDM techniques. This study has, therefore, developed a multi-criteria model to evaluate SD in EU countries using AHP. The model evaluates EU countries (and Norway and Iceland), using a total of 39 indicators or subcriteria organized into the following criteria: socio-economic development, sustainable consumption and production, social inclusion, public health, demographic change, climate change and energy, sustainable transport, global alliance for development, and good policies. The assessment provided a full classification of countries via a single value, over the period 2005-2015.
The model was designed by performing a visual filter of the data available from the Eurostat database, followed by a double filter using the Pearson correlation coefficient for groups of subcriteria, and a final filter depending on the significance of the available information. These filters look at: scarcity of information, imprecise valuations, inaccurate data, and the need for MCDM techniques to guarantee independence between the subcriteria.
The model uses the judgements of four experts to weight the criteria and subcriteria, since, despite the recognized difficulty of assigning a weighting to these criteria/subcriteria, not all contribute equally to SD. To guarantee the robustness of the model and to ensure that a logical variation in the weightings assigned to the criteria will not alter the classification of countries, a sensitivity analysis was carried out. For modifications in the weightings of the criteria of ±5%, we see that Norway is still the country with highest sustainability in all cases, and the second placed country, the United Kingdom, does not change its position either, although it ties for first place with Norway and Sweden with increases of 5% in the criterion Global alliance. With respect to the variations in the weightings of the criteria of ±10%, the three leading countries, Norway, the United Kingdom, and Sweden, do not change position in most of the criteria, and there are only a few exceptional changes in a few positions. In any case, it can be seen that variations in Sustainable development would be at most ±0.1 with variations in the weightings of the criteria of ±10%. The model can therefore be considered sufficiently robust and gives greater reliability to the results obtained, unlike most of the studies that do not include this sensitivity analysis. This could lead to studies that assign weightings to the criteria, since they do not all contribute to the same degree to Sustainable development; moreover, to continue to assign a similar weighting could lead to a greater error than assigning different weightings. At the same time, it is also expected that a group of experts from different fields, and, if possible, under the umbrella of the UN, issues an agreement on the possible importance of each criterion or indicator related to SDG.
Besides the evolution over time of SD in the 30 countries studies, the model returns a single value for SD, taking into account the past record of each country using a moving weighted average of the valuations for each country; this technique allows the results of the full history to be used, giving greater importance to those that come from more recent periods. The weightings used at each period were calculated as the first eleven terms of a first order exponential smoothing.
The AHP methodology, which gives relative results by percentages, has allowed the existence of different levels of sustainability to be detected across the continent. These levels are identified by areas of sustainability, where the highest value is found in Norway, and the rest form rings of sustainability around it, such that the further a country is from Norway, the more its sustainability decreases.
The model could serve to identify the strengths and weaknesses of each country in the field of sustainability, and, by analyzing the measures taken by Norway and other countries with very high sustainability, and by continuous improvement processes, they can achieve similar levels of sustainability.
In future work, the intention is to build another multi-criteria model, applying different filter techniques to those used in this study, and to use a different technique to obtain the weightings (for example the Delphi method), as well as using a greater number of experts. It is also hoped to apply other multi-criteria methodologies, such as the Measuring Attractiveness by a Categorical Based Evaluation Technique (MACBETH) approach, or the Preference Ranking Organization METHod for Enrichment of Evaluations (PROMETHEE), and fuzzy logic to compare the results with those found by this study. It would also be interesting to update the model with the new data available from the Eurostat database as new data are included.