A Multi-Criteria Reference Point Based Approach for Assessing Regional Innovation Performance in Spain

: The evaluation of regional innovation performance through composite innovation indices can serve as a valuable tool for policy-making. While discussion on the best methodology to construct composite innovation indices continues, we are interested in deepening the use of reference levels and the aggregation issue. So far, additive aggregation methods are, largely, the most widespread aggregation rule, thus allowing for full compensability among single indicators. In this paper, we present an integrated assessment methodology to evaluate regional innovation performance using the Multi-Reference Point based Weak and Strong Composite Indicator (MRP-WSCI) approach, which allows deﬁning reference levels and different degrees of compensability. As an example of application to the Regional Innovation Scoreboard, the proposed technique is developed to measure the innovation performance of Spain’s regions taking into account Spanish and European reference levels. The main features of the proposed approach are: (i) absolute or relative reference levels could be previously deﬁned by the decision maker; (ii) by establishing the reference levels, the resulting composite innovation index is an easy-to-interpret measure; and (iii) the non-compensatory strong composite indicator provides an additional layer of information for policy-making (iv) a visualization tool called Light-Diagram is proposed to track the speciﬁc strengths and weaknesses of the regions’ innovation performance.


Introduction
Innovation is considered one of the key factors behind regional growth and lies at the centre of a huge number of studies trying to identify the driving forces of innovation [1][2][3][4]. Thus, the importance of measuring innovation is widely acknowledged by policy-makers and researchers in order to identify general trends, determine performance targets and set policy priorities. In recent years, there has been an ongoing debate on how the innovation performance at different levels should be measured, and three main approaches prevail. A first group includes the use of single indicators such as the number of patents [4][5][6] or research and development expenditures [7]. The second approach proposes the use of extensive sets of indicators in order to cluster the countries or regions [8][9][10]. Finally, starting from the 1990s, some researchers introduced the use of composite innovation indices, arguing that they can provide a more comprehensive way to benchmark innovation performance for the purpose of policy-making.
A composite innovation index is a mathematical combination of a set of indicators representing the different dimensions of innovation. Like other composite indices, most criticisms relate to the fact that they provide a simple "big picture" that can lead policy-makers to draw simplistic conclusions [11]. Thus, we are interested in deepening the use of multi-criteria decision making (MCDM) approaches to construct composite innovation indices, at regional level, which would allow us to provide valuable additional information in terms of comparability and compensability. With this aim, we make use of the Regional Innovation Scoreboard (RIS) framework, which provides data for all the European regions about 17 innovation single indicators. For a complete description of the RIS framework, see Table A1 in Appendix A.
Therefore, this paper addresses two research questions that focus on the construction of a composite innovation index to measure regional innovation performance at Europe.

•
The first question examines the issue of how the decision makers could provide preferential information making use of reference levels to define performance intervals from the beginning, depending upon the scope and aim of the composite innovation index. The point of establishing the reference levels is that the resulting composite innovation index is an easy-to-interpret measure in terms of a comparability analysis of regions at national or European level.

•
The second question is: how does the compensability of individual indicators affect the overall ranking of regions when evaluating their innovation performance? The level of compensability defined when the individual indicators are aggregated has strong policy-making implications. From full compensation(weak) to no-compensation aggregation (strong) rules, we can obtain different composite innovation indices that aid the users to identify weak and strong points of each region and design policies accordingly.
To answer these research questions, we apply a novel MCDM aggregation methodology for building composite indicators, known as Multiple Reference Point based Weak and Strong Composite Indicators (MRP-WSCI), which is based on the use of reference levels and distance functions [12]. By using reference levels, it is possible to set a priori performance intervals so that the proposed composite indicator enables the decision maker to monitor the innovation performance of a region at the global or national level. By establishing different degrees of aggregation, we obtain two types of composite indicators: the Weak Composite Indicator (WCI), which allows compensability among indicators and the Strong Composite Indicator (SCI), reflecting the worst values achieved by the region. Moreover, we pay special attention to the visualisation tools of the indicators and, apart from tables and graphs with the corresponding rankings and scores, we use a new interesting way to track the specific strengths and weaknesses of regions' innovation performance, at every aggregation stage that we call Light-Diagram.
The paper is organized as follows. Section 2 develops a literature review on composite innovation indices. In Section 3, the procedure for developing the composite innovation index based on the MRP-WSCI method is described. In Section 4, we illustrate the proposed methodology making use of the RIS framework and 2019 dataset to benchmark the innovation performance of Spanish regions with respect to EU reference levels. Section 5 presents the numerical results of the MRP-WSCI application to Spain, but in this case, establishing Spanish's reference levels. Finally, we conclude the paper with remarks in Section 6.

Composite Innovation Indices: A Literature Review
In the literature, the use of composite innovation indices was first introduced at the micro-level by the seminal work of [13], who developed a composite measure of firm's innovativeness by using Factor Analysis of a whole set of innovation variables. Another example at firm level is [14], where a composite innovation index was proposed to benchmark the most innovative firms in Australia. In [15,16], a new measure of innovation was proposed to find the link between innovation and productivity in Canadian manufacturing industries. [17] define an innovative propensity index (IPI) of firms using a multi-criteria methodology based on Fuzzy AHP. Furthermore, in [18], an integrated innovativeness index is developed for benchmarking firm innovative performance by applying a recent multi-criteria methodology called Intuitionistic fuzzy-TOPSIS. The issue of evaluating the expediency of technology commercialization in R&D organizations using a MCDM approach is presented in [19].
In recent years, the use of composite indices has grown significantly at the macro-level to make cross national comparisons in several areas including innovation [20]. In the global context, the most comprehensive indicator for country innovation measurement is the Global Innovation Index (GII), which is jointly developed by INSEAD, World Intellectual Property Organization (WIPO) and Cornell University. Since 2007, the GII provides data and insights gathered from tracking innovation around the globe [21]. The GII is a composite index constructed in a multi-stage weighted average aggregation procedure starting from 80 single indicators, which are aggregated in seven pillars to provide two subindices-the Innovation Input Sub-Index and the Innovation Output Sub-Index. Finally, the overall GII score is the simple average of the Input and Output Sub-Indices. At the European scale, since the beginning of 2000s, the European Commission has promoted the use of composite indicators arguing that "by aggregating a number of different variables, composite indicators are able to summarise the big picture in relation to a complex issue with many dimensions" [22]. In the field of the measurement of innovation, the annual European Innovation Scoreboard (EIS) assesses strengths and weaknesses of EU Member States innovation systems. Furthermore, in Europe, but at a regional level, the Regional Innovation Scoreboard (RIS), which is used here as an empirical case, covers 238 regions in 23 EU Member States and it is calculated as the unweighted average of the normalized innovation indicators.
However, the application of such composite indices for measuring innovation performance and their utility in directing innovation policy has also been discussed in [1,[23][24][25][26][27][28], mainly due to the problems related to the varying statistical and mathematical methods utilised for aggregating the indicators into a composite indicator [29]. Thus, the lack of one-best-way to construct a composite innovation index has led to open a methodological debate on the theoretical framework to develop the final composite index [30,31].
Over the last decades, the use of multi-criteria decision making (MCDM) techniques when constructing composite indicators has also increased notably, as they have been viewed highly suitable in aggregating single indicators into a composite one [11,[32][33][34][35]. In [36], a review of MCDM methods to construct composite indicators is shown. As stated by these authors, one group of these techniques are distance functions based methods in which reference levels are defined by the decision maker in order to minimize the distance between an alternative and a point or points enjoying good preferential properties [37]. Within this family, the reference point method [38] was adapted by [39] to develop synthetic sustainability indicators dealing with different compensation degrees between the criteria. Lately, this methodology has been used in a variety of domains such as sustainability, university performance, regional social progress and to measure the ease of doing business [12,[40][41][42]. Among the main advantages of this type of MCDM distance based methodologies, we should mention its ease of implementation and ease of use, as the decision maker can establish a desired number of reference levels. These levels are used by the so-called achievement scalarizing function, to bring all the indicators down to a common scale (therefore, the achievement function carries out a distance-based normalization of the original data). Thus, by providing the reference levels of innovation performance, the resulting composite innovation index is an easy-to-interpret measure, as it provides the position of the region with respect to those levels, thus becoming a more informative measure. As a contribution for evaluating regional innovation performance for an European country, in our proposal, the decision maker may choose the following reference levels: (i) European reference levels for benchmarking the position of the regions with respect to the overall European patterns; (ii) National reference levels in order to obtain a regional comparative assessment of innovation performance to set national targets.
In addition, the focus of our study addresses in particular the "big debate" between linear aggregation vs non-linear aggregation when constructing a composite innovation index, since it determines the level of compensability, that is, the possibility that a large improvement with regard to one or more criteria can offset a small worsening with regard to one or more other criteria, or vice versa [43]. According to [44], a compensatory approach implies the use of linear functions such as the arithmetic mean that overlooks unbalances, namely, the disequilibrium among the sub-indicators/dimensions of the composite indicator. Moreover, as stated by [45,46], the use of linear aggregation rules implies ensuring that the preferential independence condition has been met. In contrast, non-compensatory or partial compensatory approaches involves unbalanced-adjusted functions, as for example the geometric mean. Additive aggregation methods of single indicators is by large the most widespread aggregation rule when constructing composite innovation indicators at the micro and macro-level. This implies complete substitutability among the various dimensions of the innovation, and such complete compensability is often not desirable. Looking at the regional innovation patterns in EU regions, given that the Regional Innovation Scoreboard uses additive aggregation methods, a full compensability approach is assumed. We stress at this point that the level of compensability can have strong policy implications when measuring and benchmarking regional innovation performances through composite indicators.

Research Design and Methodology for Reference Point Based Regional Innovation Composite Indicator
From the baseline framework of the Regional Innovation Scoreboard (RIS) indicators, we develop a step-wise methodological approach for constructing regional innovation composite indexes using reference levels and distance functions. Let us summarize in this section the Multiple Reference Point based Weak and Strong Composite Indicator (MRP-WSCI) methodology to build composite indicators. This description is a summary of the process described in [12] (where the reader is referred for further details), but we have decided to include it in this paper for completeness.
Let us assume that we are managing a set of J regions, for which I indicators are evaluated. Let us denote by x ij the value of indicator i for region j. The MRP-WSCI process comprises the following steps:

Parameters: Weights and Reference Levels
It will be assumed that the decision center has assigned weights µ 1 , µ 2 , . . . , µ I to the indicators, which reflect the contribution of each indicator to the final composite measure. In our case, given that the original regional innovation index uses equal weights for all the indicators, we will also do so in order to obtain comparable results.
On the other hand, the MRP-WSCI methodology assumes that the decision center is able to provide a set of n reference levels for every indicator i: q 1 i , q 2 i , . . . , q n i . In general, these levels define performance thresholds for indicator i (e.g., very poor, poor, fair, good, very good,...). Let us denote by q 0 i and q n+1 i , respectively, the minimum and maximum values that indicator i can take. As a result, we obtain the following (n + 2)-dimensional reference vector for indicator i: Although these reference levels can be defined in an absolute way by experts, in our case statistical values will be used. To be more precise two options for these levels will be considered.

•
On the one hand, we will adapt the benchmarks used by the RIS (Modest, Moderate, Strong, Leader) at an European level, as described in Section 4.1. This way, we establish the behaviour of the Spanish regions with respect to all the European ones, following these benchmarks.

•
On the other hand, we will use percentiles 25, 50 and 75 across all the Spanish regions. This way, we can see the relative position of each region also with respect to the Spanish regions only. This will let us illustrate the effect of using different reference levels in the MRP-WSCI approach.

Normalization: The Achievement Scalarizing Function
A set of n + 2 real values α 0 , α 1 , . . . , α n , α n+1 will be used, which are the same for all the I indicators. In our example, these values are set by default to 0, 1, 2, 3, and 4, and they define a common measurement scale. Therefore, as an example, if percentiles are used as reference levels, α 1 = 1 is the value in the common scale that a given region has if it achieves percentile 25 in indicator i. A so-called achievement scalarizing function is used to turn each indicator i to the scale defined by the values α t (t = 0, . . . , n + 1). Apart from normalising the indicators, this function also informs about the relative position of each region with respect to the corresponding reference levels. These functions were initially defined in [38] for general reference point procedures, and they were afterwards extended to double reference point (reservation-aspiration) methods, [47], and adapted to the calculation of composite indicators, [12,40,48]. For the case when n reference levels are used (assuming, without loss of generality that the indicator is of the-more-the-better type), the achievement scalarizing function defined in [12] takes the following form: This way, we are using a piece-wise linear function. If the region achieves, for example, values between percentiles 50 and 75 for indicator i, the corresponding achievement scalarizing function s i of indicator i takes values between α 2 = 2 and α 3 = 3.

Full Compensation: The Weak Composite Indicator
The weak composite indicator (WCI) allows full compensation among the single indicators by using an additive aggregation. First, the weights µ i are normalized to add up to 1: Second, the WCI of a given region j is built in the following way:

No Compensation: The Strong Composite Indicator
The strong composite indicator (SCI) does not allow any compensation. In fact, SCI j measures the worst performance of region j. In our case, given that equal weights are used for every indicator, SCI j is just the minimum value of all the achievement functions of region j: If there are different weights assessed to the indicators and the decision makers wish to take into account the relative importance of each indicator in the strong measure, an alternative way of building SCI j can be found in [12].

Successive Aggregations
As is often the case in most composite indicators, the RIS framework is based on a system of indicators which is structured in several levels. In this case, the 17 indicators are grouped into 9 innovation dimensions, which are afterwards grouped into 4 types, and finally the composite innovation indicator is obtained. Therefore, we need to carry out a three-stages aggregation process as shown in Figure 1.

•
First Aggregation. We obtain for each region the WCI j and SCI j of each innovation dimension. If the region is understood from the context, they will be simply denoted as W and S and they measure: -W: the average (compensatory) measure of the dimension; -S: the worst indicator of the dimension.
• Second Aggregation. Given the way W and S have been constructed, these composite indicators take values in the same scale, 0-1-2-3-4, as the original achievement scalarizing functions, and therefore, they can be used as achievement function in the following aggregations. Thus, in the second aggregation, one of these composite indicators is used as achievement functions to build the composite indicators of each type.
If the weak composite indicators are used as achievement functions, we get two composite indicators for each type: -W-W, the weak-weak composite indicator, measures the average (compensatory) performance of the type; -S-W, the strong-weak composite indicator, points out the worst dimension of the type.
Alternatively, the strong composite indicators of each innovation dimension can be used as achievement functions in the second stage. In this case, it is interesting to consider: -S-S, the strong-strong indicator, which points out the worst single indicator of each type.
• Third Aggregation. Finally, the same considerations can be made for the last aggregation stage, where we obtain global composite indicators for each region. In this case, we will consider: -W-W-W, which gives an overall compensatory measure of each region; -S-W-W, which points out the worst type of each region; -S-S-S, which indicates the worst single indicator of each region.
Therefore, the methodology provides a set of composite indicators which allow decision makers to find out possible improvement areas, in an easy way.  Figure 1. Hierarchy structure of the Regional Innovation Scoreborad (RIS) for three-stages of the aggregation process. The system of indicators is formed by four types, which are in turn subdivided into 9 dimensions.

A New Visualization Tool: The Light-Diagram
The development of visualization tools is one of the ten steps proposed by [49] for the construction of composite indicators, since the way in which the composite indicators can be visualized or presented can influence their interpretation. The Regional Innovation Scoreboard (RIS) assessment is displayed through an interactive dashboard, with tables and maps to provide a visual representation of regions' performance by indicator and year (https://interactivetool.eu/RIS/index.html). The RIS interactive tool is intended to provide an overview of the region's innovation performance from the Regional Innovation Index (RII), which is calculated as the unweighted average of the normalised scores of the 17 indicators. While this is an effective way to communicate the overarching regional innovation performance to non-experts, the MRP-WSCI approach can enrich this analysis with the information derived from the weak and strong composite indicators.
To reinforce the usefulness of the MRP-WSCI approach, we have developed a new visualization tool that we call Light-Diagram. The aim of this tool is to have all the information regarding each region available at a glance. It is possible to see the values of the achievement functions and those of every composite indicator. At the same time, the colors used allow us to easily identify the performance interval of each element. It adds value and feeds into a broader monitoring exercise the information provided by the MRP-WSCI composite indicators at different levels of the aggregation. The ultimate goal of this diagram is to provide a firmer foundation for innovation policy-making and to help ensure that the money devoted to innovation activities in the region delivers maximum returns. When applying the MRP-WSCI, after defining the reference levels and using the achievement function s i defined in Equation (2), we can identify, at the three stages of the aggregation, different innovation performance levels (like, e.g., high, medium or low), using colors (like green, orange and red). Let us see an example of a simulated Light-Diagram for a composite index with a hierarchy of two types, five dimensions and ten indicators.
As Figure 2 shows, for the region considered it can be easily detected that the third indicator I3 (colored in red) is the responsible for the low innovation performance in the first dimension at the first aggregation stage. In the second aggregation, the worst value of the dimension 1.1 is also the responsible of the bad performance of the S-S composite indicator in the first type. In the last aggregation stage, we obtain the S-S-S composite index also colored in red. The Light-Diagram tool is not only a matter of knowing the weaknesses of regional innovation performance but it also highlights its strengths in green.

Benchmarking Spanish Regional Innovation Performance According to European Reference Levels
In this section, an application of the Multiple Reference Point based Weak and Strong Composite Indicator (MRP-WSCI) methodology to a set of indicators included in the Regional Innovation Scoreboard (RIS) framework for Spanish regions in 2019 is presented, using European reference levels. The RIS covers 238 regions in 23 EU Member States and is the regional extension of the European Innovation Scoreboard.
In the RIS, the regional innovation performance is measured using the Regional Innovation Index (RII) composite indicator, which summarises the region innovative performance based on 17 indicators. These indicators are grouped into four main types: Framework conditions, Investments, Innovation activities, and Impacts. These four types are in turn sub-divided into 9 innovation dimensions (see Table A1 in Appendix A).
After, normalizing the data, the Regional Innovation Index (

Defining EU Relative Reference Levels
As a first step towards the construction of the composite indicator, the reference levels for each indicator are defined according to classification scheme used in the RIS framework in which regions are classified in four innovation performance groups:

•
Regions performing between the minimum value of the indicator, q 0 (Min), and 50% of the EU average, q 1 (50), are regarded as Modest Innovators (M * ).

•
Regions performing between q 3 (120) and the maximum value of the indicator, q 4 (Max), are regarded as Leaders (L).
In Table 1 we display in each cell the values corresponding to the relative reference levels for year 2019. It must be noticed that, while the RII uses these benchmarks at the global level, that is, once the composite measure is built, we use them at indicators level. This way, we can monitor the performance of the region with respect to the levels at each aggregation step.

Profiling Regional Innovation Performances: Results and Visualization Tools
In order to get the rankings of the Spanish regions with respect to the European reference levels, we computed the W-W-W (overall compensatory average), S-W-W (worst type) and S-S-S (worst indicator) composite indicators for 2019. Table 2  Anyway, some changes of rankings can be observed, which are due to the normalization based on reference levels and the existence of several aggregation steps in the MRP-WSCI methodology. Let us see, for example, the cases of (ES52) Comunidad Valenciana, ranked 5 in the RIS and 7 in the W-W-W, and (ES24) Aragón, ranked 7 in the RIS and 5 in the W-W-W.
In Figure 3, we can see that the differences in the values of the RIS indicators (denoted as specified in Table A1) are higher in favour of Comunidad Valenciana, which is, therefore, better ranked. But in Figure 4 (left), we can see two effects.
First, due to the reference levels used, the differences in the achievement functions are not so high. For example, in the indicator Trademark Applications (I 332 ), although the difference of absolute values is high in favour of Comunidad Valenciana, the value of Aragón is very close to the Leader threshold (see Table 1), and therefore, the corresponding value of the achievement function is very close to 3. Second, the process consisting of several aggregations gives actually the same weight to each of the types (and to each of the dimensions), unlike the RIS scheme, which implicitly places a higher weight on the types containing more single indicators. As can be seen in Figure 4 (centre and right), this finally causes Aragón to get a better W-W-W value. Besides, depending on the level of compensability, some regions present different positions in the ranking, as shown in Figure 5. It can be seen that, for all regions, the weak composite indicator W-W-W colored in green gets the highest score, the S-W-W indicator colored in yellow under performs the previous one, and finally the strong perspective given by the S-S-S index colored in red takes the smallest values.    Bringing a new perspective to the analysis, in Figure 6 we provide through maps the regional innovation performance assessment for the three levels of compensability. Besides, the use of maps can make the analysis more visually appealing and it can highlight those regions where policy makers should pay attention, as the S-S-S perspective provides information that can be used to define improvement actions. To perform the maps, the EU reference levels previously stated were used to classify the regions by using two color scales per reference level. Looking at these maps from left to right, one can observe how, as the level of compensability decreases, the innovation performance, logically, gets worse. As noted, the worst performing regions from the S-S-S approach are (ES23) La Rioja, (ES70) Canarias, (ES53) Illes Balears and (ES43) Extremadura. It is worth pointing out the case of (ES23) La Rioja, ranked six by the RII indicator, eight and seven by the W-W-W and S-W-W indexes, respectively, while it achieves the worst position from a strong perspective (S-S-S) index. In situations such as this, with a wide range among the composite indicators, further analysis is required through a strong multidimensional approach.
We also have developed a new visualization tool that we call Light-Diagram to provide warning signals that flag those innovation areas requiring policy intervention and at the same time to highlight the region's strengths. Let us see, for example, the case of La Rioja (ES23) as shown in Figure 7.
This Light-Diagram allows us to track the specific strengths and weaknesses of La Rioja (ES23) region. First, the warning signals of the region are colored in red at all levels of the aggregation. In the last aggregation, when applying the MRP-WSCI methodology, the value of the S-S-S indicator is zero and is the responsible for the last position in the ranking. We can see that this low value of the S-S-S composite innovation index is due to the Innovation Activities (I 3 ) type where the S-S value is zero. In turn, this value comes from Public and private co-publications (I 322 ) which takes the worst position in the Linkages (I 32 ) dimension, due to the zero value of the achievement function. This means that (ES23) La Rioja gets the worst value across all European regions in this indicator.
At the same time, it is possible to identify those criteria in which the region has strengths by looking at the green color. For (ES23) La Rioja, it is remarkable the good performance of the Sales impact (I 42 ) dimension, as well as in the Human Resources (I 11 ), the Population with tertiary education (I 111 ) indicator, both performing in the corresponding Leader interval. Furthermore, the high performances in Trademark applications (I 332 ) (the best value among all European regions) and in Design applications (I 33 ) (at the Leader interval) is clearly compensated by the bad score in PCT patent applications (I 331 ) (at the Modest Innovator interval). Finally, it is remarkable the moderate innovation performance of the region given by the overall W-W-W composite index and S-W-W index coming from the Investment (I 2 ) dimension, which are colored in orange.

Setting National Reference Levels: A Country-Level Analysis
A country-based analysis by using national reference levels may be appropriate to benchmark the regions innovation performance when the innovation policies are defined by local or national institutions. In this case, we use percentiles 25, 50 and 75 (therefore, n = 3), to see the relative position of each region with respect to the Spanish regions only (see Table 3).   In Table 4, the scores and rankings for Spanish regions have been calculated by applying the MRP-WSCI approach. The regions (ES21) País Vasco, (ES22) Comunidad Foral de Navarra, and (ES51) Cataluña, are placed in first, second and third position, respectively, in the ranking. At the bottom are (ES53) Illes Balears, (ES70) Canarias, and (ES42) Castilla la Mancha. Depending on the compensation level considered, weak or strong, the regions get different positions in the ranking as shown in Figure 8. We also depict with maps how the type of aggregation transforms the regional innovation pattern of Spain (see Figure 9). Whereas, seven regions are colored in green when a weak perspective prevails, only the (ES21) País Vasco is able to maintain a top position under the strong approach in which the red color prevails in a large number of regions.  A deeper understanding of the main indicators that have contributed to a bad or good region's innovation performance should be undertaken through the "Light-Diagram" tool. In this context, we pay special attention to the (ES51) Cataluña region, which presents a great variability depending on the weak or strong perspective. The picture provided in Figure 10 enables policy makers to track and target the performance of the Framework Conditions (I 1 ) type, mainly in the Human Resources (I 11 )dimension and, in particular, to enhance the Lifelong learning (I 112 ) indicator, which takes the worst value among all the Spanish regions. As expected, when using European reference levels, the regions get similar positions in relation to the RII ranking. In contrast, if the reference levels are taken at national level we appreciate more differences. As an example, let us consider the cases of (ES51) Cataluña and (ES22) Comunidad Foral de Navarra, which interchange their positions in the rankings when Spanish levels are used. As can be seen in Figure 11, when Spanish levels are used, (ES51) Cataluña gets worse values in the first two indicators, just at the Strong Innovator threshold for Population with tertiary education (I 111 ) and, particularly, the worst value among all the Spanish regions for Lifelong Learning (I 112 ), as mentioned before (see Table 3). This is because, in general, the Spanish regions perform better than many other European regions in these two indicators. Finally, we can conclude that the value of the scoreboard depends on the benchmark performance, and it is influenced by the choice of the reference levels. The variability of each region's ranking from the weak composite indicator with respect the RII is shown in Table 5.

Conclusions
In this paper, we have proposed an alternative methodology to assess regional innovation performance, by building composite indicators from a multi-criteria decision making perspective. More precisely, making use of a predefined set of single indicators, we have built several composite measures by applying a reference point based scheme. This allows us to establish reference levels on the different indicators (which determine certain performance intervals), and to easily interpret the results in terms of the position of each region with respect to these levels.
Innovation is an increasingly important feature, given that it is regarded as one of the key factors behind growth. Nevertheless, assessing innovation is far from being trivial, given that it is a complex concept which comprises activities of a different nature, and which can be examined from different angles. Nowadays information society makes it easier for researchers to get access to data reflecting all these different activities and angles. Nevertheless, at the same time, this great amount of information calls for the use of composite measures to obtain manageable and easy-to-interpret assessment indexes. The Regional Innovation Scoreboard (RIS) provides data, for all the European regions, about 17 innovation single indicators, which are grouped into nine innovation dimensions, which are in turn classified into four innovation types. Making use of these data, they build a composite innovation index for each region (known as the Regional Innovation Index, RII), using a direct simple additive aggregation of the 17 single indicators with equal weights. After building these composite indicators, they classify the regions into four performance groups (modest innovators, moderate innovators, strong innovators and leaders), taking into account the relative position of each region with respect to the rest. Instead, we propose the use of the Multiple Reference Point Weak and Strong Composite Indicator methodology (MRP-WSCI), to build these composite measures making use of the same source data. As an example, we have shown the results obtained for the Spanish regions.
To the best of our knowledge, it is the first time that a reference point based methodology is used to assess innovation performance, and our research has allowed us to give answers to the two research questions we stated at the beginning of this paper:

1.
How can decision makers provide preferential information making use of reference levels, depending upon the scope and aim of the composite innovation index? 2.
How does the compensability of single indicators affect the overall ranking of regions when evaluating their innovation performance?
Regarding the first question, the results have shown that the use of reference levels on the single indicators allow decision makers to define performance intervals from the beginning. Given that the methodology allows to carry out several aggregation steps, it is possible to monitor the position of each region at indicator, dimension, type and global levels, and not only at global level, as it happens with the RII. Besides, the graphical visualization tool proposed makes it possible to get all this information for a given region at a glance. As an example, we have used the same relative criteria used by the RIS to define the performance intervals, but at the indicator level. In our opinion, this provides a much richer information throughout the process and besides, the results obtained at each step are easier to interpret. Another example has been provided where the reference levels are set taking into account only the values of the Spanish regions. With these two examples, we have shown that results may vary depending on the reference levels set. Therefore, potential users can decide which levels to use, based on their specific scopes and aims. The methodology used allows, as previously stated, to clearly relate the results to the levels chosen and therefore, users can easily interpret the meaning of the composite measures.
Regarding the second question, the results have proved the usefulness of getting composite measures with different degrees of aggregation. While the fully compensatory scheme (weak) provides an overall performance assessment, the non-compensatory scheme (strong) provides users with warning signs that identify indicators, dimensions or types with a less satisfactory performance. The examples used in the paper have shown that this useful information may remain unnoticed when (as is the case in the RIS and in most of the existing methodologies) a compensatory scheme is used. In our opinion, it is not a question of choosing between the weak and the strong composite indicators, but of jointly considering them, to make the most of the additional information provided.
Because of these two features of the methodology proposed, we believe that it is a very useful tool for decision making processes. By setting the reference levels, decision makers can provide preferential information in an intuitive way, defining performance intervals for each indicator. By analyzing the weak composite measures, we can get an overall view of the performance of each region. Finally, by looking at the strong composite measures, policy makers can identify improvement actions for their regions.
The Spanish regions have been used for illustrative purposes, but this methodology is equally applicable to any other country, given that all the data are available. Besides, similar studies can be made at other microeconomic ranges, like organizational or sectoral, maybe defining a different system of indicators if this was regarded suitable. Finally, we would like to study, as a future research line, the possibility to build a web-based software, incorporating the methodological and visualization features, where potential users could define their reference levels and get the corresponding results. Funding: This research has been partially supported by the Spanish Ministry of Economy and Competitiveness (Project ECO2016-76567-C4-4-R), by the Regional Government of Andalucía (research group SEJ-417), and by the ERDF funds (Project UMA18-FEDERJA-065).
Appendix A Table A1. RIS description for the elementary criteria. Source: [50]. R&D expenditures in the public sector as percentage of GDP (R&D expenditure public sector) (I 211 ) All R&D expenditures in the government sector (GOVERD) and the higher education sector (HERD)

Regional Gross Domestic Product
Firm investments (I 22 ) R& D expenditure business sector (I 221 ) All R&D expenditures in the business sector (BERD) Sales of new to market and new to firm innovations in SMEs as percentage of turnover (I 421 )

Regional Gross Domestic Product
Sum of total turnover of new or significantly improved products for SMEs Total turnover for SMEs