1. Introduction
The abundance of innovation metrics has ambiguous effects—on one side, it made it possible to measure different aspects of the innovation phenomena (
Erdin and Çağlar 2023;
Taques et al. 2021), the innovation process (
Viso 2013), and the actors in this process (
Virkkala and Mariussen 2021). On the other side, though, the data comparability, coverage, and quality limited the extent to which the information could be used efficiently in scientific research and government policy implementation (
Brenner and Broekel 2010;
Gericke 2013;
Reeb and Zhao 2020). Without high-quality information, the studies of the innovations and innovation process could lead to biased results and inaccurate conclusions, limiting, complicating, and even hindering the research in the area; the lack of comparable data would prevent longitudinal or international comparisons, while different concepts and methodologies could prevent the pooling of the data or combining the different data sources. The insufficient and untimely data would restrict the policymakers from reacting adequately and in a timely way to the problems that arise in the development of the innovations, could lead to wrong decisions, contradicting policies, and hamper the innovation management process at the national or transnational level. Policy interventions could be misdirected or miss important areas that needed support, it would be difficult to objectively assess the efficiency and efficacy of the implemented measures, and the complexity and systematicity of the planning process would suffer from the limited data coverage or lack of representativeness across economic activities or institutional units. The diverse unstructured data could negatively impact business, making investment decisions costlier, preventing proper diffusion of knowledge, and increasing the risks of missing opportunities. All this highlights the importance of high-quality data about innovations and their impact on economic and social development.
The increasing need for accurate and timely information about the innovation characteristics, suitable for the needs of researchers, business and public administration, and the limitations of the existing data sources, provokes strong and persistent interest in the methodological and practical issues of innovation measurement. In this regard, this article aims to help advance the research and methodology on the subject by reviewing the scientific development of innovation metrics and drawing conclusions regarding the systematization of the variety of metrics, their evaluation and the establishment of possibilities for their complementation in terms of scope (both for the statistical subject and subject/predicate), and the quality and relevance of the data. The literature review focuses on studies that were at least partially methodological, suggesting new indicators and metrics, or modifying and upgrading existing ones.
The paper aims to make contributions in three directions:
First, to identify the main innovation metrics used in practice by official statistical institutions or researchers, systematizing similar ones and outlining their main attributes related to data quality and relevance.
Second, to develop a system of criteria that would allow evaluation of the innovation metrics in terms of their relevance to satisfy the needs of the users for high-quality, timely and adequate information for studying and managing the innovation processes.
Third, to assess the identified innovation metrics, present their limitations, and identify the gaps in the data coverage.
Section 2. of the paper is focused on the existing research that presents methodology, broad areas for measurement, and concrete indicators of the innovations.
Section 3. discusses the development of the instruments that facilitated the assessment process for the innovation metrics.
Section 4. presents the outcomes of implementing the proposed instrument for the identified metrics. The main implications and impact of the results are presented in
Section 5., while
Section 6. discusses the contributions, practical implications, limitations, and further research.
2. Literature Review
Recently, innovations have become a priority for corporate managers, national governments, and the EU. The use of different metrics not only contributed to the understanding of the innovation itself but was regarded as a management tool that supported R&D decisions, provided information on the strengths and weaknesses of the innovation activities, and assisted in the monitoring and evaluation of institutional policies.
To have a real impact on innovation policies both at the EU level and in individual Member States, an objective assessment of the state of innovation was needed about the investments, results, and impact. The measurement of innovation was a subject of continuous interest among researchers. Scientific publications discussed the evaluation indicators, the application of different analytic techniques, and the data provided by the different surveys. The development of the indicators was closely linked with the proper conceptual frameworks, and the identification of the key areas that describe not just the characteristics of the innovation object or the corresponding actors, but also the additional features concerning the efficiency, optimization and sustainability of the innovation process (
Banu 2018).
One of the first innovation measurement attempts was based on the number of patents (
Griliches 1990). The single indicator proved insufficient and inaccurate for the researchers (
Reeb and Zhao 2020). It addressed only one aspect of the outcomes of the innovation process, while there were other important areas linked with innovation inputs, actors, impact, and implementation or diffusion mechanics. Its accuracy was also subject to debate, as not all patents were adopted in practice, and their economic benefits were not considered. In that regard,
McAleer and Slottje (
2005) developed an additional indicator—the patent success ratio (PSR), as a tool to track the success of the innovative activity, and
Ponta et al. (
2021) suggested the Innovation Patent Index (IPI) for quantitative insight into the various aspects of firms’ innovations.
The LBIOI approach was related to the patent indicators. It was based on a literature review to assess innovation outcomes (
Coombs et al. 1996), and allowed the study of relationships specifically between innovation and performance, which was important for the promotion of innovation in the public sector (
Walker et al. 2002). This approach enabled the creation of longitudinal data sets, leading to improved communication in sharing good practice and the use of evidence in public policy, management and research (
Van der Panne 2007). It was also subject to various improvements, such as the Technology Impact Factor (TIF), which is based on the Journal Impact Factor (JIF), used to evaluate journals in terms of practical innovation in connection with the patents. The JIF factor examines the impact of journal articles on patents by calculating the number of patents cited in a journal divided by the number of articles published in that particular journal. The aim is to measure the impact of academic publications on practical innovation (
Huang et al. 2014).
To measure innovation performance, the World Intellectual Property Organization (WIPO) created the Global Innovation Index (GII). The index included around 80 indicators, including measures of the political environment, education, infrastructure and knowledge creation for each economy, and the results are published annually (
World Intellectual Property Organization 2024).
Erdin and Çağlar (
2023) used the input and output sub-indices of the GII in their input–output model. They constructed an innovation performance matrix that presented the relative positions of countries to support the exploration of countries’ strengths, weaknesses, and potential based simultaneously on innovation performance and productivity.
At the European level, the Summary Innovation Index (SII) has been developed and implemented (
European Commission: Directorate-General for Enterprise and Industry 2008). A tool for its visualization was the European Innovation Scoreboard (EIS), an initiative of the European Commission. The current version of the SII includes four sub-indicators, each with eight indicators, equally weighted, that characterize the dimensions of the innovation process—main drivers, investments, innovation activities and impacts (
European Commission: Directorate-General for Research and Innovation 2024). A variant of the EIS was the European Public Sector Innovation Scoreboard or EPSIS (
Sandor 2018). Its creation was aimed at improving the ability to compare the innovation results of the public sector.
The possibilities for assessing organizational innovation and the problems of using existing large sample surveys in European countries were analyzed by
Armbruster et al. (
2008). They recommended adapting the research methodology to the specifics of organizational innovation.
Nǎstase et al. (
2009) examined the different options for measuring innovation and their potential to complement each other for research and policy to stimulate innovation.
Gericke (
2013) identified four areas that were important in the management of innovation in the specific segment of small and medium-sized enterprises: the company’s innovation strategy; the internal innovation culture; the combination of internal research and development with open external sources; and the management of innovation cycles.
El Bassiti and Ajhoun (
2016) considered the assessment of the entire innovation process in three dimensions: Scale and Detail (“Innovation Granularity Scales”) related to the participants, new knowledge and the context of the innovation; Stages and Innovation Capacity (“Innovation Capability Stages”), related to the successive activities from the idea to the exploitation of the innovation; and “Innovation Maturity Levels”, characterizing the management of the innovation process.
To create a conceptual framework for the evaluation of innovations,
Gault (
2018) suggested using a systemic approach, including a general definition, a typology of innovations, innovation activities, and the institutional sectors, where innovations were implemented or their effects were manifested. The unification of definitions was a step towards the creation of adequate statistical information not only in the business sector but also in other institutional sectors, which would create conditions for completeness and comparability of data.
Serrano et al. (
2017) explored the problem of knowledge complementarity from different sources for product, process, organizational and commercial innovations. The indicators for innovation activities had three directions: internal (the performance of internal training and research and development activities), external (the acquisition of material and immaterial assets of an innovative nature, external training for innovation activities) and cooperation for the development of innovations.
Cirera and Muzi (
2020) analyzed the problem of information quality when conducting specialized surveys on innovation in enterprises. They found that in many cases of self-reporting, there was a discrepancy in responses (a bias) to an increase in the level of innovation caused by questionnaire formulation as well as cognitive problems. Similarly referring to input and output dimensions
Taques et al. (
2021) analyzed 26 indicators of innovation grouped into three areas: input, intermediate, and output. They found that multidimensional indicators connected in a system gave better information about innovations than single ones, and were more comprehensive. Grouping the indicators in three fields: incoming (for the innovation potential); ongoing (on the status of innovation activities) and outcome (on the benefits of innovation),
Björk et al. (
2023) analyzed the measurement of innovation in firms. They conducted 39 studies over six years and based on this, they derived recommendations for adequately measuring the innovation efficiency of companies.
Ivanov and Avasilcai (
2014) performed a comparative analysis of four models in measuring innovation: “Balanced Scorecard”, “Malcolm Baldrige”, “Performance Prism” and “European Foundation for Quality Management”. They proposed a new model, uniting the characteristics of these models, in five main directions: Strategy, Processes, Leadership, Competencies, and Organizational Culture, of which Strategy was considered to be fundamental.
Lopes and Farinha (
2018) developed a metric of the innovation activity of innovation and business networks based on the conceptual model “multi-helix ecosystem for sustainable competitiveness”. The metric consists of four areas: network cooperation, economy area, social area, and environment. To study the structure and dynamics of innovation networks and synergistic effects,
Virkkala and Mariussen (
2021) generated by factor analysis four indicators: significance, expectations, experience and gap.
Li et al. (
2023) developed a composite index of innovation system development capacity based on three indicators: coordinated development capacity, evolutionary development capacity, and sustainable development capacity.
There were also numerous specialized indicators referring to various aspects of enterprise management or specific economic sectors. The measurement of innovativeness in services was carried out through service marks, which were used as an indicator to establish a relationship between services and innovation (
Schmoch 2003). Another way to measure innovativeness is the pairing approach, which provides information to the firm based on which it can assess the quality of its innovation and the sustainability of innovation (
Shapiro 2006). TEPSIE was created as a tool for measuring social innovation (
Murray et al. 2010). The model was based on screening at different analytical levels that help define dimensions and individual indicators. Analytical levels were related to different stages of the innovation process. By combining different indicators and using multiple outputs and inputs
Pan et al. (
2010) measured the differences in the effectiveness of the National Innovation Systems.
Brenner and Broekel (
2010) considered the problems in measuring and benchmarking innovation activities in a territorial aspect—between individual countries or regions. They analyzed four approaches: the overall impact of the territorial unit; the attractiveness of innovation; supporting innovation; and assessment of the impact of specific factors, and recommended that the measurement be carried out by individual industrial sectors and branches to adjust for regional characteristics.
Camison and Monfort (
2012) analyzed the problems in the individualized evaluation of innovations in the field of tourism. According to them, there were serious inaccuracies in the evaluation of the innovation activities of tourism companies, caused by the specific activities of the companies and their heterogeneous innovative behavior, which made international comparisons difficult. There was a need to complement existing studies with data for the technological and organizational innovation in tourism companies and their specific innovative capabilities.
Viso (
2013) considered the possibilities of measuring innovation not from a technological, but from a social point of view—as an action that takes place in a certain cultural environment with its specifics. A variety of actions and innovative combinations were possible, connected with the diversity of innovators in different aspects (professional, religious, linguistic, of different generations, gender, economic, organizational and product); the stability of the working environment, and the microclimate (lack of conflicts); the freedom of thought and decision-making and the availability of free time. The evaluation of innovation in the software industry was the subject of research by
Edison et al. (
2013). The authors combined a literature review with a survey of researchers and software developers to create a conceptual model of the main elements of innovation: determinants, inputs, outputs and performance.
Cruz-Cázares et al. (
2013) advocated the thesis of the relationship between innovation and efficiency. They monitored annual changes in technological innovation performance through the global Malmquist index, examining two inputs and two outputs of the innovation process. Problems in measuring innovation diffusion related to scale, time of diffusion, organizational demographics, publications, and collaboration were analyzed by
Nelson et al. (
2014). The authors suggested combining different metrics to achieve better results in measuring innovation diffusion.
Ivan and Despa (
2014) suggested the use of tools based on statistical indicators in measuring innovation in IT management, one of the main ones being the Innovation Factors Indicator (IFI). IFI was used to measure innovation in IT project management based on three aspects of software development—planning, research and development.
The management innovation (MI) measurement tool was used to assess the relationships between MI and the organization’s performance or its impact on technological innovativeness, and as a diagnostic tool for defining the management innovativeness of the company, as well as when comparing it with other organizations operating in the same industry (
Kraśnicka et al. 2017).
Iddris (
2016) examined the development of innovation capacity within supply chains based on a sample study and identified four dimensions of innovation capacity: idea management, idea implementation, collaboration, and learning. The conceptual framework for measuring innovation proposed by
Janger et al. (
2017) identified structural change and structural refinement as two key dimensions in both manufacturing and services. The modified metric was aimed at a narrowly defined high-tech understanding of innovation outcomes.
Gamito and Madureira (
2019) proposed a tool for measuring innovation introduced by organizations in rural areas—the Rural Innovation Indicator System (RIIS). This tool compared the performance of different organizations through a multidimensional set of indicators that monitored the innovative behavior of the diverse types of rural organizations. RIIS included the three main dimensions related to the stages of innovation: input resources; processes for achieving innovation; and innovation outputs.
The measurement of innovation openness was conducted through the ATOM (Aggregated Openness Measurement Technique) approach, which helped to characterize and measure innovation openness, based on the concepts of knowledge supply (KS) and innovation practice (IP). ATOM enabled the identification and measurement of the criticality of knowledge supplies, evaluation of the openness of the adopted innovation practices, and support for subsequent training in the way the project was managed (
Bellantuono et al. 2021).
Kalapouti et al. (
2020) examined the influence of patent applications, the level of development, the level of employment and the degree of technological diversity. The thesis of the authors was that the effectiveness of innovation was determined by the ratio of input and output resources, measured by the costs of research and development activity and human capital, and the technological knowledge diffusion that comes from the spatial and technological neighborhood.
Indicators used for other purposes could be adapted to analyze the performance of micro-level innovations. Such was the Balanced Scorecard (BSC), which was an indicator of performance and included the measurement of four main aspects of business: learning and growth, business processes, customers and finances. According to
Gama et al. (
2007), the BSC was a tool for measuring business results, but it could not measure their added value. The authors proposed the use of innovation indicators from the BSC, organized in a system, with which to measure precisely the added value. A Business Intelligence dashboard could be used as a visualization tool, which a company could create to present its results. It was necessary to specify their goals to select appropriate key indicators and collect the necessary data for them—both from ad hoc surveys and from official sources. Using the capabilities of Business Intelligence dashboards,
Aimiuwu and Bapna (
2011) proposed an innovation index to track innovation capacity, which measured innovation by firm size, market power, motivation to innovate, availability of resources, as well as decentralization.
The literature review distinguished two important groups of indicators. The first group covered indicators that were concerned with and interpreted at the micro-level—in individual enterprises, reflecting the specific conditions for the emergence of innovations (product, process, organizational and marketing), as well as the innovation activities in the company that lead to innovations (
Aimiuwu and Bapna 2011;
Bellantuono et al. 2021;
Gama et al. 2007;
OECD/Eurostat 2018). These indicators were related to the place of innovation origin and reflected in detail the characteristics of the innovation itself, the participants in the innovation process, and its specific conditions and limitations. They were connected to technological, technical and administrative processes, the development of knowledge, and its sharing. Regarding the usefulness of the data, these indicators have the greatest cognitive importance. They can provide in-depth information about innovations and innovation activities, about potential and real difficulties, as well as about measures that would support the accelerated development of innovations and knowledge transfer (
Dziallas and Blind 2019). On the other hand, precisely because of the micro character of the individual indicators, they were dominated by specifics and concreteness, which made it difficult to establish general regularities and patterns, sometimes gave misplaced estimates and inaccuracies, and—due to the lack of unification—restricted implementation of comparative and dynamic analysis, planning, and implementation of a targeted policy to support innovation (
Ponta et al. 2021;
Taques et al. 2021). Furthermore, low-level indicators could not capture all the effects of innovation in firms, as they may have an impact on larger geographic, economic or institutional spheres (
Hoelscher et al. 2015).
In contrast, indicators at a high level of aggregation provided a generalized picture of innovation processes in a particular region, institutional sector or country. They were suitable for planning and reporting policies and strategies and could be extensively used for statistical and econometric analysis, as they meet the requirements for large data samples, information quality, completeness, comparability and (
OECD/Eurostat 2018). However, with them, a large part of the detailed, specialized information was lost, as the heterogeneous picture was presented in a small number of generalizing characteristics, which limited the possibilities for in-depth analysis.
Between these two extreme positions, there were a certain number of indicators that could be evaluated both at the micro and macroeconomic levels. This included indicators for patents, trademarks and citations, some financial indicators of companies, R&D expenditures, employed persons and their qualifications, and high-tech production. They provided information on the general state, structure and trends, allowing disaggregation to enrich the analysis and showing domain-specific regularities (
Erdin and Çağlar 2023;
Virkkala and Mariussen 2021). At the same time, some indicators were specific only to the relevant level of aggregation, group or territory, reflecting its specific features and its specific contribution to innovation development (
Banu 2018;
Gamito and Madureira 2019;
Gericke 2013). Sometimes, this contribution could not be established and allocated precisely in the lower levels and was lost (neutralized) at a higher level of aggregation.
The process of gathering information on innovation at the firm level was accompanied by problems related to low comparability, especially internationally; narrow coverage of the population, covering only part of the companies or economic activities; subjectivity of answers; lack of or limited representation; extended periods of observation and processing of the results; and limited scope, since the firm is observed, not the innovation (
Rammer and Es-Sadki 2023). Big data offered new possibilities to solve some of those problems and collect accurate and reliable data about innovations in enterprises (
Kinne and Axenbeck 2020). They posed certain advantages over traditional data collection: timeliness, low costs, better coverage, good accessibility, and flexibility, while at the same time, some limitations existed, related to information bias, unknown accuracy and consistency, outdated information, language problems, and the need for interpretation (
Rammer and Es-Sadki 2023).
Big data differed from the data collection methods, as they rely on the data generated for other purposes, and could produce results in a short time. The data are not structured, and the processing requires new, advanced methods and computational power. Development in the field of information technologies and the techniques of data mining, text mining, machine learning, web crawling, web content analysis, labelling, keyword search and the application of artificial intelligence facilitated the use of big data as an additional source of information on a firm’s innovation output. While big data have the potential to revolutionize data collection not only for innovations but in many areas, and present opportunities to reach a large segment of the population, they are restricted to the data published on the internet. They are useful as a supplement to official data, providing additional aspects and details for innovation at the company level (
Gök et al. 2015), and could be combined with other information in databases (
Ashouri et al. 2024) and official surveys (
Daas and van der Doef 2020;
Kinne and Lenz 2021), even if some results were controversial (
Nelhans 2020).
Further development of big data potential requires advances in both statistical methodologies for data collection and information technologies for processing unstructured data while showing the potential to enhance firm-level innovation measurement and the development of new indicators about knowledge sharing and diffusion. To do so, some conditions were to be satisfied—established coverage, bias correction, validation/reliability, transparency, and comparability (those were named guiding principles by
Rammer and Es-Sadki 2023).
The main conclusion from the literature review was related to the variety of forms of innovation metrics (a list of metrics is presented in
Appendix A). They combined individual, quantitative, qualitative, absolute and relative indicators, as well as statistical and non-statistical indicators in a list or connected in a system, with its corresponding structure, and methodology of weighing and summarizing in a composed indicator. Assessing the adequacy, applicability and suitability of the diverse and heterogeneous innovation metrics presented was a complex task. Solving it required, on the one hand, the creation of a classification scheme for their grouping and summarization, and on the other hand, the development of a system of criteria through which to establish both the statistical and cognitive characteristics of individual innovation metrics.
3. Methodology
Information on the individual indicators or groups of indicators developed, proposed and used in practice, which were called innovation metrics in the study, was collected through bibliographic research of scientific publications in the specialized Scopus and Web of Knowledge databases. Those reference databases were chosen, as they cover a large number of journal titles—Scopus included more than 47,000 in 2024, and Web of Knowledge included more than 25,000 titles. A sizeable share of those titles were open-access journals—respectively about 7900 in Scopus and about 6500 in Web of Knowledge. Papers in both databases underwent double-blind peer review, warranting high-quality standards. Databases have strong coverage of science, technology, and medicine, with a wide range of subjects that were closely connected with innovations and innovation measurement. The search engines allowed for elaborated conditions and abstracts, titles, author names, keywords, and citations.
The process of publication selection for the study of the innovation metrics was implemented in two steps. First, a basic keyword search was performed (date: 8 April 2024) that included the “(Innovation OR Innovations) AND (Measurement OR Measure OR Indicators) in (Author Keywords)”. The Web of Knowledge results were restricted by “Open Access” and “Business Economics” criteria and produced 211 documents. The Scopus results were restricted by “Open Access” and “Business, Management and Accounting” criteria, producing 223 documents. The second step was manual screening to eliminate the duplication of papers that existed in both databases and to outline the studies relevant for the current research. The papers chosen had to be at least partially methodological to present new concrete indicators or a broad scheme that could be operationalized with indicators, or to suggest modifications and refinements in the existing indicators or metrics of innovation activities. The papers that covered only empirical analysis of existing indicators and metrics were excluded from the review, as were the papers that discussed indicators concerning the technical or technological aspects of particular innovations. The papers published before 2000 were also excluded with two notable exceptions—a
Griliches (
1990) paper for patent indicator, and a
Coombs et al. (
1996) paper for literature-based innovation output indicator. The manual screening produced a total of 53 papers for the literature review. In addition, the latest methodological issues of the World Intellectual Property Organization for Global Innovation Index (
World Intellectual Property Organization 2024), European Commission for European Innovation Scoreboard (
European Commission: Directorate-General for Research and Innovation 2024), and OECD/Eurostat Oslo Manual (
OECD/Eurostat 2018) were also included in the review, as they presented important conceptual notes about the innovation measurement process.
The large number of innovation metrics required systematization and structuring in the first place, and in the second place, to develop evaluation criteria and subsequent comparative analysis.
Systematization was performed during the literature review of the publications, and indicators or metrics that have common elements, cover the same aspects of innovation or use the same source data were combined as one metric (patent indicators, innovation management indicators). Structuring of the innovation metrics was achieved by outlining their important features or attributes. Some classification features could be found in the Oslo Manual, specifically object/subject approach, qualitative/quantitative measurement; survey type; data sources and purpose; aggregation; measurement dimensions and hierarchy. In addition, three more aspects of innovation metrics were examined in the current study—data coverage, variable type, and variable scaling.
The measurement object attribute answers the question “What was measured?”, with possible outcomes being ‘object’ when innovation’s features were observed, and ‘subject’ when actor’s characteristics were measured. The measurement type answers the question “How was it measured?” with possible outcomes ‘qualitative’, ‘quantitative’ or ‘mixed’. The survey type determined the data collection target, with ‘census’ used for surveys that encompassed all units, ‘representative’ for random sample surveys, and ‘case study’ for all others. Case study survey type includes situations when indicators were developed only for a limited number of subjects, for a particular innovation type, or selected units in a specific territory or economic activities. Case studies were thus neither complete nor representative methods for data collection. The data source types were defined as ‘official’, ‘unofficial’ and ‘mixed’, respectively covering data supplied by official institutions (statistical or others), by unofficial sources (like big data, or projects), or when indicators used a combination of both. The data purpose was connected with the sources, as official institutions usually collect data for administrative needs, while businesses collect data for commercial purposes. When both were used in indicator construction and evaluation, the ‘mixed’ category was chosen. The data aggregation attribute was linked with the question “What was the level of aggregation?”, with two possible outcomes—‘individual’ and ‘summarized’. Individual indicators were computed at low levels (key performance indicators, R&D expenditures), while summarized indicators were computed at higher—regional or national—levels of aggregation. Some of the indicators could be used at both levels, and in the process of structuring, the metric would be classified based on their implementation—when the indicators or scheme were developed as a tool for enterprise innovation management, they were considered individual; when they were developed for the evaluation of the national innovations, they were considered summarized.
The data coverage feature depended on the inclusion of all subjects in the target population. When all units were included, data coverage was complete, while when only part of the units were included, the data coverage was partial. This attribute was connected but not identical to the survey type, as some census surveys excluded units when defining their target population (e.g., CIS 2018 excluded small firms with less than 10 employed persons, following the Commission Regulation No 995/2012).
The types of variables used as indicators were defined as ‘absolute’ when absolute values were used, ‘relative’ when indicators were evaluated as shares, ratios, or percentages, and ‘mixed’ when both types were used in the metrics. The scaling of the variables was an important feature, as the absolute variables often increased with the size of the subject, and were not suitable for the comparative analysis. The scaled variables were transformed in some way and were comparable for different subjects, regions or countries.
The measurement dimensions determined whether the indicator assessed only one aspect of the innovations (‘single’) or many (‘multiple’). When the multiple indicators were weighted and combined into one summary variable, metrics were considered ‘synthetic’. The presence of indicator structure was connected to the multiple-type metrics when they consisted of groups, subgroups and individual indicators. When metrics covered more aspects but indicators were not structured in some hierarchy, the metrics were classified as ‘list’.
In total, 11 attributes were included in the proposed classification scheme. The general form of the classification scheme and the individual groups are presented in
Appendix B.
The criteria for evaluating the indicators were based only broadly on the six criteria of the Oslo Manual. Some of those criteria were too abstract or generalized (‘
Serve the needs of actual and potential users’,
OECD/Eurostat 2018, p. 215) and needed contextualization related to the nature of the innovations, their interaction with the environment, implementation, diffusion and impact. The importance of theoretical background for innovation data quality was considered in the manual (
OECD/Eurostat 2018, p. 187), with its key role for data collection—in survey planning, definition of units, scope, variables, and development of the questionnaire. In the present study, other important theoretical aspects were included concerning data processing, results interpretation, and connection with innovation practice and other scientific disciplines, forming their criterial group.
Thus, seven groups of criteria were formed, and in each group, additional operationalized criteria were established, logically derived from scientific research on the issues of innovation measurement. The groups were as follows: Relevance, Accuracy, Reliability, Timeliness, Accessibility and Clarity, Comparability and Coherence, and Theoretical/Conceptual Soundness.
Relevance could be interpreted as thematic relevance, i.e., to cover inputs, processes and mechanisms, outputs, impact, and sources of innovation. In addition, one could consider target relevance when the purpose of the derived indicators was to carry out a subsequent analysis of factor influences, relationships and dependencies (
Roszko-Wójtowicz and Białek 2016;
Walker et al. 2002), for augmenting the measurement of input and output with an estimate of their ratio, i.e., efficiency (
Cruz-Cázares et al. 2013). Thirdly, a specific relevance could be indicated when separate but important aspects of the innovation activity were taken into account, such as the achievement of positive results (
McAleer and Slottje 2005), sustainability (
Shapiro 2006), conditions and cultural features of the environment (
Viso 2013), system connections and learning (
Manzini 2015), structural changes and upgrading (
Janger et al. 2017), synergy (
Virkkala and Mariussen 2021), and open innovation (
Carrasco-Carvajal et al. 2022).
The criterion of Reliability was related to obtaining stable data with little noise (
OECD/Eurostat 2018), identical results in repeated observations as well as in observations of a different group of units. Data from different sources must complement each other successfully (
Serrano et al. 2017). Samples in partial surveys should be representative (
Iddris 2016).
The Timeliness criterion considers the speed of reporting and publication of results, which increases the possibilities and usefulness of indicators in monitoring and decision-making (
OECD/Eurostat 2018). The simplicity of the computational procedures and the speed of obtaining primary data also improved the timeliness of information (
Ponta et al. 2021).
Accessibility and Clarity were linked to the comprehensibility of indicators. They should be easy to interpret and have developed metadata and interpretation guidelines (
OECD/Eurostat 2018). An additional benefit would be the availability of ready-made data from past periods or collected for other purposes, but also suitable for calculating innovation indicators (
Nagaoka et al. 2010).
Conceptual/theoretical soundness was based on the creation of a conceptual model and the adoption of common, standardized and unambiguous definitions (
Edison et al. 2013) to define on the one hand the object of the study—definitions not only of product innovation but also of the other types: technological, process, organizational innovations. On the other hand, the entities at the different levels of aggregation must also be clearly defined. At the micro level, these were the individual enterprises, scientific organizations, and academic institutions, and at the macro level, it was necessary to adopt the already created unified definitions of the institutional sectors, to create a connection of the metric with the indicators of the system of national accounts (
Gault 2018).
The system of indicators should be flexible, allowing for the addition of new indicators (
Nǎstase et al. 2009) to reflect new or existing but acquired aspects of innovation and/or actors. At the same time, the system of indicators must be balanced concerning the individual measurement directions (
Stek and van Geenhuizen 2015), thus giving a correct assessment of the innovation processes, without shifting the focus on one or another aspect, just because it was easier to measure or was an agenda of economic policy at a particular time.
The system of indicators should be aligned with not only theory, but also with the innovation practice (
Hoelscher et al. 2015), thereby accounting for and supporting the validation of good practices in knowledge sharing and innovation implementation. It is necessary that it correctly reflects innovations in a regional aspect (
Brenner and Broekel 2010), allowing one to distinguish innovations that arose in a certain region from those diffused in the region, as well as being able to distinguish that part of the beneficial effects that arise in the region after the innovation, from the overall benefits to the firms expressed in more regions.
Unification of the operationalization of innovation was needed to eliminate subjective perception in surveys (
Cirera and Muzi 2020). This would facilitate the use of standardized questionnaires with qualitative indicators in different countries and regions, as they would be perceived in the same way, and responses would be processed more easily.
Based on the seven criteria groups, the specific requirements for the individual indicators and indicator systems, a criteria checklist was developed (
Appendix C), which enabled an objective assessment of the specific innovation metrics.
4. Results
The performed literature review and the derived list of the metrics presented in the publications with their main characteristics were subject to classification and assessment. The classification described the various external features of metrics, i.e., characterized their formal side. The criteria checklist, on the other hand, described their nature, content and measurement qualities.
The full classification of the 23 metrics is presented in
Appendix D, and the summary is presented in
Table 1. The attribute concerned with the target of measurement (“Measuring Object/Subject”) showed prevailing metrics that covered both the object (innovation itself) and subject (innovation actors). Twelve metrics related to only one aspect of the innovation process—six served to measure the properties of innovations, and six measured the characteristics of the participants. According to the “Qualitative/Quantitative Measurement” feature, most were metrics (11) that offer only a numerical result, i.e., quantification of innovations by calculating some indicator. Verbal or mixed presentation of the measurement result was found for three and nine metrics, respectively. It can be summarized that most of the metrics used a quantitative result, which is a prerequisite for comparability over time and between participants.
In “Survey Type”, the census was prevalent (used by 16 metrics), while in the other seven, the survey was conducted for specific cases and units. Under “Data Sources”, eight metrics used official data sources, while data for 15 metrics were collected from unofficial sources, which might pose an issue for the reliability of the results and the conclusions that could be drawn from them. The metrics under the attribute “Data Purpose” were distributed among three groups. For 15 metrics, measurement data were collected for various business needs, while only four were using data collected for administrative purposes. In four cases, the data originated from both administrative and business sources. Most of the data used for the metrics came from business sources. This could affect both their scientific soundness and could pose a limitation on the conclusions that follow from the applied measurement to only one economic unit, industry, or region. The “Data Aggregation” and “Data Coverage” attributes showed a balanced division of the metrics. Half of them were summarized, and half were using individual data. Half offered full coverage, while the other half covered part of the target population.
In 17 cases, both absolute and relative values were used under the “Variable Type” attribute. This meant that by applying the appropriate descriptive variable, the various aspects of innovation were more fully captured. The application of only absolute or only relative variables was limited to two, respectively four metrics. For the “Scaling” attribute, the metrics were not adjusted for population size or properties (17 metrics). Adjusted variables (per person, per sq. km, in %) were used primarily in the composite indicators (six metrics). According to the “Measurement Dimensions” feature, the complex nature of the innovations was captured by a group of indicators, which provided the study of different innovation aspects (19 metrics), and in six of those, the composed indicator was provided. Only four metrics measured a single aspect. According to the “Indicator Structure”, in 18 metrics, the indicators were characterized by structure and subordination, i.e., they were organized into a system. Only in five metrics were there individual indicators or a list of indicators not connected.
The assessment of the metrics was performed via the criteria checklist. The full results are presented in
Appendix E. In
Table 2 a summary is shown.
The Relevance of the analyzed metrics revealed that all of them were serving some purpose for the potential users, as they covered at least one but usually more aspects of the innovation process. Eighteen metrics were found suitable for analyzing the factors influencing innovation and the relationships and dependencies due to the prevailing quantitative nature of the indicators. In about half of the metrics (13), all important topics were covered—the conditions for implementation as input, the innovation activities as process, the actors, and the results/impacts as output. Less attention was paid to the specific aspects reflecting efficiency, effectiveness, and synergistic effects of innovations, while open innovations, sustainability, cultural elements, and structural changes were neglected.
From the point of the “Accuracy” criterion, it could be noted that most metrics provided objectivity and validity of the assessment. The detection and correction of over or underestimation of actual achievement were less presented due to neglect of the possible heterogeneity in the data. As much as “Reliability” was concerned, only half of the metrics presented stable results or were suitable for data integration. The representative samples were used only in official data collection, and only for some of the indicators, rendering the reliability of others in question, especially when based upon a small number of case studies. Better was the performance of the metrics in the “Timeliness” criterion, where half of them were simple and easy to record, while two-thirds were delivered to the users on time.
Most of the metrics did not possess either guides or metadata, but this was partially compensated for by the data readily available for long periods, or large populations. The lack of metadata and common methodology reflected the metrics’ performance in the “Comparability and Coherence” criterion. The time comparability was easiest to achieve but less than half (nine) allowed for various levels of aggregation, and even less (five) could focus on important groups like SMEs or rural areas.
With the established logical relationship between the constituent indicators in the metrics (15) and the availability of a flexible system for adding new indicators (13) that are characteristics under the “Theoretical Soundness” criterion, researchers received useful guidance for the development and improvement of the methodologies used. The metrics were consistent with the theory and the innovation policy implemented at various levels, and the majority were based on commonly accepted definitions and concepts, which made them relevant and understandable to a wide range of users and facilitated their easy implementation. What was lacking, however, was the balancing of the indicator system, the difficulties in the perception and operationalization, and especially the connection with regional development.
Based on the criteria results for the analyzed metrics, we made a ranking in descending order (using the total number of characteristics possessed by each of them). The highest rank was found to be 26, and it was obtained for the European Innovation Scoreboard. The second place was occupied by the Global Innovation Index. Given that the maximum possible score was 32, no metrics offered a complete and comprehensive assessment of innovation. Based on their scores, we divided all 23 metrics into three groups (
Table 3).
In the first, we put all metrics with a rank above 16, i.e., they corresponded to at least half of the criteria considered. In the third group, we singled out nine metrics with a rank of or below 10. All but one of them lacked scores in two or more criteria groups, making them the least suitable for measuring innovation. The middle group included seven metrics with a rank between 11 and 15. They required refinement of the characteristics in terms of accuracy of assessment, accessibility of interpretation and consistency at various levels or groups. In summary, the official innovation metrics performed better when met with the criteria requirements, while the specialized or broad models lacked concretization, theoretical foundation, efficient and reliable data collection and coherence.
5. Discussion
Innovation is among the most important sources of competitive advantage, recognized by researchers as a powerful means of realizing polyvalent outcomes and multidirectional benefits. Like the broad discussions on the intrinsic aspects of innovation, the scientific community and business were concerned with the possibilities of measuring, evaluating and—on this basis—using them as a benchmark for gaining market position, scope of impact, tools for subsequent competitive struggle, etc. The literature review showed sustained interest, decades of effort and numerous attempts to capture the complex and multivariate nature of innovation; to incorporate certain aspects of it into evaluation systems of indicators at different levels; to model relationships, dependencies and evaluation mechanisms, to provide scientific explanations and obtain empirical evidence on why and how innovation drove the development of economies and contributed to societal well-being. In just over a century (since 1912), scientific thought had come a long way from Schumpeter’s creative destructions and the thesis of their growth as ever-expanding clusters that brought benefits, impacts, and provoked innovations and changes, to the modern complex, multidimensional evaluation systems of metrics that sought to measure and prove the importance and power of innovation as an economic category, technological lever and engine of growth.
Innovations were studied as a process from a temporal point of view, as statistical phenomena, and in the last decade, attempts have also been made to evaluate them as a key competence characteristic integrated into the structure of a certain object, system, or network. Researchers used various metrics to define their performance characteristics, parameters, properties, components, etc. Their multi-dimensional and complex nature and wide range of performance indicators presupposed the immense variety of metrics used. This diversity of metrics was also provoked by different research interests, specific perspectives, and target orientations. There was a collective understanding that innovations could not or could hardly be measured and evaluated by one universal indicator (be it an integrative indicator) since their in-depth evaluation screening covered aspects that were too diverse (
Erdin and Çağlar 2023;
Taques et al. 2021). Despite this weakness, their observation produced objective outcome metrics and provided valuable information on the potential of innovation output (e.g., patent output and efficiency data) to provide benefits, impact competitiveness at all hierarchical levels, and indicate the capacity of a given structure, system, or country to develop on a technological basis. On this foundation, prognostic assessments could be indicatively made for the possibilities of generating subsequent innovation and value flows, processes, and trends.
The complexity of innovation processes and the multi-component nature of innovation results led to the complication of the systematization and classification of the ever-increasing variety of innovation results. To distinguish, measure and report innovations, not only attributes and features of individual types of innovations were used, but clustering of innovations and approaches were introduced for their classification (oriented to the individual, structure, interactions, and innovation systems), etc. A conceptual and etymological level of abstraction was reached to denote innovation varieties, i.e., the systematization of the innovation objects based on a two-level classifier, distinguishing the basic attributes for classifying the innovation objects and the typological concepts according to those attributes (
Gault 2018). This made it difficult to develop a universal metrication system, as on the one hand, single indicators were unreliable, and on the other, it was difficult for the selected single indicators to distinguish between several types of innovation. Identifiable were the needs for the identification of common standards and the development of metrics based on quantitative indicators. In practice, these metrics prevailed, but it was also essential to assess states, structures, and processes that were difficult to measure quantitatively. In this case, it was recommended to use or supplement with qualitative indicators (for example, evaluation of sources of information and innovations, application of specific mechanisms, consideration of psychological aspects, etc.).
There was a certain contradiction or duality related to the fact that the dominant part of innovations was carried out at the micro level, the level of a separate entity (enterprise, organization, institution), to which the relevant evaluation metrics were applied. At the same time, many of the indicators used in statistical observations referred to macro-level measurements. Abstractly speaking, regardless of the size/scope of the object within which the innovations were carried out, the widespread use of the systems approach allowed the identification of these primary objects as small systems, for which it was appropriate to apply the relevant system metric. However, a critical point was the correct construction of the assessment methodologies at a higher hierarchical level (economic sector, region, country, or a wider system). In the inclusion of the primary data, it was necessary to structurally and cumulatively achieve reliable generalizing metrified evaluation results that allowed comparability of state and dynamics on the maximum possible basis. A key point was also the correct choice of the main thematic structural units, the primary objects, and the statistical units, for which the opinion and behavior would be analyzed, and the evaluations would bring useful information with the potential to be used as effectively as possible at the managerial and prognostic level.
In practice, both simplified models for metrification as well as complex and wide-ranging systems have been established. They underwent their natural development, given the need to maintain a certain correspondence with the needs of science and practice for the maximum degree of measurability, comparability, and permissible scope, to cover the major areas of evaluation and to give an objective view, explanations, and evidence for certain statements. Regardless of the efforts to upgrade and enrich, there were also certain criticisms of their imperfections, limitations, gaps, etc. For example, SII was not a good enough measure of innovation performance—its value increased even if the innovation output resulting from additional input was zero (
Edquist et al. 2018). Some authors use the constituent SII indicators, which they analyzed most often through DEA, and found overestimation as well (
Cirera and Muzi 2020;
Roszko-Wójtowicz and Białek 2016;
Sandor 2018;
Taques et al. 2021).
For the modern innovation process, the emphasis was placed on characteristics related to increased risk and the need for its diversification, the need for significant investments (of course, depending on the level of innovation radicality) and the efforts of many people. Since the beginning of the 21st century, the Open era had begun, in which innovations were increasingly the result of collaboration between different entities, whose formal boundaries were becoming increasingly “blurred” (
Carrasco-Carvajal et al. 2022). Ever-increasing competition has put significant pressure on the intensification of processes based on the sharing of time, resources, activities, efforts, and risk. The wide development of information technologies provided an opportunity to accelerate the network approach’s implementation (
Virkkala and Mariussen 2021). In this context, the interest of researchers in measuring and evaluating networked innovation processes, the contribution of individual participants and the mechanisms of interrelationships and interactions that lead to successful innovations was growing. Specialized software has increasingly been used to help with metrics and assessments. Processes of including artificial intelligence for their improvement were also considered.
The strong connection between the several types of participants in innovation processes and the open nature of their interactions directed the focus to efforts to identify opportunities more clearly and to objectify the degree of suitability of existing metrics to be used to evaluate innovation networks. Not all could be used; not all that could be used would be a reliable objective way of measuring and evaluating. However, there were positive indications that based on existing metrics, and with the help of information technology and artificial intelligence, improvements could be made, or existing metrics could be combined in a different way to reach the desired level of reliability, suitability and objectivity of the metrication and related evaluation processes.
6. Conclusions
The innovation metrics were an important part of monitoring the innovation process, policy development, implementation, and assessment, as well as establishing a rich database for scientific research and business management. To successfully comply with the needs of all users, innovation metrics have to satisfy certain conditions about their reliability, accuracy, timeliness, and relevance.
The extensive literature review performed in the paper led to several results concerning the advancement of scientific research on the topic of innovation measurement. The findings could be summarized in the following directions. First, the exploration of the scientific papers found in Scopus and Web of Knowledge allowed the identification of 23 innovation metrics ranging from broad outlining areas for measurement to the elaborated hierarchical systems with operationalized concrete variables, and from single indicators that measured only the output aspect of the innovation process to multidimensional schemes that covered inputs, outputs, actors, conditions, impacts and interactions with the environment. In the second place, the literature survey helped to produce two instruments that facilitated the analysis of the innovation metrics: a classification scheme with 11 attributes that allowed each metric to be described in detail, and a criteria checklist of seven criteria groups that allowed metrics to be evaluated versus the needs of the users (relevance, timeliness, accessibility and clarity, comparability and coherence), and data quality (accuracy, reliability, theoretical soundness).
The practical implications of the produced instruments were connected with the possibility of other users or researchers applying them when assessing existing or newly created metrics, including at the stages of their design and development. Such analysis could help improve the metrics and avoid the deficiencies arising from conceptual problems or insufficient information for important areas. The performed ranking of the 23 metrics identified in the literature could be used by policymakers or business managers when choosing the appropriate indicators to suit their needs, and avoiding metrics that were either undeveloped, narrow, unreliable, or untimely. The ranking of the metrics also revealed several gaps existing in the majority of them, connected with the cultural environment, sustainability, open innovations, structural changes, and regional development. Those deficiencies need to be addressed in the future development of methodology and indicators to enhance the usefulness of the innovation metrics.
When interpreting the results of the study, several limitations should be considered. The selection of the publications for the literature review was limited by the choice of databases, keywords, time, open access and subject area. It produces many publications, yet the narrow search criteria could miss some important research and the corresponding innovation indicators or metrics. Additional exploration of the publications in other scientific databases, with more keywords and broader subject areas, is advised to find more indicators and metrics, as well as the methodological issues especially concerning possibilities presented by big data, panel surveys, and their combination. At the same time, the structuring of the innovation metrics was performed from a somewhat statistically inclined point of view. Other classifications were also possible, from the functional point, or in connection with the stages of the innovation process. The exploration of the metrics from different sides would allow a better understanding of the innovation measurement and enhance the clarity and usefulness of the collected data. The generated criteria checklist was limited to the features and needs identified in the literature review. More research in the area, including pilot surveys with representatives from different institutional sectors, could bring more light to the potential needs of academics, businesses and policymakers. The ranking of the 23 identified metrics should also be considered with caution, as it was performed with the subjective judgment of the authors. More research in the field of innovation indicators and metrics could help refine the criteria and decrease the level of subjectivity.
The processing of the metrics through the instruments revealed that the Summary Innovation Index and Global Innovation Index were the best metrics at the moment, although they too showed some deficiencies, particularly in the absence of information about the mechanisms and sources, that is “how innovations happen?”. On the other hand, the business metrics, like the Adapted Business Scorecard Model or Business Intelligence Dashboards, presented some individual data about the innovations at the firm level but lacked the aggregation and representation for the sector, region, or country as a whole. The solution to this problem is not an easy one and would require more efforts from business, administration, academic circles and statistical organizations.