Next Article in Journal
Research on Multi-Objective Path Planning for Emergency Evacuation in Subway Stations Using an Integrated and Improved Ant Colony-Genetic Algorithm
Previous Article in Journal
System Analysis of Environmental Effects: A Case of Sustainable Development in the Russian Economy Based on Digital Engineering
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on Risk Measurement Methods of Scientific and Technological Innovation: A Dynamic Tension Model Based on Novelty and Adaptation

1
Business School, Shandong University of Technology, Zibo 255000, China
2
Max Planck Institute for Solid State Research, Heisenbergstraße 1, 70569 Stuttgart, Germany
3
National Science Library (Chengdu), Chinese Academy of Sciences, Chengdu 610213, China
4
Institute of Science and Technology Information, Beijing Academy of Science and Technology, Beijing 100044, China
*
Author to whom correspondence should be addressed.
Systems 2026, 14(2), 142; https://doi.org/10.3390/systems14020142
Submission received: 26 December 2025 / Revised: 22 January 2026 / Accepted: 27 January 2026 / Published: 29 January 2026
(This article belongs to the Section Complex Systems and Cybernetics)

Abstract

Grounded in knowledge recombination theory and innovation tension theory, this study develops a novel measurement framework for scientific and technological innovation (STI) risks that captures the dynamic and systemic equilibrium between novelty and adaptation. We first analyze the endogenous mechanisms through which STI risks emerge from knowledge recombination processes, and then propose a classification framework for knowledge recombination, along with quantifiable metrics for novelty and adaptation. Next, we introduce a risk classification system for STI and corresponding quantitative evaluation metrics, facilitating dynamic monitoring of innovation risk states. Finally, we validate the framework through an empirical case study in natural language processing (NLP). Our results reveal a persistent innovation tension within the STI system between novelty and adaptation. Emerging phrases and reinforced phrases demonstrate distinct risk profiles and distribution patterns, corresponding to differentiated structural and evolutionary regimes. These differences stem from their distinct mechanisms in the novelty–adaptation interaction within a complex innovation system. Specifically, in emerging phrases, novelty shows a stable positive linear correlation with Z-score, while adaptation exhibits a significant negative linear correlation with Z-score. In reinforced phrases, novelty displays a significant bimodal association with Z-score, and adaptation demonstrates a robust inverted U-shaped relationship with Z-score. Emerging knowledge combinations show significantly higher risk scores than reinforced combinations, with high-novelty–low-adaptation combinations consistently in the highest risk quantile across stages. Moreover, the risk threshold for emerging phrases increases monotonically across developmental phases. Thus, our framework advances innovation risk assessment from static categorization to dynamic, system-level evaluation, enabling tiered risk management and optimized resource allocation for high-potential innovation pathways.

1. Introduction

The rapid advancement of disruptive technologies (e.g., artificial intelligence, blockchain, and biotechnology) has transformed global economic systems, social frameworks, and even fundamental cognitive processes. However, societal attitudes remain ambivalent—while embracing technological breakthroughs’ transformative potential, concerns persist regarding R&D uncertainties and failure risks. This dichotomy between innovation’s transformative capacity and inherent risks has elevated technology risk management to a global priority. From a systems perspective, scientific and technological innovation (STI) constitutes a complex process characterized by nonlinear interactions, cumulative uncertainty, and endogenous risk emergence. The core challenge of STI risks stems from its inherent uncertainty and elevated probability of failure [1]. Cutting-edge research faces multiple fundamental challenges from its earliest phases, including difficulties in theoretical verification, unpredictable technological development pathways, and the complex requirements of cross-disciplinary integration [2,3]. Although high-risk research projects, frequently dismissed as “impossible”, often produce paradigm-shifting results [4]. Nevertheless, a substantial number of breakthrough innovation initiatives are abandoned owing to intractable technical barriers or unsuccessful interdisciplinary convergence [5]. Consequently, genuinely transformative research frequently encounters funding challenges owing to its inherent high-risk nature. To address this challenge, it becomes imperative to elucidate the systemic mechanisms underlying STI risk generation, while developing quantitative risk assessment methodologies and implementing a tiered risk management framework with dynamic adjustment capabilities. Such an approach would significantly contribute to optimal resource allocation by preventing technological path dependency and substantially improving research productivity.
Current scholarly work has established a fundamental attribute of technological risk—its inherent association with innovation novelty [4,6]. Novelty functions as both a primary evaluation metric for scientific advancements and a crucial factor influencing decisions about research funding distribution, academic recognition, and career prospects [7]. The quest for novelty fundamentally constitutes an investigation into uncharted territory, a process that intrinsically generates uncertainty and thereby elevates innovation risk [8]. Within the research process framework, novelty arises through the innovative synthesis of established conceptual paradigms and material elements [9]. From a systems standpoint, this synthesis can be understood as a knowledge recombination process occurring within a complex innovation system, primarily through either radical conceptual leaps or cross-disciplinary integration. Both pathways require the transcendence of traditional disciplinary boundaries and the deep integration of heterogeneous knowledge domains. Nevertheless, although such unconventional innovation strategies may generate transformative breakthroughs [10], they exhibit unique risk profiles characterized by elevated probabilities of technological trajectory miscalibrations and increased vulnerability to systemic integration failures.
This paper argues that high novelty itself does not necessarily lead to risk; rather, risk emerges only when the uncertainty brought by novelty exceeds the capacity of systemic adaptation. In other words, innovation risk is rooted in the imbalance of “innovation tension” arising from the continuous interaction between “novelty” and “adaptation.” As illustrated in Figure 1, novelty and adaptation constitute the two core dimensions of tension in the innovation process. On one hand, the pursuit of novelty entails actively breaking through existing cognitive and technological boundaries, which often results in unclear theoretical mechanisms, highly uncertain technical paths, and unpredictable application prospects, thereby significantly elevating uncertainty levels. On the other hand, adaptation reflects the ability to resolve critical bottlenecks, evolve synchronously with existing knowledge and technology, and propose acceptable new research paradigms; its core function is to mitigate and absorb uncertainty, ensuring that innovation can survive and grow within the existing scientific, technological, and institutional ecosystems.
The specific manifestation of innovation risk depends on the dynamic equilibrium of this tension. “High novelty-high adaptation” represents the ideal breakthrough innovation model, where robust adaptation effectively neutralizes the immense uncertainty brought by high novelty, rendering the risk controllable. However, in practice, this ideal state is relatively rare. More common is the “high novelty-low adaptation” combination, in which the high uncertainty introduced by novelty cannot be offset by commensurate adaptation, leading to a severe imbalance in tension and an extremely high risk of innovation failure. Conversely, “low novelty-low adaptation” signifies a state of stagnation characterized by low risk and low impact. Therefore, high novelty itself is not a sufficient condition for risk. A case in point is AlphaFold, which not only pioneered the application of deep learning to resolve the long-standing protein folding challenge, but also developed a sequence-structure-function analytical framework that seamlessly integrated with existing structural biology paradigms [11]. Conversely, innovations exhibiting limited adaptive capacity often face stagnation when encountering theoretical incompatibilities with established frameworks or insurmountable technological obstacles. The case of copper oxide superconductors illustrates this phenomenon: while their high-temperature superconducting characteristics represented a major advance, their theoretical understanding remains constrained by both incompatibility with conventional BCS theory [12,13] and the absence of viable alternative explanatory frameworks, resulting in predominantly empirical-level progress. This fundamental tension between novelty and adaptation represents the core risk-generation mechanism underlying STI processes. Although global research has advanced in STI risk identification, investigations into the fundamental origins and systemic mechanisms of risk emergence remain insufficient.
Accordingly, this study investigates the etiological sources and fundamental mechanisms underlying scientific and technological innovation (STI) risks. Employing novelty and adaptation as key dimensions, we develop measurement methodologies to establish a comprehensive risk classification and assessment framework for STI.

2. Related Research

2.1. Measurement Methodologies for STI Risks

Risk quantification in scientific and technological innovation (STI) represents a fundamental challenge in innovation management studies. Existing measurement approaches predominantly adopt two distinct paradigms. The first paradigm relies on subjective expert evaluations, incorporating either empirical estimations, pessimistic projections, realist assessments, and minimax regret analysis to calculate innovation success probabilities and potential losses, or survey instruments and peer evaluation systems for risk appraisal [14]. Although operationally tractable, these approaches encounter valid criticisms concerning their capacity to objectively capture genuine risk magnitudes given the essential uncertainty and systemic complexity inherent to technological innovation processes [15]. The second paradigm employs quantitative analytical methods, utilizing either probabilistic modeling techniques (e.g., Bayesian networks) and multi-criteria decision frameworks (e.g., AHP), or multidimensional assessment systems incorporating technical viability, organizational capacity, financial stability, and policy alignment.
While enabling quantitative risk assessment, current methodologies remain constrained by their heavy reliance on subjective expert judgments and a narrow focus on externally observable parameters like technological maturity, market volatility, and environmental factors. More fundamentally, these approaches lack the capacity to uncover the inherent properties and underlying causal processes that characterize scientific and technological innovation risks. Furthermore, prevailing methodologies conceptualize innovation risks as static phenomena, thereby neglecting the inherently dynamic and evolutionary characteristics of innovation processes. Emerging research has introduced bibliometric measures of recombinant innovation as proxy indicators for scientific risk assessment [16,17], representing a methodological advancement in the field. Nevertheless, these unidimensional measures demonstrate limited capacity to fully characterize the multifaceted evolutionary dynamics inherent to innovation risk systems.

2.2. The Endogenous Mechanisms Underlying STI Risk Formation

2.2.1. The Intrinsic Link Between Novelty and Innovation Risk

The strong correlation between research novelty and associated risks has been empirically established across diverse scientific disciplines [6]. Novelty encompasses both the pursuit of innovative solutions and their inherent uncertainties and associated risks [16,18]. First, idea novelty correlates positively with implementation uncertainties (including usability, practicality, error-proneness, and reliability) [19], consequently elevating rejection probability. Second, championing novel ideas involves inherent risks, including implementation failure, social resistance, and indeterminate development timelines [19]. This uncertainty simultaneously increases evaluators’ cognitive load [20] while expanding risk exposure through unforeseeable outcomes.
Nerkar [21] established knowledge recombination as a fundamental mechanism for generating novelty. Building upon this theoretical foundation, we employ knowledge recombination theory to systematically analyze risk profiles across different innovation modalities. Local recombination operates within constrained knowledge boundaries, typically yielding incremental innovations but risking knowledge lock-in effects that may constrain future breakthrough potential [22]. In contrast, long-jump recombination integrates distant knowledge domains, increasing both breakthrough potential and associated risks. Such radical knowledge combinations may face lower domain acceptance when bridging insufficiently compatible knowledge systems, potentially diminishing academic impact. Such outcomes may result in either innovation failure or suboptimal performance, while potentially introducing measurable risks and safety concerns [23]. Highly novel knowledge or technologies frequently experience delayed recognition due to challenges in immediate value assessment [24]. Delayed recognition often originates from either paradigm resistance in established research traditions or the necessary temporal diffusion process for cross-disciplinary validation [25]. The theoretical framework establishes a systematic relationship between novelty magnitude and corresponding risk exposure, suggesting that novelty indicators constitute reliable predictors of innovation risk in STI research.

2.2.2. Adaptation-Level Risk Assessment in STI

The trajectory of STI inherently involves evolving risk-uncertainty dynamics, with successful outcomes contingent upon adaptive capacity [26]. Adaptation, as a fundamental interdisciplinary construct, originates from Charles Darwin’s evolutionary theory in biological sciences [27]. Within biological sciences, this construct originally characterized evolutionary adaptation mechanisms involving heritable variation and selective pressures. Herbert Spencer’s [28] “survival of the fittest” principle exemplifies this concept. However, the scope of adaptation has since transcended biological contexts, extending to complex systems research. In artificial intelligence and social organizations, for instance, systems achieve self-optimization through non-evolutionary mechanisms—such as machine learning, algorithmic refinement, and innovation—without dependence on genetic inheritance. Such dynamic responsiveness embodies the fundamental principles of adaptive systems [29].
This study conceptualizes adaptation in technological innovation as a system’s capacity for self-regulation when confronting key challenges, including theoretical verification, technology trajectory selection, value proposition evaluation, and critical bottleneck resolution. This adaptive capacity operates as a technological selection mechanism, where highly adaptable innovations demonstrate enhanced environmental responsiveness, significantly decreasing obsolescence risks. Conversely, innovations exhibiting low adaptive capacity demonstrate significantly higher failure probabilities through primary mechanisms: technological path dependence, market non-adoption, and ecosystem incompatibility. Case studies demonstrate that low-adaptation innovations often carry significant risks. The cold fusion experiments conducted by Fleischmann and Pons [30] ultimately failed to replicate conventional nuclear fusion conditions, primarily because they lacked both a robust theoretical foundation and validated experimental protocols. In contrast, Einstein’s [31] theory of relativity addressed the limitations of Newtonian mechanics in high-speed and strong gravitational fields, introducing new research questions such as spacetime curvature and black holes, thereby gaining acceptance within the scientific community. These observations reveal that technological innovation adaptation operates through two principal mechanisms: problem-solving breakthroughs and paradigm-establishing potential through novel research trajectories.
Disruptive innovation arises through the complex interplay between polyrhythmic synchronization, diachronic evolution, and strategic asynchronicity [32,33]. Successful technological innovation typically depends on the coordinated convergence of three factors: epistemic development, technological maturation, and market readiness—an intrinsically adaptive process. The evolution of quantum mechanics exemplifies this pattern—Planck’s [34] quantum hypothesis was initially met with skepticism as it fundamentally challenged classical physical paradigms. However, the field attained dynamic synchronization through Einstein’s [35] photon theory and related advances, manifested in: experimental confirmation (photoelectric effect), new theoretical formalisms (matrix/wave mechanics), and paradigm acceptance—culminating in quantum theory’s transition from heterodoxy to orthodoxy. The collapse of the Superconducting Super Collider project exemplifies systemic misalignment among technical feasibility, fiscal sustainability, and political commitment. These failures demonstrate that successful innovation must exceed a critical adaptation threshold [29], where infrastructural, economic, and institutional developments co-evolve with technological advancement to resolve innovation paradoxes.
In summary, novelty and adaptation constitute two fundamental characteristics of STI risks. Lane’s research [36] suggests that the risk of a project may arise from its novelty, feasibility, or a combination of both. This conclusion indicates that risk can be measured based on its intrinsic properties. Therefore, the following sections will analyze the intrinsic characteristics of the two core dimensions, novelty and adaptation, to provide a theoretical foundation for constructing a risk measurement indicator system for technological innovation.

2.3. Analyzing the Intrinsic Properties of Novelty and Adaptability in STI Risks

2.3.1. Novelty

The formation mechanism of novelty operates through three interdependent dimensions: knowledge recombination, knowledge variation, and interdisciplinary integration [18,37]. Among these dimensions, knowledge recombination represents the central mechanism for innovation generation, which can be classified into two fundamental categories according to recombination scope [21,38]. Local recombination drives incremental innovation by optimizing intra-domain knowledge combinations, and radical recombination enables breakthrough innovation through cross-disciplinary knowledge integration [22,37,39,40,41]. This mechanism aligns with knowledge distance theory, where proximal knowledge recombination typically generates incremental innovations, whereas distal knowledge integration more frequently induces radical innovations.
Knowledge variation, as a fundamental innovation mechanism, generates transformative knowledge through imperfect replication in technological diffusion and paradigmatic expansion of conceptual frameworks. Interdisciplinary integration and knowledge variation create heterogeneous knowledge bases that enable recombination. This process enhances knowledge diversity while inducing qualitative transformations in existing knowledge structures [42] through the destabilization of entrenched technological pathways. High-knowledge-span innovations exhibit particularly pronounced creative transformations of heterogeneous knowledge, constituting a focal point in current innovation studies [43].
Current methods for measuring knowledge recombination novelty predominantly rely on citation-scarcity metrics, which evaluate novelty based on rare journal or reference combinations [10,24,25,44]. However, such approaches fail to account for variations in citation motivations [45] or the semantic content of the texts [46]. Second, semantic-similarity-based approaches, such as text similarity analysis [47] and link prediction models [18], typically employ semantic similarity as their sole evaluation metric. These methods inadequately capture the evolutionary dynamics of knowledge networks, both intra- and inter-domain, consequently limiting their ability to detect radical innovations generated through cross-disciplinary knowledge recombination.

2.3.2. Adaptation

In complex network theory, adaptation is viewed as the potential ability or intrinsic quality of a node to attract connections, reflecting the node’s importance within the network [29,48,49]. Unlike the degree-based preferential attachment model, the adaptation model does not rely on the node’s understanding of the global network structure. Instead, it determines its ability to attract connections based on some observable features of the node itself [49]. Adaptation reflects the idea that “the higher the quality, the easier it is to connect,” rather than “the higher the degree, the easier it is to connect” [50]. Adaptation is typically defined by various statistical distributions (such as Gaussian distribution, Laplace distribution, and generalized normal distribution) to describe the probability of connections between nodes [48], indicating its diversity and plasticity, allowing for flexible modeling based on specific system characteristics.
Adaptation models can be divided into static and dynamic types. Compared to static models, dynamic adaptation models better reflect the evolutionary characteristics of real networks: as the network gradually increases in nodes and edges over time, the adaptation of a node determines its probability of forming new connections [49]. Adaptation not only reflects the tendency of nodes to acquire connections but also, to some extent, embodies the robustness of their local neighborhoods to failures [50]. Dynamic models not only generate power-law degree distributions [51] but also better capture the vulnerability and heterogeneity characteristics within the network.
In the field of technological innovation, adaptation is a key factor that determines whether an innovation can evolve from the nascent stage to widespread application. It can be understood as the potential of a topic term or technology to attract new connections in the future. Evolutionary theory emphasizes that survival and reproductive ability form the basis of adaptation [26]; complex network theory defines adaptation as the ability of a node to attract new connections (or edges), which depends on the node’s absolute quality (intrinsic characteristics), rather than its initial number of connections [50,52]; technological innovation theory focuses on the innovation’s survival potential and its effectiveness in interacting with the environment. Innovations with insufficient adaptation are often replaced because they fail to effectively solve existing problems or propose new research questions. On the other hand, innovations with high adaptation not only attract more resources but also continue to drive development. For example, Aspembitova et al. [53] classified high-adaptation nodes into two categories: one that generates a large number of innovative activities in the short term and then exits the system, typically seen as speculative participants; and another that can exist long-term and continuously drive innovation, representing influential innovation entities. This indicates that adaptation not only determines whether a node can attract more connections but also closely relates to its sustained contribution within the system. In the specific context of a knowledge network, such adaptation is concretely manifested through three dimensions: (1) local association strength and the formation of “research consensus,” whereby the internal relationships among elements within a specific knowledge combination exhibit relative advantages at the local level; (2) global structural influence and hub positioning, as highly adaptive nodes tend to possess higher eigenvector centrality, indicating that they are connected not only to many nodes but, more importantly, to other core nodes; and (3) dynamic evolution and cross-boundary connectivity potential, reflected in a node’s ability to respond to changes in the knowledge environment and establish connections with emerging themes. While existing studies reveal the multidimensional characteristics of adaptation, there is still a lack of measurement for adaptation in the context of technological innovation, highlighting the urgent need to develop quantitative indicators based on its intrinsic characteristics.
In summary, the current literature emphasizes risk characterization and external determinants, while underrepresenting measurement methodologies for endogenous risk generation mechanisms. Novelty metrics inadequately address dynamic cross-domain knowledge interactions, and adaptation quantification in technological innovation lacks systematic approaches. This study consequently develops a novel dual-dimensional framework to elucidate intrinsic STI risk formation pathways and establish dynamic evaluation metrics for risk evolution analysis.

3. Research Framework and Methods

Our analytical framework comprises four interconnected modules (Figure 2). Module 1 focuses on the theoretical foundation by examining the core attributes of novelty and adaptation, thereby establishing the conceptual basis for metric development. Module 2 develops the structural framework by constructing the knowledge network architecture and detecting topic term recombination patterns, which enables differentiation between two distinct recombination types. Module 3 involves indicator construction, quantifying novelty and adaptation to assess STI risks. Here, adaptation is characterized by eigenvector centrality and co-occurrence frequency, while novelty is measured by topic similarity, cosine distance, and inter-topic distance. Module 4 is the empirical analysis, employing a macro-meso-micro hierarchical analytical framework. Specifically, the macro level categorizes domain development phases, the meso level tracks topic evolution paths, and the micro level calculates the novelty and adaptation of innovative phrases. Based on this, a Z index is constructed, while a validity analysis verifies the scientific rigor and reliability of the methodology. The four modules are executed sequentially, ultimately enabling a multidimensional analysis of STI risks.

3.1. Theoretical Framework: A Dimensional Analysis of STI Risk

Novelty and adaptation are two key dimensions for measuring technological innovation [6,26], and together they determine the risk characteristics and development trajectory of innovation activities. From a macro perspective, technological innovation presents a complex network structure with multiple intersecting paths, including exploring unknown domains, interdisciplinary integration, and technological breakthroughs (as shown in Figure 3). In this process, emerging nodes and connections continuously emerge, while redundant and low-adaptability nodes gradually exit; however, some nodes may become active again. In different paths, the survival of innovation depends on whether it can achieve adaptation based on novelty. Figure 2 illustrates the three main paths from innovation generation to final survival.
The first mechanism is synchronicity, which refers to the coordinated development of theory, technology, and validation. Innovation needs to achieve a technological breakthrough and gain recognition from the academic community at the right time to sustain progress. Without these conditions, even innovations with high novelty may stagnate due to a lack of validation and support (as shown in A3 in Figure 2).
The second mechanism is the breakthrough of technological bottlenecks. When innovation can effectively address long-standing technological or theoretical challenges, it demonstrates strong survival and evolutionary capabilities. For example, GPT-based models have overcome the limitations of traditional natural language processing in long-range context modeling, coherence generation, and scalability [54], significantly improving technical performance and redefining the possibilities of human–computer interaction, thus reflecting high adaptation.
The third mechanism is the proposal of new research questions. These innovations do not directly solve practical problems but expand cognitive boundaries and stimulate new academic issues, gradually building a new knowledge system. Take the gene-editing technology CRISPR, for example; it not only solved the problem of targeted gene modification but also triggered a series of new scientific questions regarding ethics, off-target effects, and regulatory mechanisms [55], driving a shift in research paradigms. Such innovations are often overlooked but play a key role in scientific progress.
Beyond these three mechanisms, there are still many innovative attempts that fail to survive. These aborted paths (such as D and E in Figure 2) reflect the risks caused by insufficient adaptation in the innovation process. For example, some concepts may be phased out due to theoretical ideas being ahead of their time, key technologies not yet having achieved breakthroughs, or a lack of academic consensus. A typical case in reality is the early artificial intelligence research, which repeatedly fell into development valleys due to limitations in computing power and data resources. These “failure to adapt” processes essentially represent the inherent risks in technological innovation. In other words, risk is the external manifestation of the tension between novelty and adaptation: the higher the novelty and the greater the breakthrough, the more difficult it is to adapt to the environment and the greater the uncertainty; whereas the lack of adaptation makes innovation more likely to fail in its evolution. Therefore, the risk of technological innovation fundamentally arises from the internal tension between novelty and adaptation, and its measurement requires a careful consideration of the dynamic balance between the two.
From the perspective of intrinsic mechanisms, novelty primarily results from the recombination of knowledge at varying distances, while adaptation is determined by the inherent characteristics of the innovation. Based on this theoretical framework, this study will construct a novelty indicator by quantifying knowledge distance and assessing adaptability through node features, thereby enabling a systematic evaluation of STI risks.

3.2. Framework Construction

Theoretical research indicates that there are two fundamentally differentiated innovation paths in the process of technological evolution: one is exploratory innovation, which creates new technological possibilities through distant knowledge recombination; the other is exploitative innovation, which achieves incremental optimization through localized knowledge recombination [56]. Based on the significant differences in knowledge characteristics between these two innovation paths, this paper classifies knowledge recombination patterns into two types: emerging phrases and reinforced phrases. This classification helps in comparing the potential differences in risk distribution and technology lifecycle between the two, while also revealing the differentiated mechanisms of action of these recombination patterns in the innovation process. To identify these two innovation patterns, this paper uses a complex knowledge network as the analytical foundation.
First, the entire research cycle is divided into four continuous phases ( P 1 , P 2 , P 3 , and P 4 ), as shown in Figure 4. Second, topic terms, as the basic units of knowledge, efficiently condense the core content of events and provide structured knowledge representation, making them suitable for characterizing the fundamental objects of knowledge recombination. Then, for the topic term extraction method, this paper adopts the BERTopic model, which, based on pre-trained language representations, offers higher clustering accuracy and efficiency compared to traditional topic models (e.g., LDA).
On this basis, this paper constructs a cross-phase co-occurrence network to depict the dynamic evolution characteristics of the knowledge structure. Co-occurrence refers to the simultaneous appearance of two or more topic terms within the same context (e.g., a research article), and quantifies the strength of associations between knowledge units. To provide an intuitive representation, Figure 4 illustrates how co-occurrence is visualized in the network: the nodes represent topic terms, while the edges denote topic term combinations (e.g., edge a (B, H) represents a combination formed by topic terms B and H). The thickness of the edges reflects the co-occurrence frequency, with thicker edges indicating higher frequencies. Building on this representation, this paper identifies three types of co-occurrence relations based on the topic terms extracted by the BERTopic model in each phase, namely: (1) intra-topic co-occurrence (e.g., g, h in Topic 3 of Figure 4), revealing the internal structure of a single topic; (2) inter-topic co-occurrence within a phase (e.g., the connection c between Topic 2 and Topic 3), exploring associations between different topic; (3) cross-phase co-occurrence (e.g., the connections d, e, f between stages P 2 and P 3 ), tracking the dynamic evolution of knowledge. By comparing the topic term co-occurrence networks of adjacent development phase P t 1 and P t (t = 2, 3, 4), emerging phrases and reinforced phrases are identified.
Emerging phrases are defined as topic term co-occurrence connections that were not present in the previous phase but first appear during phase transitions. Specifically, for adjacent phases P t 1 and P t , three types of topic term connections formed in phase P t are identified: intra-topic connections (e.g., i and k in Figure 4), cross-topic connections (e.g., j), and cross-phase connections (e.g., d, e, f). These are then compared with the intra-topic connections (g) and cross-topic connections (c) already present in phase P t 1 . New connections that do not appear in phase P t 1 are identified as emerging phrases. Based on their combinatory characteristics, emerging phrases are classified into two types: recombination (new combinations of existing topic terms, e.g., b and e in Figure 4) and variation (combinations formed by introducing a new topic term, e.g., a and d).
Reinforced phrases are co-occurrence combinations that are significantly enhanced within existing relationships. These are topic term combinations that co-occur in phase P t 1 with low frequency, but their co-occurrence frequency significantly increases in phase P t (e.g., c, f).
The rationale behind these threshold choices is to ensure alignment with the theoretical concepts being measured. For ‘emerging phrases,’ which signify exploratory innovation, we deliberately imposed no minimum co-occurrence frequency threshold, prioritizing structural novelty over scale to avoid systematically omitting nascent but potentially high-impact combinations. For ‘reinforced phrases,’ indicative of exploitative innovation, we focused on relative growth in co-occurrence strength rather than absolute values to circumvent biases from inter-phase scale differences.
By distinguishing these two knowledge recombination patterns, this study provides a theoretical basis and analytical framework for comparing their differences in novelty, adaptation, risk characteristics, and dynamic mechanisms within the technology life cycle. This enables the effective identification of the dynamic evolution of STI risks.

3.3. Indicator Construction

3.3.1. Novelty Indicators

This paper draws on several studies in the construction of novelty indicators. The CD index proposed by Funk et al. [57] quantifies the dynamism of technological change by measuring knowledge distance in citation networks [6], but due to its reliance on citation data, it cannot be directly applied to text-based topic word analysis. Therefore, this paper refers to the cluster connection model (such as jumps, new integrations, new bridges, etc.) proposed by Foster [4], clustering topic terms into different topics and analyzing both intra-topic and inter-topic connections. Additionally, it incorporates Kaplan’s [22] theory of local recombination and long-range jumps, emphasizing the role of distant knowledge recombination in disruptive innovation.
In addition, novelty can be measured from two perspectives: proximity and frequency. The former quantifies novelty by calculating the distance between a given idea and existing ideas, while the latter measures its novelty by counting the frequency of occurrence of a particular idea or its components [58]. Bavato’s [59] research emphasizes that when operationalizing novelty, it is necessary to more clearly integrate the temporal dimension.
Based on this concept, this paper constructs the novelty indicator by considering both proximity and frequency perspectives: In the proximity dimension, the semantic similarity between topic term and the semantic distance between topic are used to characterize the diversity of knowledge combinations (Formulas (1)–(3)); in the frequency dimension, emerging phrases and reinforced phrases are identified to reflect the dynamic intensity of knowledge recombination. At the same time, this paper introduces the time division of development stages ( P 1 P 4 ), integrating these two perspectives with the temporal dynamics of knowledge evolution, thereby providing a more comprehensive measure of novelty.
The intra-topic novelty metric N T 1 is formally defined as:
N T 1 = 1 i = 1 ,     j = 1 i = m , j = n S i , j M 1 S i , j
where i and j represent topic term, S i , j denotes the cosine similarity of the phrase pair (i, j), and M indicates the total number of possible intra-topic term combinations. The denominator term ( i = 1 , j = 1 i = m , j = n S i , j ) / M reflects the overall semantic coherence of the topic, where higher values indicate more similar knowledge combinations within the topic. The multiplier term (1 − S i , j ) measures the semantic heterogeneity of specific phrase pairs. When the overall intra-topic similarity is high while the similarity between specific phrase pairs is low, such combinations are considered to possess a high innovation value. This formula simultaneously accounts for both the stability of intra-topic knowledge structures and the potential for unconventional combinations.
The inter-topic novelty metric N T 2 is computed through two complementary formulations:
N ( T ) 2 = e D p , q ( 1 S i , j )
D p , q = 1 S p , q = 1 ( i = 1 , j = 1 i = m , j = n S i , j ) N
In formulation, p and q denote topics, while i and j denote the topic terms belonging to these topics. N indicates the total number of possible inter-topic term combinations. The similarity S i , j is calculated between a topic term from topic p and a topic term from topic q. By averaging all pairwise similarities of the topic terms (i, j) across the two topics, we obtain S p , q , which reflects the overall similarity between topic p and q. Finally, the distance is defined as D p , q = 1 − S p , q . Specifically, when S p , q approaches 0 (i.e., D p , q approaches 1), it indicates significant divergence in the knowledge bases between topics, suggesting that such combinations often yield breakthrough innovations. The introduction of the exponential function [60] ensures full representation of cross-disciplinary combinations’ innovative contributions. This design aligns with the core tenet of complex innovation theory—“long-distance recombination tends to generate breakthroughs”—while addressing traditional methods’ inadequacies in identifying cross-disciplinary innovations.

3.3.2. Adaptation Indicators

The key to quantifying adaptation lies in how to translate the concept of “node absolute quality” from complex networks into operable indicators within the technological innovation knowledge network. Glor [29] proposed that adaptability is essentially the intrinsic value or potential influence of a node, while Bianconi et al. [61] introduced the adaptation-linkage co-evolution model, which emphasizes that a node’s adaptation is reflected not only in its inherent attributes but also in its ability to attract new connections during network evolution. This innovative application of complex network analysis methods to the study of technological innovation knowledge networks is based on this theory.
Drawing on this approach, this paper argues that in the technological innovation knowledge network, the adaptation of a topic term combination should simultaneously reflect both its local association strength and its global structural influence. The strength of local association is measured by the co-occurrence frequency between topic terms [62], which reflects the degree of coupling of innovative knowledge elements in actual research and corresponds to the “tendency to connect” in adaptation theory. The higher the co-occurrence frequency of a given topic-term combination, the more widely it is adopted in the current literature, indicating the academic community’s recognition and reinforcement of its association. Frequent co-occurrence implies a form of “research consensus”, meaning that the technological or conceptual pathway represented by the combination is being extensively validated and applied.
The influence of the global structure is measured by eigenvector centrality, which captures the position and importance of a topic term within the entire knowledge network. The higher the eigenvector centrality, the more likely a node is to occupy a core or hub position within the network, making it easier to diffuse influence, connect different research paths, and act as a bridge at the intersections of multiple knowledge streams. This positional advantage enables the integration of heterogeneous knowledge, forming new cognitive combinations or perspectives, thereby providing key conditions for overcoming technological bottlenecks or generating research agendas. Thus, eigenvector centrality not only reflects cross-domain influence and knowledge integration capability but also determines a node’s survival potential in long-term evolution. This characteristic corresponds to the “structural influence” emphasized in adaptation theory. This method not only avoids the local bias caused by relying solely on co-occurrence frequency but also overcomes the issue of neglecting insufficient actual knowledge coupling when only relying on network topology characteristics. The adaptation calculation formula constructed in this study is as follows:
A d a p t a t i o n i j = f i j max f i + f i j max f j 2 + C E i + C E j max C E i + C E j
where i and j represent topic term, f i j denotes the co-occurrence frequency of the topic term combination (i, j), f i represents all co-occurrence frequencies in the row of topic term i in the co-occurrence matrix, and max ( f i ) refers to the maximum co-occurrence frequency within the row of i. Similarly, f j represents all co-occurrence frequencies in the column of topic term j in the co-occurrence matrix and max( f j ) refers to the maximum co-occurrence frequency within the column of j. C E i ,   C E ( j ) represent the eigenvector centralities of topic terms i and j.
The left part of the formula ( f i j max f i + f i j max f j )/2 evaluates the actual association strength of topic term combinations in a given phase. This component reflects the core role of linkage numbers in nodal adaptation. The right part C E i + C E ( j ) m a x ( C E i + C E j ) dynamically captures the influence of topic terms during structural changes in the network, aiming to compensate for the limitation of co-occurrence analysis in overlooking global network topological properties.

4. Empirical Analysis

4.1. Data Collection and Preprocessing

Natural Language Processing (NLP), a core artificial intelligence discipline, synthesizes knowledge from linguistics, computer science, and machine learning [63]. To ensure high-quality data sources, this study utilizes papers obtained from the ACL Anthology website (https://aclanthology.org/ (accessed on 26 September 2024)). From the many conferences available, we specifically selected four premier conferences—ACL, EMNLP, NAACL, and COLING—spanning 1969–2024. We collected 39,575 initial records from the ACL Anthology. After data cleaning, 38,568 valid records were obtained. We selected paper abstracts (summarizing core concepts) and titles (precisely capturing research focus) as textual corpora, enhancing data quality through preprocessing, including stopword removal and lemmatization. Based on the technological evolution of NLP, we divided the data into four phases: Phase 1 (1969–1984), Phase 2 (1985–1999), Phase 3 (2000–2018), and Phase 4 (2019–2024). The rationale for this specific periodization is grounded in the major paradigm shifts within the NLP field, which are characterized by fundamental changes in core methodologies and research focuses. Specifically, Phase 1 (1969–1984) represents the foundational period dominated by rule-based and symbolic approaches, heavily influenced by formal linguistic theories. The transition to Phase 2 (1985–1999) is marked by the rise and eventual dominance of statistical methods and machine learning models (e.g., HMMs, maximum entropy models), signifying a shift from hand-crafted rules to learning from corpora. The boundary of Phase 3 (2000–2018) captures the era where shallow machine learning methods (e.g., SVMs) matured and were subsequently revolutionized by the widespread adoption of deep learning (e.g., word embeddings, sequence-to-sequence models, attention mechanisms). Finally, Phase 4 (2019–2024) is delineated by the paradigm-shifting impact of large-scale pre-trained language models (e.g., BERT, GPT series), which introduced the current era of generative AI and transfer learning as the dominant framework for NLP research. This periodization scheme is designed to align with these critical technological inflection points, thereby enabling a meaningful analysis of the evolution of research themes and trends across distinct eras of the field.
In designing this study, we critically evaluated the strengths and limitations of both the data selection strategy and the adoption of NLP-based analytical methods. Focusing on four premier conferences (ACL, EMNLP, NAACL, and COLING) ensures the authoritativeness and representativeness of the corpus, while the use of titles and abstracts facilitates the extraction of core research themes suitable for large-scale topic modeling. However, this strategy may introduce publication bias by underrepresenting niche innovations from workshops or non-English venues, and the phase division—although aligned with major paradigm shifts—may simplify the inherently continuous nature of technological evolution. The primary advantage of NLP-based analysis lies in its scalability and efficiency, enabling systematic processing of nearly 40,000 records while reducing subjective bias inherent in manual reviews. Basic preprocessing steps, such as stopword removal and lemmatization, balance semantic retention and computational feasibility. Nevertheless, automated methods may struggle to capture subtle linguistic nuances, polysemy, or context-dependent meanings, and the results remain constrained by the performance of extraction algorithms. Despite these limitations, the constructed corpus and NLP framework provide a robust foundation for identifying macroscopic evolutionary patterns and trends within the field.
At the macro level, Figure 5 demonstrates the historical development and possible future trends of the NLP field by combining historical data with model-based projections. The trend forecast in Figure 5 is based on a logistic growth model, which is widely used in technology life cycle analysis to capture the typical S-shaped evolution of emerging technologies. In this study, the annual number of NLP-related publications is employed as a proxy for technological activity intensity. A nonlinear logistic function is fitted to the historical publication data using least-squares optimization, enabling the identification of key parameters such as the growth rate and the inflection point of the field’s development trajectory.
The fitted logistic curve serves two purposes: first, to characterize the long-term evolutionary pattern of NLP as it transitions from an exploratory phase to accelerated growth and gradual maturation; second, to provide a coarse-grained extrapolation of near-future trends. The trend forecast is constructed based on historical data models and is intended to provide a possible reference for the field’s development paths, rather than a deterministic prediction of the future. The figure clearly illustrates multiple technological paradigm shifts (e.g., from rule-based methods to statistical learning, deep learning, and large language models) and highlights key technological breakthroughs in each phase (e.g., Word2Vec (Gensim v4.3.3), BERT (Hugging Face Transformers v4.45.2), ChatGPT (OpenAI GPT-4)). Notably, the 2022 release of ChatGPT marked a critical inflection point, signifying NLP’s gradual transition into maturity and forecasting stabilized domain development. Through case study analysis, this study outlines how NLP’s evolution exhibits characteristic “innovation-bottleneck-breakthrough” cycles, where each technological iteration coincides with methodological or technical breakthroughs while encountering new risk constraints.
The meso-scale topic evolution (Figure 6) exhibits clear technological succession, notably the paradigm shift from statistical to neural machine translation architectures. These topics were extracted from the above textual corpus using the BERTopic model, which clusters the documents into latent semantic topics for each phase and selects representative topics to serve as interpretable labels for visualization. The bandwidth in the Sankey diagram represents the relative strength of a topic’s continuation, evolution, or transition across different phases. A wider band indicates that the topic accounts for a larger share of research in adjacent phases, reflecting greater continuity or influence. The colors in the Sankey diagram are primarily used to distinguish different categories or technological directions of topics. The number of topics increases across phases: in the early phase, topics focus on grammar parsing and language knowledge; in the middle phase, machine translation emerges as the core topic; and in the later phase, topics clearly diversify, covering dialogue systems, visual-multimodal learning, and document generation, overall reflecting the evolution of technological integration, expansion, and differentiation.

4.2. Identification of Topic Term Recombination in Knowledge Networks

At the micro-topic phrase level, emerging and reinforced phrase clusters are identified using the methodologies described in Section 3.3.1 and Section 3.3.2. We implement dynamically adjusted thresholds based on long-tail theory for sparse datasets. Phase P 2 employs lower thresholds (Table 1) to detect low-frequency emerging associations with innovative potential using lenient criteria. For instance, in P 2 , a phrase combination appearing less than 10 times in P 2 but at least 20 times in P 3 is considered reinforced, reflecting its emergence in P 2 and subsequent consolidation in P 3 . Later phases ( P 3 P 4 ) increase thresholds per the Pareto principle, applying stringent high-frequency filters to reduce noise and identify significant phrase combinations.
The specific thresholds (e.g., <10 to >20) operationalize this by defining a significant relative increase, while their dynamic adjustment across phases (lower in P 2 , higher in P 3 P 4 ) reflects the evolving nature of knowledge consolidation in a field focused on cutting-edge research. To assess the robustness of these threshold selections, we conducted a sensitivity analysis by testing alternative threshold sets (e.g., for P 2 : f i < 5 / f i > 15 and f i < 15 / f i > 25 ; for P 3 / P 4 : f i < 15 / f i > 30 and f i < 25 / f i > 50 et al.). The results demonstrated that the core statistical patterns remained stable: (1) the identified sets of ‘emerging phrases’ and ‘reinforced phrases’ under different thresholds showed high overlap (Jaccard similarity coefficients > 0.75), and (2) the fundamental relationship where ‘emerging phrases’ vastly outnumbered ‘reinforced phrases’ across all phases was consistently preserved. This indicates that our primary findings regarding the distribution and dynamics of the two recombination patterns are robust and not unduly sensitive to the specific threshold values chosen.
Emerging phrases substantially outnumber reinforced ones (Table 1), consistent with ACL Anthology’s focus on cutting-edge research in top conference papers. Although the study covers four phases ( P 1 to P 4 ), due to the use of an adjacent-phase comparison mechanism, phase P 1 serves only as the initial knowledge foundation, and its identification results are mainly reflected in the phases P 2 , P 3 , and P 4 . These clusters of topic terms form the basis for analyzing recombination features like novelty and adaptation.

4.3. Novelty and Adaptation Calculation and Results Analysis

Building on the methodologies established in Section 3.3, we compute novelty and adaptation metrics for recombinant term clusters. Through iterative experimental validation, we set the number of extracted topic terms per topic to 10 (top_n_words = 10), which balances computational efficiency with accurate topic representation. Here, M denotes the number of intra-topic combinations among the 10 topic terms within the same topic, excluding repetition and self-combination, which yields M = C (10, 2) = 45. N denotes the number of inter-topic combinations, where 10 topic terms from one topic are paired with 10 topic terms from another topic through the Cartesian product, resulting in N = 10 × 10 = 100. Based on the calculated novelty and adaptation metrics, this section systematically analyzes the distribution patterns and technological implications of innovation combinations. First, we analyze the novelty and adaptation distributions of topic term combinations (Figure 6). Emerging phrases and reinforced phrases show statistically significant differences in both novelty and adaptation metrics, which can be observed in the subsequent analysis. Emerging phrases exhibit long-tail distributions in both novelty and adaptation values, as illustrated in Figure 7, consistent with the power-law characteristics commonly observed in scientific and technological innovation processes. Specifically, a few high-frequency combinations dominate the distribution head, whereas the tail comprises numerous low-frequency combinations. In contrast, reinforced phrases exhibit more concentrated distributions with consistently lower novelty scores, yet their adaptation values significantly exceed their novelty measures. This suggests that reinforced phrases predominantly follow technological trajectories characterized by incremental optimization of existing paradigms, rather than radical breakthroughs.
Based on the two core dimensions of novelty and adaptation, this study constructs an innovation risk assessment system that integrates both dynamic and static perspectives. On the static level, innovation is instantaneously classified using a four-quadrant matrix (Figure 8), identifying its basic attributes and risk-reward characteristics. On the dynamic level, by analyzing the distribution evolution of various types of innovations across different technology lifecycle stages ( P 2 , P 3 , P 4 ), the changes in risk thresholds (Z-index thresholds), and the stage-specific variations in the relationship between novelty and adaptation, this system tracks and predicts the innovation risk trajectory. The aim of this system is to provide a theoretical and practical reference framework for research management, R&D investment, and technology planning.
Based on the quantitative analysis of novelty and adaptation, this paper constructs a four-quadrant innovation classification matrix (see Figure 8). Both axes are normalized (0–1), and the distinction between “high” and “low” is made relative to the midpoint of the scale. Technological innovations are thus categorized into four types: High Novelty-High Adaptation (HN-HA), High Novelty-Low Adaptation (HN-LA), Low Novelty-High Adaptation (LN-HA), and Low Novelty-Low Adaptation (LN-LA). The x-axis represents the novelty of the research, measuring the degree to which the innovation deviates from the existing knowledge system in the research field or discipline, reflecting its originality and theoretical breakthrough. The y-axis represents the adaptation of the innovation, reflecting its potential in practical applications, technological improvements, and solving real-world problems, as well as its compatibility with the existing technological system in a specific domain. With this framework, this paper systematically identifies the risk characteristics of different types of innovations and further explores their potential returns and evolutionary paths.
Quantitative analysis (Figure 9) demonstrates that emerging phrases span all four innovation types with distinct distribution patterns: HN-HA combinations are the least prevalent, whereas LN-LA dominates and exhibits progressive growth across the technology lifecycle. This distribution pattern emerges because technological paradigm maturation leads researchers to prioritize methodological refinements and verification studies over high-risk novel explorations, thereby accumulating LN-LA clusters. In P 2 , the relatively small number of high-novelty combinations may give the impression that novelty was lowest in this phase, but this actually reflects differences in publication scale rather than a lack of innovative breakthroughs. In fact, this period witnessed seminal high-novelty explorations such as statistical machine translation (e.g., the IBM Models, a series of early probabilistic models for translation developed at IBM Corporation in the late 1980s) and corpus-based part-of-speech tagging methods, showing that P 2 was not devoid of novelty, but that groundbreaking innovation was concentrated in a few pivotal combinations. In contrast, reinforced phrases predominantly cluster in low-novelty categories (LN-LA; LN-HA), exhibiting significantly higher density than emerging phrases. Their adaptation metrics show a significant lifecycle decline, as continuous optimization (e.g., hyperparameter tuning) yields diminishing returns when technical improvement space saturates, ultimately causing systemic performance degradation.
This study further analyzes the novelty and adaptation values of emerging phrases in Table 2, along with a semantic analysis of specific topic term combinations and their technical context. It reveals differences in technological pathways, application effectiveness, and development potential across various types of innovation.
(1)
High Novelty-High Adaptation (HN-HA)—Breakthrough Innovation
This type of innovation effectively addresses long-standing key bottlenecks in the field or opens up new research paradigms, while being highly synchronized with current technological trends and societal needs. For example, the P 3 phase topic term combination “entity, recognition named” (named entity recognition, a label assigned based on the extracted topic, their associated topic terms, and the research context) was formed by terms originating from Topic 10 (entity) and Topic 18 (recognition named) of P 3 phase. This combination effectively solved the problem of automatically extracting key information from unstructured text, providing the core foundation for knowledge base construction, intelligent search, and semantic understanding, and giving rise to a series of new research directions, such as relation extraction and entity linking. The P 4 stage topic term combination “hate, speech” (hate speech detection) directly addresses the urgent need for content governance on social media. While improving automation in identification, it also raises important new issues such as algorithmic fairness, interpretability, and cross-cultural detection. Although the research and development process for such innovations is challenging and has a high failure rate, their success brings significant rewards, with the main risk lying in the uncertainty of the technological path.
(2)
High Novelty-Low Adaptation (HN-LA)—Prospective Exploration
This type of research is highly forward-thinking but lacks clear application scenarios, technical support systems, or academic consensus, making it difficult to validate and promote within the current research paradigm. For example, the P 2 stage topic term combination “ellipsis, machine translation” explores advanced linguistic phenomena without effectively solving basic language generation problems, thus disconnected from a usable technological environment. The P 3 stage topic term combination “semantic role labeling, topic” attempts to forcibly merge two different levels of technologies, which is theoretically novel but has limited practical effectiveness and application value (low adaptation). The P 4 stage topic term combination “audio, toxicity” involves cross-modal and sensitive content processing but faces limitations such as scarce labeled data, inconsistent evaluation standards, and narrow practical demand. The majority of HN-LA research tends to fail, but a small fraction may evolve into disruptive HN-HA innovations once conditions mature (such as the early research on neural networks).
(3)
Low Novelty-High Adaptation (LN-HA)—Applied Innovation
This type of innovation does not pursue theoretical breakthroughs but focuses on applying mature technologies to new scenarios or optimizing and refining them, emphasizing the continuous fulfillment of practical needs and the expansion of related research areas. For example, the P 2 stage topic term combination “machine, translation” (machine translation) continuously improved coverage and accuracy during the era dominated by rule-based methods, effectively enhancing system performance and demonstrating high adaptability. The P 3 stage topic term combination “answer, question” (question-answering systems) utilized existing machine learning methods to improve information retrieval and automated processing capabilities, and gave rise to new issues such as multi-turn dialogue and cross-lingual question answering. The P 4 stage topic term combination “bert, model” (BERT model), although no longer novel after 2019, still significantly advanced subfields such as model compression, interpretability, and prompt engineering through its adaptation and optimization in downstream tasks. Innovations of this type have clear technological paths and relatively low implementation difficulty, but often face intense competition.
(4)
Low Novelty-Low Adaptation (LN-LA)—Declining Research
This type of innovation typically involves repetitive exploration of outdated methods or low-value patches to marginal problems, neither addressing core challenges nor attracting practical application or community attention. For example, the P 2 stage topic term combination “extension, rule” made minor improvements to rule-based systems despite the superiority of statistical methods, failing to enhance generalizability or practical value. The P 3 stage topic term combination “bilingual, entity recognition” is an overly niche research direction with limited broad demand, lacking significant impact. The P 4 stage topic term combination “method, transformer model” is very broad and vague, potentially representing minor, insignificant modifications to the “transformer”, which, although classified as HN-HA when the “transformer” was first published, has now degenerated into LN-LA through simple applications. Innovations of this type carry high latent risks. Investing in such research entails significant opportunity costs, wasting time and resources without accumulating competitive knowledge or technological advantages.
To provide a clear practical reference, Table 3 synthesizes key information for the above four types of innovation, including their core characteristics and risk-return profiles, representative cases (from the sample papers), and corresponding strategic recommendations.

4.4. Effectiveness Validation

Principal Component Analysis (PCA) is not simply the averaging of indicators, but rather involves performing eigenvalue decomposition on the covariance matrix of the original indicators to identify directions that explain the maximum variance in the original data. The weights are automatically determined based on the correlation and variance structure of the variables in the data. This method effectively eliminates potential multicollinearity issues between variables, avoiding the double-counting of information. It also retains the key information embedded in their joint variations during the dynamic evolution process, capturing the synergistic effects between the variables and their overall impact on innovation potential and risk levels. Principal Component Analysis (PCA) derives orthogonal components along axes of maximal variance, resolving multicollinearity among variables while retaining the majority of original data variance. To quantify the dynamic interplay between novelty and adaptation in innovation evolution, we developed a PCA-derived composite Z index integrating both dimensions, where higher values indicate greater innovation risk. The statistical results in Figure 10 show that the Z index of emerging phrases is higher than that of reinforced phrases, indicating that their overall performance in both the novelty-adaptation dimensions is more significant, reflecting higher uncertainty and risk. The factor loading coefficients, which reveal the relative contributions of novelty and adaptation to each principal component, are provided in Appendix A (Table A1). The results indicate that novelty and adaptation exhibit distinct factor structures across phases. In P 2 , the two dimensions load oppositely on the second component, highlighting their tension. In P 3 , they show balanced and partly synergistic loadings, indicating a transitional stage of coordination. By P 4 , novelty dominates the first component while adaptation defines the second, suggesting their differentiation into relatively independent dimensions. Overall, novelty captures the exploratory value of knowledge recombination, whereas adaptation reflects the system’s adjustment capacity, and their dynamic interplay reveals the shifting balance between exploration and adaptation in transformative STI.
Our validation framework combines two complementary methods: Spearman’s rank correlation between novelty/adaptation measures and Z-scores for nonparametric assessment, and random forest regression for predictive validation. Spearman’s coefficient offers robust correlation estimates via rank transformation, effectively capturing nonlinear patterns in long-tail distributed phrases. The random forest model, with its ability to automatically handle variable interactions and provide feature importance ranking, was also employed. Spearman’s coefficient supplements random forest by providing statistical significance testing, whereas random forest extends beyond Spearman’s monotonicity constraint to detect complex patterns. The dual validation strategy overcomes the strict distribution assumptions of traditional statistical tests while addressing the interpretability issues of machine learning models. The experimental results are summarized in Table 4.
In emerging phrases, the Spearman test shows a stable positive rank correlation between novelty and the Z-score, and a significant negative correlation between the adaptation and the Z-score (p < 0.001) (Figure 11). This association pattern uncovers the intrinsic mechanism of innovation risk in emerging phrases: technologies with high novelty (e.g., the introduction of the transformer architecture) often create value by breaking existing technological paradigms, but are typically associated with higher risks due to unverified pathways or uncertainties in implementation. In contrast, low adaptation reflects an incomplete theoretical framework and technical standards in technological development (e.g., early model compatibility issues with “transformer”), failing to address existing bottlenecks or propose new research directions, thus significantly increasing innovation risks. This finding is consistent with the theoretical expectation that high innovation risks are typically accompanied by high novelty and low adaptation. Feature importance analysis using the random forest model further supports this conclusion by showing a balanced contribution distribution between the two indicators, with no systemic bias in the construction of the Z-index. Moreover, the results across all three phases show good stability (correlation coefficient fluctuations < 0.03), confirming the robustness of the method.
In reinforced phrases, the novelty indicator exhibits a significant bimodal correlation with the Z-score (Figure 12). To gain deeper insights into its underlying patterns, we employed a data-driven segmentation approach. By minimizing the sum of squared errors (SSE), the optimal splitting threshold was determined to be Z = 0.45, which ensures an objective and statistically grounded threshold selection, thereby avoiding biases from arbitrary cutoffs. Based on this threshold, the data were divided into low-risk (Z 0.45) and high-risk (Z > 0.45) intervals. The analysis shows that the low-risk interval exhibits an inverted U-shaped relationship, whereas the high-risk interval displays a U-shaped distribution. The adaptation metric maintains a robust inverted U-shaped correlation across the entire dataset (p < 0.001). It should be noted that this analysis explores cross-sectional patterns across the entire dataset and does not segment the data according to technological development stages. Segmenting by stage would significantly reduce the sample size within each stage, making it difficult to establish stable statistical relationships. The overall analysis allows us to capture the systemic patterns of reinforced phrases in the technological innovation ecosystem: the inverted U-shaped curve in the low-risk interval reflects common characteristics of mainstream technological trajectories, while the U-shaped distribution in the high-risk interval reveals the distinctive risk patterns of peripheral technological combinations.
Because the previous analysis of the relationships between novelty and adaptation with the Z index was conducted on the aggregated dataset across all phases, it could not capture the dynamic characteristics of risk standards at different phases. Therefore, we adopt a phase-specific approach and calculated the high-risk threshold separately for each phase, using the 80th percentile (results are shown in Table 3). The results indicate that the high-risk threshold for emerging phrases dynamically evolves across technological development phases. During P 2 , lower thresholds reflect academia’s tolerant stance toward theoretical innovation—when technical implementation barriers were low, but validation methods were insufficient, with risks primarily stemming from uncertain path selection and academic community acceptance. As development progressed to P 3 , threshold elevation arose from stricter verification standards. Novel technologies required rigorous theoretical scrutiny and experimental replication, indicating higher costs for technical validation and resource investment. By P 4 , with established dominant paradigms like “transformer” or “BERT”, innovative combinations needed both compatibility with existing theoretical frameworks and stringent validation, where their innovative value had to withstand rigorous scientific verification. The continuous rise in emerging phrases’ high-risk thresholds demonstrates risk reduction throughout the technology lifecycle, attributable to stricter standards effectively filtering low-quality innovations, ultimately yielding solutions and paths with higher reliability and maturity.
Unlike the continuously increasing trend observed for emerging phrases, the high-risk threshold of reinforced phrases exhibits a non-monotonic pattern (see Table 3). This variation reflects how the risk logic of incremental knowledge recombination adjusts with the evolution of technical paradigms. In phase P 2 , NLP was still in an exploratory stage, characterized by diverse theories and dispersed technical paths; thus, the reinforcement of weakly associated combinations signaled potential academic value, resulting in a relatively high threshold. In phase P 3 , the statistical machine learning paradigm became established, and knowledge recombination mainly involved incremental optimization, leading the high-risk threshold to decline to 0.509, reflecting the academic community’s greater tolerance for minor innovations. In phase P 4 , the large-model paradigm became highly concentrated, requiring reinforced combinations to achieve significant improvement to stand out; the threshold sharply rose to 0.639, indicating the high-risk barrier for incremental innovation under a dominant paradigm.
To further illustrate these patterns, Figure 13 and Figure 14 compare the dynamic risk distributions of emerging and reinforced phrases across the phases, highlighting significant differences between the two innovation types. Each point in the figures represents an innovation combination, with color intensity determined by the Z-score, and the reference lines at Z = 0.3, 0.5, and 0.7 are set to represent low, medium, and high risk levels, respectively, in order to observe the changes in risk throughout the phase evolution process.
Emerging phrases display clear gradient risk patterns. HN-LA combinations show the highest risk, indicating frontier technologies (e.g., early “transformer” applications) face uncertainty due to unverified stability; LN-HA combinations exhibit the lowest risk, corresponding to mature technologies (e.g., rule-based text classification) stabilized through extensive validation; while HN-HA and LN-LA combinations demonstrate intermediate risks, reflecting the dynamic balance between technological exploration and maturity.
Reinforced phrases’ risk evolution trajectories distinctly reflect dynamic technological lifecycle influences. Their low-novelty nature ensures risks are primarily adaptation-driven, stemming from their incremental optimization attributes. In the early phases of technological development ( P 2 ), the LN-LA combination carries the highest risk, manifesting as severe fragmentation in research methodologies. Taking the sequence labeling field as an example, the ACL conference once featured 17 different architectures simultaneously, yet the performance improvements were generally less than 2%, fully reflecting the high trial-and-error costs of the exploration phase. As technology advances ( P 3 ), the LN-HA combination surpasses the LN-LA combination in risk. This stems from the fact that reinforcement phrase groups tend to focus on fine-tuning existing model parameters (e.g., RNN, LSTM) to optimize performance. While such research exhibits high adaptation in the short term, it lacks long-term competitiveness and is vulnerable to disruptive innovations. For example, the introduction of the transformer architecture in 2016 increased translation BLEU scores by over 10 points, directly undermining the research value of traditional methods [68]. However, with the rise in pre-trained models like BERT and GPT-3 ( P 4 ), the LN-LA combination dominates. At this phase, high adaptation instead implies greater path-locking risks, as excessive reliance on existing technological pathways suppresses disruptive innovation. Notably, the proportion of high-novelty combinations in reinforcement phrase groups remains consistently below 15%, confirming their characteristic of incremental innovation.
In summary, technological innovation risk depends not only on the two characteristic dimensions of novelty and adaptation, but is also closely related to the lifecycle phase of technological development. As technology evolves from the embryonic phase to maturity, risk characteristics and threshold levels exhibit dynamic changes. This characteristic provides important insights for seizing innovation opportunities and mitigating technological risks.

5. Discussion

5.1. Main Findings

Scientific and Technological innovation (STI) risk, characterized by intricate and evolving mechanisms, poses a fundamental challenge in technological advancement. This study investigates risk sources and applies recombination theory to categorize innovation into emerging phrases (frontier innovation) and reinforced phrases (incremental optimization), based on the novelty-adaptation coupling in knowledge recombination. We establish a quantitative assessment framework using a dual-dimensional novelty-adaptation metric, introducing a Z-index for risk quantification. This yields a comprehensive theory-methodology-empirical framework that elucidates the dynamic knowledge recombination-innovation risk relationship. Our findings offer new insights for innovation risk assessment, facilitating optimal resource allocation and R&D decision-making.
The empirical study in the NLP field yields the following main conclusions: emerging phrases and reinforced phrases exhibit significant differences in risk characteristics and distribution patterns. These differences stem from their distinct mechanisms in the “novelty-adaptation” tension. The details are as follows: In terms of innovation distribution and risk characteristics, emerging phrases and reinforced phrases exhibit significant differences. Emerging phrases demonstrate a long-tail distribution, where the high-novelty extreme regions may catalyze disruptive innovations but carry higher risks, while mature technologies are concentrated in low-risk areas. Emerging phrases encompass all four innovation types and exhibit distinct risk gradient characteristics: the novelty indicator shows a stable positive linear correlation with Z-score, while the adaptation indicator demonstrates a significant negative linear correlation with Z-score. The continuous rise in high-risk thresholds reflects the gradual decline of risk throughout the technology lifecycle. The High Novelty-Low Adaptation (HN-LA) combination carries the highest risk, the Low Novelty-High Adaptation (LN-HA) combination the lowest, while the High Novelty-High Adaptation (HN-HA) and Low Novelty-Low Adaptation (LN-LA) combinations present intermediate risk levels. Reinforced phrases primarily concentrate on low-novelty combinations, with their risks varying across the technology lifecycle: the novelty indicator exhibits a significant bimodal relationship with Z-score, showing an inverted U-shaped pattern in low-risk intervals and transitioning to a positive U-shaped pattern in high-risk intervals; the adaptation indicator demonstrates a robust inverted U-shaped relationship with Z-score. Furthermore, emerging phrases exhibit significantly higher risks than reinforced phrases. This strong correlation between “magnitude of transformation” and “degree of uncertainty” provides crucial guidance for formulating innovation strategies across different knowledge states and technology lifecycle phases.

5.2. Management Implications

In managing technological innovation risks, attention should be paid to the dual characteristics of novelty and adaptation and their synergistic effects, taking into account the technology lifecycle and features of knowledge combinations to formulate differentiated management strategies. First, a classification-based dynamic management approach should be adopted. In the early phase, for forward-looking exploratory projects (HN-LA), an appropriate margin of tolerance should be maintained, allowing small-scale experimentation and phased validation while promptly eliminating directions with limited potential. For breakthrough innovation projects (HN-HA), key investments should be prioritized to compete for leadership and secure a core position in the early phases of the technology.
In the mid-phase, for projects that have been verified to have high potential, the focus should be on enhancing the adaptation of highly novel projects and promoting the transformation of forward-looking exploratory projects (HN-LA) into breakthrough innovation projects (HN-HA). At this phase, resource allocation should tilt toward high-potential projects, gradually increasing R&D investment to improve their market adaptation and technological maturity. For applied innovation projects (LN-HA), which already possess high practical value and relatively low implementation difficulty, large-scale investment can be made to ensure stable output and commercial returns. To address risks in technology integration, interdisciplinary collaborative innovation platforms can be established, optimizing team configuration through tools such as knowledge graphs to ensure effective integration of cross-disciplinary technologies.
In the late phase, innovation projects have entered maturity, and the focus should be on achieving returns through scaled production and market promotion. At this phase, investment in declining research projects (LN-LA) should be reduced, as these projects typically face high latent risks and lack long-term competitiveness. For applied innovation projects (LN-HA), large-scale investment should be maintained to ensure stable output and commercial returns, safeguarding the long-term benefits of innovation. Meanwhile, continued attention should be given to the market leadership of breakthrough innovation projects (HN-HA) to ensure they maintain a competitive advantage and drive further technological application and commercialization.
Although the empirical analysis in this study is conducted within the NLP domain, the proposed framework is not inherently limited to text-based innovation fields. At the theoretical level, the core mechanism—innovation risk arising from the dynamic tension between novelty and adaptation—is domain-agnostic and can be applied to a wide range of innovation contexts. At the structural level, the framework relies on the representation of knowledge elements and their recombination patterns within and across different layers (e.g., science, technology, and application), which can be generalized beyond topic–term networks.
In non-textual domains such as engineering or biomedical innovation, knowledge structures can be operationalized using alternative representations, including citation networks, patent classification codes, design component networks, molecular interaction networks, or experimental protocol linkages. These representations allow the construction of analogous knowledge recombination networks, enabling the measurement of novelty, adaptation, and their associated risks. Therefore, while NLP provides a data-rich and methodologically transparent case, the proposed framework offers a flexible and extensible approach for early-stage innovation risk identification across diverse technological domains.

5.3. Research Limitations and Future Prospects

This study has several limitations. First, it does not fully elucidate the magnitude of impact or the intrinsic mechanisms linking novelty-induced uncertainty factors to scientific and technological innovation (STI) risk, thereby limiting a comprehensive understanding of this relationship. Future studies should investigate the underlying mechanisms connecting STI risk and uncertainty, including their influence pathways and key determinants. Second, regarding the correlation analysis of reinforced phrases, this study uses the SSE method to segment risk intervals; in future work, a quadrant-based approach will be applied to fit each quartile separately, enabling a more detailed analysis of the relationships among novelty, adaptation, and risk. Additionally, the identification of topic terms and their co-occurrence networks fundamentally relies on the BERTopic modeling framework. While validation steps were undertaken, the results remain sensitive to specific modeling choices, including the embedding model, clustering algorithm, and hyperparameter settings (e.g., minimum cluster size, diversity threshold). Variations in these setups may affect topic granularity, consistency, and boundary definitions, thereby influencing the subsequent detection of emerging and reinforced phrases and the derived measurements of novelty and adaptation. Future research could further enhance robustness by systematically comparing multiple topic modeling methods or by incorporating ensemble-based topic representations. Finally, this study focuses solely on natural language processing, which may limit the generalizability of the findings. Future research should expand to interdisciplinary contexts to validate the universality of the STI risk measurement framework.

Author Contributions

Conceptualization, X.H. and H.X.; Methodology, X.H.; Writing—original draft, X.H.; Data curation, X.H.; Formal analysis, X.H.; Investigation, H.X.; Writing—review and editing, H.X., R.H., C.L. and X.T.; Supervision, R.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (No. 72274113), Shandong Provincial Social Science Foundation (No. 23CTQJ07), Shandong Provincial Natural Science Foundation (No. ZR2022MG052), Beijing Natural Science Foundation (No. 9242006) and the Taishan Scholar Foundation of Shandong province of China (tsqn202103069).

Data Availability Statement

Data is contained within the article. The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

We used ChatGPT for translation, proofreading, and grammar checking. We evaluated the output by cross-referencing the translated and revised content with the original text to ensure accuracy, consistency, and alignment with the intended meaning. Additionally, we reviewed the final version to confirm that all technical terms and concepts were appropriately conveyed.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Factor Loadings.
Table A1. Factor Loadings.
IndexFactor Loading CoefficientCommunality
P 2 P 3 P 4
PC1PC2PC1PC2PC1PC2
Novelty0.5920.8060.637−0.7710.9310.3651.000
Adaptation0.605−0.7960.6370.771−0.5780.8161.000

References

  1. Heidenreich, S.; Kraemer, T. Innovations—Doomed to Fail? Investigating Strategies to Overcome Passive Innovation Resistance. J. Prod. Innov. Manag. 2016, 33, 277–297. [Google Scholar] [CrossRef]
  2. Chen, J.; Wu, Z.; Dai, T. Research on Risk Identification Framework of Future Technology Based on the Perspective of Techno-Economic Security. Bull. Chin. Acad. Sci. 2023, 38, 570–579. [Google Scholar] [CrossRef]
  3. Luthfa, S. A Study of How Uncertainty Emerges in the Uncertainty-Embedded Innovation Process. J. Innov. Manag. 2019, 7, 46–79. [Google Scholar] [CrossRef]
  4. Foster, J.G.; Rzhetsky, A.; Evans, J.A. Tradition and Innovation in Scientists’ Research Strategies. Am. Sociol. Rev. 2015, 80, 875–908. [Google Scholar] [CrossRef]
  5. Wang, T. Toward an Understanding of Innovation Failure: The Timing of Failure Experience. Technovation 2023, 125, 102787. [Google Scholar] [CrossRef]
  6. Park, M.; Leahey, E.; Funk, R.J. Papers and Patents Are Becoming Less Disruptive over Time. Nature 2023, 613, 138–144. [Google Scholar] [CrossRef]
  7. Shibayama, S.; Yin, D.; Matsumoto, K. Measuring Novelty in Science with Word Embedding. PLoS ONE 2021, 16, e0254034. [Google Scholar] [CrossRef] [PubMed]
  8. D’Este, P.; Amara, N.; Olmos-Peñuela, J. Fostering Novelty While Reducing Failure: Balancing the Twin Challenges of Product Innovation. Technol. Forecast. Soc. Change 2016, 113, 280–292. [Google Scholar] [CrossRef]
  9. Barney, J.B.; Nelson, R.R.; Winter, S.G. An Evolutionary Theory of Economic Change. Adm. Sci. Q. 1987, 32, 315. [Google Scholar] [CrossRef]
  10. Uzzi, B.; Mukherjee, S.; Stringer, M.; Jones, B. Atypical Combinations and Scientific Impact. Science 2013, 342, 468–472. [Google Scholar] [CrossRef]
  11. Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly Accurate Protein Structure Prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
  12. Bednorz, J.G.; Müller, K.A. Possible highTc Superconductivity in the Ba−La−Cu−O System. Z. Phys. B—Condens. Matter 1986, 64, 189–193. [Google Scholar] [CrossRef]
  13. Anderson, P.W. The Resonating Valence Bond State in La2CuO4 and Superconductivity. Science 1987, 235, 1196–1198. [Google Scholar] [CrossRef] [PubMed]
  14. Linton, J.D. Improving the Peer Review Process: Capturing More Information and Enabling High-Risk/High-Return Research. Res. Policy 2016, 45, 1936–1938. [Google Scholar] [CrossRef]
  15. McGoun, E.G. The History of Risk “Measurement”. Crit. Perspect. Account. 1995, 6, 511–532. [Google Scholar] [CrossRef]
  16. Machado, D. Quantitative Indicators for High-Risk/High-Reward Research. In OECD Science, Technology and Industry Working Papers; OECD: Paris, France, 2021; pp. 2–12. [Google Scholar]
  17. Veugelers, R.; Wang, J.; Stephan, P. Do Funding Agencies Select and Enable Novel Research: Evidence from ERC. Econ. Innov. New Technol. 2025, 1–18. [Google Scholar] [CrossRef]
  18. Yin, D.; Wu, Z.; Shibayama, S. Measuring Risk in Science. J. Informetr. 2023, 17, 101426. [Google Scholar] [CrossRef]
  19. Mueller, J.S.; Melwani, S.; Goncalo, J.A. The Bias Against Creativity: Why People Desire but Reject Creative Ideas. Psychol. Sci. 2012, 23, 13–17. [Google Scholar] [CrossRef]
  20. Criscuolo, P.; Dahlander, L.; Grohsjean, T.; Salter, A. Evaluating Novelty: The Role of Panels in the Selection of R&D Projects. Acad. Manag. J. 2017, 60, 433–460. [Google Scholar] [CrossRef]
  21. Nerkar, A. Old Is Gold? The Value of Temporal Exploration in the Creation of New Knowledge. Manag. Sci. 2003, 49, 211–229. [Google Scholar] [CrossRef]
  22. Kaplan, S.; Vakili, K. The Double-Edged Sword of Recombination in Breakthrough Innovation. Strateg. Manag. J. 2015, 36, 1435–1457. [Google Scholar] [CrossRef]
  23. Wang, C.-J.; Yan, L.; Cui, H. Unpacking the Essential Tension of Knowledge Recombination: Analyzing the Impact of Knowledge Spanning on Citation Impact and Disruptive Innovation. J. Informetr. 2023, 17, 101451. [Google Scholar] [CrossRef]
  24. Lin, Y.; Evans, J.A.; Wu, L. New Directions in Science Emerge from Disconnection and Discord. J. Informetr. 2022, 16, 101234. [Google Scholar] [CrossRef]
  25. Wang, J.; Veugelers, R.; Stephan, P. Bias against Novelty in Science: A Cautionary Tale for Users of Bibliometric Indicators. Res. Policy 2017, 46, 1416–1436. [Google Scholar] [CrossRef]
  26. Orr, H.A. Fitness and Its Role in Evolutionary Genetics. Nat. Rev. Genet. 2009, 10, 531–539. [Google Scholar] [CrossRef] [PubMed]
  27. Lillie, R.S. The Fitness of the Environment. An Inquiry into the Biological Significance of the Properties of Matter. Science 1913, 38, 337–342. [Google Scholar] [CrossRef]
  28. Spencer, H. The Principles of Biology; D. Appleton and Company: New York, NY, USA, 1914; pp. 51–53. [Google Scholar]
  29. Glor, E.D. Building Theory of Organizational Innovation, Change, Fitness and Survival. Innov. J. Public Sect. Innov. J. 2015, 20, 1–167. [Google Scholar]
  30. Fleischmann, M.; Pons, S. Electrochemically Induced Nuclear Fusion of Deuterium. J. Electroanal. Chem. Interfacial Electrochem. 1989, 261, 301–308. [Google Scholar] [CrossRef]
  31. Einstein, A. Zur Elektrodynamik Bewegter Körper. Ann. Der Phys. 1905, 322, 891–921. [Google Scholar] [CrossRef]
  32. Garud, R.; Gehman, J.; Kumaraswamy, A.; Tuertscher, P. From the Process of Innovation to Innovation as Process. In The SAGE Handbook of Process Organization Studies; Langley, A., Tsoukas, H., Eds.; SAGE: London, UK, 2017; pp. 451–465. [Google Scholar]
  33. Pettigrew, A.M.; Woodman, R.W.; Cameron, K.S. Studying Organizational Change and Development: Challenges for Future Research. Acad. Manag. J. 2001, 44, 697–713. [Google Scholar] [CrossRef]
  34. Planck, M. Zur Theorie Des Gesetzes Der Energieverteilung Im Normalspectrum. VhDPG 1900, 2, 238. [Google Scholar]
  35. Einstein, A. Über Einen Die Erzeugung Und Verwandlung Des Lichtes Betreffenden Heuristischen Gesichtspunkt. Ann. Der Phys. 1905, 322, 132–148. [Google Scholar] [CrossRef]
  36. Lane, J.N. The Subjective Expected Utility Approach and a Framework for Defining Project Risk in Terms of Novelty and Feasibility—A Response to Franzoni and Stephan (2023), ‘Uncertainty and Risk-Taking in Science’. Res. Policy 2023, 52, 104707. [Google Scholar] [CrossRef]
  37. Fleming, L. Recombinant Uncertainty in Technological Search. Manag. Sci. 2001, 47, 117–132. [Google Scholar] [CrossRef]
  38. Becker, M.C.; Knudsen, T.; March, J.G. Schumpeter, Winter, and the Sources of Novelty. Ind. Corp. Change 2006, 15, 353–371. [Google Scholar] [CrossRef]
  39. Ethiraj, S.K.; Levinthal, D. Modularity and Innovation in Complex Systems. Manag. Sci. 2004, 50, 159–173. [Google Scholar] [CrossRef]
  40. Gavetti, G.; Levinthal, D. Looking Forward and Looking Backward: Cognitive and Experiential Search. Adm. Sci. Q. 2000, 45, 113–137. [Google Scholar] [CrossRef]
  41. Leahey, E.; Lee, J.; Funk, R.J. What Types of Novelty Are Most Disruptive? Am. Sociol. Rev. 2023, 88, 562–597. [Google Scholar] [CrossRef]
  42. Moaniba, I.M.; Su, H.-N.; Lee, P.C. Knowledge Recombination and Technological Innovation: The Important Role of Cross-Disciplinary Knowledge. Innovation 2018, 20, 326–352. [Google Scholar] [CrossRef]
  43. Ma, R.K.; Wang, Y.T. Diversity and Novelty of Knowledge Combination and the Formation of Breakthrough Inventions. Stud. Sci. Sci. 2020, 38, 313–322. [Google Scholar] [CrossRef]
  44. Trapido, D. How Novelty in Knowledge Earns Recognition: The Role of Consistent Identities. Res. Policy 2015, 44, 1488–1500. [Google Scholar] [CrossRef]
  45. Yan, Y.; Tian, S.; Zhang, J. The Impact of a Paper’s New Combinations and New Components on Its Citation. Scientometrics 2020, 122, 895–913. [Google Scholar] [CrossRef]
  46. Luo, Z.; Lu, W.; He, J.; Wang, Y. Combination of Research Questions and Methods: A New Measurement of Scientific Novelty. J. Informetr. 2022, 16, 101282. [Google Scholar] [CrossRef]
  47. Li, J.; Yang, X. Multidimensional Measurement Characterization of Novelty Papers: An Example from the Field of Human–Computer Interaction. Inf. Sci. 2024, 43, 93–106. [Google Scholar]
  48. Rodgers, N.; Tiňo, P.; Johnson, S. Fitness-Based Growth of Directed Networks with Hierarchy. J. Phys. Complex. 2024, 5, 035013. [Google Scholar] [CrossRef]
  49. Smolyarenko, I.E.; Hoppe, K.; Rodgers, G.J. Network Growth Model with Intrinsic Vertex Fitness. Phys. Rev. 2013, 88, 012805. [Google Scholar] [CrossRef]
  50. Pham, T.; Sheridan, P.; Shimodaira, H. Joint Estimation of Preferential Attachment and Node Fitness in Growing Complex Networks. Sci. Rep. 2016, 6, 32558. [Google Scholar] [CrossRef]
  51. Hoppe, K.; Rodgers, G.J. Percolation on Fitness-Dependent Networks with Heterogeneous Resilience. Phys. Rev. E 2014, 90, 012815. [Google Scholar] [CrossRef]
  52. Xu, X.-J.; Zhang, L.-M.; Zhang, L.-J. Mutual Selection in Network Evolution: The Role of the Intrinsic Fitness. Int. J. Mod. Phys. C 2010, 21, 129–135. [Google Scholar] [CrossRef]
  53. Bianconi, G.; Barabási, A.-L. Competition and Multiscaling in Evolving Networks. Europhys. Lett. 2001, 54, 436. [Google Scholar] [CrossRef]
  54. Aspembitova, A.; Feng, L.; Melnikov, V.; Chew, L.Y. Fitness Preferential Attachment as a Driving Mechanism in Bitcoin Transaction Network. PLoS ONE 2019, 14, e0219346. [Google Scholar] [CrossRef] [PubMed]
  55. Memi, F.; Ntokou, A.; Papangeli, I. CRISPR/Cas9 Gene-Editing: Research Technologies, Clinical Applications and Ethical Considerations. Semin. Perinatol. 2018, 42, 487–500. [Google Scholar] [CrossRef]
  56. March, J.G. Exploration and Exploitation in Organizational Learning. Organ. Sci. 1991, 2, 71–87. [Google Scholar] [CrossRef]
  57. Funk, R.J.; Owen-Smith, J. A Dynamic Network Measure of Technological Change. Manag. Sci. 2017, 63, 791–817. [Google Scholar] [CrossRef]
  58. Cattani, G.; Deichmann, D.; Ferriani, S. Novelty: Searching for, Seeing, and Sustaining It. In The Generation, Recognition and Legitimation of Novelty; Emerald Publishing Limited: Leeds, UK, 2022; Volume 77, pp. 3–23. [Google Scholar]
  59. Bavato, D. Nothing New Under the Sun: Novelty Constructs and Measures in Social Studies. In The Generation, Recognition and Legitimation of Novelty; Cattani, G., Deichmann, D., Ferriani, S., Eds.; Emerald Publishing Limited: Leeds, UK, 2022; Volume 77, ISBN 978-1-80117-998-0. [Google Scholar]
  60. Newman, M. Networks: An Introduction; Oxford University Press: Oxford, UK, 2010; ISBN 978-0-19-920665-0. [Google Scholar]
  61. Bianconi, G.; Barabási, A.-L. Bose-Einstein Condensation in Complex Networks. Phys. Rev. Lett. 2001, 86, 5632–5635. [Google Scholar] [CrossRef] [PubMed]
  62. Liu, B.; Hu, M.; Cheng, J. Opinion Observer: Analyzing and Comparing Opinions on the Web. In Proceedings of the 14th International Conference on World Wide Web; Association for Computing Machinery: New York, NY, USA, 2005; pp. 342–351. [Google Scholar]
  63. Che, W.X.; Dou, Z.C.; Feng, Y.S.; Gui, T.; Han, X.P.; Hu, B.T.; Huang, M.L.; Huang, X.J.; Liu, K.; Liu, T.; et al. Towards a Comprehensive Understanding of the Impact of Large Language Models on Natural Language Processing: Challenges, Opportunities and Future Directions. Sci. Sin. Inf. 2023, 53, 1645–1687. [Google Scholar]
  64. Xie, Y.; Hu, Y.; Peng, W.; Bi, G.; Xing, L. COMMA: Modeling Relationship among Motivations, Emotions and Actions in Language-Based Human Activities. arXiv 2022, arXiv:2209.06470. [Google Scholar] [CrossRef]
  65. Brown, P.F.; Lai, J.C.; Mercer, R.L. Aligning Sentences in Parallel Corpora. In Proceedings of the 29th Annual Meeting on Association for Computational Linguistics; Association for Computational Linguistics: Berkeley, CA, USA, 1991; pp. 169–176. [Google Scholar]
  66. Dorr, B. Solving Thematic Divergences in Machine Translation. In Proceedings of the 28th Annual Meeting on Association for Computational Linguistics, Pittsburgh, PA, USA, 6–9 June 1990. [Google Scholar] [CrossRef]
  67. Pullum, G.K. Context-Freeness and the Computer Processing of Human Languages. In Proceedings of the 21st Annual Meeting on Association for Computational Linguistics; Association for Computational Linguistics: Cambridge, MA, USA, 1983; p. 1. [Google Scholar]
  68. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 6000–6010. [Google Scholar]
Figure 1. Schematic Diagram of the Risk Generation Mechanism in STI.
Figure 1. Schematic Diagram of the Risk Generation Mechanism in STI.
Systems 14 00142 g001
Figure 2. A Novelty-Adaptation Dimensional Framework for STI Risk Assessment.
Figure 2. A Novelty-Adaptation Dimensional Framework for STI Risk Assessment.
Systems 14 00142 g002
Figure 3. Conceptual Framework: STI Risk Formation Pathways. Note: Systems 14 00142 i001 indicates encountering innovation dilemmas.
Figure 3. Conceptual Framework: STI Risk Formation Pathways. Note: Systems 14 00142 i001 indicates encountering innovation dilemmas.
Systems 14 00142 g003
Figure 4. Topic Term Recombination Diagram. Note: The P 1 phase serves only as a knowledge base.
Figure 4. Topic Term Recombination Diagram. Note: The P 1 phase serves only as a knowledge base.
Systems 14 00142 g004
Figure 5. Life Cycle in NLP. Note: N denotes novelty; B denotes bottleneck.
Figure 5. Life Cycle in NLP. Note: N denotes novelty; B denotes bottleneck.
Systems 14 00142 g005
Figure 6. Topic Evolution in NLP.
Figure 6. Topic Evolution in NLP.
Systems 14 00142 g006
Figure 7. Evaluation of Novelty and Adaptation Metrics in Recombined Topic terms.
Figure 7. Evaluation of Novelty and Adaptation Metrics in Recombined Topic terms.
Systems 14 00142 g007
Figure 8. Four-Quadrant Classification Diagram.
Figure 8. Four-Quadrant Classification Diagram.
Systems 14 00142 g008
Figure 9. Quantitative Distribution of Different Innovation-Type Phrases.
Figure 9. Quantitative Distribution of Different Innovation-Type Phrases.
Systems 14 00142 g009
Figure 10. Z-score Distribution of Emerging vs. Reinforced Phrases. *** denotes a statistically significant difference (p < 0.001).
Figure 10. Z-score Distribution of Emerging vs. Reinforced Phrases. *** denotes a statistically significant difference (p < 0.001).
Systems 14 00142 g010
Figure 11. Risk–Novelty–Adaptation Relationships in Emerging Phrases.
Figure 11. Risk–Novelty–Adaptation Relationships in Emerging Phrases.
Systems 14 00142 g011
Figure 12. Risk–Novelty–Adaptation Relationships in Reinforced Phrases.
Figure 12. Risk–Novelty–Adaptation Relationships in Reinforced Phrases.
Systems 14 00142 g012
Figure 13. Z Index Distribution of Emerging Phrases in Novelty-Adaptation Space.
Figure 13. Z Index Distribution of Emerging Phrases in Novelty-Adaptation Space.
Systems 14 00142 g013
Figure 14. Z Index Distribution of Reinforced Phrases in Novelty-Adaptation Space.
Figure 14. Z Index Distribution of Reinforced Phrases in Novelty-Adaptation Space.
Systems 14 00142 g014
Table 1. Results of Topic term Combinations.
Table 1. Results of Topic term Combinations.
PhaseEmerging PhrasesReinforced Phrases Reinforced   Phrases   Threshold   ( Co - Occurrence   Frequency   in   P t 1 / P t )
Threshold   in   P t 1 Threshold   in   P t
P 2 7520756 f i < 10 f i > 20
P 3 35,869874 f i < 20 f i > 40
P 4 49,713521 f i < 20 f i > 40
Table 2. Preliminary Analysis of Novelty and Adaptation Metrics in Topic Term Combinations.
Table 2. Preliminary Analysis of Novelty and Adaptation Metrics in Topic Term Combinations.
TypePhaseTopic Term CombinationNoveltyAdaptation
HN-HA P 2 accent, intonation0.8690.545
accent, discourse0.5130.557
P 3 image, language0.6940.624
latent dirichlet allocation, model0.6550.563
entity, recognition named0.6020.809
model, neural machine translation0.5270.584
P 4 empathetic, model0.8600.646
hate, speech0.8240.740
code generation, language0.6950.618
bias, model0.6350.710
HN-LA P 2 intonation, sentence0.8230.289
ellipsis, machine translation0.7230.308
corpus, plan0.7150.162
P 3 dirichlet, morphological0.9270.281
entity recognition, sentiment analysis0.9060.281
classification, named entity recognition0.8940.333
chinese, latent dirichlet allocation0.8880.242
semantic role label, topic0.8380.153
P 4 hate speech detection, recognition0.9990.218
hate speech, part of speech0.9840.246
audio, toxicity0.9970.282
neural machine translation, sentiment analysis0.9340.306
LN-HA P 2 machine, translation0.1290.991
corpus, text0.0430.956
disambiguation, word0.2230.824
P 3 answer, question0.1631.000
coreference, resolution0.1520.983
document, summarization0.2950.970
P 4 bert, model0.2070.787
automatic, speech recognition0.0710.754
analysis, aspect based sentiment0.1290.744
LN-LA P 2 extension, rule0.0260.294
document, natural language0.0330.263
P 3 bilingual, entity recognition0.0220.261
language, word sense disambiguation0.0340.345
P 4 extraction, text generation0.1610.329
method, transformer model0.1610.352
Table 3. Four-Quadrant Analysis of Innovation Types: Characteristics, Case Examples, and Strategic Recommendations.
Table 3. Four-Quadrant Analysis of Innovation Types: Characteristics, Case Examples, and Strategic Recommendations.
Quadrant
Location
Core Characteristics &
Risk-Return Profile
Case Examples from Sample Papers & ContextStrategic Recommendations for Reference
HN-HABreakthrough innovation: high return potential but high uncertainty. Solves foundational bottlenecks or creates new paradigms aligned with market needs. Primary risk: Unproven technological paths and high R&D failure rates.Case (COMMA Framework 1): A pioneering cognitive framework that creates novel NLP tasks for modeling motivation, emotion, and action, with clear pathways for community adoption [64].Long-term, ecosystem strategy. Tolerate early failures; invest in IP and platform building.
HN-LAProspective exploration: forward-thinking yet poorly validated. Explores radical ideas lacking immediate application or support. Primary risk: Premature timing and unvalidated feasibility; most fail, but few may become disruptive.Case (Sentence Alignment): A pioneering method using only sentence length statistics, highly innovative yet untested for complex language or broad applicability [65].Exploratory Incubation & Continuous Monitoring. Best supported by small-scale, exploratory projects. Establish rapid validation mechanisms and closely monitor the maturity of enabling technologies.
LN-HAApplied innovation: stable returns with moderate risk. Optimizes and deploys mature technologies in new contexts. Primary risk: Intense competition in “red oceans” and diminishing marginal returns.Case (UNITRAN System): A systematic engineering solution that maps lexical-conceptual to syntactic structures to solve the specific, practical problem of thematic divergence in translation [66].Agile execution & market focus. Compete on speed, incremental improvement, and user experience.
LN-LADeclining research: low value with high opportunity cost. Repetitive, marginal work on outdated problems. Primary risk: Sunk costs and wasted resources that hinder progress.Case (Theory of Context-Free Grammars): Tightens constraints within a mature theoretical framework. A purely theoretical discussion with minimal practical application [67].Identification & exit. Use metrics (e.g., low Z-score) to identify and reallocate resources promptly.
1 These pathways are typically manifested in that the framework translates core cognitive concepts (motivation, emotion, and action) into a series of concrete and operable natural language processing tasks (such as motivation recognition and emotion–action association modeling), and provides standardized evaluation benchmarks, data formats, and baseline model interfaces, thereby lowering the technical barriers to community adoption and facilitating method reproducibility, comparison, and extension.
Table 4. Validation Results.
Table 4. Validation Results.
Recombination TypePhasePCA Variance ExplainedSpearman Rank Correlation CoefficientRandom Forest Feature ImportanceHigh-Risk Threshold
NoveltyAdaptationNoveltyAdaptation
emerging phrases P 2 64.17%0.766−0.8060.5800.4200.541
P 3 59.40%0.742−0.7750.4150.5850.621
P 4 61.99%0.773−0.7790.4160.5840.659
reinforced phrases P 2 54.54%0.698−0.7430.4650.5350.523
P 3 57.68%0.6940.7670.5870.4140.509
P 4 55.07%−0.7100.7520.4290.5710.639
Note: p < 0.001, α = 0.05.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hu, X.; Xu, H.; Haunschild, R.; Liu, C.; Tan, X. Research on Risk Measurement Methods of Scientific and Technological Innovation: A Dynamic Tension Model Based on Novelty and Adaptation. Systems 2026, 14, 142. https://doi.org/10.3390/systems14020142

AMA Style

Hu X, Xu H, Haunschild R, Liu C, Tan X. Research on Risk Measurement Methods of Scientific and Technological Innovation: A Dynamic Tension Model Based on Novelty and Adaptation. Systems. 2026; 14(2):142. https://doi.org/10.3390/systems14020142

Chicago/Turabian Style

Hu, Xiaoyang, Haiyun Xu, Robin Haunschild, Chunjiang Liu, and Xiao Tan. 2026. "Research on Risk Measurement Methods of Scientific and Technological Innovation: A Dynamic Tension Model Based on Novelty and Adaptation" Systems 14, no. 2: 142. https://doi.org/10.3390/systems14020142

APA Style

Hu, X., Xu, H., Haunschild, R., Liu, C., & Tan, X. (2026). Research on Risk Measurement Methods of Scientific and Technological Innovation: A Dynamic Tension Model Based on Novelty and Adaptation. Systems, 14(2), 142. https://doi.org/10.3390/systems14020142

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop