Previous Article in Journal
Applying Bibliometrics and a RoBERTa Transformer in the Circular Bioeconomy: A PRISMA 2020 Systematic Review
Previous Article in Special Issue
On the Vulnerability of Citation Metrics in the Era of Generative Artificial Intelligence
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Novelty First Policy-Based Intelligent Review Framework (IRF) for the Evaluation of Research Proposals

1
Amrita CREATE, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam 690525, Kerala, India
2
Amrita School of Business, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam 690525, Kerala, India
3
Amrita School of Computing, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam 690525, Kerala, India
*
Author to whom correspondence should be addressed.
Publications 2026, 14(2), 32; https://doi.org/10.3390/publications14020032
Submission received: 14 March 2026 / Revised: 30 April 2026 / Accepted: 11 May 2026 / Published: 15 May 2026
(This article belongs to the Special Issue AI in Academic Metrics and Impact Analysis)

Abstract

Despite its many limitations, peer review is the most preferred research assessment scheme for research proposal assessment at the individual level. Although scientometric assessment offers effective assessment frameworks, certain limitations, including the proven and potential misuse of scientometric indicators, hinder its wide adoption. Informed peer review is viewed as an effective way of harnessing the advantages of peer review and scientometric/quantitative assessment wherein one may complement the limitations of the other. Informed peer review frameworks are still prone to many inherent challenges in scientometric assessment and peer review. The importance of intelligent review frameworks that can be more advanced and effective than informed review frameworks lies there. With the advent of AI and generative AI (GenAI), a plethora of opportunities are available to convert informed peer review frameworks to intelligent review frameworks but not without challenges and concerns. In this work, we discuss the possible opportunities for effective AI intervention in an existing informed peer review framework to transform it into an intelligent review framework. Although the selected existing informed peer review framework emphasized the ‘novelty first’ policy, it did not provide any means or guidelines to execute it. The proposed conceptual ‘intelligent review framework’ addresses this very well by exploring the effective use of AI/ML techniques for the process and is envisioned to have the flexibility to adapt to future technological developments in AI, GenAI, etc. Possible challenges and a roadmap for possible evolution with anticipated technological changes, etc., are also discussed.

1. Introduction

As externally funded (mostly publicly funded) research projects are vital in determining the growth of science and technology, an intelligent and evolvable evaluation system framework for the assessment of such proposals is inevitable for STI governance. Given that AI is intended to take over many traditional systems with possibilities of human replacement or coexistence with humans, experts approach it with extra caution for some applications. STI governance, especially research evaluation, is one such application wherein even traditional scientometrics that may offer not only evidence but also intelligence are still heavily weighed against judgmental approaches such as peer review. With respect to research evaluation for STI governance, both scientometric/quantitative evaluation and peer review can benefit from AI’s transformative potential perhaps later than sooner because of many challenges, including issues such as transparency issues, due to the black box nature of some AI algorithms noted by Farber (2024). Gen AI models are far from claiming human-like intelligence/decision-making ability (Raman et al., 2024), and some experts even advocate a blanket ban of those for peer review. However, blanket bans on any technological change cannot stand the test of time, and the prudent approach is to diligently explore transformative potential and cautiously and systematically integrate AI models/approaches for systems such as peer review that are facing heavy challenges because of human limitations in tackling the massive rise in research activities. With respect to quantitative/scientometric evaluation, to be truly AI-powered, a systemic transformation of the field in terms of objectives and evaluation methods/systems is needed (Milojević, 2025), wherein AI-powered scholarly discovery platforms can be decisive (Rahman et al., 2026).
The evaluation/assessment of research project proposals for fellowships/grants/awards at the individual level is highly challenging, especially in the case of thrust areas or national priority areas. Evaluations and decision-making that are purely based on peer review (which is widely adopted for ex ante evaluations at the individual level) can be tricky. Unlike the emphasis on quality to be ensured by peer review in research publishing driven by corporate competition for reputation, the government’s immunity to such market forces, power against accountability and systemic lack of transparency sometimes lead to elite capture and cronyism (Mom et al., 2018), wherein ‘bureaucracy-elite academicians lobby’ exhibit favoritism for misappropriating/misallocating grants/resources via intentional bias in peer review. The study also revealed that women are more affected than men are in this regard. Another crucial pitfall is the bias against novelty or fear/intolerance of risk (Fedorov et al., 2010; Edwards et al., 2011; Petsko, 2011; Wang et al., 2018; Azoulay & Greenblatt, 2025; Veugelers et al., 2025). All these factors combined with the increased bureaucratization and administrative burden related to publicly funded projects orchestrated not only for genuine requirements such as crisis response but also for bureaucratic-over control and social and political side-payments (Bozeman & Jung, 2017) hinder novelty in research.
On the other hand, while its usage in nations that adopted performance-based funding systems (PRFS) for ex post evaluation (Hicks, 2012) is more sensible than peer review (Abramo et al., 2019; Abramo, 2024), we also have before us examples of misuse of scientometric/bibliometric indicators for review of proposals for funding. Sometimes using indicators can be more effective and easier for executing corrupt practices than peer review can be. The impact factor metric (Garfield, 1963) and h-index (Hirsch, 2005) might be the most misused indicators for ex ante evaluation of research proposals wherein applicants are individuals despite the caution provided during their introduction itself. For example, Madhan et al. (2018) criticized Indian funding agencies, including the DST of suffering from h-index syndrome, and attributed them not only to the misuse of IF and the h-index but also to the institutionalization of such misuse. Thus, indicators that are free from bias (favoring or giving undue advantage to some) and are not complex/sophisticated to interpret are vital for ensuring the success of scientometric assessment for the review of candidates for funding.
Informed peer review frameworks (IPRFs) that use both scientometric/quantitative assessment and peer review are sometimes regarded as better options (van den Besselaar & Sandström, 2020). However, IPRFs that take into account both bibliometric/quantitative scores and peer-review-based assessment scores to calculate final scores are also prone to the pitfalls of either the assessment procedures or both. Earlier, Abramo et al. (2019) opined that as the scale of assessment increases, scientometrics might be more useful than peer review (indicating the importance of scientometric evaluation and informed peer review over peer review). Based on the analysis of production functions of peer review and scientometric assessment for large-scale assessment that considers labor (in this context, the number of peers or scientometricians at work) and materials (resources that support analysis) costs, Abramo (2024) found that the increase in volume of proposals overburdens peers and reemphasized the ‘benefit of scale’ of scientometric assessment in long-term. In OECD countries (including those having PRFSs), and in developing countries like India that mostly rely on ex ante assessment, a rise in number of private universities and overall doctoral enrollment is visible (Sarrico, 2022; Jayaram, 2023). Nations can no longer stick to peer review assessment alone as the overburden and burnouts of peers might worsen the effects of limitations of peer review. Thus, scientometric assessment cannot be kept away forever in ex ante evaluations. Although informed peer review can be costlier than both scientometric and peer-review-based assessment, as a rise in the volume of proposals is anticipated, the ‘benefit of scale’ pointed out by Abramo (2024) for scientometric assessment in the PRFS is applicable in the case of ex ante informed peer review too, and in the long run, it may turn out as cost effective. Therefore, an informed peer review framework that ensures responsible use of indicators and is free from various previously discussed limitations of peer review should be an integral component of the national STI governance. This is the major motivation behind this work.
Recently, an informed peer review framework (IPRF) that attempted to minimize the drawbacks of both was introduced by Lathabai et al. (2023). The IPRF recommended the service of two teams, (i) scientometric/quantitative analysis teams and (ii) peer review teams, that work independently of each other. The responsibilities and functions of both the teams and the features and computations involved in IPRF are discussed in Section 2.3. Although the scientometric team computes a composite score viz. CS from citation-dependent indicators, it is almost immune to biases/limitations induced by citations, as its computation does not include raw citations or its simple variations, as in the case of IF and the h-index. Once proposals are ranked on the basis of the CS, those must be passed to the peer review team in a ‘least rank first’ fashion to review the merits of the proposal. Peer reviewers assess the proposals on the basis of the criteria set by them with or without the direction of funders, and CS has no role/influence other than deciding the order. To aid the peer review team and reduce the possibility of ineffective evaluation or biased/corrupt evaluation, a ‘novelty first policy’ was recommended. This means that while determining the quality/merit of a proposal, novelty should be given predominant consideration. The rationale is that novelty or new ideas, new theories or methods to solve problems (Bornmann et al., 2019b) are the key drivers of scientific/technological progress, and the earlier perspective of novelty as an unmeasured (Leahey et al., 2023) output of recombination (Kaplan & Vakili, 2015) is no longer valid after the introduction of methods to measure novelty, which we will eventually discuss. Thus, the diligent construct of the IPRF, which has an independent scientometric framework and a peer review framework, is supposed to have as much objectivity and balance as possible. However, the IPRF has several limitations, such as (i) limited/restricted analytical and operational space for the scientometric team and (ii) no guidance on how to assess novelty effectively despite the stress on novelty. With the advent of AI/ML, there are possibilities for addressing these drawbacks without worrying too much about cost. Thus, a framework that harnesses scientometrics and peer review in a better fashion than IPRF, other existing informed peer review models, scientometric evaluation models and peer review models that incorporate AI/ML methods effectively in both scientometric and peer review modules is needed. This gap is addressed in this work.
Therefore, we propose a conceptual framework that is envisioned to be comprehensive, robust, flexible and evolvable and retains the advantages of IPRF (Lathabai et al., 2023). This framework considers the fact that the wind of change in the form of AI is already blowing in scientometrics, which is also evident from works on hybrid human–AI tools for scientometrics (Correia et al., 2023) and many others. The proposed framework bestows more responsibilities/functions to the scientometric team so that peer reviewers in this framework can enjoy much support from the scientometric team in contrast to the IPRF. To ensure ‘novelty first policy’, in addition to the proposals ranked according to the novelty-incorporated composite score (CS′), the list of novel papers (and possibly their short summaries of novel contributions) in the relevant literature will be handed over to the peer review team. This may significantly aid peer reviewers in determining the novelty of the current proposals. The novelty of a proposal can be determined by examining whether it has any theoretical novelty, methodical novelty or applicational novelty over the papers that are determined as novel/disruptive/paradigm shift pivot papers in the fields/themes relevant to the call. Several other features envisioned for the framework that may be crucial for its evolvability are discussed in detail in Section 3. We prefer to call this framework the ‘Intelligent Review Framework’ or ‘IRF’ because the reasons are already obvious from the above discussion.

2. Developments Relevant to the IRF

Although we have yet to discuss the possible structure, function and evolution of the proposed design of the IRF, we clarify here that although the assessment is supposed to be carried out at the individual level, the following assessments are necessary for comprehensive and effective quantitative evaluation of the individual: (i) assessment of the body of literature related to field(s) or thrust areas being funded to determine key developments (for comparative determination of the relevance/novelty of submitted proposals) and the key contributors (to identify the experts for peer review) and (ii) assessment of the expertise of individuals/candidates who submitted proposals as in the case of IPRF. Prominent scientometric methods, techniques or frameworks useful for these are discussed next.

2.1. Scientometrics for the Assessment of Novelty, Disruptiveness and Paradigm Shifts in Research Fields

Scientometrics offers a myriad of frameworks, tools and methods for assessing research fields by mining scientific/patent networks via network analysis frameworks and tools (Shibata et al., 2007; Liu & Lu, 2012; Lathabai et al., 2018) for disciplinary as well as interdisciplinary analysis (Leydesdorff & Rafols, 2011; Wagner et al., 2011); Topic modeling (Mohammadi & Karami, 2022) and approaches that combine network analysis and text mining (Klavans & Boyack, 2017; Kim et al., 2022; Yan et al., 2024) are also gaining momentum. The indicators that could pinpoint or handpick works with substantial novelty/disruption/paradigm shifting potential are crucial for the proposed design of the IRF.
  • Novelty, disruptiveness and paradigm shift indicators
As extensions of the discussion of the importance of knowledge recombination and technical refinement (Grant, 1996) and disruptive innovation theory by Christensen (1997), several indicators for novelty/disruptiveness and paradigm shift detection/prediction were introduced. Azoulay (2019) noted that if a paper receives more citations in such a way that the citing works acknowledges/cites few of its intellectual forebears (papers cited by the paper of interest), then that work must be of disruptive potential. Therefore, many indicators are introduced on the basis of the cited references and citations received by papers or patents, represented as citation networks and networks derived from them, such as cocitation/bibliographic coupling networks and journal cocitation networks (reflecting knowledge origin, flow and recombination). Works that attempt to determine novelty via text or semantic similarity to previous works are also available. More such effective attempts can be expected in the AI era. In this work, we discuss 13 indicators (please also refer to Supplementary Table S1 for details) related to scientific and technological novelty, disruptiveness and paradigm shift detection/prediction.
Freeman’s betweenness centrality: Chen (2006) demonstrated the use of Freeman’s betweenness centrality (Freeman, 1977), and the rationale is that papers with high betweenness in cocitation networks are found in paths between two clusters/research fronts that lead to paradigm shifts.
Atypicality indicator: Uzzi et al. (2013) introduced a novel indicator with the rationale that atypical or unusual combinations of cited references (that lead to unusual journal pairings) can be suggestive of novelty. Atypicality score, z >0, indicates conventionality, and z < 0 indicates atypicality.
Novelty score U: Lee et al. (2015) developed the novelty score U, an improved version of the computationally intensive indicator by Uzzi et al. (2013), wherein atypicality/novelty is expressed as opposed to commonness. U is computed as the qth percentile (typically q = 10) of commonness at the paper level (obtained from commonness at the journal level).
Flow vergence index (FV index): Prabhakaran et al. (2015) defined an indicator/model as papers/patents that received (i) more citations (than the number of citations made) or (ii) a good number of citations from quality papers (which are, in turn, reasonably cited), or both are more likely to directly/indirectly transfer knowledge to more knowledge pathways in the citation network.
Flow Vergence (FV) gradient or   F V : Lathabai et al. (2015) developed the FV gradient indicator to pinpoint pivots of paradigm shift from papers that have high FV potential, because in a cited–citing pair/link where j cites i, if the citing paper exhibits more FV potential ( F V j > F V i ,     F V j i = F V i F V j   < 0 ) , then j can be a crucial paper associated with a paradigm shift, because usually i will have more FV potential than j does. The FV gradient can predict paradigm shifts too (Prabhakaran et al., 2018).
Patent-based novelty indicators: Verhoeven et al. (2016) devised three indicators, as patents can be novel if they possess novelty in recombination (NR) and novelty in technological origin (NTO) and scientific origin (NSO) of knowledge. Patents are assigned a binary value if they cause a new IPC combination (NR = 1), cause an IPC combination in the IPCs associated with them (NTO = 1) and cause a new IPC-WOS category combination (NSO = 1), rather than considering the frequency/magnitude. These three scores are then used to determine patents with different inherent novelties.
W score: It was introduced by Wang et al. (2017), according to which papers that led to journal combinations that are not only new/nonexistent but also reused (reoccur due to further publications) after introduction are novel.
CD index: The consolidated disruptive index (CD index) (Funk & Owen-Smith, 2017) treats a patent k as substantially disruptive if, among the patents that cite k, a smaller share cites the predecessors of k. A method to compute the magnitude of future use of k was also provided by them.
Disruptiveness indicator (DI): L. Wu et al. (2019) introduced a disruptiveness indicator, DI, that considers the citation preferences of works that cite a work and do not cite it. In other words, m is a work with high disruption potential if (i) the number of works that cite m but do not cite those cited by m > the number of works that cite m also cite those cited by m and (ii) there are fewer works that do not cite m but cite works that are not cited by m.
Modified DI index or  D I m n o k : Recognizing that consideration of works that do not cite m works sometimes forces the DI indicator to behave in a fashion counterintuitive to its purpose, Q. Wu and Yan (2019) introduced D I m n o k (validated by Bornmann et al., 2019b), a corrected DI index, the formula of which is without the number of works that do not cite m but cite works that are not cited by m.
Depth and Breadth indicators: Bu et al. (2021) introduced two indicators related to disruptiveness, namely, depth and breadth. If articles that cite paper i also cite other articles that cited i are more numerous (i.e., if papers citing i are highly bibliographically coupled through i), then i can be disruptive, as it creates more depth. The breadth indicator is computed as (1-depth), which indicates that it possibly caused knowledge flow to more works (but those that need not be well cited as of now but can give rise to multiple pathways).
The FV index (Prabhakaran et al., 2015) can also reflect both depth (as it incorporates eigenvector centrality) and breadth (as it incorporates the indegree/number of citations received).
Semantic novelty (SN) indicator: Instead of similarity via cited references, Shibayama et al. (2021) considered semantic similarity between cited references where semantic similarity is computed via the cosine similarity of ‘word embedding vectors’ of a cited reference pair and the qth percentile of the vector for all pairs associated with a work. In addition, semantic novelty (SN) is the opposite of semantic similarity; papers whose references have less semantic similarity with each other are treated as more novel.
Integrated novelty-impact indicator: Yan and Fan (2024) devised an indicator NI that reflects both novelty and impact, wherein the NI is computed as the weighted sum of impact and novelty. Impact is computed on the basis of citation sentiment, and novelty is computed on the basis of semantic novelty (via outlier detection in word embedding analysis). Papers with high impact according to citation sentiment and time heterogeneity and high semantic novelty can be highly novel.
Thus, most of the indicators are effective and valid (validated using seemingly effective methods) but computationally intensive. For instance, the effectiveness of the FV index was demonstrated by Prabhakaran (2018) using dynamic empirical validation in multiple fields, and its applicability in even volatile fields was validated by Lathabai et al. (2024). Patent-based novelty indicators were validated on R&D prize award (by R&D magazine) data and the EPO patent database by Verhoeven et al. (2016). The predictive power of the FV gradient was validated using static and dynamic analyses by Lathabai et al. (2018). A convergent validity test by Bornmann et al. (2019a) revealed that U is better than W is. The effectiveness of disruptiveness indicator is validated using five different approaches, including one that involves Nobel Prize Winners’ articles. The modified disruptiveness indicator is validated using a convergent validity test with the F1000 prime by Bornmann et al. (2019b). The semantic novelty indicator is validated against self-reported novelty scores using a survey, and the integrated novelty-impact indicator is empirically validated in two fields in the respective introductory papers themselves. Thus, methods in the pre-AI era, especially network-based novelty/disruptiveness/paradigm shift indicators designed with profound methodical rigor, can be promising assets for IRF. However, a trend toward the adoption of advanced text analysis and NLP techniques for the determination of semantic novelty is obvious. This might signal a gold rush toward the profound and proactive adoption of LLMs or the GenAI, and the combination of pre-AI network-based indicators for novelty/disruptiveness/paradigm shift indicators in combination with AI-powered approaches can work well in the scientometric evaluation part of the IRF.

2.2. Peer Review and AI

  • Peer review in the pre-AI era
Major factors that favor peer review as the preferred option in ex ante evaluation are (i) delays in the recognition/detection of novel works by scientometric/quantitative methods (Wang et al., 2017) and (ii) intimidatingly narrow tolerance of risk by funders, including public funders (Fedorov et al., 2010; Edwards et al., 2011; Petsko, 2011), associated with highly novel or offbeat research that deviates from mainstream research and the fear of going wrong (Boudreau et al., 2016; Lane et al., 2022). These often lead to suboptimal selections and nurturing of incremental research (Park et al., 2015; D. Li, 2017). Although several studies are on peer review for science funding, studies that consider the role of risk in the assessment of research proposals that lead to suboptimal selections are scarce. Through a linguistic analysis of review reports, van den Besselaar et al. (2018) revealed that peer reviews are more oriented toward the analysis of weak points in proposals than toward the assessment of novelty/innovativeness/high risk–high disruption potential. Linton (2016) suggested the combination of the Nobel Prize (in economics), winning the Black–Scholes model (Black & Scholes, 1973) and peer review, wherein the Black–Scholes model (adopted to the research assessment setting) provides a means to assess the relative attractiveness of projects (instead of financial value). The subjective expected utility (SEU) approach was recommended by Franzoni and Stephan (2023), in which peer reviewers provide values and probabilities for primary and secondary outcomes (along with verbal comments explaining the values) and compute a composite score as the sum of the product of value and probabilities for outcomes.
  • AI developments and possible effects on peer review
The prevailing concern of the misuse of AI in scientific writing (which is also applicable to research proposal development) since the introduction of Scigen (an AI writing tool capable of creating nonsensical papers that intensified manifold introduction) in 2005 (Bohannon, 2015) intensified manifold because of the introduction of ChatGPT/GPT-3.5 (Stokel-Walker, 2022). AI tool usage in academic writing (also applicable for research proposal writing for funding) possibly enhances six areas (Khalifa & Albadawy, 2024), such as idea generation (L. Li et al., 2024), content structuring and literature synthesis, data management, editing and ethical compliance (Zairul et al., 2025). In the short term, advanced versions of some of the above-discussed AI-based research proposal generators, aiding tools or currently trending GenAIs, may become successful/accepted, making the traditional approach to peer review (especially those focused on weak points of the proposals) completely useless. AI detectors as a first-line of defence can be useful, but those are prone to serious false positives (than false negatives) (Dalalah & Dalalah, 2023). This urges proactive action to aid reviewers (both journals and proposals) with better tools and via the formulation of standards, procedures and guidelines for authors/candidates as well as reviewers.
Owing to the manifold increase in proposal submissions, the need to automate every process from initial proposal submission to the final decision will be felt sooner, making AI intervention inevitable. Munagandla et al. (2024) proposed AI-driven research proposal management systems that could considerably reduce administrative overheads, bottlenecks and delays arising from routine tasks such as document validation, reviewer assignment and tracking deadlines. More directly useful tools for reviewers are AI-based content analysis tools or summarizers, especially when a large volume of proposals need to be analyzed. LLM-based text summarization and content analysis (Malinen, 2024) and LLM-based functionalities such as Query Master and PDF summarizers (Agarwal & Behera, 2024) can be helpful but insufficient from a semantic point of view (Carius & Teixeira, 2024), revealing their insufficiency from a semantic point of view. Early versions of AI products integrated into Dimensions, Web of Science and Science Direct for content analysis/document summarization or field summarization, targeting both reviewers and researchers (Dimensions, 2024; Hanumanth, 2025; Elsevier Ltd., 2025), are also being developed. As these tools are trained on scientific publication data, they are supposed to perform better than other tools trained on internet sources (which are prone to fake or misinformation).
Thus, most of the developments in AI usage for peer review, especially for ex ante review of research proposals, are still in the nascent stage, and the current policies, measures and regulations are not generally in favor of the use of these tools for automatic content analysis, the generation of review reports, etc. In this crucial juncture, we propose the design of ‘Intelligent Review Framework’ or IRF. As already mentioned, before discussing it, its precursor, the IPRF introduced by Lathabai et al. (2023), needs to be discussed.

2.3. Performance Assessment at the Individual Level and a Revisit to IPRF Based on Relevance-Based Expertise Scores

The use of raw indicators such as IF, h and h-type indicators such as the h(2)-index (Kosmulski, 2006), g-index (Egghe, 2006), f and t indices (Tol, 2009), h m   (Schreiber, 2008), h α (Hirsch, 2019), etc., has drawn criticism as individual assessment indicators. Expertise indicators inspired by h and g, such as x and x(g) (Lathabai et al., 2021), but reflecting core competency areas and potential core competency areas of institutions, can be used for individual-level assessment, as used in IPRF by Lathabai et al. (2023). The key recommended features of the IPRF are listed below:
  • Two independent teams—(i) the first team for scientometric/quantitative analysis, which uses a scientometric framework for informing the second team about the overall performance of candidates concerning the fields/themes considered for the award in the form of ranks—and (ii) the second team (peer review committee) qualitatively evaluate the merit of each proposal, wherein the order of consideration of proposals is based on the ranks provided by the first team during scientometric assessment (last rank to first rank). Notably, while the first team assigns ranks on the basis of the performance of the candidates, scores/ranks are not taken into account for initial screening or shortlisting. The order is assigned to make shortlisting easier. The details of the decision framework used by the peer review committee are discussed in the third section.
  • Scientometric framework (of the first team): Given the use of indicators such as the h-index or its variants, the IF is not suitable for the assessment of individuals for funding and related applications; indicators that are not too complex and sophisticated but are free from the limitations of both are necessary. Recently, Lathabai et al. (2021) introduced expertise indices such as x and x(g) (inspired by h and g indices) that help determine the core competent areas (top x thematic areas that have thematic strength ≥x, where thematic strength can be reflected by indicators such as citations, altmetric score, etc.) and potential core competent areas (x(g)-x areas below the x areas, where x(g) areas managed to gather average citations ≥x(g)) of an actor (such as institutions and individuals) and demonstrated the same results in the case of institutions. Although that framework uses keywords or important terms that are suitable for representing thematic areas (NLP techniques used as applicable), the relevance of publications that are to be mapped to thematic areas for the computation of thematic strengths was not considered. Relevance-incorporated expertise indices (wherein relevance scores come with concepts associated with publications available with the Dimensions database) were introduced to overcome this limitation and to reduce (if not eliminate) the need for NLP. Instead of the thematic strengths computed using the injection method, relevance-incorporated thematic strengths were proposed to be computed using the double injection method (Lathabai & Prabhakaran, 2023) for the computation of relevance-incorporated expertise indices. The relevance-incorporated expertise indices were defined as follows:
x [ R ] -index: A scientist is supposed to have an x [ R ] -index value of x [ R ] if he/she has published papers in at least x [ R ] thematic areas with relevance-incorporated thematic strengths exceeding x [ R ] . These x [ R ] areas that form the x [ R ] -core can be treated as the core competency areas of the scientist.
x ( g ) [ R ]   -index: A scientist is supposed to have an x ( g ) [ R ] -index value of x ( g ) [ R ] if he/she has published papers in at least x ( g ) [ R ] thematic areas such that the average relevance-incorporated thematic strength from these areas exceeds x ( g ) [ R ] .
Once the relevance-incorporated expertise indices are computed after the publication profile of each candidate who submitted proposals is analyzed, the core competency and potential core competency areas can be determined as the top x [ R ] areas and the next x ( g ) [ R ] x [ R ] areas. This helps determine the composite score CS using the following formula:
C S = 0.35   α + 0.15   β + 0.35   γ +   0.15   δ
where α and β represent the level of agreement between the t = |T| thematic areas (the list of areas in which T is preferably decided and communicated to the scientometric team by the funding agency) considered for funding (possibly thrust/national priority areas) and that of the thematic areas that fall within the lists of core competency (represented by T _ x [ R ] ) and potential core competency (represented by T _ x ( g ) [ R ] / T _ x [ R ] ) of the researcher, respectively. These can be computed as follows:
α =   | T     T _ x [ R ] |   | T | =   | T     T _ x [ R ]   |   t
β =   | T   ( T _ x ( g ) [ R ] / T _ x [ R ] ) |   | T | =   | T   ( T _ x ( g ) [ R ] / T _ x [ R ] ) |     t
γ and δ are the normalized strengths of a candidate in their core competency areas and potential core competency areas, respectively. For a candidate with | T     T x [ R ] | = n and | T   ( T _ x ( g ) [ R ] / T _ x [ R ] ) | = m, then
γ =   i = 1 n S i k = 1 t m a x . T S k
δ =   j = 1 m V j k = 1 t m a x . T S k
S i and V i represent the relevance-incorporated strengths of the candidate concerning arbitrary core competent areas i     T     T _ x [ R ] and arbitrary potential core competent areas j T   ( T _ x ( g ) [ R ] / T _ x [ R ] ) | , respectively. m a x . T S k is highest/maximum strength possessed by a candidate (applied for funding) in area k     T .
Specifically, x [ R ] (number of core competency areas) and x ( g ) [ R ] (number of core and potential core competency areas taken together) do not inherit the limitations of h and g from which those are inspired because they represent the number of areas of expertise and because of the relevance-based computation. Additionally, these are used to compute α and β (Equations (2) and (3)); hence, these are not directly used in CS computation. α and β take values between [0, 1], as these are obtained by normalization of the number of core competency areas and the number of potential core competency areas that are relevant for the call upon t = |T|, which is the total number of areas relevant to the call. This normalization further eliminates almost all the possibilities of citation-related bias. A candidate can never have both α and β equal to 1. Similarly, in the case of γ and δ , relevance incorporation for the computation of S i and V i and normalization of their sums, as shown in Equations (4) and (5), significantly reduces the effect of citation-induced bias. As γ and δ (both lies between [0, 1]), for an applicant to have γ = 1, he/she should be the one (among all the applicants) with the highest thematic strengths in all his/her core competent areas relevant to the call. Similarly, for the applicant, δ = 1 if he/she has the highest thematic strengths in all his/her potential core competency areas relevant to the call. A candidate can never have both γ and δ equal to 1.
After the CS is computed, the ranking of the applicants (applications) is performed accordingly, and the applications are passed on to the peer review team in the determined order.
3.
Peer review procedure: The peer review team analyzes the proposals (in the specified order). The size of the short list (i.e., the number of candidates to be shortlisted and called for final presentation and interview) is determined in the following way. If there are total N awards (predetermined), and if there are M groups (according to age/career length or stage), the total number of shortlists will be y N, and shortlists from a group will be y N/M, where y can be 2, 3, etc., depending on N, and the time selection can board them to listen to presentation, seek clarifications, have discussions and/or interview the candidates. From each group, proposals are shortlisted on the basis of evaluation according to parameters decided by the panel/board. Among the parameters used to evaluate (or assign scores), novelty was recommended as of primordial importance (and to be assigned more weightage rather than treating all the parameters equally). Apart from the novelty of the first policy, the key highlight of the IPRF is the diligent effort to eliminate the influence of scientometric evaluation on the shortlisting decision because it addresses two possible scenarios that may arise during the peer review process. These are as follows:
Scenario 1: If all the proposals from the top (y N/M) candidates (according to CS scores) are more meritorious (merit of proposals decided by qualitative parameters) than the other proposals are, then the top (y N/M) proposals can be shortlisted in the first pass itself.
Scenario 2: If one or more of the top (y N/M) proposals are not up to the mark and some of the other proposals having lower CS scores appear to be better in terms of merit than they are, a second pass is required to decide the best among those, and if the proposals from lower CS-scored candidates are confirmed to be of better merit, those can be shortlisted. It can be suggested that such candidates themselves receive training and guidance in that respect and it can be considered to be provided if needed, which, in a way, contributes to capacity/expertise building exercise too.
Now, we are in a position to discuss the Intelligent Review Framework (IRF), which is a conceptual framework that also includes a computational framework. Specifically, the definitions and equations discussed in Section 3 are novel and are part of the computational framework of the IRF.

3. Intelligent Review Framework (IRF)

We propose the IRF on the basis of a comprehensive and diligent review of related literature (see Section 2) and different kinds of exposures, experiences and expertise of the authors. The prolonged exposure of two authors at both the receiving end as well as at various evaluation committees, the work experience of one author in a government establishment for STI policy development, our experience in the field of scientometrics and contributions that include various evaluative indicators, frameworks, etc., and our experience in scientometric analysis of AI/ML/LLMs, especially their reliability, ethical considerations, etc., and our empirical validation of the suitability of AI/ML/LLMs for decision-making in managerial/governance/SDG-related tasks were helpful in designing the IRF, a conceptual framework that also embeds a computational framework. The IRF has to be implemented by a permanent institutional structure that is part of the STI governance of nations. Nations are recommended to have a permanent institution viz. National Scientometric Task Force or NSTF (consisting of scientometricians, prominent members of the scientific community, HR professionals and developers proficient in AI/ML). Prominent members of the scientific community will have advisory roles unless they are selected as peer reviewers upon scientometric evaluation against a call. The tenure of the peer review panel ends when all the awardees submit the final project report. If the prominent member of the scientific community agrees to be a peer reviewer, he or she will not be a part of the advisory board until the candidates are awarded. After the declaration of awards, he/she will also be excluded from key discussions/meetings/reviews during the progress of the awarded projects in the capacity of advisory members. However, he/she has to attend those in the capacity as peer reviewers whenever possible. In the case of multiple calls from the same discipline with sufficient thematic overlap, an ad hoc scientometric team will be directed to prepare a wider list of possible experts to be invited for the panels to reduce the number of members engaged in multiple panels as much as possible to reduce the possibility of elite capture in NSTF and IRF. Of course, all these invoke new challenges in some disciplines where some of the experts might be in early career stages/not hold permanent positions, which is one of the legal requirements to be appointed as a panel member in many countries. Both teams are under the oversight of the NSTF; if there is any confusion regarding scientometric evaluation results that leads to the recommendation of peer reviewers or any general issues (common in any workplace) or issues specific to a particular call in the IRF, or if there is a need for more support and facilities, the NSTF (senior scientometricians in consultation with advisory board members concerning the field, HR officials) has to intervene and resolve such issues, and future training programs can be updated by flagging such issues, along with strategies to avoid/overcome them.
The implementation of the IRF requires two independently working teams, as in the case of the IPRF. The first team, i.e., the scientometric team, has a greater operational and analytical role and has more responsibility for supporting the peer review team. NSTF is supposed to (i) carry out scientometric research and development activities related to indicator assessment, field assessment, individual assessment, etc., with special emphasis on applications of AI/ML techniques for all these (including the development and management of relevant AI-powered information systems, databases and knowledge bases); (ii) watch, track and predict the dynamics of all disciplines and subfields, including current and potential national priority areas, to provide intelligence (higher gravity than evidence-based information) for STI policy making and governance; (iii) recruit/train scientometricians; (iv) track, determine and review ethical considerations related to AI/ML usage in general and specifically in research practices and STI governance along the technological advancement and evolution of AI/ML; and (v) form/depute ad hoc teams for aiding funding agencies with IRF activities. Thus, the function (v) of the NSTF is more crucial for the scientometric component of the IRF. In regard to the peer review part of the IRF, unlike the case of the IPRF and other existing systems, peer reviewers are selected from a recommended list prepared by the scientometric team on the basis of an evaluation of expertise using state-of-the-art methods to eliminate the effects of elite capture and cronyism. Additionally, peer reviewers are blind to candidates’ details (making peer review a double-blind process) to eliminate any chances of favoritism. These measures also eliminate the possibility of gender discrimination. NSTF might provide the service of discourse analysts for aiding peer reviewers in the qualitative analysis of proposals on request.
The operation of the IRF in relation to a call for a proposal spans four phases: (1) Ideation of calls for proposals to receive proposals, (2) determination of experts and formulation of the panel for peer review, (3) computation of composite scores and (4) peer review and shortlisting for interviews.

3.1. Phase-1

When funders ideate the funding program and the call for a proposal, they can consult the NSTF, who will constitute an ad hoc team (to be headed by senior NSTF officials) to conduct research on the literature and provide feedback on which traditional and interdisciplinary fields are to be considered (Figure 1). Specifically, the NSTF is supposed to continuously monitor, track and predict development in all disciplines and subfields, including national priority areas (see item (ii) of functions of the NSTF). For this purpose, they might consider vital general properties (Coccia, 2018) and the sociolinguistic and discipline genre properties behind the evolution of fields (Stichweh, 2001). In the initial phase of the IRF roadmap, the use of AI/ML/LLM is not recommended. However, in later phases, they may consider the use of computational sociolinguistic and discipline genre analysis with the aid of proven AI/ML/LLM tools. Specifically, according to function (i) of the NSTF, its R&D activities also cover the sociolinguistic and discipline genre aspects. The scientometric team also determines the thematic areas associated with the fields using state-of-the-art methods and tools. Currently, topic modeling techniques such as LDA and transformer-based techniques such as BERTopic and SciBERT can be used, or any suitable LLMs can possibly be used. Once the fields and thematic areas are considered, the aspects to be considered for research proposal evaluation (at phase 4) according to the fields determined as relevant for the call and level of emphasis on each aspect (other than novelty) demanded by such fields are communicated to the funders/regulators. If the funders/regulators are satisfied, they may initiate the call. By using their dedicated project/proposal management systems, if any, funders can receive applications. Once the call deadline ends, funders have to transfer the applicant information and proposals to the NSTF, and they can form a master database (MDB) to store the information organized conveniently with the help of the development team to work further on that information in later stages.

3.2. Phase-2

The scientometric team constituted by the NSTF performs detailed field analysis and thematic area analysis to determine novel, disruptive or paradigm-shift-related works and patents (using relevant networks such as paper citation networks, patent citation networks, and other derived networks to compute the indicators discussed in Section 2.1) and performs content analysis of the same manually or via AI-based tools (Figure 2). Until effective tools are developed by the development team for content analysis, those already developed by Clarivate Analytics (Hanumanth, 2025) or so can be used. The team should prioritize the development of a single tool/package to determine all the novelty scores given in Section 2.1, because multiple tools/packages are currently required to compute them. Unless and until the most effective novelty, disruptiveness or paradigm shift indicators are in place, the following procedure can be used.
Procedure for classification of works into different classes of novelty, disruptiveness or pivotal roles in a paradigm shift using an AI-powered scientometric approach
The rationale of this procedure is that papers/patents that are picked up by multiple novelty, disruptiveness and paradigm shift indicators can be treated as more important in the fields/themes and can be marked as high novelty/disruptive/paradigm shift papers/patents. Thus, depending on whether more indicators are highlighted or not, different grades/classes can be present for papers/patents in a field, which can be assigned by a clustering exercise. The steps of the procedure are given below.
  • Computation of scores: For all works/patents in a field or thematic area, compute scores for novelty, disruptiveness or FV gradient scores.
  • Unidirectionality of indication should be ensured: By unidirectionality, uniformity in indication characteristics should be present. For instance, if a high value of some scores indicates high novelty, disruptiveness and a paradigm shift to a pivotal role, low values of others can indicate that. All scores should be transformed to ensure a unidirectional indication (preferably a high score indicating more novelty, disruptiveness and pivotal role in a paradigm shift). Additionally, some indicators, such as in Verhoeven et al. (2016), do not provide a score but attempt to classify patents. A scientometric team should develop an indicator that works according to the principle but provides a quantitative score and adapts it to suit the assessment of papers. For instance, an expression such as the one below can be considered for papers.
N S = a   N R + b   N S O + c   N T O
N R   = Number of new/first occurrences of Journal or WoS classification.
N S O   = Number of new journal combinations or WoS classifications via papers associated with such journals or WoS classifications.
N T O   = Number of connections between a WOS category/Journal and IPC that did not exist before.
The following values can be taken for a, b and c: a = 0.335, b = 0.335 and c = 0.33.
Similarly, the FV gradient or F V i j (Lathabai et al., 2015) is a score assigned for links in paper citation networks or patent citation networks, not vertices (that represent papers/patents). Therefore, the following transformation can be considered. First, all the links have to be ranked after being sorted in descending order. As links associated with (i) detected pivots have negative scores and (ii) predicted pivots have positive scores close to zero, the ranks of such links will be very high and high, respectively. The ranks of the links can be provided as scores to the sources (citing papers). Then, the scores of the detected pivots (sources of negative FV gradient links) will be extremely high, the scores of the predicted pivots (sources of links with an FV gradient close to zero) will be high, and the remaining papers will have lower scores than both of these categories. Thus, the FV gradient score or FVG of the citing paper (or source of a link) can be determined as follows:
F V G i = r ( F V i j )
where F V i j is the FV gradient weight of link l i j (from i to j).
3.
Normalization (optional): Nevertheless, problems can arise from some indicators (in Section 2.1) taking scores between 0 and 1 while others are not. Normalization options for others can be considered if needed.
For instance, F V G i can be easily normalized to be contained in ]0,1] by using
F V G ( n o r m ) i = r ( F V i j ) m a x { R }
where R is the set of ranks of the FV gradient values of all the links.
4.
Dimensionality reduction: To determine the most influencing indicators or avoid the less relevant ones, techniques such as principal component analysis or others can be used.
5.
Clustering of papers/patents according to these scores: Appropriate unsupervised clustering techniques such as k-means or more advanced ML algorithms can be used to cluster papers according to the scores of indicators selected after step 4.
6.
Cluster examination: Clusters having papers/patents scoring high for all indicators, clusters having papers scoring high for most of the indicators, clusters having papers scoring high for only some of the indicators, etc., can be determined. Papers that are found to score high for multiple novelty/disruptiveness/paradigm shift indicators (found most important clusters) can be selected for detailed content analysis.
7.
Content analysis: Content analysis of all the papers/patents in most important clusters can be carried out both manually and using AI-aided tools. Thematic analysis using BERTopic, SciBERT, etc., can also be performed. For other clusters, only recent works/patents need to be analyzed rigorously. The specific contributions of all the selected papers and how those advanced the applicable field/fields papers need to listed out (in an appropriate format) by the ad hoc scientometric team. Upon verification, if satisfactory, these results can be stored in another ad hoc database, ADB1. This information needs to be accessed by peer reviewers in phase-4 as this can be crucial to them because it can serve as a kind of benchmark of state-of-the-art against which novelty in proposals needs to be evaluated.
Specifically, in steps 1, 4, 5, 6 and 7 of the 7 step-procedure, AI/ML methods and techniques are applicable. For these steps, robustness can be ensured by consistency checks, i.e., by rerunning algorithms on different parameter settings as applicable and ensuring results are consistent or do not vary significantly. For instance, robustness of the clustering algorithms can be ensured by rerunning at different parameter settings and analyzing consistencies in terms of number of clusters, cluster memberships, inter cluster distances and other clustering quality indicators.
Specifically, the implementation challenges associated with the 7-step procedure may vary across STEMM fields and can be more challenging in non-STEM fields. Fields in which citation practices that determine various network structures (such as citation and cocitation) are more robust can be less challenging. In such fields, the procedures, especially cluster examination and content analysis, should be performed more cautiously, and in case of doubts, senior scientometricians in NSTF can be approached.
Second, the important authors or contributors within the fields or thematic areas can also be determined. As a rule of thumb, we recommend the use of CS scores introduced by Lathabai et al. (2023) for IPRF to be used instead of the most productive, most cited or highest h-index authors from the field or themes. Other preferable options are the use of the x-index (Lathabai et al., 2021) alone or the Ψ c -index (Lathabai & Prabhakaran, 2023). The experts thus identified will be communicated to the funders. Funders can contact the experts on the list and invite them. Details of those who accept or express willingness to serve as peer reviewers in the panel can be shared with an ad hoc team, and the information will be stored in the ad hoc database ADB2.
The NSTF will update the National Expert Network database (if there is any), which is not an ad hoc database such as the databases created thus far but is to be created and managed by the NSTF as part of the ongoing STI data governance activities of STI governance.

3.3. Phase-3

In this stage, the scientometric team has to extract the applicant’s details, including their publication information, from the master database MDB (created during phase 1). Once the details are extracted, the publication profile of the actor must be created (Figure 3). This can be accomplished through an API search in the Dimensions database, which provides the concepts associated with publications and their relevance scores and the scores for impact attributes such as citations or Altmetric attention scores. Once the profiles are created, the relevance-incorporated thematic strengths of the applicant can be computed according to the double injection procedure for IPRF given by Lathabai et al. (2023). This can aid in the determination of the x [ R ] and x ( g ) [ R ] indices, which, in turn, can be used to compute CS scores. Similarly, a suitable patent database such as USPTO or the ones in which candidates have obtained patents should be considered to extract inventive profiles of the applicants, and some intelligent topic modeling schemes can be employed to determine terms and their relevance for patents, and the same procedure given by Lathabai et al. (2023) can be used to determine composite scores or CS scores related to patents. Notably, both patents and papers need to be considered if the call is open for both industry and academia alike and if applications are also received from scientists and inventors almost alike. To execute the novelty-first policy, the IRF additionally examines and takes into account whether the applicants already contributed any novel, disruptive or pivotal papers on paradigm shifts and such patents. A search in ADB1 (created during phase 2) is required for this.
The number of novel, disruptive and paradigm shift pivotal role patents and papers can be used to compute the improved composite score CS, which also emphasizes ‘novelty’, unlike the original CS used in IPRF. CS’ is expressed as follows:
C S = 0.25   C S +   0.25   C S ( p a t ) + 0.25   η P +   0.25   η P a t
η P = N D P S P T N D P S P
η P a t = N D P S P a t T N D P S P a t
where N D P S P and N D P S P a t are the numbers of novel/disruptive/paradigm-shift pivotal papers and patents, respectively, whereas T N D P S P and T N D P S P a t are the total numbers of novel/disruptive/paradigm-shift pivotal papers and patents, respectively, found in the analyzed fields or thematic areas considered for the funding program. Notably, in the IRF, as the weight of the CS (used in the IPRF) is 0.25, the weight of the citation-dependent terms α, β, γ and δ are reduced to their one-fourth values, suggesting the elimination of any residual effect of citation bias. Specifically, in Equation (8), the weight given to novelty terms is 0.5 (0.25 each for their proportionate contribution to the total number of novel papers and patents in fields/themes related to the call). This emphasis on novelty in scientometric evaluation is the first crucial step in ensuring ‘novelty first policy’. Applicants’ CS scores, along with the scores that helped their computation, are stored in another ad hoc database, ADB3.

3.4. Phase-4

In the previous stages, the proposals were not analyzed. In this stage, the proposals saved in the MDB have to be extracted by the development team. This will then undergo thorough technical checks, including checks for AI-generated content (Figure 4). This is extremely important until and unless a robust AI-powered proposal management system that includes research integrity systems or FoSci systems that can ensure the screening of proposals according to research integrity guidelines stipulated by the emerging discipline of forensic scientometrics (McIntosh & Vitale, 2024) is used during the submission time itself. If a scientometric team finds any technical as well as ethical or research integrity issues, including AI-generated content, they can contact the NSTF for verification and confirmation, and if they also find those suspicious, the scientometric team can advise funders to either seek clarifications from applicants (if issues are seemingly resolvable in a fairly quick manner and proposals can be resubmitted quickly) or reject those proposals on technical grounds (if issues are too great to be resolved). Now, the proposals cleared by the AI detector and technical issues checker and those resubmitted proposals with resolved issues will be considered eligible proposals. Eligible proposals are then sent to both the peer review panel and the AI peer review simulator.
Evaluation by an AI peer review simulator: The AI peer review simulator is not merely an AI peer review report generator. The simulator is supposed to be equivalent to human subject experts or peer reviewers in terms of knowledge. If not now, these simulators will soon be a reality. Currently, explorations of simulating Delphi studies using AI tools that are supposed to have human expert equivalent subject knowledge, and hence, can serve as Delphi panels, are ongoing. However, the perfect version of such tools will take many more years to be realized. Until such simulators are available, the best AI peer-review report generators can be used. The simulator/report generator is also asked to provide scores against the same evaluation criteria prescribed to the human peer review panel and a detailed report in the format given to its human counterpart. Details of the format and suggestive criteria given to peer reviewers are discussed in the following subsection. One part (the incorrect statements in AI-generated summaries) will be absent in the format given to the AI simulator. All the eligible proposals and their summaries, reports and scores generated by the AI peer review simulator/report generator will be saved in a knowledge base KB by the NSTF. Proposals and their summaries (from the KB), field analysis summaries and novel contribution details in the field (from ADB1) will be passed on to the peer review panel.
Evaluation by Peer Reviewers: Now, the peer review panel has to follow the procedure given below:
  • Quickly review the AI-generated summaries. Note the contributions and novel elements highlighted in those reports.
  • Thoroughly read the proposals. Analyze the proposals according to the instructions given to them (by the funders, scientific experts, technical experts and government/regulatory bodies in consultation with the NSTF).
  • Assign scores against various prescribed aspects/heads (also agreed upon by funders and regulators in consultation with the NSTF). If it is a call related to a specific discipline, common basic aspects such as feasibility (technical as well as financial) methodical rigor and novelty should be evaluated at the level of emphasis demanded by the discipline. Ethical considerations, a major aspect that exhibits stark variation across disciplines, should be evaluated as stipulated by relevant international as well as regional discipline-specific ethical guidelines and procedures, and if there is a dedicated board/body/committee, adjudication can be sought if needed. Apart from this, any unique aspect relevant to that discipline alone is applicable and should also be evaluated. In the case of thrust/priority areas, all the above general aspects and aspects that become relevant only during transdisciplinary research assessment, along with unique aspects relevant only to a particular thrust/priority area, should be evaluated. If existing ethical frameworks/committees are not competent to assess the special ethical considerations demanded by the thrust/priority area, ad hoc committees (that can later be merged into the national research ethics and integrity framework) need to be formed, and their performance should be ensured in a fast-track mode. In particular, note that scores are given for different criteria and that peer reviewers need not compute an aggregate score.
  • While scores are assigned against novelty criteria, the field analysis summary and novel contribution details provided to them from ad hoc database ADB1 needs to be checked. It should be examined whether the proposals have theoretical, methodological and applicational novelty compared with the existing novel/disruptive/paradigm shift pivotal papers or patents and/or might cause disruptions or paradigm shifts if the projects are executed successfully. This is the decisive step in the peer review process to ensure ‘novelty first policy’.
  • Prepare a detailed report in a prescribed format that comprises (i) a summary to be written by peer reviewers, (ii) the list/details of any incorrect statements made by AI-generated reports and why those are incorrect and (iii) remarks and recommendations about the proposal.
The peer reviewers’ reports, scores and recommendations will also be stored in the KB. This will enable comparison of the (i) AI proposal summaries by human reviewers vs. the AI peer review simulator and (ii) scores given by human reviewers vs. the AI peer review simulator, which may lead to eventual improvement of the AI peer review simulator with the help of AI-powered knowledge management techniques and, hence, is vital for the roadmap of IRF. On the basis of the evaluation score similarity and textual report similarity, we envision the evolutionary roadmap of the IRF (see Section 4).
Now, both the CS′ scores (AI-powered scientometric evaluation) and the scores given by peer review (AI-powered peer review) against various criteria are available in ADB2 and KB, respectively, and can be used for shortlisting decisions. As both evaluations can be as efficient as possible (although they might not be absolutely perfect), the final score for proposals that might lead to shortlisting should be computed in the following way:
F S = θ   C S ( r s c ) +   p e e r   r e v i e w   s c o r e  
where θ + = 1. To ensure peer review is given more importance, it is recommended to keep θ well below 0.5 (50%). Furthermore, C S ( r s c ) denotes rescaled CS′ scores. The rescaling is to ensure scale computability with peer review score. This is because, CS′ (computed by Equation (8)) lies in ]0,1], and if peer review scores are assigned in ]0,100], CS′ needs to be scaled up. Thus, Equation (11) becomes
F S = θ   ( 100 × C S ) +   p e e r   r e v i e w   s c o r e
Peer review score encompasses scores for novelty and scores for technical and financial feasibility (indicating the risks involved in the proposal) and other aspects. Aspects apart from novelty can be taken together and denoted as aggregate score, then Equation (11) becomes
F S = θ   C S ( r s c ) +   ( μ   n o v e l t y   s c o r e + π   a g g r e g a t e   s c o r e )      
where μ +   π = 1.
Let θ = 0.4 (and = 0.6) to ensure 60% weightage to peer review. Now, to ensure novelty first policy, let μ = 0.6, which makes π = 0.4 , then Equation (13) becomes
F S = 0.4   C S ( r s c ) + 0.36   n o v e l t y   s c o r e + 0.24     a g g r e g a t e   s c o r e
In Equation (14), for the term 0.4 C S ( r s c ) , 50% is for novelty (see Equation (8)). This makes overall 20% allotment for novelty determined by scientometric/quantitative evaluation. As 36% of the overall score is dedicated to the novelty score (qualitatively given by peer reviewers), the overall representation of the novelty aspect in FS is 56%, ensuring the ‘novelty first policy’. This, of course, is suggestive and can be regarded as a rule of thumb. As novelty is in trade off with risk (due to technical and economic feasibility), weightage to novelty (and hence, aggregate scores) needs to be finalized in terms of how much risk tolerance is affordable for the particular call on discussion among funders, regulators and scientific and technical experts in the country in consultation with the NSTF. A minimalistic demonstration (please see Section 3.5) on synthetic data might make this point clear.
Once FS is computed for all the applications, as in IPRF, applications have to be classified into M groups according to effective age or stage of career. If the funders are willing to award N proposals and the top level of NSTF and funders and peer reviewers are willing to hear presentations from and interview the top y N applicants to make a final decision on the award (where y can be 2, 3, etc.), then from each group, the top (y N/M) applicants will be selected according to their FS score.

3.5. Demonstration of the Computational Framework Using Synthetic Data of 7 Applicants of Same Effective Age/Career Length

Let a call for proposals be associated with five subfields/focus themes denoted by FT1, FT2, … and FT5. By the novelty computation process at phase-3, let the number of novel papers determined for five subfields/focus themes be 100, 125, 95, 65 and 150, respectively, making the total number of novel papers to be considered for the call = 535.
Now let us consider 7 applicants whose essential details required for scientometric part (extracted and computed by scientometric team) are given in Table 1 in quadruple format. Let us assume that no applicant has obtained a patent in relevant subfields/themes and evaluation is focused on publication data only. The items from left to right indicate (i) whether the subfield/theme is one of the core competency subfield/theme (indicated by letter C) or one of the potential core competency subfield/theme (denoted by P) of an applicant, (ii) number of publications by the applicant in that particular subfield/theme, (iii) citations received by the publications by the applicant in that particular subfield/theme and (iv) number of novel/disruptive/paradigm shift papers by the applicant in that particular subfield/theme.
Using Equations (1)–(5), CS for all 7 applicants can be computed. This is given in Table 2. Specifically note that t = 5 (as there are 5 subfields/themes) is used for computation of α and β. For, γ and δ, k = 1 t   m a x . T S k needs to be computed. From Table 1, m a x . T S T F 1 = 600, m a x . T S T F 2 = 1950, m a x . T S T F 3 = 2000, m a x . T S T F 4 = 900 and m a x . T S T F 5 = 2750, making k = 1 t   m a x . T S k = 8200.
Now, using the information of total number of novel papers by applicants (by summing up the number of novel papers by applicants in all applicable subfields/themes given in Table 1) and total number of novel papers considered for the call, CS′ can be computed as given in Table 3.
In stage-4, after examining proposals and comparing them against 535 novel papers from the 5 applicable subfields/themes, peer reviewers assign novelty scores. If novelty and aggregate scores for the submitted proposals by applicants are awarded as given in Table 3, FS scores can be computed. The case of FS score computation using overall novelty score weightage of 56% (weightage of novelty score on peer review 36%) is also given in Table 3, if no risk is identified with any of the proposals (aggregate scores = 100).
Most novel proposals are found to have high FS scores, and if three out of 7 can only be shortlisted, C, F and A (in this particular order) will make it into the shortlist (please see bold font in Table 3).
Now let us examine various scenarios for a range of overall novelty weightage 0.25–0.75, wherein the proposals by applicants who received top scores for novelties (in peer review) are associated with varied levels of risks. No-risk scenario is characterized by aggregate score of 100, low-risk scenario is characterized by 90 < aggregate score < 100, moderate-risk scenario is characterized by 75 < aggregate score ≤ 90, and high-risk scenario is characterized by aggregate score < 75. From all scenarios (except no-risk scenario), one specific representative aggregate score is chosen for computing FS scores.
No-risk scenario: FS scores for applicants with novelty weightage range 0.25–0.75 in no-risk scenario (aggregate score = 100) are shown in Figure 5. This indicates that if there are no issues/risks related to technical and economic feasibilities of the proposals, most novel proposals will always make it onto the shortlist. However, distinction from top three and others become clearer when overall novelty weightage ( θ 2 + μ ) > 0.35.
Low-risk scenario: In low-risk scenario, let C, F and A (applicants with top novel proposals) be awarded aggregate score = 91. Then, FS scores for applicants with novelty weightage range 0.25–0.75 are shown in Figure 6. This indicates that effect of novelty weightage (emphasis) starts only when overall weightage ( θ 2 + μ ) > crosses 0.44 (as C becomes eligible for shortlisting). At 0.6 (60% overall weightage), F also becomes eligible for shortlisting, increasing complexity of decision-making by strengthening the need to carefully address novelty–risk tradeoff. Thus, the range 0.44–0.6 is chosen as ‘emphasis zone’, the ranges prior to and beyond the ‘emphasis zone’ are marked as ‘under emphasis zone’ and ‘over emphasis zone’, respectively.
Moderate-risk scenario: In moderate-risk scenario, let C, F and A (applicants with top novel proposals) be awarded aggregate score = 76. Then, FS scores for applicants with novelty weightage range 0.25–0.75 are shown in Figure 7. This indicates that effect of novelty weightage (emphasis) starts only when overall weightage ( θ 2 + μ ) > crosses 0.58 (as C becomes eligible for shortlisting). At 0.67 (67% overall weightage), F also becomes eligible for shortlisting, increasing complexity of decision-making by strengthening the need to carefully address novelty–risk tradeoff. Specifically note that when proposals are becoming riskier, novelty emphasis has to increase by increasing the risk tolerance level. Furthermore, the emphasis zone (0.58–0.67) is narrow compared to the low-risk scenario, indicating the importance of fixing the overall weightage for novelty increases when proposals’ risks grow.
High-risk scenario: In high-risk scenario, let C, F and A (applicants with top novel proposals) be awarded aggregate score = 50. Then, FS scores for applicants with novelty weightage range 0.25–0.75 are shown in Figure 8. This indicates that effect of novelty weightage (emphasis) starts only when overall weightage ( θ 2 + μ ) > crosses 0.67 (as C becomes eligible for shortlisting). At 0.75 (75% overall weightage), F also becomes eligible for shortlisting, increasing complexity of decision-making by strengthening the need to carefully address novelty–risk tradeoff. Specifically note that ‘emphasis zone’ shifted further to 0.58–0.75 and is narrowest in high-risk scenario.
The above analysis used synthetic data for 7 applicants and one set of novelty scores (given in Table 3). How the aggregate score variation among top novel applicants in different scenarios might have impacted FS is not covered. A series of simulation studies and sensitivity analyses on many large synthetic and real-world datasets (of awarded grants in custody of funders) are required for a holistic understanding prior to implementation. Despite these limitations, the current analysis clearly indicates that it is not possible to fix the same weightage to all fields as different levels of risks are associated in different fields and demand different levels of risk tolerance. Even for same field, two different calls involving different set of subfields/themes require careful evaluation of risks, and affordable risk tolerance should be fixed by extensive and inclusive stakeholder discussion before determining the overall weightage of novelty score to ensure the right level of emphasis on novelty. Transdisciplinary fields/national priority areas cannot grow with low risk tolerance and at least moderate risk tolerance (in some cases high risk tolerance is required). The choice of 56% weightage might be useful to ensure that novelty is emphasized with low and moderate risk tolerance.

4. Discussion of the Evolutionary Roadmap of the IRF

The current study introduced a conceptual framework, namely, the IRF, for the evaluation of research proposals at the individual level. Its novelty lies in the following facts: (i) it integrates a means to realize ‘novelty first policy’ in the ex ante evaluation of research proposals, and (ii) it introduces an effective integration of AI-based peer review (in peer review) and AI-powered scientometric assessment. Owing to its sophistication and scale, it cannot be demonstrated and validated in a pre-implementation manner. As the IRF relies on the ‘benefit of scale’ and requires the establishment of the NSTF or a similar functional body, which is a matter of national STI governance and policy, we can recommend only national STI policymakers and other concerned ministerial/governance offices to carefully analyze our proposed conceptual framework and critically evaluate the ‘benefit of scale’ against general, field-specific and country-specific challenges.
Upon cost/challenge–benefit analysis, if the establishment of NSTF (or the alteration of existing similar systems to meet requirements) and IRF is found to be feasible, keeping up with the pace of AI development and filtering out the best ones according to parameters such as performance, safety and reliability is a major challenge. This is vital because while the capabilities of AI/ML/LLMs are improving at a greater pace, reliability improvement is not keeping up, and in some applications, reliability decline (Allen & Peterson, 2026; Kübler et al., 2026) is also witnessed. As per the METR (Model Evaluation and Threat Research), the number of tasks that AI agents/models can perform (at a 50% success rate) doubles every 7 months (Kwa et al., 2025), and this result is beginning to become popular in some circles as Moore’s law for AI. However, for the functions that are to be carried out by the NSTF, including its IRF-related functions, a slower growth rate and a much slower adoption rate are anticipated, as a 50% success rate is not affordable in matters related to national policy. Apart from this, data security and data governance in the AI era (Sharma & Sharma, 2025) in general and, specifically, STI governance applications and ethical considerations of AI/ML/LLMs and the field-dependent and country-dependent variations in those and the evolution of these factors are also decisive in determining the participatory evolution of AI in IRF, especially for AI-powered peer review of proposals.
The NSTF’s scientometric staff (not the ad hoc scientometric team in IRF) should continuously track the developments in AI/LLMs related not only to performance but also reliability over time. By doing so, NSTF will be able to prescribe the most reliable and performing model among the available options to the ad hoc team. Additionally, in due course, if some efficient tools to detect hallucinations are developed, those should be adopted. Manual checking of the reports generated by the AI-peer review simulator for hallucinations and misinterpretations/out of context interpretations by the ad hoc scientometric team who operates the reports generated by AI-peer review simulators is mandatory before the summary of the proposals (not the scores and evaluation reports) generated by the AI-peer review simulator is handed over to the peer reviewers. Incorporation of AI summarization modules managed by or associated with scholarly databases like WoS, Scopus and Dimensions for retrieval-augmented generation (Lewis et al., 2020) can also improve reliability by reducing hallucinations. Using prompt engineering techniques like chain of thoughts, negative constraint prompting that prohibits the model from making up information, etc., can also be used to address hallucination issues for reliability improvement. Once the summaries are handed over to peer reviewers, along with the review of the proposal, they need to highlight any incorrect statements/claims in the summary generated by AI-peer review simulator, which is a second layer reliability improvement process. Owing to the importance of IRF in STI governance, as recommended by Banerjee et al. (2025), permanent human oversight should be there to ensure reliability, as some researchers argue that inherent hallucinations in LLMs will not disappear completely.
In the case of accuracy of peer review (i.e., the AI-powered peer review simulator), it can be tracked by the NSTF by examining numeric as well as text similarity/agreement scores computed at phase 4 (stored in KB2). Until both scores cross 0.5, only human evaluation can be considered valid. After 0.5 crossing (50% success rate), the AI evaluation can be considered to complement the human evaluation with strict human supervision. In that case, human evaluation should be given more weight than AI evaluation for the computation of FS, as given in Equation (15).
F S = 0.4   C S + 0.24   n o v e l t y   s c o r e   ( H ) + 0.12   n o v e l t y   s c o r e   ( A I ) + 0.16     a g g r e g a t e   o f   o t h e r   s c o r e s     ( H ) + 0.8   a g g r e g a t e   o f   o t h e r   s c o r e s     ( A I )
In Equation (15), among the peer review scores, scores from human evaluation are given more weight (66.66%), and the weight of AI peer review is approximately 33.33% to ensure caution against the potential limitations of AI peer review. However, the overall emphasis on novelty-first policy is ensured in Equation (12) because the overall weight of novelty-related terms is maintained at 56%.
Suitable thresholds can be set (e.g., 0.95 for numerical agreement and 0.85 for text agreement), beyond which human supervision can be minimized, and AI scores can also be considered with equal weight for shortlisting. Then, the FS can possibly be computed with equal weight to that of the human and AI reviews (i.e., 50% each for human and AI peer review), resulting in FS computation in the following manner.
F S = 0.4   C S + 0.18   n o v e l t y   s c o r e   ( H ) + 0.18   n o v e l t y   s c o r e   ( A I ) + 0.12     a g g r e g a t e   o f   o t h e r   s c o r e s     ( H ) + 0.12   a g g r e g a t e   o f   o t h e r   s c o r e s     ( A I )
In Equation (16) too, the overall weight of novelty remains 56%, ensuring the emphasis on ‘novelty first policy’.

Evolution of AI Participation in the IRF’s Peer Review Section: Two Scenarios

Let us consider two scenarios in which the reliability of AI/ML/LLMs has improved significantly over time and in which AI-powered peer review systems have evolved such that the agreement between AI-generated peer review and human-generated peer review has improved continuously over the years. If a scenario such as the one shown in Figure 9 (Scenario 1) is encountered, up to 2035, the evaluation can be completely human (wherein AI involvement is restricted to providing a proposal summary and aiding novelty identification). After that, AI peer review may be used to complement human evaluation under careful human supervision. In the scenario given in Figure 9, AI involvement in FS score computation is a question even in 2041. Therefore, FS will be computed as in Equation (14) up to 2035 and by using Equation (15) after that. In the case of scenario 2 (Figure 10), human evaluation (without the involvement of AI in score assignment) is the only option until 2031. After that, AI peer review can be used to complement human evaluation under careful human supervision until 2039. After 2039, AI peer review can be given more or less the same importance as that of human review, and the requirements of human supervision can be restricted to cases where wide deviations are encountered. Thus, in scenario 2, Equation (11) can be used up to 2031, Equation (12) can be used from 2031 to 2039, and subsequently, Equation (13) can be used. The possible evolutionary configuration of the IRF emphasizing HR:AI ratios at different phases that leads to an evolutionary roadmap of the IRF in scenario 2 (optimistic in terms of AI-powered peer review) is depicted in Figure 11.
Apart from the IRF-related functions of the NSTF, its continuous research endeavors may aid in the phase-1 activities of the IRF in the immediate or short-term future, such as (i) the assessment of LLMs and AI tools, as well as the development of sciento-LLMs or LLMs, and (ii) the development of AI-driven research proposal management systems such as the one proposed by Munagandla et al. (2024) to perform some of the mundane administrative tasks (not every task suggested by them) to aid funders. Assuming that AI developments in scientometric applications and peer review occur at a slower rate than that given by Moore’s law of AI, the HR:AI ratio in the IRF during configurational upgrades might also change in a slower and more careful way.
Although we have performed a comprehensive analysis of the existing state-of-the-art in scientometrics for conceptualizing IRF, many uncertainties are related to this context, especially in terms of AI usage. Additionally, the gravity of tracking challenges induced by the uncontrolled and disorganized structure and growth of scientometrics should be addressed while Sciento-LLMs, AI-powered novelty assessment frameworks (AINAFs) and AI-powered peer review systems should be upgraded. The uncertainty related to solving those issues and challenges may be crucial for the roadmap. If some of the challenges are resolved sooner, NSTF/IRF should also be prepared for that. If nations are going to have NSTF and IRF as suggested, the benefits or threats of AI, no matter how soon they materialize, can be addressed and become advantageous via diligent human intervention. The most crucial step for NSTF and IRF lies in training, recruiting and keeping ergonomic support to Scientometricians throughout the transition toward Industry 5.0.

5. Conclusions

This study proposes a conceptual Intelligent Review Framework (IRF) as a policy-based, AI-enabled alternative to conventional proposal assessment systems that rely excessively either on peer review or on simplistic scientometric indicators. Its principal contribution lies in being, to the best of our knowledge, the first framework to operationalize a ‘novelty first policy’ in research proposal evaluation through an integrated architecture that combines AI-powered scientometric assessment, structured peer review and a phased governance design. The framework advances beyond the earlier informed peer review model by assigning a broader role to the scientometric team: identifying priority themes, detecting novel and disruptive works in the relevant literature, recommending experts, strengthening proposal screening and supporting the development of AI-based review tools. A major finding is that novelty can be treated not as a vague aspiration but as an assessable dimension through the combined use of novelty, disruptiveness and paradigm-shift indicators; thematic relevance scores; and improved composite measures such as CS′ and FS. The framework also contributes a governance mechanism in the form of a national scientometric task force, which can support long-term institutionalization, standardization and technological upgrading.
However, even if a nation has rich scientometric talents and institutions that can be upgraded to the NSTF, we recommend that national policymakers and responsible ministries/departments in government carefully weigh challenges, including discipline-specific challenges and country-specific challenges, against the ‘benefit of scale’ offered by the IRF and consider them before their implementation. Additionally, the massive scale (establishment and sustenance of the NSTF and IRF or suitable revamping current institutions) demands the need for a heavy-budgeted pre-implementation commissioned study. As part of that study, expert survey (ensuring inclusivity and appropriate representation level of various stakeholders) should be there concerning the IRF and the computational framework embedded in it. Apart from that, a series of simulation studies and sensitivity analyses on many large synthetic and real-world datasets (of awarded grants in custody of funders) is vital to develop a holistic understanding on how to determine the affordable risk tolerance level in different fields/subfields/themes and thereby decide the level of emphasis on novelty for a call. Furthermore, as the framework is evolvable, the validation is not a one-time exercise and should be carried out at each upgradation point specified in the evolutionary roadmap.
IRF envisions ensured fairness in evaluation via many features. Firstly, the peer review selection via scientometric evaluation and peer reviewers’ lack of access to applicants’ details significantly reduces the possibility of favoritism by elite-capture and cronyism and other biases, including gender bias. Second and foremost is the novelty emphasis in both scientometric and peer review assessments, which directly targets the bias against novelty and fear/intolerance of risk. This might hinder the intentional or unintentional favoring of less novel proposals by rejecting novel proposals by simply stating it is either risky or ambitious. Satisfactory justification on why it is risky and why the risk is beyond the prior determined tolerance standards of the applicable fields needs to be clarified and communicated to the applicant in the event of exclusion from the short list. How crucial is affordable risk tolerance level determination in fixing novelty weightage is demonstrated by a modest sensitivity analysis. Thirdly, to ensure responsible, ethical and fair usage of AI in peer review, any attempt by human peer reviewers to use AI/LLM at their own end is prohibited. The proposal submissions (without applicants’ information), AI-generated summaries of the proposals, details of contributions of major novel publications related to the call, etc., will be available to peer reviewers in read-only format, that too via the system specifically designed by NSTF. Access to these documents will be managed via temporary credentials.
Moreover, the study shows that the transition from informed review to intelligent review must remain cautious, staged and policy sensitive. AI can substantially strengthen large-scale proposal processing, reviewer support, thematic intelligence, technical screening and evidence-based shortlisting; however, human judgment remains indispensable, especially in the near term, because proposal evaluation involves contextual reasoning, ethical scrutiny and risk-sensitive interpretation. Another salient feature is that the IRF is explicitly evolvable: it incorporates a roadmap in which the human–AI balance can be recalibrated as AI systems mature and as agreement between AI-assisted and human review improves. Thus, the IRF should not be seen merely as an automation framework but as a governance-oriented review architecture designed to improve fairness and reduce misuse and bias rooted in elite capture and cronyism, which include gender bias, strengthen novelty recognition and prepare research funding systems for the next phase of AI-integrated scientific assessment.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/publications14020032/s1, Table S1: Major indicators related to novelty, disruptiveness and paradigm shift detection/prediction from scientific and patent literature.

Author Contributions

H.H.L.: conceptualization, methodology, investigation, writing—original draft, writing—review and editing, visualization; R.R.: conceptualization, investigation, resources, writing—original draft, writing—review and editing; P.N.: investigation, resources, writing—original draft, writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author(s).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

AIArtificial Intelligence
GenAIGenerative Artificial Intelligence
MLMachine Learning
LLMLarge Language Models
IRFIntelligent Review Framework
IPRFInformed Peer Review Framework
STIScience, Technology and Innovation
OECDOrganization for Economic Co-operation and Development
PRFSPerformance-based Research Funding Systems
DSTDepartment of Science and Technology (Government of India)
WOSWeb of Science
EPOEuropean Patent Office
SEUSubjective Expected Utility
NSTFNational Scientometric Task Force
LDALatent Dirichlet Allocation
BERTBidirectional Encoder Representations from Transformers
ADBAd hoc Database
MDBMaster database
KBKnowledge base
FoSCiForensic Scientometrics
HRHuman Resources
AINAFAI-powered Novelty Assessment Frameworks
AIPRSAI peer review simulators
Sciento-LLMsScientometric LLMs (i.e., LLMs dedicated to Scientometrics)

References

  1. Abramo, G. (2024). The forced battle between peer-review and scientometric research assessment: Why the CoARA initiative is unsound. Research Evaluation, rvae021. [Google Scholar] [CrossRef]
  2. Abramo, G., D’Angelo, C. A., & Reale, E. (2019). Peer review versus bibliometrics: Which method better predicts the scholarly impact of publications? Scientometrics, 121, 537–554. [Google Scholar] [CrossRef]
  3. Agarwal, N. S., & Behera, R. K. (2024). Harnessing the power of large language models for effective query resolution and document summarization. In 2024 6th international conference on computational intelligence and networks (CINE) (pp. 1–6). IEEE. [Google Scholar]
  4. Allen, R., & Peterson, A. (2026). Intelligence without integrity: Why capable LLMs may undermine reliability. arXiv, arXiv:2602.20440. [Google Scholar] [CrossRef]
  5. Azoulay, P. (2019). Small research teams’ disrupt science more radically than large ones. Nature, 566(7744), 330–332. [Google Scholar] [CrossRef] [PubMed]
  6. Azoulay, P., & Greenblatt, W. H. (2025). Does peer review penalize scientific risk taking? Evidence from NIH grant renewals (No. w33495). National Bureau of Economic Research.
  7. Banerjee, S., Agarwal, A., & Singla, S. (2025). Llms will always hallucinate, and we need to live with this. In Intelligent systems conference (pp. 624–648). Springer Nature. [Google Scholar]
  8. Black, F., & Scholes, M. (1973). The pricing of options and corporate liabilities. Journal of Political Economy, 81(3), 637–654. [Google Scholar] [CrossRef]
  9. Bohannon, J. (2015). Hoax-detecting software spots fake papers. Science, 348, 18–19. [Google Scholar] [CrossRef]
  10. Bornmann, L., Devarakonda, S., Tekles, A., & Chacko, G. (2019a). Do disruption index indicators measure what they propose to measure? The comparison of several indicator variants with assessments by peers. arXiv, arXiv:1911.08775. [Google Scholar]
  11. Bornmann, L., Tekles, A., Zhang, H. H., & Ye, F. Y. (2019b). Do we measure novelty when we analyze unusual combinations of cited references? A validation study of bibliometric novelty indicators based on F1000Prime data. Journal of Informetrics, 13(4), 100979. [Google Scholar] [CrossRef]
  12. Boudreau, K. J., Guinan, E. C., Lakhani, K. R., & Riedl, C. (2016). Looking across and looking beyond the knowledge frontier: Intellectual distance, novelty, and resource allocation in science. Management Science, 62(10), 2765–2783. [Google Scholar] [CrossRef]
  13. Bozeman, B., & Jung, J. (2017). Bureaucratization in academic research policy: What causes it? Annals of Science and Technology Policy, 1(2), 133–214. [Google Scholar] [CrossRef]
  14. Bu, Y., Waltman, L., & Huang, Y. (2021). A multidimensional framework for characterizing the citation impact of scientific publications. Quantitative Science Studies, 2(1), 155–183. [Google Scholar] [CrossRef]
  15. Carius, A. C., & Teixeira, A. J. (2024). Artificial Intelligence and content analysis: The large language models (LLMs) and the automatized categorization. AI & SOCIETY, 40, 2405–2416. [Google Scholar] [CrossRef]
  16. Chen, C. (2006). CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for Information Science and Technology, 57(3), 359–377. [Google Scholar] [CrossRef]
  17. Christensen, C. (1997). Patterns in the evolution of product competition. European Management Journal, 15(2), 117–127. [Google Scholar] [CrossRef]
  18. Coccia, M. (2018). General properties of the evolution of research fields: A scientometric study of human microbiome, evolutionary robotics and astrobiology. Scientometrics, 117(2), 1265–1283. [Google Scholar] [CrossRef]
  19. Correia, A., Grover, A., Jameel, S., Schneider, D., Antunes, P., & Fonseca, B. (2023). A hybrid human–AI tool for scientometric analysis. Artificial Intelligence Review, 56(Suppl. S1), 983–1010. [Google Scholar] [CrossRef]
  20. Dalalah, D., & Dalalah, O. M. (2023). The false positives and false negatives of generative AI detection tools in education and academic research: The case of ChatGPT. The International Journal of Management Education, 21(2), 100822. [Google Scholar] [CrossRef]
  21. Dimensions. (2024, November 14). Dimensions research GPT on ChatGPT|Dimensions. Dimensions. Available online: https://www.dimensions.ai/products/all-products/dimensions-research-gpt/ (accessed on 14 March 2025).
  22. Edwards, A. M., Isserlin, R., Bader, G. D., Frye, S. V., Willson, T. M., & Yu, F. H. (2011). Too many roads not taken. Nature, 470(7333), 163–165. [Google Scholar] [CrossRef]
  23. Egghe, L. (2006). An improvement of the h-index: The g-index. ISSI Newsletter, 2(1), 8–9. [Google Scholar]
  24. Elsevier Ltd. (2025). ScienceDirect AI. Available online: https://www.elsevier.com/en-in/products/sciencedirect/sciencedirect-ai (accessed on 14 March 2025).
  25. Farber, S. (2024). Enhancing peer review efficiency: A mixed-methods analysis of artificial intelligence-assisted reviewer selection across academic disciplines. Learned Publishing, 37(4), e1638. [Google Scholar] [CrossRef]
  26. Fedorov, O., Müller, S., & Knapp, S. (2010). The (un) targeted cancer kinome. Nature Chemical Biology, 6(3), 166–169. [Google Scholar] [CrossRef]
  27. Franzoni, C., & Stephan, P. (2023). Uncertainty and risk-taking in science: Meaning, measurement and management in peer review of research proposals. Research Policy, 52(3), 104706. [Google Scholar] [CrossRef]
  28. Freeman, L. C. (1977). A set of measures of centrality based on betweenness. Sociometry, 40, 35–41. [Google Scholar] [CrossRef]
  29. Funk, R. J., & Owen-Smith, J. (2017). A dynamic network measure of technological change. Management Science, 63(3), 791–817. [Google Scholar] [CrossRef]
  30. Garfield, E. (1963). Citation indices in sociological and historical research. American Documentation, 14(4), 289–291. [Google Scholar] [CrossRef]
  31. Grant, R. M. (1996). Toward a knowledge-based theory of the firm. Strategic Management Journal, 17(Suppl. S2), 109–122. [Google Scholar] [CrossRef]
  32. Hanumanth. (2025, March 5). AI solutions for academia & research|Clarivate. Clarivate. Available online: https://clarivate.com/ai/academia/ (accessed on 14 March 2025).
  33. Hicks, D. (2012). Performance-based university research funding systems. Research Policy, 41(2), 251–261. [Google Scholar] [CrossRef]
  34. Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569–16572. [Google Scholar] [CrossRef]
  35. Hirsch, J. E. (2019). ‘h α: An index to quantify an individual’s scientific leadership. Scientometrics, 118(2), 673–686. [Google Scholar] [CrossRef]
  36. Jayaram, N. (2023). Recent trends in doctoral education in India. Innovations in Education and Teaching International, 60(5), 677–687. [Google Scholar] [CrossRef]
  37. Kaplan, S., & Vakili, K. (2015). The double-edged sword of recombination in breakthrough innovation. Strategic Management Journal, 36(10), 1435–1457. [Google Scholar] [CrossRef]
  38. Khalifa, M., & Albadawy, M. (2024). Using artificial intelligence in academic writing and research: An essential productivity tool. Computer Methods and Programs in Biomedicine Update, 5, 100145. [Google Scholar] [CrossRef]
  39. Kim, E. H., Jeong, Y. K., Kim, Y., & Song, M. (2022). Exploring scientific trajectories of a large-scale dataset using topic-integrated path extraction. Journal of Informetrics, 16(1), 101242. [Google Scholar] [CrossRef]
  40. Klavans, R., & Boyack, K. W. (2017). Research portfolio analysis and topic prominence. Journal of Informetrics, 11(4), 1158–1174. [Google Scholar] [CrossRef]
  41. Kosmulski, M. (2006). A new Hirsch-type index saves time and works equally well as the original h-index. ISSI Newsletter, 2(3), 4–6. [Google Scholar]
  42. Kübler, J. M., Budhathoki, K., Kleindessner, M., Zhou, X., Yin, J., Khetan, A., & Karypis, G. (2026, April 23–27). When LLMs get significantly worse: A statistical approach to detect model degradations. Fourteenth International Conference on Learning Representations (ICLR 2026), Rio de Janeiro, Brazil. Available online: https://openreview.net/forum?id=cM3gsqEI4K (accessed on 29 April 2026).
  43. Kwa, T., West, B., Becker, J., Deng, A., Garcia, K., Hasin, M., Jawhar, S., Kinniment, M., Rush, N., Von Arx, S., Bloom, R., Broadley, T., Du, H., Goodrich, B., Jurkovic, N., Miles, L. H., Nix, S., Lin, T., Parikh, N., … Chan, L. (2025). Measuring AI ability to complete long tasks. arXiv, arXiv:2503.14499. [Google Scholar]
  44. Lane, J. N., Teplitskiy, M., Gray, G., Ranu, H., Menietti, M., Guinan, E. C., & Lakhani, K. R. (2022). Conservatism gets funded? A field experiment on the role of negative information in novel project evaluation. Management Science, 68(6), 4478–4495. [Google Scholar] [CrossRef] [PubMed]
  45. Lathabai, H. H., George, S., Prabhakaran, T., & Changat, M. (2018). An integrated approach to path analysis for weighted citation. Scientometrics, 117, 1871–1904. [Google Scholar] [CrossRef]
  46. Lathabai, H. H., Nandy, A., & Singh, V. K. (2021). x-index: Identifying core competency and thematic research strengths of institutions using an NLP and network based ranking framework. Scientometrics, 126(12), 9557–9583. [Google Scholar] [CrossRef]
  47. Lathabai, H. H., & Prabhakaran, T. (2023). Contextual Ψ-index and its estimate for contextual productivity assessment. Scientometrics, 128(8), 4875–4886. [Google Scholar] [CrossRef]
  48. Lathabai, H. H., Prabhakaran, T., & Changat, M. (2015). Centrality and flow vergence gradient based path analysis of scientific literature: A case study of biotechnology for engineering. Physica A: Statistical Mechanics and its Applications, 429, 157–168. [Google Scholar] [CrossRef]
  49. Lathabai, H. H., Prabhakaran, T., & Raman, R. (2024). ChatGPT research: Insights from early studies using network scientometric approach. Journal of Scientometric Research, 13(3), 688–705. [Google Scholar] [CrossRef]
  50. Lathabai, H. H., Singh, V. K., Singh, P., & Raman, R. (2023, July 2–5). An informed peer review framework for thrust-area performance-based funding. 19th International Conference of the International Society for Scientometrics and Informetrics (ISSI 2023) (V1, pp. 285–303), Bloomington, IN, USA. [Google Scholar] [CrossRef]
  51. Leahey, E., Lee, J., & Funk, R. J. (2023). What types of novelty are most disruptive? American Sociological Review, 88(3), 562–597. [Google Scholar] [CrossRef]
  52. Lee, Y. N., Walsh, J. P., & Wang, J. (2015). Creativity in scientific teams: Unpacking novelty and impact. Research Policy, 44(3), 684–697. [Google Scholar] [CrossRef]
  53. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-T., Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33, 9459–9474. [Google Scholar]
  54. Leydesdorff, L., & Rafols, I. (2011). Indicators of the interdisciplinarity of journals: Diversity, centrality, and citations. Journal of Informetrics, 5(1), 87–100. [Google Scholar] [CrossRef]
  55. Li, D. (2017). Expertise versus bias in evaluation: Evidence from the NIH. American Economic Journal: Applied Economics, 9(2), 60–92. [Google Scholar] [CrossRef]
  56. Li, L., Xu, W., Guo, J., Zhao, R., Li, X., Yuan, Y., Zhang, B., Jiang, Y., Xin, Y., Dang, R., Zhao, D., Rong, Y., Feng, T., & Bing, L. (2024). Chain of ideas: Revolutionizing research via novel idea development with llm agents. arXiv, arXiv:2410.13185. [Google Scholar]
  57. Linton, J. D. (2016). Improving the Peer review process: Capturing more information and enabling high-risk/high-return research. Research Policy, 45(9), 1936–1938. [Google Scholar] [CrossRef]
  58. Liu, J. S., & Lu, L. Y. Y. (2012). An integrated approach for main path analysis: Development of the Hirsch index as an example. Journal of the Association for Information Science and Technology, 63(3), 528–542. [Google Scholar] [CrossRef]
  59. Madhan, M., Gunasekaran, S., & Arunachalam, S. (2018). Evaluation of research in India—Are we doing it right. Indian Journal of Medical Ethics, 3(3), 221–229. [Google Scholar] [CrossRef]
  60. Malinen, E. (2024). Interactive document summarizer using llm technology. Available online: https://lutpub.lut.fi/bitstream/handle/10024/167149/Master%27s%20thesis%2C%20Esko%20Malinen.pdf?sequence=1&isAllowed=y (accessed on 14 March 2025).
  61. McIntosh, L. D., & Vitale, C. H. (2024). Forensic scientometrics—An emerging discipline to protect the scholarly record. arXiv, arXiv:2404.00478. [Google Scholar]
  62. Milojević, S. (2025). Science of science. Scientometrics, 130(6), 3195–3211. [Google Scholar] [CrossRef]
  63. Mohammadi, E., & Karami, A. (2022). Exploring research trends in big data across disciplines: A text mining analysis. Journal of Information Science, 48(1), 44–56. [Google Scholar] [CrossRef]
  64. Mom, C., Sandström, U., & van den Besselaar, P. (2018). Does cronyism affect grant application success? The role of organizational proximity. In STI 2018 conference proceedings (pp. 1579–1585). Centre for Science and Technology Studies (CWTS). [Google Scholar]
  65. Munagandla, V. B., Dandyala, S. S. V., Vadde, B. C., & Engineer, D. (2024). AI-driven optimization of research proposal systems in higher education. Revista de Inteligencia Artificial en Medicina, 15(1), 650–672. [Google Scholar]
  66. Park, H., Lee, J. J., & Kim, B. C. (2015). Project selection in NIH: A natural experiment from ARRA. Research Policy, 44(6), 1145–1159. [Google Scholar] [CrossRef]
  67. Petsko, G. A. (2011). Risky business. Genome Biology, 12, 119. [Google Scholar] [CrossRef]
  68. Prabhakaran, T. (2018). Studies on some indices for paradigm shift emerging area detection and productivity assessment in techno-scientific disciplines: A network scientometric approach [Doctoral dissertation, University of Kerala]. Available online: http://hdl.handle.net/10603/474902 (accessed on 13 March 2025).
  69. Prabhakaran, T., Lathabai, H. H., & Changat, M. (2015). Detection of paradigm shifts and emerging fields using scientific network: A case study of information technology for engineering. Technological Forecasting and Social Change, 91, 124–145. [Google Scholar] [CrossRef]
  70. Prabhakaran, T., Lathabai, H. H., George, S., & Changat, M. (2018). Toward prediction of paradigm shifts from scientific literature. Scientometrics, 117(3), 1611–1644. [Google Scholar] [CrossRef]
  71. Rahman, M. H., Shihab, M., & Naderuzzaman, M. (2026). Design and implementation concept of an AI-powered scholarly discovery platform for emerging research ecosystems. Open Access Journal on Engineering Applications, 1(02), 8–18. [Google Scholar] [CrossRef]
  72. Raman, R., Lathabai, H. H., Mandal, S., Das, P., Kaur, T., & Nedungadi, P. (2024). ChatGPT: Literate or intelligent about UN sustainable development goals? PLoS ONE, 19(4), e0297521. [Google Scholar] [CrossRef] [PubMed]
  73. Sarrico, C. S. (2022). The expansion of doctoral education and the changing nature and purpose of the doctorate. Higher Education, 84(6), 1299–1315. [Google Scholar] [CrossRef]
  74. Schreiber, M. (2008). A modification of the h-index: The hm-index accounts for multiauthored manuscripts. Journal of Informetrics, 2(3), 211–216. [Google Scholar] [CrossRef]
  75. Sharma, A. K., & Sharma, R. (2025). Data governance in the age of artificial intelligence: Challenges, best practices and regulatory compliance. Applied Marketing Analytics, 10(4), 390–403. [Google Scholar] [CrossRef]
  76. Shibata, N., Kajikawa, Y., & Matsushima, K. (2007). Topological analysis of citation networks to discover the future core articles. Journal of the American Society for Information Science and Technology, 58(6), 872–882. [Google Scholar] [CrossRef]
  77. Shibayama, S., Yin, D., & Matsumoto, K. (2021). Measuring novelty in science with word embedding. PLoS ONE, 16(7), e0254034. [Google Scholar] [CrossRef] [PubMed]
  78. Stichweh, R. (2001). Scientific disciplines, history of. International Encyclopedia of the Social and Behavioral Sciences, 21, 287–290. [Google Scholar] [CrossRef]
  79. Stokel-Walker, C. (2022). AI bot ChatGPT writes smart essays-should professors worry? Nature. [Google Scholar] [CrossRef]
  80. Tol, R. (2009). The h-index and its alternatives: An application to the 100 most prolific economists. Scientometrics, 80(2), 317–324. [Google Scholar] [CrossRef]
  81. Uzzi, B., Mukherjee, S., Stringer, M., & Jones, B. (2013). Atypical combinations and scientific impact. Science, 342(6157), 468–472. [Google Scholar] [CrossRef]
  82. van den Besselaar, P., & Sandström, U. (2020). Bibliometrically disciplined peer review: On using indicators in research evaluation. Scholarly Assessment Reports, 2(1), 1–13. [Google Scholar] [CrossRef]
  83. van den Besselaar, P., Sandström, U., & Schiffbaenker, H. (2018). Studying grant decision-making: A linguistic analysis of review reports. Scientometrics, 117, 313–329. [Google Scholar] [CrossRef] [PubMed]
  84. Verhoeven, D., Bakker, J., & Veugelers, R. (2016). Measuring technological novelty with patent-based indicators. Research Policy, 45(3), 707–723. [Google Scholar] [CrossRef]
  85. Veugelers, R., Wang, J., & Stephan, P. (2025). Do funding agencies select and enable novel research: Evidence from ERC. Economics of Innovation and New Technology. [Google Scholar] [CrossRef]
  86. Wagner, C. S., Roessner, J. D., Bobb, K., Klein, J. T., Boyack, K. W., Keyton, J., Rafols, I., & Börner, K. (2011). Approaches to understanding and measuring interdisciplinary scientific research (IDR): A review of the literature. Journal of Informetrics, 5(1), 14–26. [Google Scholar] [CrossRef]
  87. Wang, J., Lee, Y. N., & Walsh, J. P. (2018). Funding model and creativity in science: Competitive versus block funding and status contingency effects. Research Policy, 47(6), 1070–1083. [Google Scholar] [CrossRef]
  88. Wang, J., Veugelers, R., & Stephan, P. (2017). Bias against novelty in science: A cautionary tale for users of bibliometric indicators. Research Policy, 46(8), 1416–1436. [Google Scholar] [CrossRef]
  89. Wu, L., Wang, D., & Evans, J. A. (2019). Large teams develop and small teams disrupt science and technology. Nature, 566(7744), 378–382. [Google Scholar] [CrossRef]
  90. Wu, Q., & Yan, Z. (2019). Solo citations, duet citations, and prelude citations: New measures of the disruption of academic papers. arXiv, arXiv:1905.03461. [Google Scholar] [CrossRef]
  91. Yan, Z., & Fan, K. (2024). An integrated indicator for evaluating scientific papers: Considering academic impact and novelty. Scientometrics, 129(11), 6909–6929. [Google Scholar] [CrossRef]
  92. Yan, Z., Xiang, B., Yu, D., & Shi, J. (2024). Identify the knowledge trajectory of internet of vehicles: From the perspective of main path analysis and topic analysis. IEEE Internet of Things Journal, 12, 1602–1612. [Google Scholar] [CrossRef]
  93. Zairul, M., Syamil, M., & Fateh, M. (2025, June 3–5). RFlowZ-SS: A design science research approach to revolutionizing research proposal formulation through ai integration. International Grand Invention, Innovation and Design Expo (IGIIDeation) 2025 (p. 7), Virtual. [Google Scholar]
Figure 1. Workflow of the IRF in phase 1.
Figure 1. Workflow of the IRF in phase 1.
Publications 14 00032 g001
Figure 2. Workflow of the IRF in phase 2.
Figure 2. Workflow of the IRF in phase 2.
Publications 14 00032 g002
Figure 3. Workflow of the IRF in phase 3.
Figure 3. Workflow of the IRF in phase 3.
Publications 14 00032 g003
Figure 4. Workflow of the IRF in phase 4.
Figure 4. Workflow of the IRF in phase 4.
Publications 14 00032 g004
Figure 5. Dynamics of FS values in no-risk scenario.
Figure 5. Dynamics of FS values in no-risk scenario.
Publications 14 00032 g005
Figure 6. Dynamics of FS values in low-risk scenario.
Figure 6. Dynamics of FS values in low-risk scenario.
Publications 14 00032 g006
Figure 7. Dynamics of FS values in moderate-risk scenario.
Figure 7. Dynamics of FS values in moderate-risk scenario.
Publications 14 00032 g007
Figure 8. Dynamics of FS values in high-risk scenario.
Figure 8. Dynamics of FS values in high-risk scenario.
Publications 14 00032 g008
Figure 9. AI peer review evolution envisioned in scenario 1.
Figure 9. AI peer review evolution envisioned in scenario 1.
Publications 14 00032 g009
Figure 10. AI peer review evolution envisioned in scenario 2.
Figure 10. AI peer review evolution envisioned in scenario 2.
Publications 14 00032 g010
Figure 11. Evolutionary roadmap of the IRF that leads to scenario 2.
Figure 11. Evolutionary roadmap of the IRF that leads to scenario 2.
Publications 14 00032 g011
Table 1. Details (essential for computation of CS′) of 7 applicants.
Table 1. Details (essential for computation of CS′) of 7 applicants.
TF1TF2TF3TF4TF5
A(C, 10, 500, 3)---(C, 15, 750, 4)---(P, 5, 100,1)
B(C, 15, 600, 2)(P, 10, 275, 2)---(C, 25, 950, 3)---
C(P, 9, 110, 1)---(C, 30, 2000, 6)(P, 10, 100,1)---
D---(C, 25, 1350, 4)(P, 7, 300, 2)(P, 12, 210, 2)----
E(P, 5, 90, 0)(C, 35, 1950, 5)------(C, 15, 400, 1)
F---(P, 8, 350, 0)-------(C, 40, 2750, 7)
G--------(P, 18, 350, 3)(C, 25, 650, 0)(P, 7, 400, 2)
Table 2. CS computation of 7 applicants.
Table 2. CS computation of 7 applicants.
αβ γ δ CS
A2/5 = 0.41/5 = 0.2(500 + 750)/8200 = 0.152100/8200 = 0.0120.225
B2/5 = 0.41/5 = 0.2(600 + 950)/8200 = 0.189275/8200 = 0.0340.241
C1/5 = 0.22/5 = 0.42000/8200 = 0.244(110 + 100)/8200 = 0.0260.219
D1/5 = 0.22/5 = 0.41350/8200 = 0.165(300 + 210)/8200 = 0.0620.197
E2/5 = 0.41/5 = 0.2(1950 + 400)/8200 = 0.28790/8200 = 0.0110.272
F1/5 = 0.21/5 = 0.22750/8200 = 0.335350/8200 = 0.0430.224
G1/5 = 0.22/5 = 0.4650/8200 = 0.079(350 + 400)/8200 = 0.0910.171
Table 3. CS′ and FS computation of 7 applicants (with 56% overall weightage to novelty).
Table 3. CS′ and FS computation of 7 applicants (with 56% overall weightage to novelty).
Total Number of Novel/Disruptive/
Paradigm Shift Papers
η P CS′CS′
(rsc)
Novelty
Score
Aggregate
Score
FS
A88/535 = 0.0150.060068010055.201362
B77/835 = 0.0130.06366.367510053.5427314
C88/535 = 0.0150.05855.858510056.9416059
D88/535 = 0.0150.05305.37810054.1990449
E66/535 = 0.0110.07087.087010052.0316617
F66/535 = 0.0110.05925.928210055.888646
G55/535 = 0.0090.04524.526510049.2080921
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lathabai, H.H.; Raman, R.; Nedungadi, P. Novelty First Policy-Based Intelligent Review Framework (IRF) for the Evaluation of Research Proposals. Publications 2026, 14, 32. https://doi.org/10.3390/publications14020032

AMA Style

Lathabai HH, Raman R, Nedungadi P. Novelty First Policy-Based Intelligent Review Framework (IRF) for the Evaluation of Research Proposals. Publications. 2026; 14(2):32. https://doi.org/10.3390/publications14020032

Chicago/Turabian Style

Lathabai, Hiran H., Raghu Raman, and Prema Nedungadi. 2026. "Novelty First Policy-Based Intelligent Review Framework (IRF) for the Evaluation of Research Proposals" Publications 14, no. 2: 32. https://doi.org/10.3390/publications14020032

APA Style

Lathabai, H. H., Raman, R., & Nedungadi, P. (2026). Novelty First Policy-Based Intelligent Review Framework (IRF) for the Evaluation of Research Proposals. Publications, 14(2), 32. https://doi.org/10.3390/publications14020032

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop