Skip Content
You are currently on the new version of our website. Access the old version .
PublicationsPublications
  • Essay
  • Open Access

28 January 2026

Interpreting Bibliometric Indicators as the “Blood Tests” of Research Systems

Department of Engineering and Science, Universitas Mercatorum, 00186 Rome, Italy

Abstract

The increasing emphasis on responsible research assessment has renewed the need for conceptual tools that help communicate the complementary roles of quantitative and qualitative evaluation. This essay proposes an interpretative metaphor that frames bibliometric indicators as the “blood tests” of research systems—heuristic devices that reveal multidimensional aspects of system vitality, balance, and dysfunction. The metaphor, grounded in standard categories of clinical diagnostics (hematological, hepatic, renal, lipidic, and cardiovascular panels), provides an accessible language for scholars and policymakers in research. Each bibliometric technique—ranging from publication and citation counts to patent analysis, altmetrics, and topic modelling—is associated with a diagnostic function such as screening, monitoring, or early risk detection. By linking established principles of responsible metrics (DORA, Leiden Manifesto, Metric Tide, CoARA) with the professionalization of evaluators, the essay situates the metaphor within current debates on bibliometric literacy and the ethical interpretation of indicators. Rather than prescribing metrics or decision rules, the contribution invites reflection on how evaluators can interpret bibliometric signals diagnostically—as contextual evidence for institutional learning, strategic decision-making, and the cultivation of healthy, adaptive research systems. Consistent with the essay format, this contribution does not propose a new evaluative methodology nor empirical validation. Instead, it advances a heuristic and communicative framework intended to emphasize the holistic, contextual, and professionally informed interpretation of quantitative indicators in the evaluation of research activity.

1. Introduction

The debate surrounding the role of bibliometrics in research evaluation is reaching a pivotal phase. Since the San Francisco Declaration on Research Assessment (DORA, 2012), followed by the Metric Tide Report (J. Wilsdon, 2016), the Leiden Manifesto for Research Metrics (2015) (Hicks et al., 2015), the Hong Kong Principles for Assessing Researchers (2019) (Moher et al., 2020), and most recently the CoARA Agreement on Reforming Research Assessment (CoARA, 2022), as well as the Global Research Council’s Statement of Principles on Sustainable Research (GRC, 2024), which further reinforces the international consensus toward responsible and sustainable assessment practices, the scientific community has been calling for the more responsible, balanced, and transparent use of quantitative indicators. These initiatives build on earlier European discussions about alternative indicators, such as the Next-Generation Metrics report (J. R. Wilsdon et al., 2017), which emphasized openness, diversity, and community-driven approaches to impact measurement.
Research evaluation has undergone profound transformations over the past two decades. Initially dominated by informal reputational assessments and local hierarchies, research systems have progressively moved toward more standardized, quantitative, and comparative approaches. As Whitley (2000) noted, early research systems relied heavily on local reputations, with formal evaluation mechanisms only emerging recently. Ziman (2001) described this as part of the shift from “academic” to “post-academic” science, shaped by external accountability. DORA marked a watershed moment, explicitly questioning the overreliance on journal-based metrics such as the journal impact factor and advocating for the direct assessment of research outputs and broader contributions. The Leiden Manifesto offered ten principles for the responsible use of metrics, emphasizing transparency, contextualization, and integration with qualitative judgment.
The Hong Kong Principles added a crucial dimension: embedding research integrity and responsible conduct into evaluation frameworks. Most recently, the CoARA Agreement has galvanized a broad coalition of institutions, funders, and researchers to move toward a more qualitative, peer review-centered model. These documents collectively mark a paradigmatic shift away from simplistic, numbers-driven evaluations toward more nuanced, context-sensitive, and multifaceted approaches.
However, this evolution has sparked intense debate. While the intentions of these initiatives are laudable, several voices have cautioned against a wholesale rejection of quantitative methods. Abramo (2024) has critiqued CoARA for sidelining the expertise of the scientometric community, warning against “scientometric negationism.” Abramo argues that bibliometric indicators, when properly used, are not an enemy of fairness or rigor but a necessary complement to qualitative peer judgment, particularly in disciplines where standardized, comparable data are available. This debate resonates with recent efforts to articulate explicit Principles of Evaluative Bibliometrics in a DORA/CoARA Context (Torres-Salinas et al., 2025), which call for renewed dialogue between responsible metrics initiatives and the scientometric profession. Abramo’s critique is not a defense of blind metric worship but a call for professionalization. As he has emphasized, “there is no one-size-fits-all solution” in research assessment. Peer review is indispensable in the arts and humanities, but it is not infallible: it is subject to biases, power dynamics, inconsistency, and opacity. It is extremely costly and time-consuming. Metrics, when responsibly deployed, can counterbalance some of these flaws by providing standardized, reproducible, and scalable indicators. Conversely, relying exclusively on metrics can distort researcher behavior, incentivizing quantity over quality and fostering citation cartels (Biagioli & Lippman, 2020). These systems show that the key is not choosing between metrics and peer review but orchestrating them intelligently.
This essay is primarily addressed to scholars and practitioners in research policy and the science of science who are interested in conceptual and communicative approaches to the interpretation of bibliometric indicators. Rather than offering new indicators or empirical results, the purpose is to use the diagnostic metaphor as a reflexive and pedagogical tool—one that helps illustrate to both experts and decision-makers how quantitative and qualitative approaches can coexist and inform responsible assessment practices. Consistent with the essay format, this contribution is intentionally conceptual and reflective. Its aim is not to advance a prescriptive evaluative model, but to offer a shared interpretive language for making sense of bibliometric signals associated with research activity. The value of the proposed metaphor therefore lies in its capacity to structure reasoning and communication among scientometrics expertise, peer judgement and research policy, rather than in methodological operationalization.
The evolution of research evaluation practices is not occurring in a vacuum but reflects broader transformations in the governance of science and higher education. Over the past several decades, increasing public investment in research, heightened societal expectations, and global competition have created pressures for greater accountability, transparency, and performance measurement. As Power (1997) observes, we live in an “audit society,” where verification rituals pervade all sectors. Marginson (2006) further highlights that global competition has intensified performance pressures, while Hazelkorn (2011) shows how international rankings have reshaped institutional priorities. Movements like DORA, the Leiden Manifesto, the Hong Kong Principles, and the CoARA Agreement should be understood within this context: they are not merely technical interventions but also normative responses to the perceived excesses and distortions introduced by metric-based assessment systems.
Critics have argued that research evaluation is experiencing a “crisis of assessment,” characterized by an erosion of trust, a proliferation of rankings, and a shift from intrinsic to extrinsic motivations. As Strathern (1997) noted, “when a measure becomes a target, it ceases to be a good measure,” underscoring the dangers of gaming and metric distortion. Espeland and Sauder (2007) have shown how rankings reshape institutional behavior, while Muller (2018) warns of a “tyranny of metrics” that risks crowding out professional judgment. In this landscape, the challenge is not to reject measurement altogether but to reimagine it in ways that reinforce, rather than undermine, the fundamental values of scholarship. Bibliometrics, if used judiciously, with clear, specific objectives and within well-defined and context-sensitive frameworks, can be part of the solution. When applied with methodological rigor and interpreted by professionals with expertise in the field, bibliometric analysis has the potential to enhance fairness, transparency, and strategic decision-making in research assessment.

2. The Diagnostic Metaphor as an Interpretive Lens

In this context, the challenge is to reconceptualize bibliometrics not as a substitute for qualitative judgment, but as a set of complementary diagnostic signals whose meaning emerges through interpretation. Rather than functioning as evaluative tools in their own right, bibliometric indicators are considered here as sources of contextual evidence that can inform expert reflection. The diagnostic metaphor introduced in this section is intended as an interpretive lens for reasoning and communication within the research policy and scientometric communities. By drawing parallels with clinical diagnostics, it offers an accessible language through which evaluators, policymakers, and scholars can discuss the roles.
Much like clinical diagnostics in medicine, bibliometrics can offer indicators of “health” that, when interpreted wisely, enrich rather than replace professional judgment.
I propose to conceptualize bibliometrics as the “blood test” of research activity: a diagnostic tool offering a measurable, multidimensional assessment of the health, vitality, and dynamics of research systems. Just as clinical blood tests examine various physiological markers to evaluate an individual’s health, bibliometric indicators provide diverse perspectives on the state of research at individual, institutional, and systemic levels.
Importantly, this metaphor should be approached with caution. Clinical diagnostics and research evaluation operate in fundamentally different domains. Biological health is, in many respects, a more clearly delineated target than scholarly value or intellectual contribution, which are inherently pluralistic, contested, and shaped by disciplinary cultures. Moreover, laboratory tests are supported by decades of validation and standardized reference ranges, whereas bibliometric indicators remain subject to debate, context dependency, and evolving interpretations. Unlike biochemical parameters, bibliometric indicators lack universally validated reference ranges. Their interpretive value depends on comparative baselines—across disciplines, institutions, or time periods—rather than fixed thresholds. This relativism does not undermine their diagnostic usefulness but underscores the need for contextual calibration and expert interpretation, much as reference values in medicine differ across populations and conditions.
The metaphor should not be interpreted as a rigid mapping but as a heuristic device to stimulate reflection on the multidimensionality of evaluation. It is important to clarify the epistemic status of the correspondences proposed below. The mappings between clinical diagnostic categories and bibliometric techniques are not intended as fixed equivalences, nor as theory-driven functional classifications. They are heuristic analogies based on the diagnostic function performed—such as screening, monitoring, or early risk signalling—rather than on biological similarity or causal correspondence.
Accordingly, the diagnostic metaphor should be read as an interpretive lens designed to stimulate multidimensional reasoning about evaluation practices, not as a classificatory scheme or evaluative model.
To illustrate the heuristic potential of this metaphor, the table below presents an illustrative comparison between selected categories of clinical diagnostics and commonly used bibliometric techniques. This table is not intended as a prescriptive or exhaustive classification, but as a conceptual exercise to highlight some possible functional parallels in diagnostic reasoning The clinical test categories included are drawn from major physiological systems routinely assessed in standard laboratory medicine and are used here solely for their illustrative value, in order to support reflection on the multidimensional interpretation of bibliometric indicators. The grouping follows the structure of widely adopted diagnostic panels in clinical practice—such as the Comprehensive Metabolic Panel and the Complete Blood Count (see, e.g., Burtis et al., 2015; Fischbach et al., 2021)—which encompasses hematological, hepatic, renal, lipidic, and cardiovascular markers. The selection is therefore anchored in established medical taxonomies rather than arbitrary choice.
From a conceptual standpoint, these groups were chosen because they embody distinct diagnostic functions—baseline screening (e.g., hematological tests), functional monitoring (e.g., hepatic and renal profiles), and risk assessment (e.g., lipidic and cardiovascular markers). These functions are used here to illustrate different interpretive roles that bibliometric indicators may play when examining research activity, including the provision of descriptive overviews, the identification of emerging patterns or trends, and the signalling of potential imbalances or sources of strain. Each bibliometric indicator, like each clinical parameter, offers a partial window onto system functioning; considered together, they can support a more nuanced, multidimensional understanding of research performance. Table 1 summarizes these conceptual parallels, aligning major categories of laboratory tests with corresponding bibliometric approaches, based on functional analogies rather than direct equivalences. Each diagnostic group represents a specific evaluative function—such as screening, monitoring, or risk detection—mirrored in bibliometric practices that capture productivity, visibility, collaboration, innovation, and emerging trends.
Table 1. Conceptual framework: parallels between clinical diagnostic categories and bibliometric techniques.
Alternative correspondences could be proposed without undermining the validity of the metaphor, as its analytical relevance does not depend on any single pairing. Rather, it lies in the reasoning process that encourages evaluators to interpret indicators in relation to distinct diagnostic functions and to consider them jointly rather than in isolation.
The heuristic potential of the diagnostic metaphor can be illustrated through simple conceptual examples or interpretive reasoning. At the institutional level, a university may use bibliometric “blood tests” to complement qualitative self-assessment. For example, a decline in collaboration indicators—analogous to an electrolyte imbalance—combined with rising publication counts but stagnant citation impact—akin to elevated yet unproductive metabolic activity—may indicate emerging organizational imbalance. Such patterns can signal that the research system is under strain, prompting leadership to examine potential issues in coordination, resource distribution, or strategic alignment.
At the disciplinary level, mapping keyword co-occurrences and applying topic modelling—comparable to cardiac and inflammatory markers—can reveal early signs of thematic pressure or emerging opportunities. For example, a sudden concentration of attention on artificial intelligence within education research may signal both intellectual vitality and the risk of thematic saturation. In such cases, evaluators may integrate these indicators with expert peer judgement to determine whether the trend reflects genuine innovation or merely transient hype.
Just as physicians are trained to read laboratory results in the context of patient history and environmental factors, so too should research evaluators be professionals with expertise in scientometrics, capable of contextualizing metrics and avoiding simplistic judgments. Meeting this demand requires the development of structured training and certification initiatives. International efforts, such as those promoted by ENRESSH1 and the CoARA communities of practice, are beginning to outline curricula and guidelines for evaluator professionalism. Embedding bibliometric literacy into doctoral education and research management programs could further strengthen this professional field.
Equally important is recognizing contextual variables: just as environmental and social determinants shape health outcomes, institutional and geographic contexts influence research performance. Factors such as national funding levels, teaching loads, or access to research infrastructure must be considered, along with the career stage and trajectory of the evaluated subject. A bibliometric “diagnosis” without these considerations’ risks producing misleading or unjust conclusions.
Finally, effective and ethical research evaluation is not solely the responsibility of evaluators and institutions; it requires engagement from the evaluated as well. Researchers, particularly early-career scholars, need opportunities to develop bibliometric literacy and to communicate the diverse impacts of their work.
When approached responsibly, evaluation functions not merely as an external audit but as a catalyst for reflection, learning, and improvement. It creates a virtuous circle that helps researchers, teams, and institutions clarify goals, identify strengths and weaknesses, and pursue continuous quality enhancement. A responsible, dialogical approach is thus essential to foster not only fairness but also growth, innovation, and resilience in research environments.

3. Discussion: Implications, Limits and Responsible Use of the Diagnostic Metaphor

The diagnostic metaphor proposed in this essay raises important implications for contemporary debates on research evaluation, particularly in the context of responsible metrics and the professionalization of evaluative practices. A further implication of the clinical analogy concerns the notion of systemic balance. In medicine, health is not defined by the maximization of individual parameters, but by their equilibrium within acceptable ranges; both deficiency and excess may signal dysfunction. Transposed to research evaluation, this perspective reinforces the diagnostic logic of the framework: not only low productivity, visibility, or collaboration may be problematic, but also their hypertrophy, when it leads to publication inflation, excessive specialization, citation pressure, or overdependence on specific indicators. The metaphor thus helps reframe bibliometric indicators as signals of balance or imbalance, rather than as targets to be maximized. Its primary contribution lies not in operational prescription, but in reframing how bibliometric indicators are interpreted and communicated. A recurrent problem in research assessment is the tendency to translate indicators directly into decisions, often without sufficient contextualization or methodological expertise. The diagnostic analogy explicitly resists such mechanistic use. Just as laboratory results do not dictate medical treatment without clinical interpretation, bibliometric indicators should not determine evaluative outcomes without expert judgment, disciplinary knowledge, and qualitative assessment. From this perspective, the limited operationalization of the framework is intentional. This essay argues that premature formalization—through rigid decision rules or flowcharts—risks reinforcing the very distortions that responsible metrics initiatives seek to correct. The diagnostic metaphor instead emphasizes interpretive literacy, professional responsibility, and dialogue among evaluators, researchers, and policymakers. The framework also highlights the need for evaluator professionalization. Reading bibliometric “signals” diagnostically requires training, methodological competence, and ethical awareness. Without such expertise, indicators are prone to misuse, overinterpretation, or gaming. In this sense, the metaphor aligns with recent calls from CoARA and related initiatives to invest in evaluator training rather than in ever more complex metric systems. Finally, the metaphor has clear communicative value. It offers a shared language that can help bridge gaps between scientometric experts and non-specialist decision-makers, facilitating more informed and transparent discussions about research performance, risk, and sustainability. Its usefulness therefore lies less in methodological novelty than in its capacity to support reflective, responsible evaluation practices.

4. Concluding Remarks

In conclusion, conceptualizing bibliometrics as the “blood tests” of research activity offers more than an appealing metaphor: it provides a communicative and interpretive perspective for understanding how diverse indicators contribute to the health and sustainability of research systems. Each metric, like a clinical parameter, offers a partial but meaningful signal—one that gains interpretive power only when contextualized by expert judgment and complemented by qualitative insights.
The metaphor is intentionally heuristic rather than prescriptive. By aligning bibliometric methods with standard categories of clinical diagnostics, it emphasizes the value of integration, professional literacy, and interpretive expertise in research evaluation. The illustrative examples included are intended to show how the metaphor can support reflective interpretation and informed discussion, rather than to prescribe evaluative outcomes or policy decisions.
This perspective also echoes recent calls from DORA, the Leiden Manifesto, the Metric Tide, and CoARA to move beyond metric minimalism and towards responsible, contextual, and transparent assessment practices. It suggests that the future of research evaluation will depend not on rejecting or embracing metrics, but on learning to read them diagnostically—as signs of systemic vitality, imbalance, or transformation.
Ultimately, the diagnostic metaphor invites a broader dialogue between the scientometric community and research policy stakeholders. It promotes a shared vocabulary for discussing the “health” of research ecosystems and highlights the ethical responsibility to nurture them with both scientific precision and interpretive care. As an essay, this contribution does not aim to close the debate on research evaluation, but to open a reflective space in which metrics can be discussed as diagnostic signals rather than as targets or proxies of value. By emphasizing interpretation over prescription, the framework invites further dialogue between the scientometric community and research policy stakeholders on how to cultivate healthier and more resilient research systems.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CoARACoalition for Advancing Research Assessment
DORADeclaration on Research Assessment
CPHCreatine Phosphokinase
LDHLactate Dehydrogenase
ESRErythrocyte Sedimentation Rate
ENRESSHEuropean Network for Research Evaluation in the Social Sciences and Humanities

Note

1
https://enressh.eu, accessed on 27 November 2025.

References

  1. Abramo, G. (2024). The forced battle between peer-review and scientometric research assessment: Why the CoARA initiative is unsound. Research Evaluation, rvae021. [Google Scholar] [CrossRef]
  2. Biagioli, M., & Lippman, A. (Eds.). (2020). Gaming the metrics: Misconduct and manipulation in academic research. MIT Press. [Google Scholar]
  3. Burtis, C. A., Bruns, D. E., & Tietz, N. W. (2015). Disorders of bone and mineral metabolism. In R. Edward (Ed.), Tietz textbook of clinical chemistry and molecular diagnostics (7th ed., p. 759). Chapter 39. Elsevier. [Google Scholar]
  4. Coalition for Advancing Research Assessment (CoARA). (2022). Agreement on reforming research assessment. Available online: https://coara.eu/agreement (accessed on 27 November 2025).
  5. Espeland, W. N., & Sauder, M. (2007). Rankings and reactivity: How public measures recreate social worlds. American Journal of Sociology, 113(1), 1–40. [Google Scholar] [CrossRef]
  6. Fischbach, F., Fischbach, M., & Stout, K. (2021). Fischbach’s A manual of laboratory and diagnostic tests. Lippincott Williams & Wilkins. [Google Scholar]
  7. Global Research Council. (2024). Statement of principles on sustainable research. Available online: https://globalresearchcouncil.org/fileadmin//user_upload/GRC_2024_Statement_of_Principles.pdf (accessed on 27 November 2025).
  8. Hazelkorn, E. (2011). Rankings and the reshaping of higher education: The battle for world-class excellence. Palgrave MacMillan. [Google Scholar]
  9. Hicks, D., Wouters, P., Waltman, L., de Rijcke, S., & Rafols, I. (2015). Bibliometrics: The Leiden Manifesto for research metrics. Nature, 520(7548), 429–431. [Google Scholar] [CrossRef]
  10. Marginson, S. (2006). Dynamics of national and global competition in higher education. Higher Education, 52, 1–39. [Google Scholar] [CrossRef]
  11. Moher, D., Bouter, L., Kleinert, S., Glasziou, P., Sham, M. H., Barbour, V., Coriat, A.-M., Foeger, N., & Dirnagl, U. (2020). The Hong Kong Principles for assessing researchers: Fostering research integrity. PLoS Biology, 18(7), e3000737. [Google Scholar] [CrossRef]
  12. Muller, J. (2018). The tyranny of metrics. Princeton University Press. [Google Scholar]
  13. Power, M. (1997). The audit society: Rituals of verification. OUP Oxford. [Google Scholar]
  14. San Francisco Declaration on Research Assessment (DORA). (2012). Available online: https://sfdora.org (accessed on 27 November 2025).
  15. Strathern, M. (1997). Improving ratings: Audit in the British University system. European Review, 5(3), 305–321. [Google Scholar] [CrossRef]
  16. Torres-Salinas, D., Arroyo-Machado, W., & Robinson-García, N. (2025). Principles of evaluative bibliometrics in a DORA/CoARA context. InfluScience Ediciones. [Google Scholar]
  17. Whitley, R. (2000). The intellectual and social organization of the sciences. Oxford University Press. [Google Scholar]
  18. Wilsdon, J. (2016). The metric tide: Independent review of the role of metrics in research assessment and management. Sage. [Google Scholar]
  19. Wilsdon, J. R., Bar-Ilan, J., Frodeman, R., Lex, E., Peters, I., & Wouters, P. (2017). Next-generation metrics: Responsible metrics and evaluation for open science. European Commission. [Google Scholar]
  20. Ziman, J. (2001). Real science: What it is, and what it means. Cambridge University Press. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.