The Innovative Potential of Artificial Intelligence Applied to Patient Registries to Implement Clinical Guidelines

Gangemi, Sebastiano; Allegra, Alessandro; Di Gioacchino, Mario; Gammeri, Luca; Cacciola, Irene; Canonica, Giorgio Walter

doi:10.3390/make8020038

Open AccessPerspective

The Innovative Potential of Artificial Intelligence Applied to Patient Registries to Implement Clinical Guidelines

by

Sebastiano Gangemi

¹,

Alessandro Allegra

²

,

Mario Di Gioacchino

^3,4

,

Luca Gammeri

⁵

,

Irene Cacciola

^6,7,*,†

and

Giorgio Walter Canonica

^8,9,†

¹

Unit and School of Allergy and Clinical Immunology, Department of Clinical and Experimental Medicine, University of Messina, 98122 Messina, Italy

²

Division of Hematology, Department of Human Pathology in Adulthood and Childhood “Gaetano Barresi”, University of Messina, 98125 Messina, Italy

³

Center for Advanced Studies and Technology, G. D’Annunzio University, 66100 Chieti, Italy

⁴

Institute for Clinical Immunotherapy and Advanced Biological Treatments, 65100 Pescara, Italy

⁵

Department of Biomedical and Dental Science and Morphofunctional Imaging, University of Messina, 98125 Messina, Italy

⁶

Department of Clinical and Experimental Medicine, University of Messina, 98124 Messina, Italy

⁷

Medicine and Hepatology Unit, Department of Medical Science, University Hospital of Messina, 98124 Messina, Italy

⁸

Personalized Medicine, Asthma and Allergy Unit—IRCCS Humanitas Research Hospital, 20089 Milan, Italy

⁹

Department of Biomedical Sciences, Humanitas University, 20072 Milan, Italy

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mach. Learn. Knowl. Extr. 2026, 8(2), 38; https://doi.org/10.3390/make8020038

Submission received: 26 November 2025 / Revised: 1 February 2026 / Accepted: 5 February 2026 / Published: 7 February 2026

(This article belongs to the Topic AI and Computational Methods for Modelling, Simulations and Optimizing of Advanced Systems: Innovations in Complexity, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Guidelines provide specific recommendations based on the best available medical knowledge, summarizing and balancing the advantages and disadvantages of various diagnostic and treatment options. Currently, consensus methods are the best and most common practices in creating clinical guidelines, even though these approaches have several limitations. However, the rapid pace of biomedical innovation and the growing availability of real-world data (RWD) from clinical registries (containing data like clinical outcomes, treatment variables, imaging, and laboratory results) call for a complementary paradigm in which recommendations are continuously stress-tested against high-quality, interoperable data and auditable artificial intelligence (AI) pipelines. AI, based on information retrieved from patient registries, can optimize the process of creating guidelines. In fact, AI can analyze large volumes of data, ensuring essential tasks such as correct feature identification, prediction, classification, and pattern recognition of all information. In this work, we propose a four-phase lifecycle, comprising data curation, causal analysis and estimation, objective validation, and real-time updates, complemented by governance and machine learning operations (MLOps). A comparative analysis with consensus-only methods, a pilot protocol, and a compliance checklist are provided. We believe that the use of AI will be a valuable support in drafting clinical guidelines to complement expert consensus and ensure continuous updates to standards, providing a higher level of evidence. The integration of AI with high-quality patient registries has the potential to substantially modernize guideline development, enabling continuously updated, data-driven recommendations.

Keywords:

registry; guideline development; consensus methods; grade; generative artificial intelligence; explainability; dataset curation; bias; interpretability; prediction mode

Graphical Abstract

1. Introduction

Evidence-based medicine and personalized medicine should be the cornerstone of every patient’s treatment, the aim of all health researchers, and the foundation of effective clinical practice. Physicians must therefore adopt a methodical and individualized approach to clinical decision-making, grounding their choices in high-quality evidence and adapting recommendations to patient-specific characteristics [1]. In such a framework, evidence-based medicine provides information from which clinical guidelines can be established. Clinical guidelines provide structured, evidence-based recommendations that support the diagnosis and management of medical conditions. They provide specific recommendations based on the best available medical knowledge, summarizing it and balancing the advantages and disadvantages of various diagnostic and treatment options [2].

The consensus method, defined as general agreement among group members, whether explicit or implicit, is used to draft clinical guidelines. The process is accomplished by forming a guideline committee that includes experts covering all pertinent aspects of the medical condition in question [3].

Guidelines can be created for a wide variety of topics, but primarily these focus on disease diagnosis and therapeutic approach. Developing guidelines requires significant resources, including clinicians and researchers with a wide range of expertise and financial support. Furthermore, the development of clinical guidelines should include a multidisciplinary approach and a systematic review of the evidence [4]. Therefore, it is essential to determine the topics of the recommendations, the composition of the working groups that develop the guidelines, the procedures by which these groups operate, and the crucial procedural issue of conflicts of interest. Next, it is necessary to identify and synthesize the evidence, select the types of results to include in the guidelines, classify and present the evidence, and integrate economic considerations. The final step is the transition from evidence to recommendations, with the subsequent review, reporting, and publication of the guidelines [5].

Today, artificial intelligence (AI) is increasingly gaining ground in scientific and medical practice and is finding wide application, from data analysis to diagnostic and decision-making support. Applying modern AI models to the analysis of medical records could be a valuable tool for developing more precise, consistently updated guidelines.

1.1. How to Generate Clinical Recommendations: The Consensus Development

Different processes have been proposed for implementing the guidelines, all based on a consensus method. The Delphi method, an organized approach for reaching consensus among experts, has been used [6]. The system employs a series of questionnaires to gather knowledge from a group of experts; each step is refined based on responses from the experts. This enables participants to consider the group’s opinion and, in the subsequent round, modify their own evaluations in the subsequent round [7]. Typically, two rounds are used; more rounds increase panel dropouts’ rates, though they may also improve output quality. The process is time-consuming due to the time for response collection and question customization. A 9-point Likert scale is commonly used for rating. In general, “no consensus” is the outcome of the conference when at least one-third of respondents rate the statement at the other end of the spectrum from their peers [8].

The Nominal Group Technique (NGT), a highly controlled in-person group interaction, offers an alternative approach that is particularly well-suited for addressing complex problems [9]. Voting, round robin, silent generation, and clarification are its four main phases [10].

The NGT has been widely used in various healthcare settings to achieve a range of objectives. Some notable applications include establishing and generating criteria for pharmacy practice [9], influencing practice change [11], and educating the profession on specific themes [12].

Consensus methods, such as the Nominal Group Technique (NGT) and the Delphi Technique, often resemble focus groups, which are widely used in pharmacy practice research. All these methods foster communication among participants, yet they diverge in their processes and outcomes. NGT’s structured approach and ranking system are well-suited for decision-making. The Delphi Technique is ideal for achieving expert consensus on complex issues. Focus groups provide in-depth insights into participants’ experiences and perceptions.

Additionally, the RAND/UCLA approach is a powerful tool that enhances the consensus-making process by combining the Delphi and Nominal Group Techniques. Its iterative rating process improves the relevance and consistency of rankings, allowing experts to refine their evaluations based on peer insights. The process is relevant in literature reviews and recommendation writing [13].

Lastly, as mentioned above, consensus methodologies must use grading schemes that are straightforward in evaluating both the strength of the recommendations and the caliber of the evidence. GRADE is based on meta-analysis and systematic reviews, and subsequent grading is performed by experts who avoid conflicts of interest. The success of GRADE as a rigorous grading method is reflected in its widespread use. The GRADE method assigns a grade to each of four levels of evidence quality—high, moderate, low, and very low—to maintain simplicity and clarity. The system provides two recommendation grades, “strong” and “weak,” ensuring recommendations are understandable and easily applicable to practitioners and policymakers [14].

1.2. Limits of Consensus Methods in the Implementation of Clinical Guidelines

Although consensus development conferences are the usual method for designing clinical guidelines, their drawbacks cannot be overlooked. Inconsistent methodology, poor procedural conceptualization, and variable decision-making processes pose significant challenges to achieving effective consensus [15]. Furthermore, the process can be time-consuming, often requiring extensive negotiation and compromise, and it requires significant resources, including personnel, materials, and money. Lastly, one notable issue is the potential influence of dominant voices within the conference.

While the Delphi method holds significant potential for creating high-quality reporting guidelines through expert consensus, its current use remains limited; only 25% of participants in a systematic review use Delphi to reach a consensus. Furthermore, most reporting guidelines developed through the Delphi technique exhibit modest reporting quality [16].

Thus, there are still many hurdles to overcome, as creating standards is challenging and requires skilled doctors, health services, researchers, group process leaders, and adequate funding. The comprehensiveness of analytical procedures, the absence of thorough explanations of ethical issues, conflicts of interest among panelists, and the lack of information about the relationship between participants and the consensus group were other noteworthy areas of concern in the risk of bias assessment [17].

Furthermore, explanations of the systematic literature review and the role of investigators in the consensus process were often left out by those using the nominal method [18]. In light of all this, several studies have examined how the Delphi method is reported, critically evaluated its application and methodology, and highlighted the most critical points for improvement: expert selection, anonymity, round fatigue, and subjectivity in analysis [18,19,20,21,22].

Lastly, the development of clinical guidelines benefits from groups composed of both non-clinical and clinical stakeholders. These varied groups ultimately contribute to balanced, comprehensive, and patient-centered recommendations. However, this multiplicity of actors may lead to different interpretations of the same data, as patients and doctors may process, interpret, and react differently to the various types of uncertainty inherent in clinical decisions [23,24].

2. Towards a Different Conception of the Formulation of Guidelines: The Introduction of Clinical Registries and Artificial Intelligence

2.1. The Evolution of Guideline Automation: From CIGs to AI

The concept of automating clinical guidelines and defining structured clinical pathways is not new [25,26]. Over the past three decades, much research has focused on the transition from paper-based narrative text to Computer-Interpretable Guidelines (CIGs) [27,28]. Different works have led to the development of various formalisms designed to make clinical knowledge executable, such as the Arden Syntax for medical logic modules [29], the Guideline Interchange Format (GLIF) [30,31], PROforma [32], and the Guideline Elements Model (GEM) [33]. These efforts aimed to integrate guideline recommendations directly into Clinical Decision Support Systems to improve adherence and reduce errors [34].

However, these “first-generation” automated guidelines showed some limitations. They are primarily rule-based and rely on experts explicitly encoding knowledge into rigid “if-then” logic. While effective for standardizing care, they often struggle with the complexity of real-world patients who have comorbidities that defy simple algorithmic rules. Furthermore, maintaining these executable models is labor-intensive; when clinical evidence changes, the code must be manually updated, often leading to a lag between evidence generation and bedside implementation [27,35].

The framework we propose represents a paradigm shift from ‘rule-based’ automation to ‘data-driven’ generation. Unlike traditional CIGs that execute static rules defined by experts, our proposed AI-registry framework uses machine learning to continuously derive and validate patterns from real-world data (RWD). This addresses the critical gap identified in prior literature: the need for systems that not only execute guidelines but also learn from their application in daily practice.

2.2. The Unexploited Wealth of Clinical Registries

Currently, a large amount of highly standardized, structured, and uniform data is accumulated in clinical registries, Electronic Medical Records, and Electronic Health Records, but is not used in the formulation of clinical guidelines. It is essential to note that enormous amounts of biological data of continuously higher quality are being produced and quickly made available to researchers by clinical registries. Longitudinal, structured datasets could significantly enrich the evidence base used for clinical decision-making, and this represents a missed opportunity.

Patient registries gather information over time on a population characterized by a specific illnesses, conditions, or exposures [36]. These types of registries are crucial for generating observational evidence, estimating future risk, tracking the course of an illness, measuring treatment outcomes, and assessing how a disease impacts individuals and healthcare systems.

Since the word “disease registry” lacks a universally accepted definition, it is possible to differentiate between diverse types of registries, which may account for the wide range of uses for which they are employed [37]. Primary care registers (PCRs) are crucial for public health surveillance because they contain the personal patient data that general practitioners typically collect during their daily work, including demographic information, prescriptions, and disease diagnoses. The PCRs are useful for passive sentinel surveillance due to the abundance of data [38].

To eliminate sources of selection bias in epidemiological studies, a distinct type of registry is Population-Based Patient Registries (PBPRs), which aim to record all cases of a specific condition within a defined community. Therefore, PBPRs play a crucial role in both the efficacy of patient healthcare measures and the planning and assessment of disease control programs. In a PBPR, data gathering requires access to various data sources, including clinical records, pathology reports, hospital discharge records, and death certificates. To ensure the quality of the registry’s data, meticulous attention is required. Undiagnosed cases, unclear diagnoses, underreporting, and incorrect coding are among the factors that can impair data quality [39].

Clinical registries, which focus on clinical treatment and hospital administration, differ from PBPRs. Clinical registries, unlike PBPRs, are typically designed for quality-of-care monitoring and therefore may contain fewer population-level variables, although they often provide richer clinical granularity for the conditions they track.

This helps to explain the contradictions that occur between epidemiological expectations for comparability and completeness, and clinical demands for predictive precision, as PBPRs and clinical registries are utilized for fundamentally different objectives [40].

The use of clinical registries brings significant benefits, improving care and disease management. For example, the use of clinical registries in diabetes has been shown to improve physician-patient care processes and clinical outcomes, especially in resource-limited clinics [41]. Another example concerns the use of clinical registries in ophthalmology, which has been shown to improve the quality of care and outcomes in cataract surgery, corneal transplantation, and macular degeneration [42].

The symbiotic relationship between data quality and usefulness would be strengthened by promoting the use of PBPRs for secondary data analysis (e.g., academic literature across databases, industry reports, government publications, etc.). Utilizing population-based registry data has been shown to improve the validity of results while also saving time and money compared to epidemiological research. Registries are a valuable tool for gathering real-life evidence, essentially serving as observational studies that track the entire patient population [43]. Furthermore, connecting registry data to other types of data—such as socioeconomic, environmental, or dietary/lifestyle data for the same populations—can encourage more focused studies to verify the validity of any connections identified.

Existing tools for exploiting registry data fall broadly into two categories: (1) data-processing tools enabling harmonization, cleaning, and transformation, and (2) data-exploration tools that support visualization and hypothesis generation. However, the current ecosystem remains fragmented across data types, clinical domains, and analytical tasks.

2.3. Current AI Applications and Existing Gaps

The AI Act defines AI as “a fast evolving family of technologies that contributes to a wide array of economic, environmental and societal benefits across the entire spectrum of industries and social activities” [44]. AI has gained prominence in this context, given its use in data analysis and the development of prediction models utilizing extensive biological data, including proteomics and genomics. In this way it was possible to develop new and extensive applications [45,46].

The potential of AI has advanced significantly, especially with the rise of Deep Learning (DL), thanks to advancements in computation and storage capacity [47]. Notably, deep convolutional neural network (CNN) models, such as AlexNet, demonstrated exceptional image classification performance, rekindling contemporary interest and leading to ever more advanced DL architectures. Natural language processing and comprehension have recently been further revolutionized by Transformer models, the architecture of large language models (LLMs) like BERT and GPT-4, which leverage massive-scale datasets and cloud computing. Rapid advancements in various fields, including medicine, are enabled by the computational advances generated by AI models themselves.

Moreover, the use of AI for the analysis of large amounts of data for the diagnosis or treatment of diseases, or for the management of treatments [48,49], must be distinguished from the analysis of registries for the creation of clinical guidelines.

In the literature, there are a few examples of the feasibility of using AI to analyze clinical records and generate clinical guidelines. To date, AI and computer technology have been implemented in clinical guidelines exclusively for classification, extraction, representation, verification, and integration of guideline knowledge [50].

However, some research groups are working to develop AI-supported clinical registries that could inform future clinical guidelines.

In 2017, the Japan Ocular Imaging (JOI) registry, a nationwide database of ophthalmic images and clinical information, was established by the Japanese Ophthalmological Society (JOS). The data kept in each institution’s electronic medical records was automatically transferred by the JOI registry to cloud storage controlled by JOS. The information gathered is intended for research and surveys aimed at raising the standard of eye care. Currently, 22 academic hospitals, one private hospital, and two health checkup facilities are collaborating on the project [51].

OncoDoc uses AI to support the creation and use of guidelines [52]. With this method, clinicians could leverage OncoDoc’s hypertextual reading of the knowledge base, encoded as a decision tree, to control the implementation of guideline knowledge. This approach allows the physician to analyze the information provided in the context of their patients and categorize it into the closest formal equivalent. With notably high compliance scores, OncoDoc’s real-world experiment demonstrated that users were effectively using the system. In a second medical facility, the authors successfully tested the knowledge base and the implemented method, providing a clear illustration of how encoded Clinical Practice Guidelines (CPG) knowledge may be shared and reused between institutions [52].

While expert-led processes remain essential for interpreting evidence and contextualizing values and preferences, they can be slow to update and difficult to replicate across different contexts. With the introduction of interoperable clinical registries, electronic health records, and patient-generated data providing large-scale longitudinal RWD, we can hypothesize that guideline development efforts could evolve into a data-driven mode, where recommendations are iteratively challenged by registry-derived signals generated through transparent, verifiable, and causal AI pipelines. In Scheme 1, we have conceptualized the key steps through which we could use AI to collect, analyze, and implement registries data into clinical guidelines.

Figure 1 schematizes a potential AI-enhanced registry framework and living guideline lifecycle.

Furthermore, with this work we propose a practical framework based on established pillars: FAIR data management [53], standardized data models and exchange (OMOP CDM, HL7 FHIR) [54,55], causal inference via target study emulation [2,22,24], reporting standards for AI studies (CONSORT-AI, SPIRIT-AI) [56], governance principles for AI in guidelines (GIN) [57], and compliance with the EU Artificial Intelligence Act [44]. We also introduce operational elements (federated analyses, bias auditing, model cards, dataset cards, and MLOps for clinical AI) to make the approach implementable within existing guideline workflows.

3. Proposed Framework for AI-Enhanced Registries

3.1. Phase I—FAIR Data Curation and Interoperability

The first step in creating AI-enhanced registries that can be used for guideline development is data retrieval and analysis. Data must be shareable across different research groups and, therefore, must reflect the so-called FAIR Guiding Principles for Scientific Data Management and Stewardship. Findable, Accessible, Interoperable, and Reusable (FAIR) define good data management practices [53].

Data producers and registry stewards should implement FAIR-by-design practices to ensure findability, accessibility, interoperability, and reusability [53,58,59]. We recommend mapping core datasets to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), where feasible, to enable standardized analytics and federated queries across institutions [54,55]. Using the OMOP CDM ecosystem is necessary to improve data collection, as Wang and his team recently demonstrated in oncology [54]. The authors themselves suggest that continuous model development and iterative improvement are essential. The Fast Healthcare Interoperability Resources (FHIR) standard is a standard used in healthcare IT. It was introduced in 2011 by Health Level Seven International (HL7) and is based on the previous HL7 standards. FHIR-HL7 is the most widely used standard for healthcare data exchange [60]. HL7 FHIR resources and profiles facilitate granular, real-time interoperability across vendor systems. Each dataset and transformation should be documented via dataset cards and versioned ETL procedures with quality checks (completeness, plausibility, conformance).

3.2. Phase II—Analysis and Causal Estimation

To mitigate confounding inherent to observational registries, analyses should emulate a target trial for each PICO question (eligibility, treatment strategies, assignment procedures, follow-up, outcomes, causal contrasts, analysis plan) [61]. State-of-the-art estimators (e.g., inverse probability weighting, g-computation, doubly robust methods, and heterogeneous treatment effect models such as causal forests) can be combined with predictive models to identify effect modifiers. All models require rigorous internal validation (cross-validation with temporal blocking) and external validation (independent registries) with calibration assessment alongside discrimination metrics.

3.3. Phase III—Objective Validation and Reporting

An additional challenge is the generalizability of AI models across institutions, given variations in data collection practices and patient demographics. External validation on independent registries is therefore essential to ensure robustness and transferability. Outputs that inform practice changes should meet reporting and evaluation standards, such as CONSORT-AI and SPIRIT-AI for interventional AI trials, and analogous structured reporting for observational causal analyses [56,62,63]. CONSORT-AI is an extension of the CONSORT 2010 (Consolidated Standards of Reporting Trials) statement, which provides guidelines for reporting clinical trials evaluating interventions with an AI component. These guidelines were developed through a phased consensus process and include 14 new key elements for AI interventions. Specifically, CONSORT-AI recommends providing detailed descriptions of the AI intervention, including instructions for use, context, input and output management, human–AI interaction, and error analysis [56]. SPIRIT-AI is the international standard for reporting randomized clinical trial protocols and includes a 15-item checklist as an extension of the SPIRIT 2013 checklists. These guidelines include requirements for reporting areas such as the quality and completeness of input data and the investigation of error cases, as well as for defining the clinical context and the human–AI interaction involved [62].

Results must be accompanied by confidence intervals or posterior intervals, sensitivity analyses (including negative controls where appropriate), and ablation of key assumptions. Transparency artifacts (model cards) document intended use, data provenance, performance across subgroups, and known limitations.

3.4. Phase IV—Living Recommendations and Feedback

We advocate a living guideline process where registry-derived evidence triggers scheduled or event-driven reviews by the panel. Human-in-the-loop oversight remains essential: expert panels arbitrate between causal estimates, contextual values, and resource considerations, mapping signals into GRADE certainty ratings and Evidence-to-Decision (EtD) judgments. Recommendations and rationales are versioned, with data and code archives to support reproducibility.

Table 1 shows the differences between the consensus-only method and the approach that combines AI and logs.

3.5. Case Scenario: Biologic Therapy Selection in Severe Asthma

To demonstrate the practical application of this framework, we consider its use in the management of severe asthma. In this section, we describe the “traditional” consensus approach. Subsequently, we describe the AI-registry approach to highlight the main differences between the two methods.

3.5.1. The Traditional Consensus Approach

Current guidelines recommend selecting biologic therapies, such as anti-IgE or anti-IL5, based on specific phenotypic biomarkers, such as blood eosinophil counts or IgE levels. These recommendations are derived from randomized clinical trials and updated periodically by expert panels. However, in real-world practice, many patients meeting these criteria are “non-responders” or “partial responders” due to complex traits (e.g., nasal polyps, BMI, age of onset) that strict consensus thresholds often overlook.

3.5.2. The AI-Registry Approach

An AI model trained on a longitudinal severe asthma registry (containing thousands of real-world patient profiles) does not rely on single cut-off values; instead, it performs continuous ‘phenomapping’.

The AI analyzes multimodal data (including spirometry, biomarkers, comorbidities, and environmental pollution data linked to the patient’s location).

Using causal inference methods, the AI identifies that a specific sub-cluster of patients (e.g., late-onset asthma with high BMI and low FeNO) has a 40% higher response rate to Drug A compared to Drug B, a nuance missed by the broad consensus guideline.

Finally, the system flags this signal to the guideline panel. The panel then validates this ‘data-driven phenotype’ and issues a specific, living recommendation for this subgroup, moving from a ‘one-size-fits-all’ guideline to precision recommendations.

3.6. Pilot Protocol (Conceptual Design)

Below, we outline the conceptual design of a framework designed to test the concordance and clinical utility of AI-enhanced registry-derived signals, comparing them with the expert panel’s recommendations. We will use a series of 30–50 PICO questions to illustrate a chronic immunoinflammatory condition.

For this conceptual design, we use a multi-registry observational study with target-trial emulation for each PICO, federated analysis across OMOP-mapped databases, and external validation in an independent registry.

The population consists of adults meeting registry-defined criteria with adequate follow-up; inclusion/exclusion criteria mirror the emulated trial protocol.

Interventions/comparators are represented by the standard of care strategies identified in the registry, subjected to documented algorithmic assessment.

The primary outcome is agreement with the panel’s recommendations. Secondary outcomes include AUC/PR-AUC, calibration, decision curve analysis, time to recommendation update, and subgroup performance.

For the analysis in our pilot protocol, we use statistical methods such as doubly robust estimators and heterogeneous treatment-effect modeling, combined with predefined sensitivity analyses (negative control outcomes, alternative propensity specifications).

We would use CONSORT-AI/SPIRIT-AI-aligned reporting for any interventional component and structured observational reporting for emulations [56,62,63].

Governance aligns with the GIN Principles—transparency, preplanning, additionality, credibility, ethics, accountability, compliance, and evaluation [57,64]. For EU deployments, the EU AI Act (Regulation 2024/1689) establishes obligations for high-risk AI, including risk management, data governance, technical documentation, logging, human oversight, robustness, and cybersecurity [44]. Operationally, MLOps should include dataset/model versioning, performance and data drift monitoring, periodic retraining criteria, incident response, and independent audits. Equity is enforced through subgroup performance reporting (age, sex, geography, comorbidity) and bias-mitigation procedures [57]. Scheme 2 shows the compliance checklist.

We provide practical clinical example of the proposed framework and outline a pilot validation study focused on severe asthma, a heterogeneous condition for which selecting the optimal biologic therapy remains a complex challenge in standard guidelines:

-

Objective: To compare the concordance and prognostic accuracy of AI-generated treatment recommendations derived from registry data against standard expert-based guidelines in patients eligible for multiple biologic therapies.

-

Study Design: We will utilize a federated network of severe asthma registries mapped to the OMOP CDM. The study will focus on patients who meet eligibility criteria for more than one class of biologics (e.g., patients with both high eosinophils and high IgE).

▪

PICO Question Formulation: Instead of generic queries, we define specific causal questions, such as: “In adult patients with late-onset asthma, nasal polyps, and blood eosinophils >300 cells/µL, does initiation of anti-IL5R therapy result in a greater reduction in annualized exacerbation rates compared to anti-IgE therapy?”

▪

AI Analysis (Target Trial Emulation): For each PICO, the AI pipeline will emulate a target trial.

▪: Population: Adults meeting registry inclusion criteria with >12 months of follow-up.
▪: Intervention: Identification of specific biologic therapies (Standard of Care A vs. B).
▪: Causal Estimation: Application of doubly robust estimators and heterogeneous treatment effect (HTE) models (e.g., causal forests) to estimate the Individual Treatment Effect (ITE) for each patient profile, adjusting for high-dimensional confounders (comorbidities, biomarkers, demographics).

▪

Comparison: The system will output a ranked recommendation for each patient. This will be compared with the theoretical recommendation derived from current consensus-based guidelines.

-

Outcome Measures:

▪: Primary Outcome: The rate of agreement between the AI-driven recommendation and the expert panel’s guideline recommendation.
▪: Secondary Clinical Validation: For cases where the AI and guidelines disagree, we will analyze the actual longitudinal patient outcomes in the registry.
▪: Performance Metrics: AUC for predicting super-responder status, calibration plots, and decision curve analysis to assess clinical net benefit.

-

Governance: The pilot will adhere to CONSORT-AI reporting standards. A “human-in-the-loop” review board will audit the AI’s “rationale” (via SHAP values) to ensure biological plausibility before any finding is considered for a guideline update.

Table 2 show the main differences between traditional guideline development and AI-enhanced living guideline.

4. Explainability, Reliability, and Interoperability in Artificial Intelligence Use

Numerous challenges must be overcome in our project to use AI to create guidelines based on registry data. For instance, a register must ensure that the data gathered is compatible with other registries to add value and enhance the utility of the collected data. The success of earlier attempts to address patient-registry interoperability has varied. Implementing an existing framework based on the idea of federated, semantic metadata registries can provide a fruitful solution [65].

Furthermore, even though AI techniques have demonstrated the ability to predict clinical outcomes, several obstacles still prevent their widespread clinical adoption [66].

Safeguarding user rights and ensuring ethical standards in AI implementation are key to making AI trustworthy and widely used. In recent years, EU legislators have updated regulations and issued additional provisions to ensure respect for users’ privacy and other fundamental rights. Key elements include the General Data Protection Regulation 2016/679 (GDPR) [67] and the Artificial Intelligence Act 2024/1689 (AI Act) [44]. The AI Act outlines additional guidelines specific to AI and automated systems. The AI Act comprises a set of regulatory frameworks that place greater emphasis on privacy and require AI to adhere to principles that prioritize user control and responsibility. The potential use of AI in developing clinical guidelines requires that the systems employed ensure the highest level of explainability and trustworthiness.

4.1. AI and Trustworthiness

AI algorithms must be reliable to be successfully integrated into clinical practice [68]. To build user trust and facilitate the integration of AI into a learning healthcare system, thorough reporting is required to improve the comprehension and interpretation of AI outputs [69]. To increase confidence and adoption of AI in healthcare, the WHO has released important guidelines, including the need to increase transparency by providing information on the database, source code, data inputs, and the analytical techniques used in AI algorithms [70]. One of the main problems that could arise from the application of AI is algorithmic bias, which identifies “instances in which the application of an algorithm exacerbates existing inequalities in terms of socioeconomic status, race, ethnic background, religion, gender, disability, or sexual orientation, amplifying them and negatively impacting inequalities in health systems” [71]. To overcome these biases, it is essential to train algorithm developers and data scientists to recognize and address them. Therefore, it is crucial to create a supportive infrastructure that fosters confidence in these systems and reduces biases at the initial stages of research, clinical evaluation, and development, ensuring the correct use of AI algorithms [44]. To maximize benefits and minimize human rights risks, steps must be taken to uphold ethical commitments and promote trust. AI trials must adhere to ethical standards, taking into account data limitations, interpretability issues, and the new hazards associated with human–AI interactions [72].

Understanding the design, development, and clinical validation processes is essential to identify potential sources of bias and prevent patient harm, which would be unethical and could lead to severe negative repercussions. This will help to ensure a safe translation of AI algorithms into medical practice [56].

Transparency is necessary to enable patients and medical end-users to evaluate the quality of AI algorithms for stakeholders. For example, none of the 14 CE-certified AI-based radiology solutions currently on the market in Europe disclose any potential performance limitations related to bias mitigation factors, such as age and ethnicity, and most lack information on population characteristics and training data collection [73].

Early registration of studies can reduce potential harm by requiring the disclosure of important AI algorithm components before clinical application, promoting the publication of unfavorable outcomes, and avoiding biases or overly optimistic interpretations of results, thereby improving study rigor and transparency.

Studies that show AI reinforces systemic health disparities serve as examples [74,75]. Transparency is essential for detecting and removing biases, promoting accountability, and ongoing progress. Although transparency alone cannot guarantee bias-free algorithms, it significantly contributes to these goals [76].

The FAIR principles (Findable, Accessible, Interoperable, and Reusable) ensure that data is efficient [53]. For example, several studies thoroughly explain the technological and methodological prerequisites for the “FAIR-ification” of rare disease registries at the European level [77]. Despite these suggestions, achieving complete FAIR compliance with illness databases remains challenging [78].

To address this gap, a study has recommended a set of minimal requirements for an AI algorithm registry. This registry would require registration covering the entire model, including information presentation to end users, characteristics of the training data, details of the data collection process, and model specifications (such as the foundation model’s type and version, the manufacturer, and the tuning or grounding procedure applied). To enable a safe, open, and responsible integration of AI in healthcare, a registry should release general algorithm information rather than sharing code, thus also protecting intellectual property. Significantly, because only broad algorithmic knowledge is required, the AI system’s content should not raise concerns about patent infringement. Once the registry is open for enrollment, AI algorithms should be registered before use in clinical settings and before submitting a trial protocol for ethics approval, in advance of clinical evaluation [79].

4.2. AI and Explainability

The inability to explain the models is another element that can prevent AI from being used effectively to create therapeutic guidelines. AI tools are frequently seen as “black boxes.” Users must understand how and why the algorithm arrived at its decision if decisions are to be based on AI-generated predictions. This is of utmost importance for the so-called “accountability” of the decisions, both in medicine and in other, completely different domains.

To increase “explainability,” the current WHO guidance on big multimodal models promotes certain features in algorithm development, such as revealing internal testing results [70]. Traceability and thorough documentation of the development process and pre-clinical studies are crucial, given the possible influence AI algorithms may have on patient care.

ML explainability has garnered significant attention in recent years. Two methods have recently been developed and proposed to improve the interpretability of AI models: SHapley Additive exPlanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME). Although widely adopted, systematic evaluation of LIME and SHAP in clinical guideline contexts remains limited [80,81].

However, LIME is generally preferred for its excellent clarity at the single-instance level, whereas SHAP provides a more comprehensive global and local perspective [82]. In one study, researchers developed an eXtreme Gradient Boosting (XGB)-based model to predict 10-year overall survival using breast cancer data from the Netherlands Cancer Registry. They compared the effectiveness of LIME and SHAP in explaining the model’s predictions. In this study, there was 95.4% agreement between LIME and SHAP [83].

Techniques like LIME and SHAP are a first step toward offering a more comprehensible explanation of complicated models than the models themselves can provide.

Moreover, one of the biggest obstacles to AI in science and medicine is reproducibility. Preprocessing, model training, validation, and reporting procedures are often described too simplistically in the general textual explanations of methods and outcomes frequently published [84]. Reproducibility is hindered by the limited application of standardized ML model development and reporting criteria, as well as the absence of consistent data input and source code sharing policies [85].

Complete program details and source code used, accurate documentation of key features, and precise implementation instructions are all components of reproducibility. Reproducibility in AI increases confidence in models and their outcomes [86].

Therefore, to produce highly valued and reliable scientific discoveries, every researcher and developer must strive for reproducibility, focus on accurate and thorough documentation, and provide the necessary details about the source code and data.

5. Discussion

The rapidly expanding volume of multimodal clinical and biological data requires a radical rethinking of the mechanisms for generating medical recommendations. It underscores the need for analytical frameworks that can synthesize this complexity into actionable recommendations. Traditional consensus methods, while essential for synthesizing expert opinion, have inherent limitations in resource requirements, update times, and susceptibility to subjective bias. The application of AI cannot ignore the quality of the underlying data; therefore, it is necessary to outline strict requirements for data entry into registries. The adoption of CDM, as discussed in relation to the OMOP standard and the FHIR ecosystem, is not merely a technicality but a fundamental prerequisite for transforming registries from passive archives into active research tools [87]. Adhering to the FAIR principles is necessary to overcome current fragmentation and enable federated analyses, ensuring the statistical robustness needed to support clinical decisions. Integrating AI into guidelines should not be limited to predictive or associative models, which risk conflating correlation and causality. To avoid this, it is essential to adopt “Target Trial Emulation” approaches and doubly robust estimators, as outlined in Phase II of our framework. This methodological rigor must also be reflected in scientific reporting. The European Life Sciences Data Infrastructure, ELIXIR, includes the ELIXIR Machine Learning Focus Group. The ML Focus Group has developed a set of guidelines for reporting supervised AI methods in computational biology through a community-driven consensus process involving over 50 ML specialists [88]. Their guidelines address key aspects of supervised machine learning in the context of scientific papers and are collectively referred to as the Data Model Evaluation Optimization (DOME) guidelines. For readers, investigators, and reviewers, the DOME proposals aim to improve the reproducibility and transparency of published machine learning techniques. They emphasize the importance of thorough statistical tests for accurate performance assessment, addressing critical issues such as generalization to independent data, efficient optimization, and model interpretability [89,90]. Comparable efforts include the AIMe registry, which focuses on describing machine learning/artificial intelligence techniques for biomedical applications [91].

Furthermore, adherence to standards such as CONSORT-AI and SPIRIT-AI is necessary. These protocols not only ensure technical reproducibility but also enable expert panels to assess the “certainty of evidence” according to GRADE criteria, bridging the gap between algorithmic output and clinical judgment. The use of AI in clinical practice still faces resistance, primarily related to the “black box” nature of many advanced algorithms. Therefore, the concepts of explainability and trustworthiness are crucial. It is not enough for a model to be accurate; it must also be ethical, transparent, and compliant with current regulations, such as the GDPR and the new EU AI Act. The guidelines developed by the Guidelines International Network (GIN) offer an essential compass in this regard, identifying transparency, pre-planning, and accountability as the cornerstones of responsible AI use. Algorithms must be audited for bias (racial, socioeconomic, or gender) before being integrated into the guidelines’ lifecycle, to avoid automating and amplifying existing inequalities [57]. The framework proposed in this work aims to overcome the static nature of current guidelines. Living guidelines are an evolution of traditional guidelines, dynamically updated as new relevant evidence becomes available. This is achieved through the use of living systematic reviews to monitor the literature continuously. Expert panels use the GRADE framework to modify only the specific recommendations affected, which are then immediately published on digital platforms to ensure their real-time clinical relevance [92]. An example of dynamic guidelines is the ESMO Living Guidelines, which offer constantly updated recommendations in a rapidly evolving field such as oncology [93].

The goal is to transition to a more advanced “Living Guidelines” model, where evidence derived from registries, analyzed in real time by transparent and verified AI pipelines, triggers periodic or event-driven reviews. This approach does not replace the physician, researcher, or expert panel. However, it provides support, for example, by freeing human resources from manual data synthesis tasks and allowing experts to focus on causal interpretation, value contextualization, and personalized treatment implementation. Computational advances will continue to evolve, but ethical and methodological governance will determine the success of this revolution in evidence-based medicine.

6. Conclusions

Currently, guidelines are based on recommendations derived from the best available medical knowledge and summarize the potential advantages and disadvantages of different diagnostic and therapeutic options. Consensus methods remain the gold standard for guideline development, despite their limitations. From this perspective, the availability of registered, regional, national, or local registries will enable AI to a more directly and consistently adapt clinical recommendations to local, national, or regional realities. This perspective provides an actionable pathway—framework, pilot, and checklist—to accelerate responsible adoption. Future work should prioritize standardized causal protocols for recurrent PICO patterns, scalable federated learning with differential privacy, harmonized equity reporting, and automated evidence-to-decision tools that surface uncertainty and value trade-offs to panels. Prospective pilots are needed to quantify real-world impact on time-to-update, recommendation stability, and patient-important outcomes. The application of AI-based methods should not be considered an alternative to the current gold standard of expert consensus, but rather a supporting tool. In the not-too-near future, it may be possible to combine traditional consensus methods with AI to develop clinical guidelines. This approach will allow us to continuously update standards and provide more substantial evidence for the clinical recommendations we make. AI-enhanced registries represent a promising avenue for modernizing guideline development, offering a scalable and transparent complement to traditional consensus-based methods.

Author Contributions

Conceptualization, S.G., M.D.G. and G.W.C.; methodology, S.G., A.A. and I.C.; writing—original draft preparation, S.G. and L.G.; writing—review and editing, S.G., M.D.G., L.G. and G.W.C.; visualization, A.A. and I.C.; supervision, S.G.; project administration, S.G. and G.W.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

This article is a perspective paper and does not report or analyze original datasets. Therefore, data sharing is not applicable.

Acknowledgments

Generative artificial intelligence tools were used during the manuscript preparation process. Specifically, ChatGPT-5 Pro was employed to support the creation of tables included in the submission and for limited language and stylistic refinement. All scientific content, interpretations, and conclusions were developed, reviewed, and validated by the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Pfaffenlehner, M.; Behrens, M.; Zöller, D.; Ungethüm, K.; Günther, K.; Rücker, V.; Reese, J.P.; Heuschmann, P.; Kesselmeier, M.; Remo, F.; et al. Methodological Challenges Using Routine Clinical Care Data for Real-World Evidence: A Rapid Review Utilizing a Systematic Literature Search and Focus Group Discussion. BMC Med. Res. Methodol. 2025, 25, 8. [Google Scholar] [CrossRef]
Eccles, M.; Grimshaw, J. Clinical Guidelines from Conception to Use; Radcliffe Medical Press: Buckinghamshire, UK, 2000; p. 120. [Google Scholar]
Bourrée, F.; Michel, P.; Salmi, L.R. Consensus Methods: Review of Original Methods and Their Main Alternatives Used in Public Health. Rev. Epidemiol. Sante Publique 2008, 56, e13–e21. [Google Scholar] [CrossRef]
Shekelle, P.G.; Woolf, S.H.; Eccles, M.; Grimshaw, J. Clinical Guidelines: Developing Guidelines. BMJ 1999, 318, 593–596. [Google Scholar] [CrossRef]
Shekelle, P.; Woolf, S.; Grimshaw, J.M.; Schünemann, H.J.; Eccles, M.P. Developing Clinical Practice Guidelines: Reviewing, Reporting, and Publishing Guidelines; Updating Guidelines; and the Emerging Issues of Enhancing Guideline Implementability and Accounting for Comorbid Conditions in Guideline Development. Implement. Sci. 2012, 7, 62. [Google Scholar] [CrossRef]
Linstone, H.A.; Turoff, M.; Helmer, O. The Delphi Method Techniques and Applications; Linstone, H.A., Turoff, M., Eds.; Addison-Wesley Publishing Company: Boston, MA, USA, 1975. [Google Scholar]
McMillan, S.S.; King, M.; Tully, M.P. How to Use the Nominal Group and Delphi Techniques. Int. J. Clin. Pharm. 2016, 38, 655–662. [Google Scholar] [CrossRef]
Cassar Flores, A.; Marshall, S.; Cordina, M. Use of the Delphi Technique to Determine Safety Features to Be Included in a Neonatal and Paediatric Prescription Chart. Int. J. Clin. Pharm. 2014, 36, 1179–1189. [Google Scholar] [CrossRef]
Tully, M.P.; Cantrill, J.A. Exploring the Domains of Appropriateness of Drug Therapy, Using the Nominal Group Technique. Pharm. World Sci. 2002, 24, 128–131. [Google Scholar] [CrossRef]
Claxton, J.D.; Ritchie, J.R.B.; Zaichkowsky, J. The Nominal Group Technique: Its Potential for Consumer Research. J. Consum. Res. 1980, 7, 308–313. [Google Scholar] [CrossRef]
Gastelurrutia, M.A.; Benrimoj, S.I.C.; Castrillon, C.C.; De Amezua, M.J.C.; Fernandez-Llimos, F.; Faus, M.J. Facilitators for Practice Change in Spanish Community Pharmacy. Pharm. World Sci. 2009, 31, 32–39. [Google Scholar] [CrossRef] [PubMed]
McMillan, S.S.; Kelly, F.; Sav, A.; Kendall, E.; King, M.A.; Whitty, J.A.; Wheeler, A.J. Consumers and Carers Versus Pharmacy Staff: Do Their Priorities for Australian Pharmacy Services Align? Patient 2015, 8, 411–422. [Google Scholar] [CrossRef] [PubMed]
Jones, J.; Hunter, D. Qualitative Research: Consensus Methods for Medical and Health Services Research. BMJ 1995, 311, 376–380. [Google Scholar] [CrossRef]
Guyatt, G.H.; Oxman, A.D.; Vist, G.E.; Kunz, R.; Falck-Ytter, Y.; Alonso-Coello, P.; Schünemann, H.J. GRADE: An Emerging Consensus on Rating Quality of Evidence and Strength of Recommendations. BMJ 2008, 336, 924–926. [Google Scholar] [CrossRef] [PubMed]
Humphrey-Murto, S.; Varpio, L.; Gonsalves, C.; Wood, T.J. Using Consensus Group Methods Such as Delphi and Nominal Group in Medical Education Research. Med. Teach. 2017, 39, 14–19. [Google Scholar] [CrossRef]
Banno, M.; Tsujimoto, Y.; Kataoka, Y. The Majority of Reporting Guidelines Are Not Developed with the Delphi Method: A Systematic Review of Reporting Guidelines. J. Clin. Epidemiol. 2020, 124, 50–57. [Google Scholar] [CrossRef]
Medina, Y.F.; Mendieta, C.V.; Prieto, N.; Acosta Felquer, M.L.; Soriano, E.R. A Systematic Scoping Review of Essential Methodological Elements for Developing a Tool to Improve the Reporting of Consensus Studies in Classification, Diagnostic Criteria, and Guidelines Development. J. Multidiscip. Healthc. 2024, 17, 5813–5830. [Google Scholar] [CrossRef]
Tugwell, P.; Knottnerus, J.A. The Need for Consensus on Consensus Methods. J. Clin. Epidemiol. 2018, 99, vi–viii. [Google Scholar] [CrossRef]
Moher, D.; Schulz, K.F.; Simera, I.; Altman, D.G. Guidance for Developers of Health Research Reporting Guidelines. PLoS Med. 2010, 7, e1000217. [Google Scholar] [CrossRef] [PubMed]
Waggoner, J.; Carline, J.D.; Durning, S.J. Is There a Consensus on Consensus Methodology? Descriptions and Recommendations for Future Consensus Research. Acad. Med. 2016, 91, 663–668. [Google Scholar] [CrossRef]
Grant, S.; Booth, M.; Khodyakov, D. Lack of Preregistered Analysis Plans Allows Unacceptable Data Mining for and Selective Reporting of Consensus in Delphi Studies. J. Clin. Epidemiol. 2018, 99, 96–105. [Google Scholar] [CrossRef] [PubMed]
Jünger, S.; Payne, S.A.; Brine, J.; Radbruch, L.; Brearley, S.G. Guidance on Conducting and REporting DElphi Studies (CREDES) in Palliative Care: Recommendations Based on a Methodological Systematic Review. Palliat. Med. 2017, 31, 684–706. [Google Scholar] [CrossRef]
Wieringa, S.; Engebretsen, E.; Heggen, K.; Greenhalgh, T. Clinical Guidelines and the Pursuit of Reducing Epistemic Uncertainty. An Ethnographic Study of Guideline Development Panels in Three Countries. Soc. Sci. Med. 2021, 272, 113702. [Google Scholar] [CrossRef]
Murray, R.; Sharp, M.; Razidan, A.; Hibbitts, B.; Ryan, M.; Mahtani, K.; Lynch, R.; Smith, S.; O’Neill, M.; Schünemann, H.; et al. Investigating How the GRADE Evidence to Decision (EtD) Framework Is Used in Clinical Guidelines: A Scoping Review Protocol. HRB Open Res. 2023, 6, 50. [Google Scholar] [CrossRef]
De Bleser, L.; Depreitere, R.; De Waele, K.; Vanhaecht, K.; Vlayen, J.; Sermeus, W. Defining Pathways. J. Nurs. Manag. 2006, 14, 553–563. [Google Scholar] [CrossRef] [PubMed]
Rotter, T.; Kinsman, L.; James, E.L.; Machotta, A.; Gothe, H.; Willis, J.; Snow, P.; Kugler, J. Clinical Pathways: Effects on Professional Practice, Patient Outcomes, Length of Stay and Hospital Costs. Cochrane Database Syst. Rev. 2010. Art. No.: CD006632. [Google Scholar] [CrossRef] [PubMed]
Peleg, M. Computer-Interpretable Clinical Guidelines: A Methodological Review. J. Biomed. Inform. 2013, 46, 744–763. [Google Scholar] [CrossRef]
De Clercq, P.A.; Blom, J.A.; Korsten, H.H.M.; Hasman, A. Approaches for Creating Computer-Interpretable Guidelines That Facilitate Decision Support. Artif. Intell. Med. 2004, 31, 1–27. [Google Scholar] [CrossRef]
Hripcsak, G. Writing Arden Syntax Medical Logic Modules. Comput. Biol. Med. 1994, 24, 331–363. [Google Scholar] [CrossRef]
Ohno-Machado, L.; Gennari, J.H.; Murphy, S.N.; Jain, N.L.; Tu, S.W.; Oliver, D.E.; Pattison-Gordon, E.; Greenes, R.A.; Shortliffe, E.H.; Barnett, G.O. The Guideline Interchange Format: A Model for Representing Guidelines. J. Am. Med. Inform. Assoc. 1998, 5, 357–372. [Google Scholar] [CrossRef] [PubMed]
Boxwala, A.A.; Peleg, M.; Tu, S.; Ogunyemi, O.; Zeng, Q.T.; Wang, D.; Patel, V.L.; Greenes, R.A.; Shortliffe, E.H. GLIF3: A Representation Format for Sharable Computer-Interpretable Clinical Practice Guidelines. J. Biomed. Inform. 2004, 37, 147–161. [Google Scholar] [CrossRef]
Fox, J.; Johns, N.; Lyons, C.; Rahmanzadeh, A.; Thomson, R.; Wilson, P. PROforma: A General Technology for Clinical Decision Support Systems. Comput. Methods Programs Biomed. 1997, 54, 59–67. [Google Scholar] [CrossRef]
Shiffman, R.N.; Karras, B.T.; Agrawal, A.; Chen, R.; Marenco, L.; Nath, S. GEM: A Proposal for a More Comprehensive Guideline Document Model Using XML. J. Am. Med. Inform. Assoc. 2000, 7, 488–498. [Google Scholar] [CrossRef] [PubMed][Green Version]
Kawamoto, K.; Houlihan, C.A.; Balas, E.A.; Lobach, D.F. Improving Clinical Practice Using Clinical Decision Support Systems: A Systematic Review of Trials to Identify Features Critical to Success. BMJ 2005, 330, 765–768. [Google Scholar] [CrossRef]
Latoszek-Berendsen, A.; Tange, H.; Van den Herik, H.J.; Hasman, A. From Clinical Practice Guidelines to Computer-Interpretable Guidelines. A Literature Overview. Methods Inf. Med. 2010, 49, 550–570. [Google Scholar] [CrossRef]
Denton, E.; Hew, M.; Peters, M.J.; Upham, J.W.; Bulathsinhala, L.; Tran, T.N.; Martin, N.; Bergeron, C.; Al-Ahmad, M.; Altraja, A.; et al. Real-World Biologics Response and Super-Response in the International Severe Asthma Registry Cohort. Allergy 2024, 79, 2700–2716. [Google Scholar] [CrossRef]
Chen, W.; Sadatsafavi, M.; Tran, T.N.; Murray, R.B.; Wong, C.B.N.; Ali, N.; Ariti, C.; Gil, E.G.; Newell, A.; Alacqua, M.; et al. Characterization of Patients in the International Severe Asthma Registry with High Steroid Exposure Who Did or Did Not Initiate Biologic Therapy. J. Asthma Allergy 2022, 15, 1491–1510. [Google Scholar] [CrossRef] [PubMed]
Introduction to Public Health Surveillance|Public Health 101 Series|CDC. Available online: https://www.cdc.gov/training-publichealth101/php/training/introduction-to-public-health-surveillance.html (accessed on 1 November 2025).
Sørensen, H.T.; Sabroe, S.; Olsen, J. A Framework for Evaluation of Secondary Data Sources for Epidemiological Research. Int. J. Epidemiol. 1996, 25, 435–442. [Google Scholar] [CrossRef]
Walters, S.; Maringe, C.; Butler, J.; Brierley, J.D.; Rachet, B.; Coleman, M.P. Comparability of Stage Data in Cancer Registries in Six Countries: Lessons from the International Cancer Benchmarking Partnership. Int. J. Cancer 2013, 132, 676–685. [Google Scholar] [CrossRef]
Pollard, C.; Bailey, K.A.; Petitte, T.; Baus, A.; Swim, M.; Hendryx, M. Electronic Patient Registries Improve Diabetes Care and Clinical Outcomes in Rural Community Health Centers. J. Rural Health 2009, 25, 77–84. [Google Scholar] [CrossRef]
Tan, J.C.K.; Ferdi, A.C.; Gillies, M.C.; Watson, S.L. Clinical Registries in Ophthalmology. Ophthalmology 2019, 126, 655–662. [Google Scholar] [CrossRef] [PubMed]
Canonica, G.W.; Agache, I.; Schünemann, H.J.; Roche, N.; Price, D.; del Giacco, S. Next Generation Health Guidelines: The Role of Real-Life Data in Evidence-Based Medicine. Allergy 2024, 79, 12–14. [Google Scholar] [CrossRef]
Regulation-EU-2024/1689-EN-EUR-Lex. Available online: https://eur-lex.europa.eu/eli/reg/2024/1689/oj/eng (accessed on 1 November 2025).
Allegra, A.; Tonacci, A.; Sciaccotta, R.; Genovese, S.; Musolino, C.; Pioggia, G.; Gangemi, S. Machine Learning and Deep Learning Applications in Multiple Myeloma Diagnosis, Prognosis, and Treatment Selection. Cancers 2022, 14, 606. [Google Scholar] [CrossRef]
Danieli, M.G.; Tonacci, A.; Paladini, A.; Longhi, E.; Moroncini, G.; Allegra, A.; Sansone, F.; Gangemi, S. A Machine Learning Analysis to Predict the Response to Intravenous and Subcutaneous Immunoglobulin in Inflammatory Myopathies. A Proposal for a Future Multi-Omics Approach in Autoimmune Diseases. Autoimmun. Rev. 2022, 21, 103105. [Google Scholar] [CrossRef]
Dick, K.; Humber, J.; Ducharme, R.; Dingwall-Harvey, A.; Armour, C.M.; Hawken, S.; Walker, M.C. The Transformative Potential of AI in Obstetrics and Gynaecology. J. Obstet. Gynaecol. Can. 2024, 46, 102277. [Google Scholar] [CrossRef] [PubMed]
Allegra, A.; Mirabile, G.; Tonacci, A.; Genovese, S.; Pioggia, G.; Gangemi, S. Machine Learning Approaches in Diagnosis, Prognosis and Treatment Selection of Cardiac Amyloidosis. Int. J. Mol. Sci. 2023, 24, 5680. [Google Scholar] [CrossRef] [PubMed]
Murdaca, G.; Caprioli, S.; Tonacci, A.; Billeci, L.; Greco, M.; Negrini, S.; Cittadini, G.; Zentilin, P.; Spagnolo, E.V.; Gangemi, S. A Machine Learning Application to Predict Early Lung Involvement in Scleroderma: A Feasibility Evaluation. Diagnostics 2021, 11, 1880. [Google Scholar] [CrossRef] [PubMed]
Li, X.H.; Liao, J.P.; Chen, M.K.; Gao, K.; Wang, Y.B.; Yan, S.Y.; Huang, Q.; Wang, Y.Y.; Shi, Y.X.; Hu, W.B.; et al. The Application of Computer Technology to Clinical Practice Guideline Implementation: A Scoping Review. J. Med. Syst. 2023, 48, 6. [Google Scholar] [CrossRef]
Miyake, M.; Akiyama, M.; Kashiwagi, K.; Sakamoto, T.; Oshika, T. Japan Ocular Imaging Registry: A National Ophthalmology Real-World Database. Jpn. J. Ophthalmol. 2022, 66, 499–503. [Google Scholar] [CrossRef]
Séroussi, B.; Bouaud, J.; Antoine, É.C. ONCODOC: A Successful Experiment of Computer-Supported Guideline Development and Implementation in the Treatment of Breast Cancer. Artif. Intell. Med. 2001, 22, 43–64. [Google Scholar] [CrossRef]
Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J.W.; da Silva Santos, L.B.; Bourne, P.E.; et al. The FAIR Guiding Principles for Scientific Data Management and Stewardship. Sci. Data 2016, 3, 160018. [Google Scholar] [CrossRef]
Wang, L.; Wen, A.; Fu, S.; Ruan, X.; Huang, M.; Li, R.; Lu, Q.; Lyu, H.; Williams, A.E.; Liu, H. A Scoping Review of OMOP CDM Adoption for Cancer Research Using Real World Data. NPJ Digit. Med. 2025, 8, 189. [Google Scholar] [CrossRef]
Mitchell, M.; Wu, S.; Zaldivar, A.; Barnes, P.; Vasserman, L.; Hutchinson, B.; Spitzer, E.; Raji, I.D.; Gebru, T. Model Cards for Model Reporting. In Proceedings of the FAT* 2019: Conference on Fairness, Accountability, and Transparency, Atlanta, GA, USA, 29–31 January 2019; pp. 220–229. [Google Scholar] [CrossRef]
Liu, X.; Rivera, S.C.; Moher, D.; Calvert, M.J.; Denniston, A.K. Reporting Guidelines for Clinical Trial Reports for Interventions Involving Artificial Intelligence: The CONSORT-AI Extension. BMJ 2020, 370, m3164. [Google Scholar] [CrossRef]
Sousa-Pinto, B.; Marques-Cruz, M.; Neumann, I.; Chi, Y.; Nowak, A.J.; Reinap, M.; Awad, M.; Nothacker, M.; Trucl, M.; Brozek, J.; et al. Guidelines International Network: Principles for Use of Artificial Intelligence in the Health Guideline Enterprise. Ann. Intern. Med. 2025, 178, 408–415. [Google Scholar] [CrossRef]
Boeckhout, M.; Zielhuis, G.A.; Bredenoord, A.L. The FAIR Guiding Principles for Data Stewardship: Fair Enough? Eur. J. Hum. Genet. 2018, 26, 931–936. [Google Scholar] [CrossRef]
Wilkinson, M.D.; Sansone, S.A.; Méndez, E.; David, R.; Dennis, R.; Hecker, D.; Kleemola, M.; Lacagnina, C.; Nikiforova, A.; Castro, L.J. Community-Driven Governance of FAIRness Assessment: An Open Issue, an Open Discussion. Open Res. Eur. 2023, 2, 146. [Google Scholar] [CrossRef]
Vorisek, C.N.; Lehne, M.; Klopfenstein, S.A.I.; Mayer, P.J.; Bartschke, A.; Haese, T.; Thun, S. Fast Healthcare Interoperability Resources (FHIR) for Interoperability in Health Research: Systematic Review. JMIR Med. Inform. 2022, 10, e35724. [Google Scholar] [CrossRef]
Hernán, M.A.; Robins, J.M. Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available. Am. J. Epidemiol. 2016, 183, 758–764. [Google Scholar] [CrossRef]
Wicks, P.; Liu, X.; Denniston, A.K. Going on up to the SPIRIT in AI: Will New Reporting Guidelines for Clinical Trials of AI Interventions Improve Their Rigour? BMC Med. 2020, 18, 272. [Google Scholar] [CrossRef] [PubMed]
Chen, D.; Arnold, K.; Sukhdeo, R.; Alla, J.F.; Raman, S. Concordance with CONSORT-AI Guidelines in Reporting of Randomised Controlled Trials Investigating Artificial Intelligence in Oncology: A Systematic Review. BMJ Oncol. 2025, 4, e000733. [Google Scholar] [CrossRef]
Marques-Cruz, M.; Sousa-Pinto, B.; Wiercioch, W.; Reinap, M.; Neumann, I.; Chi, Y.; Nowak, A.; Awad, M.; Nothacker, M.; Brozek, J.; et al. Protocol for the Creation of the Guidelines International Network–McMaster Guideline Development Checklist Extension for Integrating Artificial Intelligence in the Guideline Enterprise (Guidelines-Artificial Intelligence Extension). Clin. Public Health Guidel. 2025, 2, e70038. [Google Scholar] [CrossRef]
Nicholson, N.; Perego, A. Interoperability of Population-Based Patient Registries. J. Biomed. Inform. X 2020, 112, 100074. [Google Scholar] [CrossRef] [PubMed]
Kourou, K.; Exarchos, T.P.; Exarchos, K.P.; Karamouzis, M.V.; Fotiadis, D.I. Machine Learning Applications in Cancer Prognosis and Prediction. Comput. Struct. Biotechnol. J. 2014, 13, 8–17. [Google Scholar] [CrossRef]
Regulation-2016/679-EN-Gdpr-EUR-Lex. Available online: https://eur-lex.europa.eu/eli/reg/2016/679/oj/eng (accessed on 9 December 2025).
Benjamens, S.; Dhunnoo, P.; Meskó, B. The State of Artificial Intelligence-Based FDA-Approved Medical Devices and Algorithms: An Online Database. NPJ Digit. Med. 2020, 3, 118. [Google Scholar] [CrossRef] [PubMed]
Badal, K.; Lee, C.M.; Esserman, L.J. Guiding Principles for the Responsible Development of Artificial Intelligence Tools for Healthcare. Commun. Med. 2023, 3, 47. [Google Scholar] [CrossRef]
World Health Organization. Ethics and Governance of Artificial Intelligence for Health: WHO Guidance. 2021, pp. 1–148. Available online: https://Iris.Who.Int/Bitstream/Handle/10665/350567/9789240037403-Eng.Pdf (accessed on 25 November 2025).
Panch, T.; Mattie, H.; Atun, R. Artificial Intelligence and Algorithmic Bias: Implications for Health Systems. J. Glob. Health 2019, 9, 010318. [Google Scholar] [CrossRef]
Perni, S.; Lehmann, L.S.; Bitterman, D.S. Patients Should Be Informed When AI Systems Are Used in Clinical Trials. Nat. Med. 2023, 29, 1890–1891. [Google Scholar] [CrossRef]
Fehr, J.; Citro, B.; Malpani, R.; Lippert, C.; Madai, V.I. A Trustworthy AI Reality-Check: The Lack of Transparency of Artificial Intelligence Products in Healthcare. Front. Digit. Health 2024, 6, 1267290. [Google Scholar] [CrossRef] [PubMed]
Seyyed-Kalantari, L.; Zhang, H.; McDermott, M.B.A.; Chen, I.Y.; Ghassemi, M. Underdiagnosis Bias of Artificial Intelligence Algorithms Applied to Chest Radiographs in Under-Served Patient Populations. Nat. Med. 2021, 27, 2176–2182. [Google Scholar] [CrossRef]
Omiye, J.A.; Lester, J.C.; Spichak, S.; Rotemberg, V.; Daneshjou, R. Large Language Models Propagate Race-Based Medicine. NPJ Digit. Med. 2023, 6, 195. [Google Scholar] [CrossRef] [PubMed]
Lambert, S.I.; Madi, M.; Sopka, S.; Lenes, A.; Stange, H.; Buszello, C.P.; Stephan, A. An Integrative Review on the Acceptance of Artificial Intelligence among Healthcare Professionals in Hospitals. NPJ Digit. Med. 2023, 6, 111. [Google Scholar] [CrossRef]
Berger, A.; Rustemeier, A.K.; Göbel, J.; Kadioglu, D.; Britz, V.; Schubert, K.; Mohnike, K.; Storf, H.; Wagner, T.O.F. How to Design a Registry for Undiagnosed Patients in the Framework of Rare Disease Diagnosis: Suggestions on Software, Data Set and Coding System. Orphanet J. Rare Dis. 2021, 16, 198. [Google Scholar] [CrossRef]
Raycheva, R.; Kostadinov, K.; Mitova, E.; Bogoeva, N.; Iskrov, G.; Stefanov, G.; Stefanov, R. Challenges in Mapping European Rare Disease Databases, Relevant for ML-Based Screening Technologies in Terms of Organizational, FAIR and Legal Principles: Scoping Review. Front. Public Health 2023, 11, 1214766. [Google Scholar] [CrossRef]
van Genderen, M.E.; van de Sande, D.; Hooft, L.; Reis, A.A.; Cornet, A.D.; Oosterhoff, J.H.F.; van der Ster, B.J.P.; Huiskens, J.; Townsend, R.; van Bommel, J.; et al. Charting a New Course in Healthcare: Early-Stage AI Algorithm Registration to Enhance Trust and Transparency. npj Digit. Med. 2024, 7, 119. [Google Scholar] [CrossRef]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should i Trust You?” Explaining the Predictions of Any Classifier. In Proceedings of the KDD’16: The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 4766–4775. [Google Scholar]
Freitas, A.A. Comprehensible Classification Models. ACM SIGKDD Explor. Newsl. 2014, 15, 1–10. [Google Scholar] [CrossRef]
Jansen, T.; Geleijnse, G.; van Maaren, M.; Hendriks, M.P.; Ten Teije, A.; Moncada-Torres, A. Machine Learning Explainability in Breast Cancer Survival. Stud. Health Technol. Inform. 2020, 270, 307–311. [Google Scholar] [CrossRef] [PubMed]
Haibe-Kains, B.; Adam, G.A.; Hosny, A.; Khodakarami, F.; Shraddha, T.; Kusko, R.; Sansone, S.A.; Tong, W.; Wolfinger, R.D.; Mason, C.E.; et al. Transparency and Reproducibility in Artificial Intelligence. Nature 2020, 586, E14–E16. [Google Scholar] [CrossRef] [PubMed]
Gundersen, O.E.; Kjensmo, S. State of the Art: Reproducibility in Artificial Intelligence. Proc. AAAI Conf. Artif. Intell. 2018, 32, 1644–1651. [Google Scholar] [CrossRef]
Hauschild, A.C.; Eick, L.; Wienbeck, J.; Heider, D. Fostering Reproducibility, Reusability, and Technology Transfer in Health Informatics. iScience 2021, 24, 102803. [Google Scholar] [CrossRef]
Ahmadi, N.; Zoch, M.; Guengoeze, O.; Facchinello, C.; Mondorf, A.; Stratmann, K.; Musleh, K.; Erasmus, H.P.; Tchertov, J.; Gebler, R.; et al. How to Customize Common Data Models for Rare Diseases: An OMOP-Based Implementation and Lessons Learned. Orphanet J. Rare Dis. 2024, 19, 298. [Google Scholar] [CrossRef]
Walsh, I.; Fishman, D.; Garcia-Gasulla, D.; Titma, T.; Pollastri, G.; Capriotti, E.; Casadio, R.; Capella-Gutierrez, S.; Cirillo, D.; Del Conte, A.; et al. DOME: Recommendations for Supervised Machine Learning Validation in Biology. Nat. Methods 2021, 18, 1122–1127. [Google Scholar] [CrossRef]
Renaux, A.; Terwagne, C.; Cochez, M.; Tiddi, I.; Nowé, A.; Lenaerts, T. A Knowledge Graph Approach to Predict and Interpret Disease-Causing Gene Interactions. BMC Bioinform. 2023, 24, 324. [Google Scholar] [CrossRef]
Versbraegen, N.; Gravel, B.; Nachtegael, C.; Renaux, A.; Verkinderen, E.; Nowé, A.; Lenaerts, T.; Papadimitriou, S. Faster and More Accurate Pathogenic Combination Predictions with VarCoPP2.0. BMC Bioinform. 2023, 24, 179. [Google Scholar] [CrossRef] [PubMed]
Matschinske, J.; Alcaraz, N.; Benis, A.; Golebiewski, M.; Grimm, D.G.; Heumos, L.; Kacprowski, T.; Lazareva, O.; List, M.; Louadi, Z.; et al. The AIMe Registry for Artificial Intelligence in Biomedical Research. Nat. Methods 2021, 18, 1128–1131. [Google Scholar] [CrossRef] [PubMed]
Akl, E.A.; Meerpohl, J.J.; Elliott, J.; Kahale, L.A.; Schünemann, H.J.; Agoritsas, T.; Hilton, J.; Perron, C.; Hodder, R.; Pestridge, C.; et al. Living Systematic Reviews: 4. Living Guideline Recommendations. J. Clin. Epidemiol. 2017, 91, 47–53. [Google Scholar] [CrossRef] [PubMed]
ESMO Living Guidelines. Available online: https://www.esmo.org/guidelines/living-guidelines (accessed on 29 December 2025).

Scheme 1. AI patient registry integration in formulating clinical guidelines. Data collection, analysis, validation, and model updating.

Figure 1. Schematic representation of the proposed AI-enhanced registry framework and living guideline lifecycle.

Scheme 2. Compliance Checklist (Excerpt).

Table 1. Comparative analysis: consensus-only versus AI + registries.

Dimension	Traditional Consensus (Delphi/NGT/RAND-UCLA)	AI + Real-World Registries (Proposed)
Evidence source	Literature + expert opinion	Registries/EHR/wearables; standardized via OMOP/FHIR [54,55,60]
Bias control	Subjective bias; limited reproducibility	Objective validation; causal design; cross-site consistency [61]
Update cycle	Years (periodic)	Continuous/event-driven (living)
Transparency	Narrative synthesis	FAIR metadata; auditable pipelines; model/dataset cards [53]
Validation	Expert review	Internal/external validation; calibration; sensitivity analyses [56,62]
Scalability	Limited by panel capacity	Federated analytics; code-to-data sharing [55]
Explainability	Expert rationale	XAI + effect modifier analysis; causal diagrams; model cards [55]
Cost & Time	High marginal cost; slow iteration	Lower marginal cost after setup; rapid iteration
Governance	Manual processes	GIN Principles; EU AI Act compliance; MLOps [44,57]
Integration with GRADE	Manual mapping	Automated signal extraction with panel adjudication to GRADE/EtD

Table 2. Main differences between traditional guideline development and AI-enhanced living guideline.

Feature	Traditional Guideline Development (Current)	AI-Enhanced Living Guideline (Proposed)
Evidence Basis	RCTs (strict inclusion criteria, clean populations)	Real-world registry data (heterogeneous and complex patients, different in age, sex, ethnicity, etc.)
Update Speed	Periodic (every 1–5 years). Static PDF/text	Continuous/triggered (e.g., monthly). Dynamic digital alerts
Granularity	Broad phenotypes (e.g., “Eosinophilic Asthma”)	Micro-clusters (e.g., “obese, non-atopic, eosinophilic, very late-onset”)
Recommendation	“Consider anti-IL5 if Eosinophils > 300 cells/µL”	“Probability of remission with anti-IL5 is 85% for this specific phenotype; consider as first line”
Feedback Loop	Passive (clinicians read guidelines)	Active (clinician outcomes are fed back into the registry to retrain the model)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gangemi, S.; Allegra, A.; Di Gioacchino, M.; Gammeri, L.; Cacciola, I.; Canonica, G.W. The Innovative Potential of Artificial Intelligence Applied to Patient Registries to Implement Clinical Guidelines. Mach. Learn. Knowl. Extr. 2026, 8, 38. https://doi.org/10.3390/make8020038

AMA Style

Gangemi S, Allegra A, Di Gioacchino M, Gammeri L, Cacciola I, Canonica GW. The Innovative Potential of Artificial Intelligence Applied to Patient Registries to Implement Clinical Guidelines. Machine Learning and Knowledge Extraction. 2026; 8(2):38. https://doi.org/10.3390/make8020038

Chicago/Turabian Style

Gangemi, Sebastiano, Alessandro Allegra, Mario Di Gioacchino, Luca Gammeri, Irene Cacciola, and Giorgio Walter Canonica. 2026. "The Innovative Potential of Artificial Intelligence Applied to Patient Registries to Implement Clinical Guidelines" Machine Learning and Knowledge Extraction 8, no. 2: 38. https://doi.org/10.3390/make8020038

APA Style

Gangemi, S., Allegra, A., Di Gioacchino, M., Gammeri, L., Cacciola, I., & Canonica, G. W. (2026). The Innovative Potential of Artificial Intelligence Applied to Patient Registries to Implement Clinical Guidelines. Machine Learning and Knowledge Extraction, 8(2), 38. https://doi.org/10.3390/make8020038

Article Menu

The Innovative Potential of Artificial Intelligence Applied to Patient Registries to Implement Clinical Guidelines

Abstract

1. Introduction

1.1. How to Generate Clinical Recommendations: The Consensus Development

1.2. Limits of Consensus Methods in the Implementation of Clinical Guidelines

2. Towards a Different Conception of the Formulation of Guidelines: The Introduction of Clinical Registries and Artificial Intelligence

2.1. The Evolution of Guideline Automation: From CIGs to AI

2.2. The Unexploited Wealth of Clinical Registries

2.3. Current AI Applications and Existing Gaps

3. Proposed Framework for AI-Enhanced Registries

3.1. Phase I—FAIR Data Curation and Interoperability

3.2. Phase II—Analysis and Causal Estimation

3.3. Phase III—Objective Validation and Reporting

3.4. Phase IV—Living Recommendations and Feedback

3.5. Case Scenario: Biologic Therapy Selection in Severe Asthma

3.5.1. The Traditional Consensus Approach

3.5.2. The AI-Registry Approach

3.6. Pilot Protocol (Conceptual Design)

4. Explainability, Reliability, and Interoperability in Artificial Intelligence Use

4.1. AI and Trustworthiness

4.2. AI and Explainability

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI