Previous Article in Journal
A Review of Non-Fully Supervised Deep Learning for Medical Image Segmentation
Previous Article in Special Issue
Explainable Recommender Systems Through Reinforcement Learning and Knowledge Distillation on Knowledge Graphs
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

SemFedXAI: A Semantic Framework for Explainable Federated Learning in Healthcare

by
Alba Amato
1,* and
Dario Branco
2
1
Department of Political Science, University of Campania “L. Vanvitelli”, 81100 Caserta, Italy
2
Department of Engineering, University of Campania “L. Vanvitelli”, 81031 Aversa, Italy
*
Author to whom correspondence should be addressed.
Information 2025, 16(6), 435; https://doi.org/10.3390/info16060435 (registering DOI)
Submission received: 26 April 2025 / Revised: 20 May 2025 / Accepted: 22 May 2025 / Published: 25 May 2025

Abstract

:
Federated Learning (FL) is emerging as an encouraging paradigm for AI model training in healthcare that enables collaboration among institutions without revealing sensitive information. The lack of transparency in federated models makes their deployment in healthcare settings more difficult, as knowledge of the decision process is of primary importance. This paper introduces SemFedXAI, a new framework that combines Semantic Web technologies and federated learning to achieve better explainability of artificial intelligence models in healthcare. SemFedXAI extends traditional FL architectures with three key components: (1) Ontology-Enhanced Federated Learning that enriches models with domain knowledge, (2) a Semantic Aggregation Mechanism that uses semantic technologies to improve the consistency and interpretability of federated models, and (3) a Knowledge Graph-Based Explanation component that provides contextualized explanations of model decisions. We evaluated SemFedXAI within the context of e-health, reporting noteworthy advancements in explanation quality and predictive performance compared to conventional federated learning methods. The findings refer to the prospects of combining semantic technologies and federated learning as an avenue for building more explainable and resilient AI systems in healthcare.

1. Introduction

Artificial Intelligence (AI) is rapidly transforming the healthcare field, introducing new channels to improve diagnosis, treatment, and disease management. However, deployment of AI technologies within healthcare settings is hindered by two main challenges: data privacy issues and a lack of transparent decisions. Data privacy is a chief concern, as health data are extremely sensitive and are controlled by regulations such as the General Data Protection Regulation (GDPR) within Europe. Healthcare institutions are reluctant to exchange data with each other, thus limiting the amount of information that can be made available to build accurate AI models. Federated learning (FL) has emerged as a revolutionary technique to overcome this obstacle by enabling collaborative training of a shared model while protecting the privacy of raw data [1,2]. In the FL paradigm, models are trained at each institute separately, and updates to parameters are sent to a centralized server that aggregates updates to build a holistic global model [3]. The second main challenge lies in transparent decision-making. AI models, especially those that are based on deep learning, tend to be “black boxes” because their complexity and lack of transparent decision processes make them incomprehensible. This lack of transparency becomes a crucial issue in healthcare, where professionals need to understand and be confident in AI-driven suggestions in order to adopt them within clinical protocols [4]. Explainable AI (XAI) aims to resolve this by designing protocols that make AI algorithms more understandable and interpretable [5]. Both FL and XAI have stimulated increased academic interest, but their integration is a remaining issue, particularly in healthcare, where both aspects are of the utmost relevance. In particular, the construction of coherent and understandable explanations in a federated environment is hindered by the distributed learning process and the heterogeneity of data within institutions. In this work, we propose SemFedXAI, a novel framework that combines Semantic Web technologies with federated learning to make artificial intelligence models more understandable in the healthcare field. The premise of SemFedXAI is that semantic technologies, such as ontologies and knowledge graphs, have the potential to provide a structured knowledge foundation for encoding and reasoning over domain knowledge, allowing more informative and contextualized explanations to be generated. Our framework extends traditional FL architectures with three key components:
  • Ontology-Enhanced Federated Learning: Integrates medical ontologies into the federated learning process to enrich models with domain knowledge.
  • Semantic Aggregation Mechanism: Uses semantic technologies to improve the consistency and interpretability of federated models during the aggregation process.
  • Knowledge Graph-Based Explanation: Provides contextualized explanations of model decisions based on knowledge graphs.
We implemented SemFedXAI by extending the fedbench framework [6] and evaluated it through a healthcare case study, demonstrating significant improvements in both explanation quality and predictive accuracy compared to traditional FL approaches. This paper’s main contributions to the field include a new framework that combines Semantic Web technologies with federated learning to improve explainability within artificial intelligence models in healthcare, a semantic aggregation algorithm that utilizes ontologies and knowledge graphs to provide federation-consistent and interpretable federated models, a knowledge graph-based explanation technique that provides contextualized interpretations of model choices within a federated environment, and an open-source implementation of the SemFedXAI framework that has been subjected to empirical evaluation in a healthcare environment. The rest of this paper is organized as follows: Section 2 gives an overview of the relevant literature in federated learning, explainable AI, and the Semantic Web; Section 3 gives a detailed overview of the SemFedXAI framework; Section 4 gives an overview of the experimental design and presents the findings; Section 5 considers the limitations and implications of our approach; and lastly, Section 6 concludes the paper and presents some directions for future work.

2. Related Work

2.1. Federated Learning in Healthcare

Federated learning (FL) is a distributed learning paradigm that enables the collaborative training of models without sharing the raw data. The paradigm is well suited for applications in healthcare, where data privacy is particularly important. McMahan et al. [7] introduced FedAvg, which takes the weighted average of local model parameters to attain a global model. Subsequently, several modifications have been proposed to address specific problems such as heterogeneity of data [8], communication efficiency [9], and robustness to attacks [10]. In healthcare, FL has been applied to various problems such as adverse event prediction [11], disease diagnosis [12], and personalized treatment [13]. Rieke et al. [14] provided a comprehensive review of FL applications in healthcare, discussing opportunities and challenges. Despite the progress, the lack of explainability of federated models remains a significant impediment to their implementation in clinical practice.

2.2. Explainable AI and Interpretable Models

Explainable AI (XAI) aims to make AI models more interpretable and understandable [15]. XAI techniques can be categorized into two general types: intrinsically interpretable techniques and post hoc explanation techniques [16]. Intrinsically interpretable techniques include models such as decision trees, association rules, and fuzzy systems, which are intrinsically transparent [17]. They are very interpretable, but at the expense of predictive performance, especially on complex data. Post hoc explanation techniques, on the other hand, attempt to interpret the decisions of “black box” models after training. They encompass feature-importance-based approaches such as LIME [18] and SHAP [19], which attribute importance to each input feature for a single prediction. Other approaches include example-based explanations [20], counterfactual explanations [21], and activation visualizations [22]. In medicine, XAI is especially crucial in order to obtain the trust of clinicians and enable the integration of AI into everyday practice [23]. The majority of current XAI methods, however, are tailored to centralized models and do not specifically tackle the inherent difficulties of federated environments.

2.3. Semantic Web and Knowledge Management in Healthcare

The Semantic Web provides a model for representing and integrating knowledge according to standards such as RDF, OWL, and SPARQL [24]. In the health sector, semantic technologies have been widely utilized to represent medical knowledge in terms of ontologies such as SNOMED CT [25], Gene Ontology [26], and ICD-10 [27]. Knowledge graphs, which show entities and relations in graph representation, are becoming powerful tools for integrating varied knowledge and making inferences from it [28]. In healthcare, knowledge graphs have been employed to support diagnosis [29], detect drug interactions [30], and personalize treatments [31]. There has been growing interest in combining semantic technology with AI to improve the interpretability and transparency of models [32]. Methods such as neurosymbolic models [33] and learning from a knowledge graph [34] strive to combine the power of deep learning to predict with the power of symbolic representations to reason.

2.4. Integration of FL, XAI, and Semantic Technologies

While progress has been made in the individual domains of FL, XAI, and the Semantic Web, their integration remains an emerging research area. Some recent works explored the integration of FL and XAI [35,36] without exploring the promise of semantic technologies. Ducange et al. [37] explored federated learning of XAI models in healthcare, comparing fuzzy rule-based systems (interpretable by design) and neural networks with post hoc explanations. However, they did not take into account the use of ontologies or knowledge graphs for explanation improvement. Corcuera Bárcena et al. [38] proposed a beyond-5G/6G network FL-as-a-Service architecture with explainability mechanisms integrated, but not with a focus on semantic technologies or healthcare applications. Our contribution differs from the preceding ones in the synergistic combination of FL, XAI and semantic technologies within a single framework, aimed at enhancing the explainability of AI models in healthcare. SemFedXAI is, to the best of our knowledge, the first framework to leverage ontologies and knowledge graphs to enhance both federated model fusion and explanation generation in a healthcare setting.

3. The SemFedXAI Framework

3.1. General Architecture

SemFedXAI extends the traditional client–server architecture of federated learning with both server-side and client-side semantic components. Figure 1 illustrates the general architecture of the framework.
On the server side, SemFedXAI includes three main components. The Semantic Aggregator is in charge of aggregating local models using semantic techniques to improve the consistency and interpretability of the global model. The Knowledge Graph Manager deals with the management of the global knowledge graph that integrates domain knowledge with model information and explanations. Moreover, the Explanation Generator generates global explanations based on the knowledge graph for the federated model predictions. On the client side, SemFedXAI includes three complementary components. The first one is the Local Knowledge Enhancer, which enriches local data with semantic knowledge to improve training quality. Then, the Ontology-Aware Model Trainer trains local models that incorporate ontological knowledge to improve their interpretability. Finally, the Local Explainer generates local explanations in context for local model predictions.
In Figure 2, we provide a general overview of the general workflow of SemFedXAI through a sequence diagram. The steps of the workflow can be summarized as follows:
  • The server creates the overall model and sends it to the clients;
  • All the clients enrich their datasets using semantic knowledge through the Local Knowledge Enhancer;
  • The Ontology-Aware Model Trainer is used by clients for training local models;
  • Clients generate local explanations using the Local Explainer;
  • Clients send local model parameters and explanations to the server;
  • The server aggregates the local models using the Semantic Aggregator;
  • The server updates the global knowledge graph with the new information using the Knowledge Graph Manager;
  • The server generates global explanations using the Explanation Generator;
  • The server sends the updated global model to the client;
  • Steps 1–9 are repeated for a predefined number of rounds.

3.2. Ontology-Enhanced Federated Learning

Ontology-Enhanced Federated Learning extends the basic federated learning architecture by incorporating knowledge expressed by medical ontologies. This aspect of the architecture occurs mainly on the client side, where the Local Knowledge Enhancer relies upon ontologies to enrich data before proceeding to the training stage. Semantic data enrichment is a complex process that involves a few steps. The first stage, known as feature mapping, involves mapping data features to concepts expressed in the medical ontology. One example would be to connect the feature “systolic_blood pressure” to the concept “SystolicPressure” defined in the ontology. Then, the process proceeds to relation extraction, which determines the interrelations between concepts in the ontology that help to clarify the semantic relationships between the features. A good example would be that the ontology creates a connection that indicates that “SystolicPressure” is related to “Hypertension”. Next, semantic feature generation leverages identified relationships to construct new features with semantic meaning. One example of this would be to generate a feature that combines “systolic_blood_pressure” and “diastolic_blood_pressure” based upon their relational interpretation and meaning that the ontology supplies. The final stage involves integration into training data. The Ontology-Aware Model Trainer proceeds to use this enriched dataset to build local models that integrate inherent knowledge pertaining to the field. This approach not only enhances prediction accuracy but also enhances explainability, since semantic features maintain a direct relationship with domain concepts.

3.3. Semantic Aggregation Mechanism

The Semantic Aggregation Mechanism extends the traditional FedAvg algorithm [7] by incorporating semantic information into the aggregation process. While FedAvg computes a weighted average of the local model parameters, our approach modifies the aggregation weights based on the semantic relevance of the models. The semantic aggregation algorithm operates by receiving local model parameters and associated metrics such as accuracy and loss, computing initial aggregation weights based on these metrics, similar to FedAvg. These weights are then enriched with semantic information extracted from the knowledge graph. After normalization of the enriched weights, the algorithm aggregates the model parameters using these weights.
Semantic enrichment of weights is based on several factors, including concept relevance, where models that have learned semantically relevant features for the problem at hand receive a greater weight; consistency with domain knowledge, where models whose predictions are more consistent with the domain knowledge represented in the ontology receive a greater weight; and semantic diversity, ensuring that models that capture semantically different aspects of the problem receive weights that reflect their complementarity. Formally, the semantically enriched weight for client i is defined as
w i s e m = w i · ( 1 + α · S R i ) · ( 1 + β · D C i ) · γ S D i
where the variables are defined as follows:
  • w i is the original metric-based weight;
  • S R i is the semantic relevance of model i;
  • D C i is the domain knowledge consistency of model i;
  • S D i is the semantic diversity of model i;
  • α , β , and γ are hyperparameters that control the influence of each factor.
This approach assures improvement to the prediction capacity of the global model at the same time that semantic knowledge is absorbed coherently and its interpretability increases.

3.4. Knowledge Graph-Based Explanation

The Knowledge Graph-Based Explanation framework generates contextualized interpretations of model outputs by leveraging a global knowledge graph. This framework operates at both the server level (Explanation Generator) and the client level (Local Explainer). Explanation generation involves starting with feature importance calculation, applying methods such as SHAP [19] to evaluate the importance of each feature to a particular prediction. Then, semantic mapping allows alignment of the features and their importance with concepts in the knowledge graph. Then, domain knowledge enrichment enriches the explanations with information based on data drawn from the knowledge graph, such as relationships to other concepts, definitions, and contextual information. Finally, multi-level explanation generation generates explanations with varying levels of granularity, ranging from simple lists of key features to rich, contextualized narratives that are drawn from the knowledge graph. At the client level, the Local Explainer generates contextualized explanations for local predictions made by the model based on the local knowledge graph. These explanations and the parameters of the model are communicated to the server. At the server level, the Explanation Generator aggregates the local explanations and further refines them by integrating them with the global knowledge graph. This approach guarantees that the explanations are aligned with domain expertise and are understandable to end users. A salient feature of our method is that it can generate explanations that not only highlight the importance of features but also convey the semantic relationships between the features and concepts in the domain. For example, in addition to the explanation that “systolic blood pressure” is a feature that plays a key role in predicting ”cardiovascular risk”, the explanation can provide more information regarding how this relationship is grounded by medical knowledge linking hypertension to cardiovascular disease.
The global knowledge graph operates exclusively on aggregated and anonymized data. It does not store any raw or patient-identifiable information. Instead, it encodes abstract concepts and semantic relationships—such as disease–symptom–treatment patterns—derived from standardized medical ontologies (e.g., SNOMED CT). These serve as semantic support for the explainability mechanisms. Likewise, local explanations are built on general ontological concepts and on model-derived elements such as the aggregate importance of features. This strategy ensures that neither the local nor the global explanations expose any individual-level data, maintaining compliance with privacy constraints in healthcare applications.

3.5. Implementation

The SemFedXAI architecture was established by extending the fedbench framework [6], which is a client–server architecture adapted to federated learning. The execution was carried out in Python (v3.10.12), with the use of TensorFlow to develop the model, RDFLib to manipulate the knowledge graphs, and SHAP to generate interpretative explanations. The architecture complies with a modular design to enable the substitution or upgrading of different components. For example, the software can incorporate distinct medical ontologies, aggregate algorithms, or explanation methods depending on its necessities. Below, described in Algorithm 1, is a representative code snippet that outlines the implementation of the Semantic Aggregator:
Algorithm 1: Semantic aggregator for federated learning
Information 16 00435 i001

4. Experiments and Results

4.1. Experimental Setup

To evaluate SemFedXAI’s efficacy, we conducted experimental studies within a healthcare case study, with synthetic data created to simulate realistic material for federated learning in healthcare settings.

4.1.1. Dataset

Synthetic data were generated to simulate the medical data of patients.
The choice to use synthetic data was mainly motivated by the need to (a) simulate a multi-client federated learning environment with specific characteristics (data heterogeneity and non-IID distribution) in a controlled way for a rigorous evaluation of the SemFedXAI framework, and (b) circumvent the stringent privacy and availability limitations associated with the use of real health data, especially in the early validation phase of the framework. The data were generated to reflect the typical characteristics of clinical data related to patients with different cardiovascular risk profiles, including 20 relevant variables (such as age, blood pressure, and glucose levels) and three risk classes (healthy, medium risk, and high risk). A total of 10,000 samples were generated and distributed across five clients, with class imbalance and data heterogeneity introduced through non-IID partitioning. To simulate realistic scenarios of data heterogeneity across different healthcare institutions, we partitioned the synthetic dataset among the five clients using a Dirichlet distribution applied to class labels. This approach is widely adopted to simulate non-IID (non-independent and identically distributed) data in federated settings. The degree of heterogeneity is controlled by the alpha parameter: lower values of alpha (e.g., 0.1) lead to highly skewed distributions where clients receive samples from only a few classes, while higher values (e.g., 10.0) approximate a more uniform, IID-like setting.
We performed a series of experiments varying alpha in {0.1, 0.5, 1.0, 5.0, 10.0} to evaluate the robustness of SemFedXAI under different non-IID conditions. The results, summarized in Table 1, showed that even in the presence of high heterogeneity, the framework maintained acceptable performance.
These results confirm that SemFedXAI exhibits good stability and convergence across various degrees of non-IID data distributions.

4.1.2. Medical Ontology

A domain-specific ontology in the OWL format was designed to define concepts such as disease, symptom, treatment, and measurement and their relationships. This ontology includes mappings that relate characteristics of datasets to medical concepts and that lay a semantic basis for data enrichment and explanation generation.
It integrates concepts from domain-specific concepts and relationships from standard clinical ontologies such as SNOMED CT and ICD-10, while being adapted for semantic enrichment and explanation purposes within the SemFedXAI framework. The ontology defines core classes such as Disease, Symptom, Measurement, Treatment, and RiskFactor and includes object properties like hasRiskFactor, hasSymptom, and isTreatedWith to represent clinical relationships. A representative example from the ontology is the following triple:
Hypertension    hasRiskFactor    HighSystolicPressure
This structure supports semantic reasoning and alignment between data features and medical knowledge.

4.1.3. Models and Configuration

We compared four approaches:
  • FedAvg: The standard federated averaging algorithm [7], without any semantics or explainability components;
  • FedXAI: A federated learning framework that includes post hoc interpretability using SHAP [19], but without semantic elements;
  • SemFed: A federated learning framework integrating semantic aspects, namely, Ontology-Enhanced Federated Learning and Semantic Aggregation, with optional post hoc explainability but without explicit integration of semantic explanations;
  • SemFedXAI: Our overall approach that captures all aspects presented in Section 3.
For all approaches, we used a neural network with the same architecture: two hidden layers with 128 and 64 neurons, respectively; ReLU activation; and 20% dropout. Training was performed for 100 federated rounds, with five local epochs per round, using the Adam optimizer with a learning rate of 0.001 and a batch size of 32.

4.1.4. Evaluation Metrics

Several measures were used to evaluate the approaches. Specifically, we used predictive accuracy, measuring the ability of a predictive model to accurately forecast the outcome of novel instances of data; quality of explanations, measured as accuracy in the identification of relevant features from the synthetic dataset; understandability of explanations, measured using an indicator of domain concepts and semantic relations occurring in the explanations; and computational overhead, consisting of the running time and memory usage of the various components.

4.2. Results

4.2.1. Predictive Accuracy

Table 2 shows the predictive accuracy of the four approaches on the test dataset. These results correspond to a scenario with Dirichlet α = 5.0 , representing a moderate level of non-IID distribution across clients.
SemFedXAI achieved the highest accuracy (85.3%), followed by SemFed (82.1%), FedXAI (76.4%), and FedAvg (73.5%). These results demonstrate that integrating semantic components improves predictive accuracy, likely due to enriching the data with domain knowledge and semantic aggregation that promotes more consistent models.

4.2.2. Explanation Quality

Figure 3 shows the accuracy of the three approaches that include explainability components (FedXAI, SemFed with post hoc explanations, and SemFedXAI) in identifying important features.
SemFedXAI achieved the highest accuracy (0.85), followed by FedXAI (0.70) and SemFed with post hoc explanations (0.65). These results indicate that integrating semantic components significantly improves the quality of explanations, helping the model to identify the truly medically important features.

4.2.3. Explanation Comprehensibility

Figure 4 shows the explanation comprehensibility scores for the three approaches with explainability components.
SemFedXAI achieved the highest comprehensibility score (8.7/10), followed by SemFed with post hoc explanations (6.2/10) and FedXAI (4.5/10). These results demonstrate that knowledge graph-based explanations are significantly more understandable to users, due to the inclusion of domain concepts and semantic relations.

4.2.4. Computational Overhead

Table 3 shows the computational overhead of the four approaches in terms of execution time and memory usage. These measurements were obtained using a lightweight experimental setup to simulate constrained execution environments.
As expected, SemFedXAI has the highest computational overhead, with a 35% increase in execution time and a 28% increase in memory usage compared to FedAvg. However, considering the significant improvements in accuracy and explainability, we believe this overhead is acceptable for many healthcare applications where decision transparency is crucial.

4.3. Qualitative Analysis of Explanations

In addition to the quantitative metrics, we conducted a qualitative analysis of the explanations generated by the different approaches. Figure 5 shows examples of explanations for a “high risk” prediction.
The example clearly illustrates the difference in richness and contextualization of the explanations. FedXAI provides a simple list of important features along with their SHAP values. SemFed, which incorporates post hoc explanations, adds some semantic information, although in a limited way. In contrast, SemFedXAI delivers a rich explanation that not only highlights the importance of features but also contextualizes them through their relationships to medical concepts, supported by references to the underlying ontology. For example, while FedXAI simply indicates that “systolic blood pressure = 160” is an important feature, SemFedXAI explains that “systolic blood pressure = 160 is significantly above the normal range (90–120 mmHg) and indicates hypertension, which is a known risk factor for cardiovascular disease according to the medical ontology”. This semantic richness makes SemFedXAI’s explanations much more useful for healthcare professionals, who can easily connect the model’s predictions to their medical knowledge.

5. Discussion

5.1. Implications

The findings of our research demonstrate that the integration of semantic technologies and federated learning improves the accuracy and the interpretability of artificial intelligence models in the field of healthcare. This has several significant implications, including the following:
  • Increased trust: Richer and more exhaustive explanations provide a means to build healthcare professionals’ trust in AI systems, thus ensuring increased adoption within clinical settings.
  • Clinical decision-making support: Explanations based on knowledge graphs provide essential information to stakeholders that can enhance clinical decision-making by linking predictions made by the models to established medical knowledge.
  • Compliance with regulations: The proposed approach can ensure compliance with regulations like GDPR that require protection of personal data to be combined with algorithmic explainability.
  • Semantic interoperability: The use of standard ontologies can improve interoperability between heterogeneous AI systems within the healthcare sector, enabling more efficient integration and exchange of knowledge.

5.2. Limitations and Challenges

Despite the positive results, our approach is accompanied by several limitations and challenges that require further investigation:
  • Computational overhead: As outlined in Section 4.2.4, SemFedXAI demands significant computational resources. Algorithmic improvements and more efficient implementations could alleviate this cost.
  • Scalability: The computational experiments carried out in this work were executed with a limited number of clients and features. The scalability of the presented methodology in settings with numerous clients or large features is a direction that deserves further investigation.
  • Ontology quality: The fitness of SemFedXAI is largely dependent on the quality and completeness of the underlying medical ontology. Defective or incomplete ontologies can lead to interpretations that are incorrect.
  • Evaluation with real users: Although we tested readability using quantitative metrics, experimental testing with real healthcare professionals is required to determine the real-world effectiveness of the produced explanations.
  • Privacy of explanations: The explanations provided have an inherent risk of leaking personal information related to the training data. Further work aimed at preserving the privacy of such explanations is needed.

5.3. Comparison with Alternative Approaches

It is important to compare SemFedXAI with alternative approaches for integrating FL and XAI:
  • Inherently interpretable models: An alternative approach includes the deployment of models based on intrinsic interpretability, such as fuzzy systems or decision trees, in a federated system [37]. Even with some level of transparency, such models tend to compromise prediction accuracy when faced with intricate datasets.
  • Local vs. global explanations: SemFedXAI generates both local (client-level) and global (server-level) explanations. Alternative approaches could focus only on one of the two levels, sacrificing either local customization or global consistency of explanations.
  • Neurosymbolic approaches: recent work on neurosymbolic models [33] offers an interesting alternative to integrate symbolic knowledge into deep learning models. However, their application in federated contexts remains largely unexplored.
SemFedXAI stands out from these approaches for its synergistic integration of FL, XAI, and semantic technologies into a unified framework, specifically designed to improve the explainability of AI models in healthcare.

6. Conclusions and Future Work

6.1. Conclusions

In this paper, we presented SemFedXAI, a novel framework that integrates Semantic Web technologies with federated learning to improve the explainability of AI models in healthcare. SemFedXAI extends traditional FL architectures with three key components: Ontology-Enhanced Federated Learning, a Semantic Aggregation Mechanism, and Knowledge Graph-Based Explanation. Our experiments using a healthcare case study demonstrated that SemFedXAI significantly improves predictive accuracy, explanation quality, and explanation understandability compared to traditional approaches. The explanations generated by SemFedXAI are rich in semantic context, linking the model’s predictions to established medical knowledge represented in ontologies. These results highlight the potential of integrating semantic technologies into federated learning to develop more transparent and reliable technologies for use in healthcare, thus facilitating their adoption in clinical practice. SemFedXAI enables the development of explainable AI models that respect data privacy while providing semantically enriched and interpretable outputs aligned with medical knowledge. This framework is particularly useful in clinical decision support scenarios, where trust and transparency are essential. By allowing each institution to maintain control over its data while contributing to a shared model enriched by ontological knowledge, SemFedXAI enables collaboration without compromising confidentiality, a crucial requirement in real-world healthcare implementations.

6.2. Future Work

Several promising directions for future work emerge from this research. One potential avenue is the extension of SemFedXAI to real-world healthcare case studies, utilizing actual clinical data to assess its performance in practical scenarios. Although synthetic data were used in this study to enable controlled experimentation, future evaluations will target realistic datasets to assess clinical transferability. In particular, we plan to use the MIMIC-III dataset, a widely adopted critical care database, to test SemFedXAI in a real-world medical setting. For more lightweight simulations and baseline comparisons, the UCI Heart Disease dataset may also serve as a reference. These datasets will help us assess both the robustness and the practical relevance of the framework with real data distributions and constraints.
Another direction involves integrating the framework with standardized medical ontologies such as SNOMED CT, ICD-10, and LOINC to enhance interoperability with existing healthcare systems. Developing methods for multimodal explanations that combine different data types—such as text, images, and signals—in a federated context is also a key area of interest. Additionally, the application of differential privacy techniques to the generated explanations should be explored to ensure the protection of sensitive information. Supporting continuous learning scenarios, in which new data and knowledge are incrementally incorporated into the system, represents another promising extension. Engaging healthcare professionals in end-user evaluations will be essential for assessing the practical usefulness of the generated explanations and for collecting feedback to guide further improvements. Lastly, efforts should be directed toward computational optimizations, aiming to develop more efficient implementations of the semantic components and enhance the scalability of the overall framework.

Author Contributions

Conceptualization, A.A. and D.B.; methodology, A.A. and D.B.; software, A.A. and D.B.; validation, A.A. and D.B.; formal analysis, A.A. and D.B.; investigation, A.A. and D.B.; resources, A.A. and D.B.; data curation, A.A. and D.B.; writing—original draft preparation, A.A. and D.B.; writing—review and editing, A.A. and D.B.; visualization, A.A. and D.B.; supervision, A.A. and D.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Yang, Q.; Liu, Y.; Chen, T.; Tong, Y. Federated machine learning: Concept and applications. ACM Trans. Intell. Syst. Technol. (TIST) 2019, 10, 1–19. [Google Scholar] [CrossRef]
  2. Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and open problems in federated learning. Found. Trends Mach. Learn. 2021, 14, 1–210. [Google Scholar] [CrossRef]
  3. Yurdem, B.; Kuzlu, M.; Gullu, M.K.; Catak, F.O.; Tabassum, M. Federated learning: Overview, strategies, applications, tools and future directions. Heliyon 2024, 10, e38137. [Google Scholar] [CrossRef] [PubMed]
  4. Holzinger, A.; Langs, G.; Denk, H.; Zatloukal, K.; Müller, H. Causability and explainability of artificial intelligence in medicine. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, 9, e1312. [Google Scholar] [CrossRef]
  5. Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; García, S.; Gil-López, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
  6. Branco, D. Fedbench: Federated Learning Clients/Server Desktop Implementation. GitHub Repository, 2023. Available online: https://github.com/vanvitellicode/fedbench (accessed on 6 April 2025).
  7. McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
  8. Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450. [Google Scholar]
  9. Konečný, J.; McMahan, H.B.; Yu, F.X.; Richtárik, P.; Suresh, A.T.; Bacon, D. Federated learning: Strategies for improving communication efficiency. arXiv 2016, arXiv:1610.05492. [Google Scholar]
  10. Bagdasaryan, E.; Veit, A.; Hua, Y.; Estrin, D.; Shmatikov, V. How to backdoor federated learning. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, Palermo, Italy, 26–28 August 2020; pp. 2938–2948. [Google Scholar]
  11. Brisimi, T.S.; Chen, R.; Mela, T.; Olshevsky, A.; Paschalidis, I.C.; Shi, W. Federated learning of predictive models from federated electronic health records. Int. J. Med. Inform. 2018, 112, 59–67. [Google Scholar] [CrossRef]
  12. Liu, D.; Miller, T.; Sayeed, R.; Mandl, K.D. FADL: Federated-autonomous deep learning for distributed electronic health record. In Proceedings of the Machine Learning for Healthcare Conference, Virtual, 6–7 August 2021; pp. 811–837. [Google Scholar]
  13. Huang, L.; Shea, A.L.; Qian, H.; Masurkar, A.; Deng, H.; Liu, D. Patient clustering improves efficiency of federated machine learning to predict mortality and hospital stay time using distributed electronic medical records. J. Biomed. Inform. 2019, 99, 103291. [Google Scholar] [CrossRef]
  14. Rieke, N.; Hancox, J.; Li, W.; Milletarì, F.; Roth, H.R.; Albarqouni, S.; Bakas, S.; Galtier, M.N.; Landman, B.A.; Maier-Hein, K.; et al. The future of digital health with federated learning. NPJ Digit. Med. 2020, 3, 119. [Google Scholar] [CrossRef]
  15. Gunning, D.; Stefik, M.; Choi, J.; Miller, T.; Stumpf, S.; Yang, G.Z. XAI—Explainable artificial intelligence. Sci. Robot. 2019, 4, eaay7120. [Google Scholar] [CrossRef]
  16. Molnar, C. Interpretable Machine Learning; Lulu.Com: Morrisville, NC, USA, 2020. [Google Scholar]
  17. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef]
  18. Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
  19. Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Proceedings of the Advances in Neural Information Processing Systems 30, Long Beach, CA, USA, 4–9 December 2017; pp. 4765–4774. [Google Scholar]
  20. Cai, C.J.; Winter, S.; Steiner, D.; Wilcox, L.; Terry, M. “Hello AI”: Uncovering the onboarding needs of medical practitioners for human-AI collaborative decision-making. Proc. ACM Hum. Comput. Interact. 2019, 3, 1–24. [Google Scholar] [CrossRef]
  21. Wachter, S.; Mittelstadt, B.; Russell, C. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. J. Law Technol. 2017, 31, 841. [Google Scholar] [CrossRef]
  22. Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
  23. Tonekaboni, S.; Joshi, S.; McCradden, M.D.; Goldenberg, A. What clinicians want: Contextualizing explainable machine learning for clinical end use. In Proceedings of the Machine Learning for Healthcare Conference, Ann Arbor, MI, USA, 9–10 August 2019; pp. 359–380. [Google Scholar]
  24. Berners-Lee, T.; Hendler, J.; Lassila, O. The semantic web. Sci. Am. 2001, 284, 34–43. [Google Scholar] [CrossRef]
  25. Donnelly, K. SNOMED-CT: The advanced terminology and coding system for eHealth. Stud. Health Technol. Inform. 2006, 121, 279. [Google Scholar]
  26. Ashburner, M.; Ball, C.A.; Blake, J.A.; Botstein, D.; Butler, H.; Cherry, J.M.; Davis, A.P.; Dolinski, K.; Dwight, S.S.; Eppig, J.T.; et al. Gene ontology: Tool for the unification of biology. Nat. Genet. 2000, 25, 25–29. [Google Scholar] [CrossRef]
  27. World Health Organization. ICD-10: International Statistical Classification of Diseases and Related Health Problems: Tenth Revision; World Health Organization: Geneva, Switzerland, 2004. [Google Scholar]
  28. Hogan, A.; Blomqvist, E.; Cochez, M.; d’Amato, C.; Melo, G.D.; Gutierrez, C.; Kirrane, S.; Gayo, J.E.L.; Navigli, R.; Neumaier, S.; et al. Knowledge graphs. ACM Comput. Surv. 2021, 54, 1–37. [Google Scholar] [CrossRef]
  29. Rotmensch, M.; Halpern, Y.; Tlimat, A.; Horng, S.; Sontag, D. Learning a health knowledge graph from electronic medical records. Sci. Rep. 2017, 7, 5994. [Google Scholar] [CrossRef]
  30. Celebi, R.; Uyar, H.; Yasar, E.; Gumus, O.; Dikenelli, O.; Dumontier, M. Evaluation of knowledge graph embedding approaches for drug-drug interaction prediction in realistic settings. BMC Bioinform. 2019, 20, 726. [Google Scholar] [CrossRef]
  31. Kamdar, M.R.; Musen, M.A. PhLeGrA: Graph analytics in pharmacology over the web of life sciences linked open data. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 321–329. [Google Scholar]
  32. Tiddi, I.; Schlobach, S. Knowledge graphs as tools for explainable machine learning: A survey. Artif. Intell. 2022, 302, 103627. [Google Scholar] [CrossRef]
  33. Garcez, A.D.A.; Lamb, L.C. Neurosymbolic AI: The 3rd Wave. arXiv 2020, arXiv:2012.05876. [Google Scholar] [CrossRef]
  34. Yang, J.; Xu, X.; Xiao, G.; Shen, Y. A Survey of Knowledge Enhanced Pre-trained Language Models. arXiv 2021, arXiv:2110.00269. [Google Scholar] [CrossRef]
  35. Peng, H.; Li, H.; Song, J.; Zheng, V.; Li, J. Differentially Private Federated Knowledge Graphs Embedding. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (CIKM ’21), Gold Coast, Australia, 1–5 November 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 1416–1425. [Google Scholar] [CrossRef]
  36. Xu, J.; Wang, F.; Tao, D. Federated learning for healthcare informatics. J. Healthc. Inform. Res. 2021, 5, 1–19. [Google Scholar] [CrossRef] [PubMed]
  37. Ducange, P.; Marcelloni, F.; Renda, A.; Ruffini, F. Federated Learning of XAI Models in Healthcare: A Case Study on Parkinson’s Disease. Cogn. Comput. 2024, 16, 3051–3076. [Google Scholar] [CrossRef]
  38. Corcuera Bárcena, J.L.; Ducange, P.; Marcelloni, F.; Nardini, G.; Noferi, A.; Renda, A.; Ruffini, F.; Schiavo, A.; Stea, G.; Virdis, A. Enabling federated learning of explainable AI models within beyond-5G/6G networks. Comput. Commun. 2023, 211, 356–375. [Google Scholar] [CrossRef]
Figure 1. General architecture of the SemFedXAI framework, which extends the traditional client–server architecture of federated learning with both server-side and client-side semantic components.
Figure 1. General architecture of the SemFedXAI framework, which extends the traditional client–server architecture of federated learning with both server-side and client-side semantic components.
Information 16 00435 g001
Figure 2. SemFedXAI sequence diagram.
Figure 2. SemFedXAI sequence diagram.
Information 16 00435 g002
Figure 3. Accuracy of different approaches in identifying important features.
Figure 3. Accuracy of different approaches in identifying important features.
Information 16 00435 g003
Figure 4. Comprehensibility of explanations for different approaches.
Figure 4. Comprehensibility of explanations for different approaches.
Information 16 00435 g004
Figure 5. Examples of explanations generated by the different approaches for a “high risk” prediction.
Figure 5. Examples of explanations generated by the different approaches for a “high risk” prediction.
Information 16 00435 g005
Table 1. Accuracy under varying Dirichlet alpha values (non-IID scenarios).
Table 1. Accuracy under varying Dirichlet alpha values (non-IID scenarios).
AlphaFedAvgFedXAISemFedSemFedXAI
0.141.245.652.356.9
0.554.859.366.570.4
1.064.769.075.279.1
5.073.576.482.185.3
10.075.178.284.086.7
Table 2. Predictive accuracy of different approaches.
Table 2. Predictive accuracy of different approaches.
ApproachAccuracy (%)
FedAvg73.5
FedXAI76.4
SemFed82.1
SemFedXAI85.3
Table 3. Computational overhead of different approaches.
Table 3. Computational overhead of different approaches.
ApproachExecution Time (s)Memory Usage (MB)
FedAvg120450
FedXAI145480
SemFed150520
SemFedXAI162575
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Amato, A.; Branco, D. SemFedXAI: A Semantic Framework for Explainable Federated Learning in Healthcare. Information 2025, 16, 435. https://doi.org/10.3390/info16060435

AMA Style

Amato A, Branco D. SemFedXAI: A Semantic Framework for Explainable Federated Learning in Healthcare. Information. 2025; 16(6):435. https://doi.org/10.3390/info16060435

Chicago/Turabian Style

Amato, Alba, and Dario Branco. 2025. "SemFedXAI: A Semantic Framework for Explainable Federated Learning in Healthcare" Information 16, no. 6: 435. https://doi.org/10.3390/info16060435

APA Style

Amato, A., & Branco, D. (2025). SemFedXAI: A Semantic Framework for Explainable Federated Learning in Healthcare. Information, 16(6), 435. https://doi.org/10.3390/info16060435

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop