Next Article in Journal
Psychometric Evaluation of the 15-Item Five Facet Mindfulness Questionnaire: A Cross-Cultural Comparison Study Among English- and Chinese-Speaking Adult Mental Health Service Users
Previous Article in Journal
Study of Behaviors Related to Over-the-Counter Medications, in Particular Nonsteroidal Anti-Inflammatory Drugs, in the General Polish Population
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Systematic Review

Federated Learning in Healthcare Ethics: A Systematic Review of Privacy-Preserving and Equitable Medical AI

by
Bilal Ahmad Mir
1,†,
Syed Raza Abbas
2,† and
Seung Won Lee
2,3,4,5,6,*
1
Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea
2
Department of Precision Medicine, Sungkyunkwan University School of Medicine, Suwon 16419, Republic of Korea
3
Department of Metabiohealth, Institute for Cross-Disciplinary Studies, Sungkyunkwan University, Suwon 16419, Republic of Korea
4
Department of Artificial Intelligence, Sungkyunkwan University, Suwon 16419, Republic of Korea
5
Personalized Cancer Immunotherapy Research Center, Sungkyunkwan University School of Medicine, Suwon 16419, Republic of Korea
6
Department of Family Medicine, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, 29 Saemunan-ro, Jongno-gu, Seoul 03181, Republic of Korea
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Healthcare 2026, 14(3), 306; https://doi.org/10.3390/healthcare14030306 (registering DOI)
Submission received: 18 December 2025 / Revised: 21 January 2026 / Accepted: 23 January 2026 / Published: 26 January 2026

Abstract

Background/Objectives: Federated learning (FL) offers a way for healthcare institutions to collaboratively train machine learning models without sharing sensitive patient data. This systematic review aims to comprehensively synthesize the ethical dimensions of FL in healthcare, integrating privacy preservation, algorithmic fairness, governance, and equitable access into a unified analytical framework. The application of FL in healthcare between January 2020 and December 2024 is examined, with a focus on ethical issues such as algorithmic fairness, privacy preservation, governance, and equitable access. Methods: Following PRISMA guidelines, six databases (PubMed, IEEE Xplore, Web of Science, Scopus, ACM Digital Library, and arXiv) were searched. The PROSPERO registration is CRD420251274110. Studies were selected if they described FL implementations in healthcare settings and explicitly discussed ethical considerations. Key data extracted included FL architectures, privacy-preserving mechanisms, such as differential privacy, secure multiparty computation, and encryption, as well as fairness metrics, governance models, and clinical application domains. Results: Out of 3047 records, 38 met the inclusion criteria. The most popular applications were found in medical imaging and electronic health records, especially in radiology and oncology. Through thematic analysis, four key ethical themes emerged: algorithmic fairness, which addresses differences between clients and attributes; privacy protection through formal guarantees and cryptographic techniques; governance models, which emphasize accountability, transparency, and stakeholder engagement; and equitable distribution of computing resources for institutions with limited resources. Considerable variation was observed in how fairness and privacy trade-offs were evaluated, and only a few studies reported real-world clinical deployment. Conclusions: FL has significant potential to promote ethical AI in healthcare, but advancement will require the development of common fairness standards, workable governance plans, and systems to guarantee fair benefit sharing. Future studies should develop standardized fairness metrics, implement multi-stakeholder governance frameworks, and prioritize real-world clinical validation beyond proof-of-concept implementations.

1. Introduction

The fast growth of digital health technologies has generated large amounts of medical data, creating both new opportunities for improved patient care and ethical challenges related to privacy, consent, and equitable access [1,2]. Machine learning (ML) and artificial intelligence (AI) algorithms have demonstrated strong potential in drug discovery, disease diagnosis, treatment recommendations, and costomize medicine [3,4]. The old centralized approach to ML, which gathers data from multiple sources into a single repository, faces significant obstacles in healthcare [5]. Strict privacy regulations, like GDPR in the European Union and HIPAA in the United States [6,7], institutional data sharing restrictions, and growing concerns about patient privacy [8,9,10] are key barriers to this approach.
Applying AI in healthcare raises fundamental ethical principles of accountability and equity, as algorithmic bias poses operational risks that may worsen existing disparities and undermine equitable care outcomes. ML models can unintentionally learn and reproduce systematic biases present in healthcare data through mechanisms such as unrepresentative training samples, historically biased clinical decisions encoded in labels, and differential data quality across demographic groups [11,12,13]. When biased models are deployed clinically, the principle of equitable care is threatened because these systems may produce differing diagnostic accuracy or treatment recommendations for different demographic groups [14,15]. Recent scholarship emphasizes that responsible medical AI implementation must extend beyond technical safeguards to include robust institutional governance, professional integrity, and the preservation of humanistic values [16].
By enabling collaborative model training across different institutions while keeping sensitive data localized, FL has emerged as a promising technique that partially addresses several of these concerns, though it does not eliminate all ethical challenges [17,18]. FL enables participating nodes to train models on their local datasets and share only model parameters or gradients with central aggregation server, which subsequently generates an updated global model that is distributed back among all participants [19,20]. Changing from data centralization to model propagation, this architecture radically changes the data governance structure, improving privacy protection while still allowing the creation of reliable, a widely applicable AI models trained on variety of datasets [21].
Although interest in federated learning for healthcare is rising, the literature is still split among technical, clinical, and ethical strands [22]. Many reviews focus on methods [18,23], privacy techniques [24,25,26], or particular clinical uses [27,28], but comprehensive, systematic analyses of the ethical issues are scarce. This review fills that gap by offering an integrated assessment of how federated learning confronts ethical challenges in healthcare AI, which are privacy preservation, fairness and bias mitigation, governance, transparency and explainability, and equitable access to AI technologies.

1.1. Comparison with Existing Reviews

Table 1 presents a detailed comparison between this systematic review and other recent reviews on FL in healthcare. Comparator reviews were selected based on (1) publication within the last three years, (2) focus on FL in healthcare settings, and (3) citation frequency in the field. This comparison highlights the unique contributions and focus areas of this work.

1.2. Contribution of This Review

This review condenses key ethical challenges and implementation-ready solutions for FL in healthcare, distinguishing actionable technical approaches from broader conceptual recommendations.
  • We compare privacy methods (differential privacy, secure multi-party computation, homomorphic encryption, and hybrids) using three evaluation criteria: strength of privacy guarantees, computational overhead, and impact on model utility.
  • Assess fairness methods across clients, demographic groups, and multi-objective settings, and critique commonly used metrics. We note that fairness metric definitions varied across studies, which we address as a limitation.
  • Then, synthesize governance models, highlighting procedural, relational, and structural mechanisms for ethical FL deployment.
  • Analyze strategies for equitable participation when institutions differ in compute, data quality, and expertise.
  • HIghlight research gaps, absent standards, scarce clinical validation, limited patient based work, and tensions between fairness and privacy.

1.3. Research Objectives

This systematic review has four co-primary analytical objectives, treating privacy, fairness, governance, and equity as equally important ethical dimensions:
1.
To synthesize privacy preservation techniques used in healthcare FL and assess their computational feasibility and trade-offs with model performance [9,24].
2.
To analyze fairness and bias mitigation strategies across multiple levels (client-level, multi-dimensional, and attribute-level) and evaluate their impact on model equity and overall performance [11,30].
3.
To examine governance frameworks and mechanisms that guide ethical implementation of FL in healthcare settings [29,31].
4.
To assess strategies for equitable access and resource distribution in FL, particularly for institutions with limited computational capacity [19,32].

2. Methods

Systematic Reviews and Meta Analyses (PRISMA) 2020 standards [33] were followed in this systematic review to guarantee methodological, transparency, and reproducibility which PROSPERO registration is CRD420251274110. With clear focus on ethical issues like privacy, fairness, and governance, the review methodology was created to fully discover, assess, and synthesize evidence on FL applications in healthcare [17,29].

2.1. Search Strategy

A thorough literature search was conducted across several electronic databases [34]. PubMed/MEDLINE, IEEE Xplore Digital Library, Web of Science, Scopus, ACM Digital Library, SpringerLink and Science Direct, given the rapidly evolving nature of this field, arXiv was also searched to ensure comprehensive coverage; however, only preprints that had subsequently been published in peer-reviewed venues were included in the final analysis [6,7].
The search strategy used a combination of controlled vocabulary terms and keywords related to FL, healthcare applications, and ethical considerations [17]. The primary search string used Boolean operators to combine concepts: (“federated learning” OR “federated machine learning” OR “distributed learning” OR “collaborative learning” OR “decentralized learning”) AND (“healthcare” OR “medical” OR “clinical” OR “health” OR “biomedical” OR “hospital”) AND (“ethics” OR “fairness” OR “bias” OR “privacy” OR “equity” OR “governance” OR “transparency” OR “accountability”). The search was adapted for each database to account for differences in indexing systems and search syntax. All searches were conducted between September and October 2025. The complete search strategies for each database, including database-specific syntax and filters, are provided in Supplementary Table S1.

2.2. Eligibility Criteria

Studies were included in this review if they met the following criteria [33]:
Inclusion Criteria:
  • Peer-reviewed journal articles or conference proceedings published between January 2020 and December 2024.
  • Studies explicitly addressing FL methodologies or frameworks in healthcare or medical contexts.
  • Research that incorporates or discusses ethical considerations not limited to privacy preservation, fairness, bias mitigation, governance, transparency, or equity.
  • Original research articles, which are empirical studies, methodological developments, framework proposals, or systematic evaluations.
  • Publications available in English language.
  • Studies employing real-world healthcare data, synthetic medical datasets, or simulated federated healthcare scenarios.
Exclusion Criteria:
  • Duplicate publications or multiple reports of the same study.
  • Review articles, meta-analyses, opinion pieces, editorials, or commentaries (these were examined for reference mining but not included in primary analysis).
  • Conference abstracts without full papers.
  • Studies focusing solely on technical FL algorithms without healthcare application or ethical consideration.
  • Research not available in English.
  • Studies focusing exclusively on non-medical applications of FL.
  • Book chapters, dissertations, and gray literature.
  • Studies published before 2020 to ensure contemporary relevance.

2.3. Study Selection Process

The study selection process followed a systematic multi-stage approach [33]. Initial database searches yielded 3047 potentially relevant articles. Following removal of 612 duplicates using reference management software, 2435 unique articles remained for screening. Title and abstract screening was performed independently by two reviewers, with disagreements resolved through discussion or consultation with a third reviewer [17]. This process excluded 2167 articles that clearly did not meet inclusion criteria, leaving 302 articles for full-text review.
Full-text articles were assessed for eligibility against the predetermined inclusion and exclusion criteria [33]. During this stage, 230 articles were excluded for the following reasons: 87 had insufficient methodological detail, 74 had no clinical validation, 25 were not available in English, and 44 were published before 2020. This process resulted in 38 studies meeting all inclusion criteria for detailed data extraction and quality assessment.

2.4. Data Extraction

A standardized data extraction form was developed and piloted on five randomly selected included studies before full implementation [33]. Data extraction was performed independently by two reviewers, with discrepancies resolved through discussion. Extracted data elements included:
Study Characteristics: First author, publication year, country of origin, study design, healthcare domain, clinical specialty, sample size, and number of participating institutions [17].
Federated Learning Architecture: FL topology (centralized, decentralized, or hierarchical), distributed learning [35], aggregation algorithm, number of communication rounds, data partitioning approach (horizontal, vertical, or transfer learning), and handling of data heterogeneity [20,36].
Ethical Dimensions: Privacy-preservation techniques employed, fairness metrics and objectives, governance mechanisms, transparency and explainability methods, and stakeholder involvement strategies [24,29,30].
Privacy Techniques: Type of privacy-preservation method (differential privacy, homomorphic encryption, secure multi-party computation, etc.), privacy parameters, computational overhead, and privacy–utility trade-offs [9,24,37].
Fairness Approaches: Level of fairness addressed (client-level, attributelevel, multi-dimensional), fairness metrics used, bias mitigation techniques, and fairness-accuracy trade-offs [11,12].
Clinical Application: Medical domain (radiology, oncology, cardiology, etc.), task type (diagnosis, prognosis, treatment recommendation), data modalities (imaging, EHR, genomics, etc.), and reported clinical outcomes [38,39,40].
Performance Metrics: Model accuracy, area under the curve, F1-score, sensitivity, specificity, in fairness metrics like demographic parity, equalized odds, etc., privacy metrics like epsilon values for differential privacy, and computational efficiency measures [30,37].

2.5. Quality Assessment

Due to the heterogeneity of included study designs (methodological developments, empirical evaluations, framework proposals, and case studies), no single standardized quality assessment tool was applicable. Instead, studies were classified as high, moderate, or low quality based on author-defined criteria assessing (1) clarity of study objectives, (2) appropriateness of FL methodology, (3) depth of ethical considerations, (4) transparency in reporting results, and (5) presence of validation. Quality assessment was performed independently by two reviewers, with disagreements resolved through consensus discussion.

3. Results

3.1. Study Selection and Characteristics

The PRISMA flow diagram in Figure 1 shows the systematic study selection process. The initial database search identified 3047 potentially relevant articles. After removing 612 duplicates, 2435 unique articles remained for title and abstract screening, resulting in the exclusion of 2167 articles. Full text assessment of 268 articles led to the final inclusion of 38 studies Full text assessment of 302 articles led to the final inclusion of 38 studies that met all eligibility criteria and explicitly addressed ethical considerations in FL for healthcare.
Table 2 shows the key characteristics of the 38 included studies. The studies were published between 2020 and 2024, with a notable increase in publications after 2022, reflecting growing interest in ethical considerations of healthcare FL. Geographically, studies were classified by corresponding author affiliation, with the United States contributing 12 studies, making up 31.6%, China contributing 9, which is 23.7%. These two countries represent the largest contributors, followed by multi-national collaborations, which contributed 8, making up 21.1%, European countries, which contributed 6, making up 15.8%, and other regions contributing 3, which is 7.9%.
The included studies used various research designs, with methodological development studies used in 18 studies, representing 47.4%, followed by empirical evaluation, which were used in studies 12, which is 31.6%, framework proposals used in 6, 15.8%, and case studies used in 2, which is 5.3%. Medical imaging emerged as the predominant healthcare domain, and was used in 16 cases, which is 42.1%, followed by EHR, which was used in 11, 28.9%, with smaller representations from genomics and wearable device applications. Among clinical specialties, radiology was used in 9, 23.7%, and in 7, 18.4%. These were the most frequently addressed, reflecting the maturity of AI applications in these fields.

3.2. Privacy-Preservation Techniques in Healthcare FL

Privacy keeping appeared as central ethical consideration across all 38 included studies, though implementation approaches varied substantially. Table 3 summarizes the privacy preservation techniques used in included studies, their computational resources, and reported effectiveness.
Homomorphic encryption, which enables computations on encrypted data, was employed in 12 studies, representing 31.6%. While it offers strong theoretical privacy guarantees, it incurs significant computational overhead, with several studies reporting substantially increased training times compared to plaintext training.
A notable finding was the prevalence of hybrid privacy-preservation methods, observed in 18 studies, which make up 47.4%, which combine multiple techniques to balance privacy, computational efficiency, and model performance. These hybrid methods outperformed single technique implementations, providing robust privacy protection with minimal impact on model accuracy.

3.3. Fairness and Bias Mitigation in Healthcare FL

Fairness emerged as a critical ethical consideration, with 32 of 38 studies (84%) explicitly addressing algorithmic fairness or bias mitigation. Studies were considered to ’explicitly address fairness’ if they included fairness metrics, bias mitigation techniques, or substantive discussion of equity implications. The analysis revealed four distinct levels at which fairness was considered: client-level fairness, attribute-level fairness, multi-dimensional fairness (balancing multiple fairness objectives simultaneously), and intersectional fairness (addressing overlapping demographic vulnerabilities).
Table 4 summarizes key fairness levels explored in healthcare FL, along with corresponding metrics and mitigation strategies. The reported results indicate varying effectiveness across client, attribute, multi-dimensional, and intersectional fairness approaches.
Half of the studies (19 out of 38, 50.0%) tackled client-level fairness, grounded in the ethical principle that all participating institutions regardless of size or resources deserve equitably performing models for their patient populations, regardless of differences in their data distributions, quality, or volume. These methods included fair aggregation methods that weighted each institution’s contribution based on how well the model performed, rather than simply how much data they provided. The results showed substantial reductions in performance variations compared to standard federated averaging methods, with minimal sacrifice to overall model accuracy.
The most common focus area appeared in 22 studies are 57.9%, centering on reducing disparities in how models performed across different patient groups defined by characteristics like race, age, gender, and economic background. Researchers frequently measured fairness using demographic parity and equalized odds, with many studies tracking multiple fairness measures at once. The most popular technique for addressing these disparities was adversarial debiasing, which uses competing neural networks to prevent the model from making predictions based on sensitive attributes while maintaining its ability to accurately predict clinical outcomes. These methods reduced disparities across different demographic groups, though this sometimes came at the cost of modest reductions in overall accuracy.
Eleven studies explicitly recognized that healthcare machine learning systems must address fairness across multiple dimensions at once. The Federated Learning with Unified Fairness Objective framework marked notable progress by applying distributionally robust optimization to maintain consistent performance across all patient subgroups while sustaining acceptable overall accuracy. Evaluation results showed that these methods achieved more balanced improvements and retained overall model performance better than approaches targeting a single fairness objective.
One important limitation stood out, which was that only four studies considered intersectional fairness in terms of how multiple demographic characteristics combine to create distinct patterns of vulnerability. Since health disparities often hit hardest at intersection of multiple marginalized identities, this gap points to a important direction for future research.

3.4. Governance Frameworks for Ethical Healthcare FL

The analysis showed emerging but still underdeveloped governance frameworks for healthcare FL. Of the 38 included studies, 16 were be 42.1%, explicitly discussed governance mechanisms, with substantial heterogeneity in components and depth of governance considerations shown in Figure 2.
An overview of governance frameworks in healthcare FL is summarized in Table 5, outlining critical practices such as privacy protocols, ethical oversight, stakeholder participation, and audit-based accountability structures.
Procedural mechanisms, addressing how FL should be conducted, were most extensively developed. Data privacy protocols were discussed in 12 of the 16 governance-focused studies, typically incorporating principles of data minimization, purpose limitation, and secure communication. Nine governance-focused studies mentioned ethical review processes, though detailed implementation guidance was limited. Consent procedures for FL presented unique challenges, as traditional informed consent frameworks designed for centralized data collection may not adequately address the distributed nature and ongoing learning characteristics of FL systems.
Stakeholder engagement received moderate attention. Nine studies highlighted the need for interdisciplinary collaboration between ethicists, clinicians, data scientists, and patient representatives. Few offered concrete methods for stakeholder engagement across the FL lifecycle, such as documented consultation processes, patient advisory involvement, or feedback mechanisms. While capacity building and institutional support were identified as essential specially for resource-limited institutions detailed frameworks to address those capacity gaps were largely absent.
Defining organizational roles and oversight structures, were least developed. While seven studies (43.8% of governance-focused studies) mentioned the need for oversight bodies. Health consumer representation, emphasized as essential for patient-centered AI development, was explicitly addressed in only two studies (12.5% of governance-focused studies), representing a significant gap given the patient-facing nature of healthcare AI applications.
This lack of clear role definition creates significant accountability gaps, as it remains unclear how responsibility should be assigned when federated models cause patient harm whether to data providers, model developers, aggregation operators, or deploying institutions.

3.5. Comparative Analysis of FL Approaches

Table 6 presents a comparative analysis of different FL methods used in the included studies, evaluating their characteristics, advantages, challenges, and suitability for ethical healthcare applications.
Studies that were characterized by collaboration among a relatively small number of institutional participants emerged as the dominant and most ethically suitable approach for healthcare applications. This architecture aligns well with existing healthcare data governance structures, institutional privacy requirements, and regulatory frameworks. A total of 20 studies, which is 52.6%, used cross-silo FL, demonstrating practical feasibility in real-world healthcare settings.
Centralized FL with a trusted aggregation server was adopted in 25 studies (65.8%), providing an efficient and straightforward setup. This approach assumes the central server will not collude with adversaries or attempt to infer private information from received updates an assumption that may require institutional safeguards in healthcare settings. To mitigate these issues, several studies implemented secure aggregation protocols and encrypted communication, showing that centralized architectures can still deliver strong privacy protection when properly designed.
Decentralized designs that remove a central server were investigated in six studies, and are intended to increase trust and avoid single points of failure. However, their practical use in healthcare is limited by more complex coordination and consensus requirements and higher communication overhead barriers in contexts that require clear institutional oversight and accountability.
Hierarchical FL architectures, employing multiple levels of aggregation, were proposed in four studies, or 10.5%, as scalable solutions for large scale healthcare networks. These methods offered advantages for regional or national health systems.
Notably, none of the decentralized FL studies provided explicit governance mechanisms for handling disputes or coordinating model updates, representing a significant gap for ethical implementation.

3.6. Clinical Application Domains and Outcomes

Machine learning methods for genomics and epitranscriptomics, such as m5C-Seq and ORI-Explorer, further motivate privacy-preserving collaborative learning [41,42]. The included studies addressed diverse clinical applications, with varying depending on levels of maturity and real world implementation. Table 7 summarizes the primary clinical application domains, specific tasks, data modalities, and reported outcomes.
With 13 studies representing 35.1% of the most developed domain, medical imaging applications showed that FL models can achieve comparable performance to centralized models while preserving privacy, with most studies reporting only modest accuracy reductions. With discrepancy reductions of 30–50% across demographic categories, fairness interventions in imaging applications demonstrated promise. Nonetheless, there were still issues with addressing differences in label quality and guaranteeing representative training data from a variety of demographics.
Data heterogeneity, missing values, and complex temporal patterns were among the difficulties faced by EHR-based applications [43], which were analyzed in 11 studies (28.9%). On EHR data, privacy-preserving FL maintained reasonable accuracy levels comparable to centralized approaches, though performance varied depending on data heterogeneity and privacy mechanisms employed [44]. However, models often revealed significant performance disparities across racial, ethnic, and socioeconomic groups, raising ongoing concerns about fairness.
Since cancer care usually requires cooperation between specialist centers, each with distinct patient demographics and treatment methods, oncology applications (six studies; 15.8%) showed great promise for FL. While maintaining institutional autonomy and patient privacy, FL made it possible to establish multi-institutional models. Federated models for cancer diagnosis and therapy recommendation retained clinical utility comparable to centralized approaches while allowing validation across a variety of populations, according to several studies.
Six studies (15.8%) documented COVID-19 response applications, predominantly using retrospective data with limited real-time deployment, which showed the potential and limitations of FL in public health emergencies. Federated learning raised ethical questions about informed consent in emergency situations, equitable resource distribution, and guaranteeing model fairness when training data reflected disparate pandemic impacts across various communities, but it also made it possible for quick multi-institution collaboration to develop diagnostic and prognostic models without centralizing patient data.

3.7. Methodological Quality Assessment

The included studies’ methodological rigor and reporting transparency varied significantly, according to the quality evaluation. Twelve of the 38 studies, or 31.6% of the total, were deemed good quality due to their explicit study aims, suitable FL methodology, thorough ethical considerations, open reporting of privacy and fairness metrics, and thorough validation. Twenty studies, or roughly 52.6% of the total, were categorized as intermediate quality; they met most but not all quality standards, usually with limitations in validation techniques or the completeness of the ethical framework. Six studies, or around 15.8% of the total, were deemed to be of inferior quality and had serious flaws in their methodology, ethical considerations, or reporting openness.
Clear explanations of FL architectures, suitable aggregation algorithm selection, and methodical model performance evaluation were common methodological strengths. Inadequate discussion of generalizability to various healthcare contexts, limited long-term evaluation of fairness and privacy properties, limited real-world validation (with only five studies, or roughly 13.2%), and actual deployment in clinical settings were all commonly noted weaknesses.

4. Discussion

This systematic review provides the current state of FL applications that address ethical considerations including privacy, fairness, governance, and equity in healthcare settings, revealing both substantial progress and critical gaps. The synthesis of 38 studies demonstrates that FL offers technically viable and ethically potential approach for collaborative healthcare AI development, addressing fundamental challenges of privacy preservation, algorithmic fairness, and distributed governance. The translation from proof of concept research to real world clinical implementation remains limited, and significant ethical challenges require continued attention. To address the heterogeneity of included studies, we conducted stratified descriptive analyses by privacy technique in Table 3, fairness approach in Table 4, FL architecture in Table 6, and clinical application domain in Table 7, enabling identification of patterns across different methodological categories.

4.1. Privacy-Preservation: Balancing Theoretical Guarantees with Practical Utility

Healthcare AI is starting to incorporate privacy-preserving strategies like safe aggregation and differential privacy, signaling a change from treating privacy as an afterthought to including it into model design [45,46]. Moderate epsilon levels, such as 1.0–5.0, typically maintain adequate accuracy, although striking a balance between privacy and model utility continues to be a major difficulty, despite the fact that privacy requirements may vary across clinical contexts and patient populations, the included studies rarely incorporated patient perspectives into privacy budget decisions [47,48].
Although homomorphic encryption provides robust privacy protection, its 10×–100× computational cost makes it unfeasible for large-scale healthcare models. Although they increase implementation complexity, hybrid techniques that include safe aggregation, trusted execution environments, or differential privacy show promise by increasing privacy with little performance cost. Inadequate assessment of privacy concerns unique to FL, such as gradient leakage and model inversion attacks [49,50,51], is a significant unresolved issue.
Privacy budget parameters (e.g., epsilon values for differential privacy) were typically selected by researchers based on prior literature or empirical utility-privacy trade-off experiments, with limited justification for clinical appropriateness.

4.2. Algorithmic Fairness: From Single Metrics to Multi-Dimensional Equity

With 84% of studies specifically addressing fairness, the importance of fairness in healthcare AI is becoming more widely acknowledged. Significant advancements, institutional discrepancies, and attribute levels are all simultaneously marked by the transition from single metric to multi dimensional fairness frameworks. Fair aggregation techniques reduce variation by 40–60% without significantly compromising performance, and client-level fairness guarantees consistent model performance across healthcare facilities serving heterogeneous populations. Adversarial debiasing and other attribute-level techniques reduced demographic inequalities by 30–70%, although they frequently resulted in an accuracy drop of 2–8%. Focusing solely on age, gender, or race may overlook other sources of disparity such as geographic, socioeconomic, or institutional factors that influence healthcare outcomes [52,53].
By jointly optimizing several equity goals using distributionally balancing fairness and utility more successfully than sequential methods, multidimensional fairness frameworks enhanced this research. Practical use is hampered by their complexity [54,55,56]. Strict fairness restrictions reduced performance for majority groups, according to some research, sparking moral discussions about striking a balance between equity and equality. These difficulties show that ethical reasoning and community-based principles must be in line with technical solutions in order for healthcare AI to be fair.

4.3. Governance: From Technical Solutions to Institutional Frameworks

There is a glaring disconnect between institutional preparedness for ethical implementation and technical advancements in FL, as only 42% of research specifically addressed governance. Although techniques for privacy and justice have improved, there are still few frameworks for monitoring, consent, and accountability. Liability is uncertain when models cause harm since traditional informed consent and review procedures frequently ignore FL’s shared duties and continuous model updates [57,58]. Smaller or less-resourced institutions are at a disadvantage since relational issues like stakeholder involvement and capability building have not received as much attention in governance discussions as procedural procedures like data privacy.
Only two studies (5.3%) explicitly included patient representation, although Table 3 indicates that 25% included some form of patient input, and structural governance systems that define responsibilities and decision-making procedures are still in their infancy. Although they frequently function without standardized frameworks, which limits interoperability, emerging healthcare FL consortia offer examples of collaborative governance [59]. Federation opacity the inherent difficulty in auditing model behavior and tracing data contributions across distributed nodes without centralizing sensitive information represents a significant ethical challenge for transparency and accountability.

4.4. Clinical Translation: Bridging the Gap Between Research and Practice

Only 13% of studies reported real world clinical deployment, highlighting a major translational gap: proof-of-concept and simulated federations miss organizational, regulatory, legal, and human-factor barriers that arise in practice [60,61]. Medical imaging has demonstrated strong performance in advanced ultrasound imaging tasks [62,63] and appears closest to readiness federated imaging models often reach 85–95% of centralized performance likely thanks to standardized data and established multi-center collaborations, but scanner heterogeneity, population representativeness, and cross-site validation remain unresolved.
EHR-based FL faces larger hurdles due to heterogeneous systems, variable documentation and coding, and entrenched biases that make fairness hard to achieve [64]. Specialties with strong multicenter networks, like oncology and cardiology, show promise but contend with intellectual property, competition between systems, and transparency concerns. Federated efforts during COVID-19 demonstrated speed and privacy advantages but also exposed risks from compressed governance and fairness shortcuts when urgency overrides deliberative oversight.

4.5. Limitations

This systematic review has several limitations that should be considered when interpreting findings that should be considered when interpreting findings. Geographic bias may have been introduced by the restriction to English-language publications, which may have left out pertinent studies written in other languages. Quantitative meta-analysis was not possible due to the diversity in FL techniques and ethical frameworks among studies, which limited our capacity to draw firm comparative conclusions regarding the efficacy of various approaches.
The emphasis on ethical issues inevitably draws attention to normative issues that technical research may subtly address using various terminologies, which could have an impact on study inclusion choices. Conclusions from earlier studies may not accurately reflect contemporary ethical requirements or best practices due to the rapid changes in privacy rules, fairness standards, and governance expectations.

4.6. Implications for Practice and Policy

Treat privacy as mandatory for embed privacy-preserving techniques and set context-specific privacy budgets aligned with patient preferences and regulations [65]. Figure 3 summarizes the key challenges identified in this review and presents a solution framework for ethical healthcare FL implementation.
Make a fairness routine for evaluating models across institutions and clinical subpopulations; choose metrics by ethical and clinical priorities and manage trade-offs explicitly.
Define governance up front which sets clear roles, decision processes, accountability, and include patient and stakeholder representation; build an institutional capacity for equitable participation.
Update regulation sensibly for clarify data permissions, liability, and approval paths for FL while avoiding unnecessary barriers; encourage regulatory harmonization for cross-border collaborations.
Run real-world pilots for prioritize evaluation of performance, privacy, fairness, governance, and user experience, and share results to speed up broader, safer adoption.

4.7. Future Directions

  • Unified ethical frameworks combining technical, governance and consent standards.
  • Privacy methods tuned for clinical data (efficient HE/SMP, hybrid verification).
  • Intersectional fairness that includes social determinants of health.
  • Patient-centered governance: clear explanations, consent, and ongoing engagement.
  • Resource-efficient FL approaches to enable equitable participation by low-resource and community institutions, measured by computational requirements, technical expertise needed, and infrastructure demands.
  • Real-world validation via pilot/pragmatic trials and longitudinal monitoring.
  • Federated explainability and transparency standards without centralizing data.
  • Standardized benchmarks including reference datasets, evaluation protocols for privacy–utility trade-offs, fairness assessment procedures, and governance compliance checklists.

5. Conclusions

This systematic review, conducted according to PRISMA 2020 and drawn from 3047 initially identified records, found that FL is both technically feasible and ethically promising for collaborative healthcare AI, but important translational gaps remain. Privacy methods such as differential privacy and secure aggregation are becoming more established; hybrid approaches that combine techniques can provide strong protection with only modest losses in utility. Multi-dimensional fairness strategies show promise for reducing disparities, though achieving fairness gains often requires accepting accuracy trade-offs. Real-world clinical deployment remains limited, and governance frameworks are underdeveloped, with insufficient attention to oversight and accountability mechanisms, and patient and community engagement is largely absent. There are no widely adopted ethical standards for healthcare FL, which leads to inconsistent implementation and evaluation. It will be necessary to include privacy and fairness mechanisms into explicit governance structures that specify responsibilities, accountability, and stakeholder participation in order to move FL from research to reliable practice. The central insight from this review is that technical advances in privacy and fairness must be matched by equally rigorous governance frameworks and stakeholder engagement to realize FL’s ethical potential in healthcare. Real-world pilots that assess technological performance in addition to privacy attributes, fairness results, governance efficacy, and user experience should be prioritized. To facilitate cross-institutional cooperation without creating needless obstacles, standardized ethical frameworks and regulatory guidelines are required. In order to reach a consensus on acceptable privacy–utility trade-offs, fairness goals, and equitable benefit sharing, physicians, data scientists, ethicists, legislators, and patient representatives must collaborate on these technical and normative decisions. As healthcare systems increasingly adopt AI technologies, FL stands at a critical juncture: with sustained interdisciplinary collaboration, patient-centered governance, and commitment to rigorous real-world validation, it can fulfill its promise of enabling equitable, privacy-preserving medical AI that serves all patient populations.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/healthcare14030306/s1, Table S1: Database search strategy and retrieval counts.

Author Contributions

Conceptualization, B.A.M. and S.R.A.; methodology, B.A.M. and S.R.A.; formal analysis, B.A.M. and S.R.A.; investigation, B.A.M. and S.R.A.; writing—original draft preparation, B.A.M. and S.R.A.; writing—review and editing, S.W.L.; supervision, S.W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Sungkyunkwan University and BK21 FOUR (Graduate School Innovation), funded by the Ministry of Education, Korea. This research was also supported by the Ministry of Education and Ministry of Science & ICT, Republic of Korea (grant numbers: NRF [2021-R1-I1A2 (059735)], RS [2024-0040 (5650)], RS [2024-0044 (0881)], RS [2019-II19 (0421)], and RS [2025-2544 (3209)].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
FLFederated Learning
AIArtificial Intelligence
MLMachine Learning
DPDifferential Privacy
HEHomomorphic Encryption
SMPCSecure Multi-Party Computation
EHRElectronic Health Records
PRISMAPreferred Reporting Items for Systematic Reviews and Meta-Analyses
PROSPEROInternational Prospective Register of Systematic Reviews
JBIJoanna Briggs Institute
IoTInternet of Things
P2PPeer-to-Peer

References

  1. Topol, E.J. High-performance medicine: The convergence of human and artificial intelligence. Nat. Med. 2019, 25, 44–56. [Google Scholar] [CrossRef]
  2. Wang, Q.; Jiang, Q.; Yang, Y.; Pan, J. The burden of travel for care and its influencing factors in China: An inpatient-based study of travel time. J. Transp. Health 2022, 25, 101353. [Google Scholar] [CrossRef]
  3. Rieke, N.; Hancox, J.; Li, W.; Milletari, F.; Roth, H.R.; Albarqouni, S.; Bakas, S.; Galtier, M.N.; Landman, B.A.; Maier-Hein, K.; et al. The future of digital health with federated learning. NPJ Digit. Med. 2020, 3, 119. [Google Scholar] [CrossRef]
  4. Esteva, A.; Robicquet, A.; Ramsundar, B.; Kuleshov, V.; DePristo, M.; Chou, K.; Cui, C.; Corrado, G.; Thrun, S.; Dean, J. A guide to deep learning in healthcare. Nat. Med. 2019, 25, 24–29. [Google Scholar] [CrossRef] [PubMed]
  5. McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the Artificial Intelligence and Statistics; PMLR: London, UK, 2017; pp. 1273–1282. [Google Scholar]
  6. Voigt, P.; Von dem Bussche, A. The eu General Data Protection Regulation (gdpr), 1st ed.; A Practical Guide; Springer International Publishing: Cham, Switzerland, 2017; Volume 10, pp. 10–5555. [Google Scholar]
  7. Price, W.N.; Cohen, I.G. Privacy in the age of medical big data. Nat. Med. 2019, 25, 37–43. [Google Scholar] [CrossRef] [PubMed]
  8. Abbas, S.R.; Abbas, Z.; Zahir, A.; Lee, S.W. Federated learning in smart healthcare: A comprehensive review on privacy, security, and predictive analytics with IoT integration. Healthcare 2024, 12, 2587. [Google Scholar] [CrossRef]
  9. Brauneck, A.; Schmalhorst, L.; Kazemi Majdabadi, M.M.; Bakhtiari, M.; Völker, U.; Baumbach, J.; Baumbach, L.; Buchholtz, G. Federated machine learning, privacy-enhancing technologies, and data protection laws in medical research: Scoping review. J. Med. Internet Res. 2023, 25, e41588. [Google Scholar] [CrossRef]
  10. Xue, Q.; Xu, D.R.; Cheng, T.C.; Pan, J.; Yip, W. The relationship between hospital ownership, in-hospital mortality, and medical expenses: An analysis of three common conditions in China. Arch. Public Health 2023, 81, 19. [Google Scholar] [CrossRef]
  11. Chen, R.J.; Wang, J.J.; Williamson, D.F.; Chen, T.Y.; Lipkova, J.; Lu, M.Y.; Sahai, S.; Mahmood, F. Algorithmic fairness in artificial intelligence for medicine and healthcare. Nat. Biomed. Eng. 2023, 7, 719–742. [Google Scholar] [CrossRef]
  12. Poulain, R.; Bin Tarek, M.F.; Beheshti, R. Improving fairness in ai models on electronic health records: The case for federated learning methods. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency; Association for Computing Machinery: New York, NY, USA, 2023; pp. 1599–1608. [Google Scholar]
  13. Pfohl, S.R.; Foryciarz, A.; Shah, N.H. An empirical characterization of fair machine learning for clinical risk prediction. J. Biomed. Inform. 2021, 113, 103621. [Google Scholar] [CrossRef]
  14. Rajkomar, A.; Hardt, M.; Howell, M.D.; Corrado, G.; Chin, M.H. Ensuring fairness in machine learning to advance health equity. Ann. Intern. Med. 2018, 169, 866–872. [Google Scholar] [CrossRef] [PubMed]
  15. Challen, R.; Denny, J.; Pitt, M.; Gompels, L.; Edwards, T.; Tsaneva-Atanasova, K. Artificial intelligence, bias and clinical safety. BMJ Qual. Saf. 2019, 28, 231–237. [Google Scholar] [CrossRef] [PubMed]
  16. Sallam, M.; Sallam, M. Ethical aspects of implementing generative artificial intelligence in medical education: A narrative review. Hist Philos Med. 2025, 7, 20. [Google Scholar] [CrossRef]
  17. Teo, Z.L.; Jin, L.; Liu, N.; Li, S.; Miao, D.; Zhang, X.; Ng, W.Y.; Tan, T.F.; Lee, D.M.; Chua, K.J.; et al. Federated machine learning in healthcare: A systematic review on clinical applications and technical architecture. Cell Rep. Med. 2024, 5, 101419. [Google Scholar] [CrossRef] [PubMed]
  18. Zhang, F.; Kreuter, D.; Chen, Y.; Dittmer, S.; Tull, S.; Shadbahr, T.; Schut, M.; Asselbergs, F.; Kar, S.; Sivapalaratnam, S.; et al. Recent methodological advances in federated learning for healthcare. Patterns 2024, 5, 101006. [Google Scholar] [CrossRef]
  19. Li, M.; Xu, P.; Hu, J.; Tang, Z.; Yang, G. From challenges and pitfalls to recommendations and opportunities: Implementing federated learning in healthcare. Med. Image Anal. 2025, 101, 103497. [Google Scholar] [CrossRef]
  20. Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and open problems in federated learning. Found. Trends Mach. Learn. 2021, 14, 1–210. [Google Scholar] [CrossRef]
  21. Xu, J.; Glicksberg, B.S.; Su, C.; Walker, P.; Bian, J.; Wang, F. Federated learning for healthcare informatics. J. Healthc. Inform. Res. 2021, 5, 1–19. [Google Scholar] [CrossRef]
  22. Antunes, R.S.; André da Costa, C.; Küderle, A.; Yari, I.A.; Eskofier, B. Federated learning for healthcare: Systematic review and architecture proposal. ACM Trans. Intell. Syst. Technol. (TIST) 2022, 13, 1–23. [Google Scholar] [CrossRef]
  23. Mothukuri, V.; Parizi, R.M.; Pouriyeh, S.; Huang, Y.; Dehghantanha, A.; Srivastava, G. A survey on security and privacy of federated learning. Future Gener. Comput. Syst. 2021, 115, 619–640. [Google Scholar] [CrossRef]
  24. Pati, S.; Kumar, S.; Varma, A.; Edwards, B.; Lu, C.; Qu, L.; Wang, J.J.; Lakshminarayanan, A.; Wang, S.h.; Sheller, M.J.; et al. Privacy preservation for federated learning in health care. Patterns 2024, 5, 100974. [Google Scholar] [CrossRef] [PubMed]
  25. Gu, X.; Sabrina, F.; Fan, Z.; Sohail, S. A review of privacy enhancement methods for federated learning in healthcare systems. Int. J. Environ. Res. Public Health 2023, 20, 6539. [Google Scholar] [CrossRef] [PubMed]
  26. Jin, J.; Wu, M.; Ouyang, A.; Li, K.; Chen, C. A novel dynamic hill cipher and its applications on medical IoT. IEEE Internet Things J. 2025, 12, 14297–14308. [Google Scholar] [CrossRef]
  27. Dayan, I.; Roth, H.R.; Zhong, A.; Harouni, A.; Gentili, A.; Abidin, A.Z.; Liu, A.; Costa, A.B.; Wood, B.J.; Tsai, C.S.; et al. Federated learning for predicting clinical outcomes in patients with COVID-19. Nat. Med. 2021, 27, 1735–1743. [Google Scholar] [CrossRef]
  28. Sheller, M.J.; Edwards, B.; Reina, G.A.; Martin, J.; Pati, S.; Kotrotsou, A.; Milchenko, M.; Xu, W.; Marcus, D.; Colen, R.R.; et al. Federated learning in medicine: Facilitating multi-institutional collaborations without sharing patient data. Sci. Rep. 2020, 10, 12598. [Google Scholar] [CrossRef]
  29. Eden, R.; Chukwudi, I.; Bain, C.; Barbieri, S.; Callaway, L.; de Jersey, S.; George, Y.; Gorse, A.D.; Lawley, M.; Marendy, P.; et al. A scoping review of the governance of federated learning in healthcare. Npj Digit. Med. 2025, 8, 427. [Google Scholar] [CrossRef]
  30. Zhang, F.; Shuai, Z.; Kuang, K.; Wu, F.; Zhuang, Y.; Xiao, J. Unified fair federated learning for digital healthcare. Patterns 2024, 5, 100907. [Google Scholar] [CrossRef]
  31. Reddy, S.; Allan, S.; Coghlan, S.; Cooper, P. A governance model for the application of AI in health care. J. Am. Med. Inform. Assoc. 2020, 27, 491–497. [Google Scholar] [CrossRef]
  32. Gerke, S.; Minssen, T.; Cohen, G. Ethical and legal challenges of artificial intelligence-driven healthcare. In Artificial Intelligence in Healthcare; Elsevier: Amsterdam, The Netherlands, 2020; pp. 295–336. [Google Scholar]
  33. Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
  34. Rethlefsen, M.L.; Kirtley, S.; Waffenschmidt, S.; Ayala, A.P.; Moher, D.; Page, M.J.; Koffel, J.B. PRISMA-S: An extension to the PRISMA statement for reporting literature searches in systematic reviews. Syst. Rev. 2021, 10, 39. [Google Scholar] [CrossRef]
  35. Xue, B.; Zheng, Q.; Li, Z.; Wang, J.; Mu, C.; Yang, J.; Fan, H.; Feng, X.; Li, X. Perturbation defense ultra high-speed weak target recognition. Eng. Appl. Artif. Intell. 2024, 138, 109420. [Google Scholar] [CrossRef]
  36. Li, T.; Sahu, A.K.; Talwalkar, A.; Smith, V. Federated learning: Challenges, methods, and future directions. IEEE Signal Process. Mag. 2020, 37, 50–60. [Google Scholar] [CrossRef]
  37. Sadilek, A.; Liu, L.; Nguyen, D.; Kamruzzaman, M.; Serghiou, S.; Rader, B.; Ingerman, A.; Mellem, S.; Kairouz, P.; Nsoesie, E.O.; et al. Privacy-first health research with federated learning. NPJ Digit. Med. 2021, 4, 132. [Google Scholar] [CrossRef] [PubMed]
  38. Truhn, D.; Arasteh, S.T.; Saldanha, O.L.; Müller-Franzes, G.; Khader, F.; Quirke, P.; West, N.P.; Gray, R.; Hutchins, G.G.; James, J.A.; et al. Encrypted federated learning for secure decentralized collaboration in cancer image analysis. Med. Image Anal. 2024, 92, 103059. [Google Scholar] [CrossRef]
  39. Brisimi, T.S.; Chen, R.; Mela, T.; Olshevsky, A.; Paschalidis, I.C.; Shi, W. Federated learning of predictive models from federated electronic health records. Int. J. Med. Inform. 2018, 112, 59–67. [Google Scholar] [CrossRef]
  40. Xue, B.; Zheng, Q.; Li, Z.; Wang, J.; Mu, C.; Yang, J.; Feng, X.; Fan, H.; Li, X. ISAR Weak Feature Enhancement With Perturbation Defense Using Hybrid Clustering Oversegmentation. IEEE Trans. Aerosp. Electron. Syst. 2024, 60, 6256–6274. [Google Scholar] [CrossRef]
  41. Abbas, Z.; Rehman, M.U.; Tayara, H.; Lee, S.W.; Chong, K.T. m5C-Seq: Machine learning-enhanced profiling of RNA 5-methylcytosine modifications. Comput. Biol. Med. 2024, 182, 109087. [Google Scholar] [CrossRef]
  42. Abbas, Z.; Rehman, M.U.; Tayara, H.; Chong, K.T. ORI-Explorer: A unified cell-specific tool for origin of replication sites prediction by feature fusion. Bioinformatics 2023, 39, btad664. [Google Scholar] [CrossRef]
  43. Li, J.; Li, J.; Wang, C.; Verbeek, F.J.; Schultz, T.; Liu, H. Outlier detection using iterative adaptive mini-minimum spanning tree generation with applications on medical data. Front. Physiol. 2023, 14, 1233341. [Google Scholar] [CrossRef]
  44. Xu, G.; Fan, X.; Xu, S.; Cao, Y.; Chen, X.B.; Shang, T.; Yu, S. Anonymity-enhanced Sequential Multi-signer Ring Signature for Secure Medical Data Sharing in IoMT. IEEE Trans. Inf. Forensics Secur. 2025, 20, 5647–5662. [Google Scholar] [CrossRef]
  45. Abbas, S.R.; Seol, H.; Abbas, Z.; Lee, S.W. Exploring the role of artificial intelligence in smart healthcare: A capability and function-oriented review. Healthcare 2025, 13, 1642. [Google Scholar] [CrossRef] [PubMed]
  46. Zhang, X.; Zhou, L.; Wang, S.; Fan, C.; Huang, D. Facilitating patient adoption of online medical advice through team-based online consultation. J. Theor. Appl. Electron. Commer. Res. 2025, 20, 231. [Google Scholar] [CrossRef]
  47. Zhao, Z.; Li, X.; Luan, B.; Jiang, W.; Gao, W.; Neelakandan, S. Secure internet of things (IoT) using a novel brooks Iyengar quantum byzantine agreement-centered blockchain networking (BIQBA-BCN) model in smart healthcare. Inf. Sci. 2023, 629, 440–455. [Google Scholar] [CrossRef]
  48. Xu, W.; Deng, J.; Yu, J.; Mao, S.; Li, Y.; Peng, Z.; Xiao, B. Blockchain-based Verifiable Decentralized Identity for Intelligent Flexible Manufacturing. IEEE Internet Things J. 2025, 12, 32366–32378. [Google Scholar] [CrossRef]
  49. Ma, Z.; Ma, J.; Miao, Y.; Li, Y.; Deng, R.H. ShieldFL: Mitigating model poisoning attacks in privacy-preserving federated learning. IEEE Trans. Inf. Forensics Secur. 2022, 17, 1639–1654. [Google Scholar] [CrossRef]
  50. Geiping, J.; Bauermeister, H.; Dröge, H.; Moeller, M. Inverting gradients-how easy is it to break privacy in federated learning? Adv. Neural Inf. Process. Syst. 2020, 33, 16937–16947. [Google Scholar]
  51. Yin, H.; Mallya, A.; Vahdat, A.; Alvarez, J.M.; Kautz, J.; Molchanov, P. See through gradients: Image batch recovery via gradinversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2021; pp. 16337–16346. [Google Scholar]
  52. Marmot, M. Health equity in England: The Marmot review 10 years on. BMJ 2020, 368, m693. [Google Scholar] [CrossRef]
  53. Braveman, P.A.; Arkin, E.; Proctor, D.; Kauh, T.; Holm, N. Systemic And Structural Racism: Definitions, Examples, Health Damages, And Approaches To Dismantling: Study examines definitions, examples, health damages, and dismantling systemic and structural racism. Health Aff. 2022, 41, 171–178. [Google Scholar] [CrossRef]
  54. Crenshaw, K. Intersectionality and identity politics: Learning from violence against women of color. In Reconstructing Political Theory: Feminist Perspectives; Pennsylvania State University Press: University Park, PA, USA, 1997; pp. 178–193. [Google Scholar]
  55. Green, B. Escaping the impossibility of fairness: From formal to substantive algorithmic fairness. Philos. Technol. 2022, 35, 90. [Google Scholar] [CrossRef]
  56. Bauer, G.R. Incorporating intersectionality theory into population health research methodology: Challenges and the potential to advance health equity. Soc. Sci. Med. 2014, 110, 10–17. [Google Scholar] [CrossRef]
  57. Vayena, E.; Blasimme, A. Health research with big data: Time for systemic oversight. J. Law, Med. Ethics 2018, 46, 119–129. [Google Scholar] [CrossRef]
  58. Vayena, E.; Blasimme, A.; Cohen, I.G. Machine learning in medicine: Addressing ethical challenges. PLoS Med. 2018, 15, e1002689. [Google Scholar] [CrossRef]
  59. Gujar, P. Data standardization and interoperability. In Data Usability in the Enterprise: How Usability Leads to Optimal Digital Experiences; Springer: Berlin/Heidelberg, Germany, 2025; pp. 89–110. [Google Scholar]
  60. Li, G.; Wu, X.; Ma, X. Artificial intelligence in radiotherapy. In Proceedings of the Seminars in Cancer Biology; Elsevier: Amsterdam, The Netherlands, 2022; Volume 86, pp. 160–171. [Google Scholar]
  61. Kelly, C.J.; Karthikesalingam, A.; Suleyman, M.; Corrado, G.; King, D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 2019, 17, 195. [Google Scholar] [CrossRef]
  62. Luan, S.; Yu, X.; Lei, S.; Ma, C.; Wang, X.; Xue, X.; Ding, Y.; Ma, T.; Zhu, B. Deep learning for fast super-resolution ultrasound microvessel imaging. Phys. Med. Biol. 2023, 68, 245023. [Google Scholar] [CrossRef]
  63. Yu, X.; Luan, S.; Lei, S.; Huang, J.; Liu, Z.; Xue, X.; Ma, T.; Ding, Y.; Zhu, B. Deep learning for fast denoising filtering in ultrasound localization microscopy. Phys. Med. Biol. 2023, 68, 205002. [Google Scholar] [CrossRef]
  64. Nasajpour, M.; Pouriyeh, S.; Parizi, R.M.; Han, M.; Mosaiyebzadeh, F.; Liu, L.; Xie, Y.; Batista, D.M. Federated Learning in Smart Healthcare: A Survey of Applications, Challenges, and Future Directions. Electronics 2025, 14, 1750. [Google Scholar] [CrossRef]
  65. Hu, F.; Yang, H.; Wei, S.; Hu, H.; Chen, Y.; Zhou, H. Spatial networks of China’s specialized, refined, distinctive, and innovative medical device firms based on parent–subsidiary contacts: Implications for regional health policy. Front. Public Health 2025, 13, 1676189. [Google Scholar] [CrossRef]
Figure 1. PRISMA 2020 flow diagram of study selection process.
Figure 1. PRISMA 2020 flow diagram of study selection process.
Healthcare 14 00306 g001
Figure 2. Distribution of ethical dimensions and study characteristics across all 38 included studies.
Figure 2. Distribution of ethical dimensions and study characteristics across all 38 included studies.
Healthcare 14 00306 g002
Figure 3. Key challenges and solution framework for ethical healthcare FL.
Figure 3. Key challenges and solution framework for ethical healthcare FL.
Healthcare 14 00306 g003
Table 1. Contribution comparison: this review vs. existing reviews.
Table 1. Contribution comparison: this review vs. existing reviews.
ReviewPrimary FocusEthical EmphasisStudies IncludedKey Limitations
[17]Clinical applications and technical architectureLimited (privacy only)612Minimal fairness/governance focus; broad scope.
[8]IoT integration, privacy, and securityModerate (privacy and security)Not specifiedLimited fairness analysis; IoT-specific.
[19]Challenges and recommendationsModerate (privacy concerns)107Limited governance frameworks; technical focus.
[18]Methodological advancesLimited (privacy techniques)89Minimal ethical framework; methods-focused.
[24]Privacy preservation techniquesHigh (privacy only)Review articleNo fairness/governance; privacy-specific.
[29]Governance mechanismsHigh (governance only)39Limited privacy/fairness integration.
This ReviewComprehensive ethical healthcare FLVery High (all dimensions)38Integrates privacy, fairness, governance, and equity; PRISMA-compliant; multi-dimensional analysis.
Note: Ethical emphasis categories (Limited, Moderate, High, Very High) were author-defined based on the extent to which each review addressed privacy, fairness, governance, and equity dimensions. Ethical Emphasis categories were author-defined: ’Limited’ = addresses one ethical dimension minimally; ’Moderate’ = addresses two dimensions; ’High’ = addresses three dimensions comprehensively; ’Very High’ = comprehensively addresses all four dimensions (privacy, fairness, governance, and equity) with integrated analysis.
Table 2. Characteristics of included studies (n = 38).
Table 2. Characteristics of included studies (n = 38).
CharacteristicNumber of StudiesPercentage (%)
Publication Year
202037.9
2021410.5
2022615.8
20231436.8
20241128.9
Geographic Region
United States1231.6
China923.7
Multi-national821.1
Europe615.8
Other37.9
Study Type
Methodological development1847.4
Empirical evaluation1231.6
Framework proposal615.8
Case study25.3
Healthcare Domain
Medical imaging1642.1
Electronic health records1128.9
Genomics410.5
Wearable devices/IoT410.5
Multi-domain37.9
Clinical Specialty
Radiology923.7
Oncology718.4
Cardiology513.2
Internal medicine410.5
Neurology37.9
Multiple specialties1026.3
Table 3. Privacy-preservation techniques in healthcare federated learning.
Table 3. Privacy-preservation techniques in healthcare federated learning.
TechniqueStudies (n)Privacy GuaranteeComputational OverheadUtility Trade-OffHealthcare Applicability
Differential Privacy (DP)24DPLow to ModerateModerateHigh
Homomorphic Encryption12Information-theoreticHighLowModerate
Secure Multi-party Computation8Information-theoreticVery HighLow to ModerateLow
Secure Aggregation15ComputationalLowVery LowHigh
Trusted Execution Environment4Hardware-basedModerateLowModerate
Blockchain Integration6CryptographicHighModerateModerate
Synthetic Data Generation5SemanticLowModerate to HighHigh
Hybrid Approaches18CombinedVariableVariableHigh
Note: Overhead, utility trade-off, and applicability are summarized qualitatively based on how each study reported runtime/communication cost, performance changes versus non-private baselines, and feasibility in real healthcare settings.
Table 4. Fairness approaches and metrics in healthcare federated learning.
Table 4. Fairness approaches and metrics in healthcare federated learning.
Fairness LevelStudies (n)Primary MetricsMitigation StrategiesReported Effectiveness
Client-Level Fairness19Performance variance, worst-case accuracy, Gini coefficientFair aggregation, client weighting, resource allocationHigh (variance reduction 40–60%)
Attribute-Level Fairness22Demographic parity, equalized odds, equal opportunityAdversarial debiasing, reweighting, fair representation learningModerate to High (disparity reduction 30–70%)
Multi-Dimensional Fairness11Harmonic mean of fairness metrics, Pareto frontier analysisUnified fairness objectives, constrained optimizationModerate (balanced improvement 25–45%)
Intersectional Fairness4Subgroup-specific metricsHierarchical fairness constraintsLimited evidence
Table 5. Governance mechanisms in healthcare federated learning.
Table 5. Governance mechanisms in healthcare federated learning.
Governance ComponentStudies (n)Key Elements
Procedural Mechanisms
Data Privacy Protocols12Data minimization, purpose limitation, access controls, encryption standards
Ethical Review9IRB approval, ethics committee oversight, ethical impact assessment
Consent Procedures5Informed consent frameworks, opt-in/opt-out mechanisms, consent for secondary use
Formal Agreements6Data sharing agreements, participation contracts, liability frameworks
Relational Mechanisms
Stakeholder Involvement9Clinician engagement, patient representatives, ethics practitioners
Capability Building7Training programs, technical support, knowledge sharing
Institutional Support6Resource allocation, infrastructure investment, organizational commitment
Trust Building5Transparency initiatives, communication protocols, dispute resolution
Structural Mechanisms
Oversight Bodies7Federated learning consortia, governance boards, regulatory compliance structures
Role Definition6Data custodian roles, model developer responsibilities, end-user accountability
Health Consumer Representation2Patient advisory boards, community engagement, feedback mechanisms
Audit Mechanisms6Regular audits, performance monitoring, fairness assessments
Note: Study counts reflect the frequency with which each governance component was discussed or proposed in the included studies, not necessarily evidence of real-world implementation.
Table 6. Comparative analysis of federated learning approaches in healthcare.
Table 6. Comparative analysis of federated learning approaches in healthcare.
FL ApproachStudies (n)Key AdvantagesEthical ChallengesHealthcare Suitability
Centralized FL25Simple coordination, efficient aggregation, established protocolsSingle point of failure, server trust requirements, potential privacy risksHigh for institutional networks
Decentralized FL (P2P)6No central authority, enhanced resilience, distributed trustComplex coordination, higher communication costs, consensus challengesModerate for multi-institutional
Hierarchical FL4Scalability, regional aggregation, flexible architectureComplex governance, multi-level fairness considerationsHigh for large-scale deployments
Cross-Silo FL20Institution-level privacy, manageable participants, stable infrastructureData heterogeneity, fairness across institutionsVery High for healthcare
Cross-Device FL4Large-scale deployment, individual privacyDevice heterogeneity, limited healthcare validationModerate for wearables/mobile health
Vertical FL3Feature partitioning, complementary dataComplex privacy preservation, limited healthcare applicationsLow currently
Federated Transfer Learning7Knowledge sharing across domains, limited data requirementsDomain shift challenges, fairness across source/targetModerate for rare diseases
Table 7. Clinical application domains in healthcare federated learning studies.
Table 7. Clinical application domains in healthcare federated learning studies.
DomainStudies (n)Primary TasksData ModalitiesKey Findings
Medical Imaging13Lesion detection, tumor segmentation, disease classificationX-ray, CT, MRI, pathology imagesFL models achieved 85–95% of centralized performance; fairness interventions reduced disparities by 30–50%.
Electronic Health Records11Risk prediction, treatment recommendation, readmission forecastingStructured EHR, clinical notesPrivacy-preserving FL maintained 80–92% accuracy; significant fairness challenges across demographic groups.
Oncology6Cancer detection, treatment response prediction, survival analysisMulti-modal: imaging + genomics + EHRFederated models demonstrated clinical utility; privacy techniques showed minimal impact on performance.
Cardiology5Cardiovascular risk assessment, ECG analysis, heart disease predictionECG signals, cardiac imaging, EHRFL enabled multi-center validation; fairness-aware training reduced gender disparities.
Genomics4Genetic risk scoring, biomarker discovery, pharmacogenomicsGenomic sequences, SNP dataSecure genomic FL feasible but computationally intensive; privacy critical given genetic data sensitivity.
Wearables/IoT4Health monitoring, early warning systems, chronic disease managementSensor data, activity logsDemonstrated feasibility for patient-centric FL; privacy preservation essential.
COVID-19 Response6Diagnosis, prognosis, resource allocationChest imaging, EHR, mobility dataRapid deployment demonstrated FL agility; ethical challenges in crisis contexts identified.
Note: Study counts reflect the specific clinical application tasks included in this table and may differ from counts reported using broader domain groupings elsewhere in the manuscript.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mir, B.A.; Abbas, S.R.; Lee, S.W. Federated Learning in Healthcare Ethics: A Systematic Review of Privacy-Preserving and Equitable Medical AI. Healthcare 2026, 14, 306. https://doi.org/10.3390/healthcare14030306

AMA Style

Mir BA, Abbas SR, Lee SW. Federated Learning in Healthcare Ethics: A Systematic Review of Privacy-Preserving and Equitable Medical AI. Healthcare. 2026; 14(3):306. https://doi.org/10.3390/healthcare14030306

Chicago/Turabian Style

Mir, Bilal Ahmad, Syed Raza Abbas, and Seung Won Lee. 2026. "Federated Learning in Healthcare Ethics: A Systematic Review of Privacy-Preserving and Equitable Medical AI" Healthcare 14, no. 3: 306. https://doi.org/10.3390/healthcare14030306

APA Style

Mir, B. A., Abbas, S. R., & Lee, S. W. (2026). Federated Learning in Healthcare Ethics: A Systematic Review of Privacy-Preserving and Equitable Medical AI. Healthcare, 14(3), 306. https://doi.org/10.3390/healthcare14030306

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop