A Secure and Privacy-Preserving Approach to Healthcare Data Collaboration

Adnan, Amna; Kausar, Firdous; Shoaib, Muhammad; Iqbal, Faiza; Altaf, Ayesha; Asif, Hafiz M.

doi:10.3390/sym17071139

Open AccessArticle

A Secure and Privacy-Preserving Approach to Healthcare Data Collaboration

by

Amna Adnan

¹

,

Firdous Kausar

²

,

Muhammad Shoaib

¹,

Faiza Iqbal

^3,*

,

Ayesha Altaf

^1,*

and

Hafiz M. Asif

^4,*

¹

Department of Computer Science, University of Engineering and Technology, Lahore 54890, Pakistan

²

Department of Computer Science and Data Science, School of Applied Computational Sciences, Meharry Medical College, Nashville, TN 37208, USA

³

Institute of Data Sciences, University of Engineering and Technology, Lahore 54890, Pakistan

⁴

Department of Electrical and Computer Engineering, College of Engineering, Sultan Qaboos University, Muscat 123, Oman

^*

Authors to whom correspondence should be addressed.

Symmetry 2025, 17(7), 1139; https://doi.org/10.3390/sym17071139

Submission received: 8 May 2025 / Revised: 22 June 2025 / Accepted: 26 June 2025 / Published: 16 July 2025

(This article belongs to the Special Issue Exploring Symmetry in Wireless Communication)

Download

Browse Figures

Versions Notes

Abstract

Combining a large collection of patient data and advanced technology, healthcare organizations can excel in medical research and increase the quality of patient care. At the same time, health records present serious privacy and security challenges because they are confidential and can be breached through networks. Even traditional methods with federated learning are used to share data, patient information might still be at risk of interference while updating the model. This paper proposes the Privacy-Preserving Federated Learning with Homomorphic Encryption (PPFLHE) framework, which strongly supports secure cooperation in healthcare and at the same time providing symmetric privacy protection among participating institutions. Everyone in the collaboration used the same EfficientNet-B0 architecture and training conditions and keeping the model symmetrical throughout the network to achieve a balanced learning process and fairness. All the institutions used CKKS encryption symmetrically for their models to keep data concealed and stop any attempts at inference. Our federated learning process uses FedAvg on the server to symmetrically aggregate encrypted model updates and decrease any delays in our server communication. We attained a classification accuracy of 83.19% and 81.27% when using the APTOS 2019 Blindness Detection dataset and MosMedData CT scan dataset, respectively. Such findings confirm that the PPFLHE framework is generalizable among the broad range of medical imaging methods. In this way, patient data are kept secure while encouraging medical research and treatment to move forward, helping healthcare systems cooperate more effectively.

Keywords:

EfficientNet-B0; federated learning; homomorphic encryption

1. Introduction

The healthcare industry generates vast amounts of sensitive patient data, which has immense potential to revolutionize diagnostic decisions, advance medical research, and enable personalized treatment solutions. However, the utilization of these data is hindered by significant privacy and security concerns. Unauthorized data breaches and access to patient information pose severe risks, making healthcare data highly vulnerable. Traditional methods of data sharing and centralized storage exacerbate these vulnerabilities, as they often do not comply with stringent privacy regulations and are susceptible to cyberattacks. Consequently, healthcare institutions face substantial barriers to collaborative data sharing, which is essential to optimize the value of healthcare data while keeping patient trust and confidentiality. In recent years, the digitalization of healthcare data has further amplified concerns about privacy, security, and data authenticity. Health data, including medical histories, diagnoses, and treatment plans, is highly sensitive and a prime target for cybercriminals. Traditional approaches to securing healthcare data, such as centralized storage and management, are increasingly inadequate in the face of sophisticated cyber threats. Moreover, there is a growing reliance on data-driven artificial intelligence (AI) models, which require large datasets for optimal performance and thus inherently contain the possibility of illegal access and data leaks during sharing.

Federated learning (FL) proved to be a viable solution for healthcare collaborative learning that protects privacy. FL enables several healthcare organizations to work together to train machine learning models without directly sharing sensitive patient data. FL allows local model training within each institution in lieu of centralizing data. A central server receives only model updates, like gradients or parameters, for aggregation. The risk of data breaches is significantly reduced by this decentralized approach, which ensures that raw patient data remain within each institution’s secured limits. However, FL is not without limitations. Although it mitigates the risks associated with raw data sharing, providing a model with new parameters throughout the training stage can still expose sensitive information. Malicious actors can potentially exploit these updates to infer private patient data, leading to privacy breaches.

A hospital coalition trying to develop an advanced AI disease predictive tool faces resistance from members who want to prioritize protecting patient confidential information. A combination of homomorphic encryption technology and a federated learning framework allows them to achieve the results shown in Figure 1. Medical records and other patient information reside securely in protected databases, which resemble digital vaults, before hospitals start sharing data. Homomorphic encryption allows data encryption in ciphertext to maintain privacy through secret code protocols. Data safety under homomorphic encryption creates an encrypted vault that protects information while allowing formulas to operate securely on the content within. The hospitals divide their shared information into encrypted codes before storing it remotely in a cloud-based system that prevents the exposure of raw data to others. The cloud system performs number-crunching procedures, such as optimizing the AI model directly on encrypted data while maintaining complete data security for all participants. The final result emerges from the homomorphic decryption of encrypted results, but patient data remain protected throughout the entire process without consequences. With these measures, healthcare institutions can achieve better care outcomes through collaboration while protecting data privacy at all times according to HIPAA and GDPR standards and ensuring patient security.

Additionally, the integration of cryptographic techniques, such as homomorphic encryption (HE), with FL frameworks introduces computational and communication overheads, which can hinder the scalability and efficiency of these systems. HE allows computations to be performed on encrypted data, ensuring end-to-end privacy during model updates and aggregation. However, the computational complexity of HE remains a significant deterrent to its application of deep learning in medical services for a wide range of people. The combination of FL and HE offers a transformative approach to secure data collaboration in healthcare. By leveraging FL’s decentralized model training and HE’s ability to perform computations on encrypted data, healthcare institutions can collaboratively train models while maintaining strict privacy standards. This integrative approach not only protects patient confidentiality but also ensures compliance with legal regulations, such as the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA). Despite these advancements, several challenges remain. The scalability and efficiency of FL systems need to be improved to handle the increasing complexity and interconnectedness of modern healthcare systems. Furthermore, the risk of adversarial attacks, such as model inversion and targeted queries, poses a significant threat to the privacy of trained models. These attacks can exploit vulnerabilities in the model to infer sensitive patient information, even after the training phase. Therefore, there is a pressing need to develop robust privacy-preserving mechanisms that minimize the amount of sensitive information exchanged during training while maintaining model accuracy.

This research aims to design a federated learning framework that preserves privacy and allows the secure collaboration of healthcare data. The primary goal is to reduce the computational overhead associated with cryptographic techniques while ensuring data confidentiality. By addressing these challenges, this study seeks to unlock the full potential of healthcare data for innovation and collaboration, paving the way for advancements in medical research, personalized treatment, and improved patient outcomes. Model fairness and interpretability remain achievable through a symmetrical federated learning setup that requires identical model configurations from clients functioning equally in the learning process. Building on these challenges, the related work section reviews existing efforts to address privacy and security in healthcare data collaboration. The main contributions are as follows.

We give a privacy-preserving federated learning framework employing homomorphic encryption (HE) for healthcare data, PPFLHE, which preserves model privacy and computational efficiency while maintaining data availability and model correctness, which are critical for the healthcare application of FL.

In the local gradient case, to protect against inference attacks, CKKS is used to encrypt model updates on the client side so that a malicious server cannot decode into anything but nonsense and that can be easily interpreted by the client; the same is true in the case of sharing.

In the FedAvg algorithm, on the server side, encrypted updates from three clients are folded, and delays caused by unresponsive participants and communication overhead are reduced. Next, we investigate the efficiency and privacy of PPFLHE using the APTOS 2019 Blindness Detection dataset, achieving 83.19% precision with EfficientNet-B0, maintaining privacy and having low communication cost.

The paper is organized as follows. First, we review related work on FL and privacy techniques (Section 2); then, we outline the system model and methodology (Section 3). After, we analyze the system architecture and security (Section 4), discsuss the experimental results (Section 5), and draw our conclusion (Section 6).

2. Related Work

The fast growth of healthcare data would require stronger protection of privacy and security due to increased AI capabilities and IoT technology development. The work of learned authors bases their privacy-preservation research on FL and blockchain security systems implemented with cryptographic approaches. The research concerning healthcare privacy has been examined here to demonstrate the approaches employed and data utilized along with the acquired results.

Mantey et al. [1] used stochastic modification minimized gradient (SMMG) and Newton distributed estimate (NDE) algorithms to establish privacy-preserving collaborative learning, which tackles gradient leakage problems within federated systems. Bo Wang et al. [2] developed PPFLHE as a new security framework that defends healthcare federated learning using homomorphic encryption. It protects user privacy through side encryption for model updates and authentication access controls. The study incorporated APTOS 2019 Blindness Detection together with the CIFAR-10 datasets for experimental assessment. The system received an ACK functionality for managing inactive clients which enhanced communication operations. The classification method achieves an 81.53% accuracy level that reaches a favorable balance between model performance and data privacy alongside transmission efficiency. As part of this research, the authors integrated cryptographic solutions with foundational system-level practical optimizations to advance healthcare applications of federated learning.

Wang et al. [3] examined IoMT privacy issues specifically focused on recommender systems through their research study. The framework protected user privacy safely at the same time it achieved a high accuracy through innovations in optimization algorithms and homomorphic encryption implementation. Through FL, Hijazi et al. [4] created an FHE system which protects IoT security within cognitive cities by applying the N-BaIoT dataset to reduce latency along with communication overhead. Gu et al. [5] performed a review of privacy enhancement strategies for healthcare FL through differential privacy and blockchain technologies as well as hierarchical schemes, which they applied to diabetic retinopathy and COVID-19 forecasting. Operational problems caused by concerns about model convergence and system performance, along with privacy versus precision conflicts in personal data management, were recorded in the research. Applications of FL to electronic health records required solutions, according to research conducted by Antunes et al. [6], to overcome interoperability and scalability problems. A privacy-preserving data-sharing platform was created by the authors through a combination of blockchain technology integrated with differential privacy standards for handling various healthcare information.

Zhou et al. [7] produced adaptive segmented CKKS homomorphic encryption as a way to boost FL performance when working with federated averaging (FedAvg). Their approach achieved secure and efficient computations when it was implemented on MNIST and CIFAR-10 datasets. The medical professionals at Xu et al. [8] implemented FHE to conduct secure FL operations for breast cancer detection from mammogram images. Complete model encryption required both extended processing power and large memory storage for execution but delivered nearly identical performance to standard techniques against inference attacks effectively. Khalid et al. [9] conducted a review which examined secure multiparty computation techniques and blockchain systems and differential privacy solutions for protecting medical records such as EHRs and genomic data as well as imaging data. The paper examined multiple types of privacy intrusions and defense approaches yet suggested the need for legal and ethical systems to safeguard data privacy without compromising its worth.

Liu et al. [10] developed a privacy-protected FL system for medical IoT skin lesion classification in healthcare operations. The model secured data sharing and model aggregation by implementing HE encryption and then combining it with Shamir’s secret sharing method as well as Diffie–Hellman key exchange [11]. Through its use of HAM10000 dataset data, the solution reached high accuracy results together with better computational speed. Qayyum et al. [12] built an FL framework that performed clustering diagnostics on COVID-19 through X-ray images together with ultrasound data analysis. Through the modification of the VGG16 neural network, higher F1 score results were achieved than what centralized model approaches delivered. The system established queryable data access across various sources to solve heterogeneity problems and computational limits by maintaining privacy protection. Future work should concentrate on building a personalized version of the model along with its scaling process.

Researchers created IoThC through the integration of FL and blockchain concepts, which forms the basis of an extensive IoT healthcare framework as described by Singh et al. [13]. Blockchain technology creates data protection through its unmodifiable characteristics, while FL processes information across multiple users without needing direct system connections [14]. This system combines differential privacy as an encryption method with homomorphic encryption protocols to protect sensitive data from leakage vulnerabilities [15]. The case evaluation showed that the specified framework delivered successful privacy defense alongside stable ownership management in the system design. A collaborative filtering system for healthcare recommendations with privacy protection was created by Kaur et al. [16] through the integration of multiparty random masking with polynomial aggregation and homomorphic encryption. Through their framework, the authors accomplished superior performance levels and better precision results when evaluating simulated healthcare and MovieLens datasets. A framework for mobile healthcare social networks data sharing was developed by Huang et al. [17] by uniting both attribute-based and identity-based broadcast encryption methods. The platform provided a protected information exchange system for smart city healthcare providers by resolving problems with data processing overheads and exclusions within their simulated operational environment.

The research by Yang et al. [18] introduced an integrated security solution to shield patient data during all cloud-based medical data-sharing tasks. The integration of vertical partitioned processing together with hybrid search protocols and anonymization methods let closed-system data management provide data usefulness and protect privacy. The solution tested in experimental trials upon electronic medical records (EMRs) demonstrated its practical viability [19]. The protection of healthcare data improves via research that generates privacy mechanisms capable of maintaining operational performance. FL serves as an AI learning methodology that duplicates the study process from an unknown source to process distributed data systems. The storage of medical documents signed through encryption comes under the unalterable blockchain platform at each processing stage. The proposed solutions handle both training data leak problems and huge data handling needs as well as address system integration requirements and data unification challenges [20,21]. Li et al. [22] explore federated learning (FL) through a detailed investigation as a privacy-focused distributed machine learning architecture in this study. Devices participate in model training through collaborative processes while sensitive user data remain on their devices, according to the authors. The article stands out from numerous dataset-focused research by avoiding dedicated analysis of one specific dataset and performance metrics. The paper performs an extensive analysis of FL’s privacy system architecture through the examination of both FedSGD and FedAvg methods and their identified security weaknesses encompassing model poisoning along with communication overloads and inference attacks. The research puts forward a set of defenses which include FHE homomorphic encryption together with differential privacy followed by secure aggregation and multiparty computation (MPC) to solve these dilemmas. The article functions as a foundational resource that provides technical guidance about building scalable privacy-protected robust federated learning systems for healthcare, IoT, smart cities and financial analytics despite absent accuracy performance evaluation information. Altaf et al. [23] delivers an extensive evaluation of blockchain technology by examining its multi-tier system infrastructure alongside its agreement mechanisms and field implementations together with its main frameworks and protection vulnerabilities. Table 1 shows a comparative survey of previous operations concerning federated learning and homomorphic encryption on medical and simulated data.

The paper discusses how the blockchain configuration supports privacy features because it functions with decentralized protocols that benefit from cryptographic hashes as well as consensus mechanisms, which remove requirements for trusted third parties [27]. The technology guarantees unalterable data integrity, which proves vital for healthcare, together with finance sectors and IoT applications. This paper bases its findings on existing sources without conducting any experimental analysis with a definite dataset because it follows a survey-based approach [28,29]. The research paper collects and evaluates actual cases from multiple business domains, which include healthcare, energy, supply chain, and IoT, to show how blockchain technology improves data security and transparency as well as control over data [24,30]. The paper establishes that blockchain transforms digital trust and privacy, yet scientists must tackle issues regarding scalability, energy consumption, and interoperability because these barriers hinder further development. The research groups accomplished successful outcomes by implementing evaluation through the convergence of medical records with standardized MNIST datasets in real medical settings. This paper describes our method to join FL and HE frameworks to allow secure private healthcare data sharing in the methodology [31,32]. FL stands today as the principal defense mechanism that medical organizations use to handle their various machine learning distribution operations. The study conducted by Xu et al. [33] demonstrated how FL supports collaborative model development from EHRs through a process that prevents the exchange of original medical data. The research used MIMIC-III, eICU, and Cerner Health Facts datasets for analytics before applying support vector machines and logistic regression and multilayer perceptrons, recurrent neural networks, autoencoders and tensor factorization for model development.

Researchers in [34] addressed distribution issues in non-IID data and FL heterogeneity using three main approaches called federated averaging [35], agnostic FL, and federated multi-task learning represented by MOCHA and VIRTUAL. Model compression, federated dropout, and resource-aware client selection methods reduced communication overhead [36]. Pan et al. [37] suggested a privacy-preserving federated learning framework, FedSHE, which is based on segmented CKKS homomorphic encryption. The framework has solved the issue of gradient leakage by optimization of the encryption parameter and the use of the segmented encryption technique when dealing with large model size. An assessment of examples on standard datasets shows that FedSHE yields similar model performance with less computational overhead and low communication overhead. It is more efficient and secure than Paillier and other CKKS-based methods, which theoretically make the implementation of fully homomorphic encryption feasible in realistic federated learning applications. A blockchain architecture developed by Iqbal et al. [38] incorporated verification protocols for IoT devices together with protection elements for reliable data and secure energy deals implemented through smart contracts within net metering infrastructure. The authors used moving average and ARIMA models, which were combined with LSTM for energy consumption prediction throughout their research. Several professionals used data from the smart home dataset hosted on Kaggle for training their models based on appliance records and weather measurements recorded per minute. The energy consumption prediction using LSTM reached its highest accuracy rate [25]. The solid smart contracts within the Ganache environment ensured secure automated peer-to-peer energy transactions by utilizing a private Ethereum blockchain. The study created a private smart home system with advanced functionality, yet it needed official verification together with performance enhancement [39].

Li et al. [40] developed a privacy-protected federated learning system which analyzes multi-site functional MRI (fMRI) data through decentralized operations. Using their decentralized deep learning method, the authors protect medical data by maintaining it at its source location for security reasons. The Autism Brain Imaging Data Exchange (ABIDE I) utilized sites NYU, UM, USM and UCLA for running both their model development and evaluation processes. Performance enhancement and privacy preservation occurs through a research approach which integrates differential privacy methods with domain adaptation methods utilizing Mixture of Experts and Adversarial Domain Alignment. The model succeeded in providing dependable ASD diagnosis through valid biomarker assessment, resulting in 80% reliable patient results across different clinical datasets. The research shows that domain adaptation methods improve model performance during federated operations by protecting patient privacy [41]. Almaiah et al. [42] put together a system which unites supervised machine learning (SML) frameworks with cryptographic encryption and decryption methods (CPBED) for user verification of Internet of Medical Things (IoMT)-based cyber-physical systems (CPSs) and healthcare dataset protection. Before implementing encryption, the system performs a distributed authentication of medical devices, thereby maintaining consistent data while ensuring privacy security. The dataset contained encrypted X-ray images that protected owner information during deep learning model analysis by analysts. Data encryption through this method leads to performance degradation but managed to preserve complete accuracy rates while experiencing a 1% reduction in accuracy points.

In addition to security and privacy, energy consumption is a new issue in the federated edge learning systems, especially in UAV-facilitated MEC systems. Sharma et al. [43] worked on Mobile Edge Computing (MEC) as Unmanned Aerial Vehicles (UAVs) solutions have shown a lot more potential to offer on-demand computing services closer to the user equipment (UE), which in turn, lessens the latency and better enhanced the quality-of-service (QoS). Nevertheless, the energy consumption is a very serious problem because both UAVs and mobile devices have limited battery capacities. This problem is more evident in 5G and beyond 5G (B5G) networks, where a dynamic UE mobility and often task offloading require adaptive and smarter resource management. To deal with this, the recent research discusses energy-efficient optimization techniques in NOMA-based UAV-assisted MEC. It is worth noting that a multi-agent federated reinforcement learning (MAFRL) framework was suggested, where the Markov Decision Processes (MDPs) are used to model the problem, and Multi-Agent Reinforcement Learning (MARL) is used to obtain an optimal energy-aware offloading policy. The findings indicate better energy efficiency than the conventional centralized, single-agent schemes, showing the prospects of integration of federated learning, UAV-MBE-based MEC and NOMA technologies in next-generation networks to support energy/power-efficient edge computing on a large scale.

Pan et al. [44] developed the application of SplitFed Learning (SFL) in MRI images classification of brain tumors and compared it with the other methods of federated and centralized learning. They applied a non-IID partitioned dataset composed of 3264 MRI scans and VGG19 to determine performance in different learning conditions. The experiment concluded that SFL consists of an acceptable trade-off between privacy and performance, as it only transmits intermediate activations rather than the raw parameters of the model, thereby using less communication overhead and providing more data security. Although the performance of centralized learning was the best, SFL provided similar results with enhanced privacy and reduced resource requirements and hence was adequate to handle sensitive medical tasks.

3. Methodology

Building on the foundation of existing research, this section describes the practical steps taken to implement and test the proposed federated learning and homomorphic encryption framework, beginning with data acquisition and transformation.

3.1. Data Acquisition and Transformation

The study used the APTOS 2019 Blindness Detection dataset, which includes 3662 labeled retinal images from the Aravind Eye Hospital in India, which aimed at detecting diabetic retinopathy. The data were separated in a variety of non-IID ways to ensure it represents the real-world diversity within healthcare, giving each client a different set of medical images. Data augmentation techniques, such as random cropping, resizing to 256 × 256 pixels, horizontal flipping, and normalization, were applied to improve model generalization. A custom dataset class handled image loading and preprocessing, ensuring robustness by replacing damaged or missing files with black images of the same dimensions.

MosMedData CT Scans

To assess PPFLHE in another field of medical imaging, we have applied the framework to the MosMedData CT, which is a collection of 1110 CT scans of the chest labeled with respect to COVID-19 severity. Non-IID sets were designated to each client so that they simulated different heterogenous clinical settings. The images were minimized to 256 × 256 pixels, and the representative slices were taken to be classified. Normalization and format conversion were made as part of preprocessing, since they did not conform to the model pipeline.

3.2. Model Development and Privacy Implementation

The goals of this classification task led to the selection of EfficientNet-B0 because it created an efficient yet accurate model. The approach in EfficientNet-B0 in opposite to those of conventional deep learning models which modify depth, width or resolution each on their own, because this framework applies compound scaling methods for the simultaneous modification of all three features. The model reaches high accuracy levels despite needing only small computational power and parameter usage. Numerous essential reasons influenced the selection of EfficientNet-B0. The model shows better performance than ResNet and the MobileNet family while maintaining a compact design. Since EfficientNet-B0 uses ImageNet, its pretraining allowed it to learn general concepts from the extensive dataset, thus enabling knowledge transfer during APTOS 2019 Blindness Detection Dataset processing. The EfficientNet-B0 model includes both Squeeze-and-Excitation (SE) blocks together with depthwise-separable convolutions to improve feature extraction capabilities without adding extra computational expenses. EfficientNet-B0 represents the best option when optimizing fundus image classification using limited resources.

3.3. Model Modification

The base architecture of the diabetic retinopathy classification task is an EfficientNet-B0 model pretrained on ImageNet. To adjust it to the particular issue, the classifier layer is modified by exchanging it with a linear one, which projects onto 5 output classes, which are the severity levels of diabetic retinopathy. The fine-tuning approach aims at the balancing of stability and performance. The first is freezing all feature extraction layers and training only the classifier layer in the first two rounds of federated learning. It is a method that uses the pretrained weights as feature extractors and fine-tunes the classifier to a specific task. Following the second round, with all layers unlocked, one can fine-tune the entire network to optimize the performance further by tuning both the feature extracting and classification parts of the network to the particular dataset.

3.4. Federated Learning and Evaluation

Many clients work separately with their own training datasets, as seen in Figure 2. The symmetric federated learning system enables all clients to operate with identical EfficientNet-B0 architecture alongside matching training parameters while maintaining equivalent participation in global aggregation processes. The design enables uniform client behavior during learning processes while facilitating FedAvg algorithm applications. Clients update their personal models before encrypting the weight changes for transmission to a central hub. Encrypted medical information is safeguarded while downloading updates from several different locations. When all client training ends, the encrypted model weights go to the central server, which runs FedAvg to combine all updates. The combined diabetic retinopathy model becomes more precise at identifying the disease through the sharing of client updates. All participants receive an aggregated updated model that came from this process.

4. Privacy-Preserving Federated Learning Framework with Homomorphic Encryption

This study focuses on integrating homomorphic encryption (HE) with federated learning (FL) technology to let different decentralized health facilities securely share sensitive data from their patients. Table 2 details important values of the configuration and training parameters for our federated model. These parameters were carefully chosen to optimize model performance and computational efficiency while ensuring the confidentiality of the data through encryption.

The described data layout reveals the system’s hardware structure as well as shows how privacy and performance work together in cutting-edge machine learning systems today. During training, the model calculates predictions against actual labels through cross-entropy to discover mismatches between expected results. By using Momentum-SGD, we optimize training because it improves the convergence rate through its parameter momentum acceleration vectors. Training improves when the momentum parameter keeps 90% of the past gradient direction at 0.9 to reduce updates and control oscillations. The model updates weights slowly during training because our learning rate stands at 0.01. After running the model for five local steps, each federated node sends its updated knowledge to the main model for overall training. The team processes training samples in groups of 32 samples to save processing time and produce better model results. Each of the three participating nodes trains its own local dataset independently before sending resulting model updates to the shared model. The setup allows decentralized devices to process private data locally with the updates joining the main model without harming its accuracy. We next explore how the model functions as part of its main elements.

4.1. Privacy Preserving Federated Learning Framework

The Privacy-Preserving Federated Learning with Homomorphic Encryption (PPFLHE) framework enables secure and efficient collaboration among sites through handling their sensitive healthcare data. The system operates with symmetric structure because clients begin with standard initialization procedures and standardized aggregation methods. Homomorphic encryption functions as part of this framework to safeguard model update confidentiality throughout the whole federated learning process.

4.2. Local Gradient Computation and Encryption

At every round, the central server transmits the plaintext global model weights, denoted as

w_{t}

, to all participating clients. The model remains unencrypted on the client side, as decrypting it and simultaneously computing gradients is computationally expensive, particularly for large neural network models.

Each client i uses its local dataset

D_{i}

and the received model

w_{t}

to compute its local gradient

g_{i}

. Since gradients are real-valued, we adopt the CKKS homomorphic encryption scheme, which supports approximate arithmetic over real numbers and enables the encryption of continuous values with controlled precision.

After computing the gradient, the client encrypts it using the public encryption key

p k

:

{\hat{g}}_{i} = {Enc}_{p k} (g_{i})

(1)

For scenarios where gradient quantization is applied, the client may alternatively encrypt the quantized gradient:

{\hat{g}}_{i} = {Enc}_{p k} (g_{i}^{'})

(2)

The encrypted gradient

{\hat{g}}_{i}

is then sent to the central server for secure aggregation.

4.3. Homomorphic Aggregation

Once the server receives the encrypted gradients from each client, it performs aggregation using the additive homomorphic property of the CKKS encryption scheme. Let the encrypted gradients be denoted as shown below:

{\hat{g}}_{i} = {Enc}_{p k} (g_{i}), for i = 1, 2, \dots, N

The server computes the encrypted sum of all gradients:

{\hat{g}}_{sum} = {\hat{g}}_{1} + {\hat{g}}_{2} + \dots + {\hat{g}}_{N} = {Enc}_{p k} (\sum_{i = 1}^{N} g_{i})

(3)

To calculate the average encrypted gradient, the server multiplies the result by

\frac{1}{N}

:

{\hat{g}}_{avg} = \frac{1}{N} \cdot {\hat{g}}_{sum} = {Enc}_{p k} (\frac{1}{N} \sum_{i = 1}^{N} g_{i})

(4)

Only clients possessing the appropriate decryption key are able to decrypt

{\hat{g}}_{avg}

and obtain the average gradient

g_{avg}

according to Equation (4). This ensures that raw gradient values are never exposed to the server, thereby enabling secure collaborative learning through encrypted aggregation.

4.4. Threat Model and Security Considerations

The clients use CKKS because it enables approximate arithmetic over real numbers to encrypt local gradients in order to obtain training updates. This makes sure that no raw gradients are ever exposed even when the gradients have to be aggregated. Only encrypted updates are given to the server, and it is only able to perform homomorphic addition, which has a greatly reduced attack surface.

In this framework, model inversion attacks, in which the attacker aims to recover client information based on gradients, are resisted sturdily. Opponents cannot access raw model updates, because all updates are encrypted and decrypted locally. In addition, secure aggregation and the symmetric architecture used to prevent a particular client having excessive sway in the global model mitigate model poisoning attacks. Our construction follows well-established patterns in the existing literature [2,7,24], which show that an efficient FL framework effectively secures inference and poisoning attacks and preserves model utility.

This algorithm develops a secure data collaboration method for healthcare organizations to share information without losing patient privacy. Algorithm 1 shows our steps to start one model per healthcare facility that uses local patient information to train. The models train from individual datasets without giving raw patient data to a central organization. Each institution updates their matching local model according to an optimization function that minimizes given loss standards from existing datasets. Local model upgrade information becomes encrypted when it shares to the system. At set times, we aggregate local model improvements to create one shared model instance for all levels of the system. A global model develops by averaging encrypted model updates from all institutions to share knowledge between centers while keeping patient data hidden. The institutions receive the new updated global model when the system completes its cycle. The procedure runs K times to build up the global model through consistent updates without revealing patient details.

Algorithm 1: Privacy-preserving federated learning for secure healthcare data collaboration.

Inputs:

X = {X_{1}, X_{2}, \dots, X_{N}}

: Local datasets at N institutions.

Y = {Y_{1}, Y_{2}, \dots, Y_{N}}

: Corresponding labels.

f_{w} = {f_{w_{1}}, f_{w_{2}}, \dots, f_{w_{N}}}

: Local models with weights

w_{i}

.

M (\cdot)

: Encryption function;

{\hat{w}}_{n} = M (w_{n})

.

K: Total number of global training rounds.

τ

: Global synchronization interval.

{o p t_{1}, o p t_{2}, \dots, o p t_{N}}

: Local optimizers minimizing loss L.

Algorithm:

1. Initialize local models:

w_{n} (0) \leftarrow

random for

n = 1

to N.

2. For

k = 1

to K:

For each client

n = 1

to N:

Compute gradient:

g_{n} (k) \leftarrow \nabla L (f_{w_{n} (k - 1)} (X_{n}, Y_{n}))

Update local model:

w_{n} (k) \leftarrow o p t_{n} (g_{n} (k))

Encrypt weights:

{\hat{w}}_{n} (k) \leftarrow M (w_{n} (k))

If

k mod τ = 0

:

Aggregate:

\bar{w} (k) \leftarrow \frac{1}{N} \sum_{n = 1}^{N} {\hat{w}}_{n} (k)

For each client

n = 1

to N:

Update model:

w_{n} (k) \leftarrow \bar{w} (k)

3. Repeat until K rounds are complete.

This secure data collaboration method functions in a real-world setting to measure its success.

5. Performance Analysis

In this section, we use the degree of diabetic retinopathy as a case study to evaluate the performance of the PPFLHE program. The effectiveness of this approach is demonstrated through experimental analysis, focusing on five key metrics from the APTOS 2019 Blindness Detection dataset: classification accuracy, F1 score, precision, recall, and the computational time required for encryption and decryption. Additionally, the privacy-preserving capabilities of the scheme are assessed, confirming its advantages through these detailed experiments. Efficient experiments required us to set up a specific computing space.

5.1. Experimental Setup

For our diabetic retinopathy (DR) detection experiments, we utilized an updated setup running on Python 3.10 with PyTorch (version 2.0.1). The experiments were performed on a Google Colab environment with a Tesla T4 GPU and 12 GB RAM.

5.2. Dataset

The research utilizes the APTOS 2019 BD dataset, which provides 3662 labeled retinal images for statistics. The research team obtained retinal images by testing rural India participants at Aravind Eye Hospital, which remains a top attraction for eye healthcare. The collection of data establishes an effective platform to train and test diabetic retinopathy detection models that develop during diabetes progression. The ready-for-model dataset serves three essential tasks with data preparation that includes the collection of labeled images plus quality checks and image updates for training. The dataset proves useful because it comes from real-life healthcare practice and shows multiple variations for testing and creating medical-oriented federated learning platforms.

Building on this data source, the study conducts a comprehensive evaluation of the proposed secure data collaboration model.

5.3. Model Accuracy

In order to comprehensively assess the PPFLHE framework, I compared the performance of three different models, i.e., EfficientNet-B0, MobileNet V2, and ResNet 50, on the APTOS 2019 database to classify diabetic retinopathy. Table 3 indicates that EfficientNet-B0 has an accuracy of 83.19 %, whereas MobileNet V2 and ResNet 50 have an accuracy of 81.53% and 78.00%, respectively. This testing shows the possibility of the framework being able to effectively support various model architectures while attaining high performance on the APTOS 2019 dataset.

In addition to the experiment on reviewing performance on APTOS 2019, we also tested the EfficientNet-B0 model in a new modality with the MosMedData CT scan dataset. The findings, which are depicted in Table 4, indicate that the framework can be generalized to different areas of medical imaging.

It was observed that EfficientNet-B0 performed better than both MobileNetV2 and ResNet50 due to its accuracy and effectiveness on the APTOS dataset in training. Also, on the MosMedData CT data, it attained an accuracy of 81.27%, leading it to show high adaptability to multiple types of medical imaging. These findings demonstrate the significance of the architecture and of the training processes in achieving high levels of performance. As revealed in Table 5, the measurement of the models kept on improving round by round.

5.3.1. Client Behavior and Accuracy Analysis

PPFLHE follows the model of symmetric federated learning. Each participating client is set up the same, so the model and its training settings provide equal and fair representation. The results from experiments suggest that as the number of clients is raised (C ∈ 3, 5, 10, 15), the system’s accuracy does not become less reliable, demonstrating how much the framework can handle.

The results of multiple training rounds on the APTOS 2019 Blindness Detection Dataset are shown in Figure 3. It is apparent from the experiments that training mostly works the same for any number of clients. Even with only three clients, the model keeps showing strong and stable progress no matter the configuration.

5.3.2. Time Consumption in Federated Setup

These results report two findings: first, they show a still subtle effect of the number of clients on the availability of PPFLHE, and lastly, that the average classification precision seems diverse by approximately

\pm 1.5 %

in different configurations. Additionally, the framework is robust to client dropout, as it does not significantly affect the overall accuracy of the model. Network communication between clients increases noticeably with time as the number of clients increases. For example, with

C = 3

, the processing times corresponding well to those reported in Figure 4 (see, for instance, the number of initial rounds around 200–250 s, depending on network traffic), and each round runs in rounds

O (log n)

. However, when

C = 15

, the increase in coordination and encryption overhead results in worst-case scaling, and the time consumption increases.

Given the trade-off between performance and computational cost, we selected

C = 3

as the optimal number of clients for this study, as shown in Table 2. To measure the framework’s scalability in terms of time and computing power, more clients were included in the latency experiments, reaching up to 10. Based on the study, encryption and decryption times for users (approximately 6–8 s) do not change, but the overall time taken for each round of work increases according to the number of clients due to all the necessary encryption, transfer, and decryption work performed by the server.

When PPFLHE was enabled, it took 6 to 8 s to perform encryption, resulting in a total latency per round of almost 215 s, which was much longer than the 105 s for non-HE FL. Yet, using this extra processing increases the model’s accuracy. Consequently, using homomorphic encryption can safeguard the confidentiality of your data and, at the same time, support slightly better model results. From the results in Table 6, we conclude that privacy technology is important, as it keeps data secure while still allowing learning to progress properly.

5.3.3. Privacy Preserving Analysis

The Privacy-Preserving Federated Learning with Homomorphic Encryption (PPFLHE) framework is evaluated with the APTOS 2019 Blindness Detection dataset with the aim of performing comparative experiments against a differential privacy (DP) approach. The performance metrics for the classification accuracy are shown in Figure 5, and it is seen that the data utility degrades for DP noise. All of our previous studies [1,5,6,18] in a form similar to that of Table 1 above and with related work are unable to guarantee model and data privacy and security at the same time. However, in comparison to the work in [2], our PPFLHE framework has the same privacy and security levels inherited in [2,8], and at least on top of that, our PPFLHE framework outperforms the computationally intensive CKKS HE approach by collecting a higher classification accuracy and communication overhead than [2]. In other words, PPFLHE does realize the trade-off between security, privacy, and performance in a secure collaboration of healthcare data.

Through our design, we develop secure patient data while offering solid and promising models for healthcare institutions. The system maintains symmetrical architecture between clients, which ensures all learners receive equivalent opportunities while lowering the potential for transfer of information about client relationships. The structure of our Privacy-Preserving Federated Learning with Homomorphic Encryption (PPFLHE) framework is depicted in Figure 2, which guarantees an equitable computational load distribution among all participating clients, thus avoiding any single institution bearing a disproportionate amount of the training process. Using the methodology described, the system increases the utilization of available resources in healthcare networks using federated learning techniques, leading to 83.19% accuracy in the APTOS 2019 Blindness Detection dataset while protecting data privacy.

As shown in Figure 6, the experiments prove that PPFLHE has better results when trained once or multiple times. With time, the model becomes better at classification as its training methods improve and it learns from the given data in the APTOS 2019 Blindness Detection Dataset. During those early stages, the model does not understand the basic patterns of the data, so it takes further rounds of training to perform well. Next, we added homomorphic encryption to our enhanced system performance in Section 4 to let the server aggregate the models safely. The parallel handling of data in the PPFLHE framework helps create a well-balanced result between keeping people’s privacy and performing effective classifications. New insights from these tests prove that using iterative learning and proper aggregation methods boosts the performance of diabetic retinopathy detection with the EfficientNet-B0 model and results in a performance of 83.19%.

5.4. Real-World Robustness Discussion

We demonstrated the real-world usability of the suggested framework by utilizing the APTOS 2019 Blindness Detection dataset, which consists of images of diabetic retinopathy from patients. It makes certain that the results of the evaluation are accurate reflections of actual medical problems. The testing included three clients. Because all EfficientNet-B0 models are used and trained identically, every hospital or clinic node behaves similarly. With this scheme, the contributions of each model are balanced, which ensures it can run in real-world systems where healthcare is distributed.

6. Discussion

This section features a review and comparison between what the PPFLHE framework uncovered and what the existing studies in the literature have found. With papers such as those by Mantey, Lee, and Zhou, previous studies have outlined the need to integrate encryption with federated learning to improve privacy. However, the studies usually do not explain in depth how privacy and model performance can be balanced.

In short, the proposed PPFLHE framework is beneficial for keeping data confidential, ensuring accurate models, and improving communication. It managed to retain an accuracy of 83.19% on the dataset, as shown in Table 7. The inclusion of CKKS homomorphic encryption helps to protect patients’ privacy in healthcare systems by supporting low communication traffic and delivering good performance. We have also discussed the threat model of PPFLHE to detail how the symmetric client design and encryption framework address primary key attack avenues, which are model inversion and poisoning. This assists in making sure PPFLHE is not only resistant to privacy but also resilient to the adversarial actions prevalent in federated learning.

Although the initial experiments applied to the APTOS 2019 dataset showed promising results, further testing of the MosMedData CT dataset provided validation of the strength of our approach. The PPFLHE framework was robust and had an average performance, regardless of the modality of the image, image resolution, or type of disease, indicating that it could be applied to a variety of clinical tasks. Data modality and distribution may have an effect on training dynamics as well as encryption performance. However, the model-agnostic symmetric scheme PPFLHE should generalize with the help of CKKS-based homomorphic encryption, so the connection to the model should be clean.

7. Conclusions

The sharing of medical data security among organizations is made possible through the use of federated learning (FL) and homomorphic encryption (HE), which provide robust protection of medical data. FL also allows the institutions to share safeguarded model updates rather than exchange patient data with each other, complying with GDPR and HIPAA policies and resulting in a lower chance of data breaches. The arrangements and settings are more privacy-friendly than other designs because they increase training efficiency and model interpretability. The real-life application of systems is facilitated as the clients are architecturally symmetrical. The created framework registered an accuracy rate of 83.19% on EfficientNet-B0 in the APTOS 2019 Blindness Detection dataset on diabetic retinopathy detection. Moreover, when tested on the MosMedData CT scan data, the model showed a high score of 81.27%, indicating the flexibility of the PPFLHE scheme in different medical imaging modalities. All of these were representative of the system’s incremental progress in major performance indicators such as precision, recall and F1 score. It was also successful in applying HE in securing model updates, which complies with data minimization policies and increases user confidence in collaborative learning settings. These results verify the efficiency of PPFLHE in safeguarding highly desirable performance as well as protecting privacy, and as such, they constitute another formidable means of secure exchange of medical data.

The next area of work will involve optimizing the efficiency of the processing and the investigation of hybrid privacy-preserving approaches. In addition, we will attempt to scale the framework to greater and more diversely representative cohorts, and to datasets of diverse imaging types, in order to understand the extent to which PPFLHE generalizes to more complex, more heterogeneous clinical environments whilst maintaining those strong security guarantees.

Author Contributions

Conceptualization, A.A. (Amna Adnan), F.K. and M.S.; methodology, A.A. (Amna Adnan), F.I. and A.A. (Ayesha Altaf); software, A.A. (Amna Adnan), M.S. and A.A. (Ayesha Altaf); resources, F.K., M.S. and H.M.A.; writing, A.A. (Amna Adnan), F.I. and A.A. (Ayesha Altaf); reviewing, F.K., M.S. and H.M.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest. The sponsors had no role in the design of the study; in the collection, analysis, or interpretation of the data; in the writing of the manuscript; or in the decision to publish the results.

References

Mantey, E.A.; Zhou, C.; Anajemba, J.H.; Arthur, J.K.; Hamid, Y.; Chowhan, A.; Otuu, O.O. Federated Learning Approach for Secured Medical Recommendation in Internet of Medical Things Using Homomorphic Encryption. IEEE J. Biomed. Health Inform. 2024. early access. [Google Scholar] [CrossRef] [PubMed]
Wang, B.; Li, H.; Guo, Y.; Wang, J. PPFLHE: A privacy-preserving federated learning scheme with homomorphic encryption for healthcare data. Appl. Soft Comput. 2023, 146, 110677. [Google Scholar] [CrossRef]
Wang, F.; Zhu, H.; Liu, X.; Lu, R.; Hua, J.; Li, H.; Li, H. Privacy-Preserving Collaborative Model Learning Scheme for E-Healthcare. IEEE Access 2019, 7, 166054–166065. [Google Scholar] [CrossRef]
Hijazi, N.M.; Aloqaily, M.; Guizani, M.; Ouni, B.; Karray, F. Secure Federated Learning with Fully Homomorphic Encryption for IoT Communications. IEEE Internet Things J. 2023, 11, 4289–4300. [Google Scholar] [CrossRef]
Gu, X.; Sabrina, F.; Fan, Z.; Sohail, S. A Review of Privacy Enhancement Methods for Federated Learning in Healthcare Systems. Int. J. Environ. Res. Public Health 2023, 20, 6539. [Google Scholar] [CrossRef]
Antunes, R.S.; da Costa, C.A.; Küderle, A.; Yari, I.A.; Eskofier, B. Federated Learning for Healthcare: Systematic Review and Architecture Proposal. ACM Trans. Intell. Syst. Technol. (TIST) 2022, 13, 1–23. [Google Scholar] [CrossRef]
Zhou, Y.; Ye, Q.; Lv, J. Communication-Efficient Federated Learning with Compensated Overlap-FedAvg. IEEE Trans. Parallel Distrib. Syst. 2021, 33, 192–205. [Google Scholar] [CrossRef]
Xu, C.; Wang, N.; Zhu, L.; Sharif, K.; Zhang, C. Achieving Searchable and Privacy-Preserving Data Sharing for Cloud-Assisted E-Healthcare System. IEEE Internet Things J. 2019, 6, 8345–8356. [Google Scholar] [CrossRef]
Khalid, N.; Qayyum, A.; Bilal, M.; Al-Fuqaha, A.; Qadir, J. Privacy-Preserving Artificial Intelligence in Healthcare: Techniques and Applications. Comput. Biol. Med. 2023, 158, 106848. [Google Scholar] [CrossRef]
Liu, W.; Zhou, T.; Chen, L.; Yang, H.; Han, J.; Yang, X. Round Efficient Privacy-Preserving Federated Learning Based on MKFHE. Comput. Stand. Interfaces 2024, 87, 103773. [Google Scholar] [CrossRef]
Zhang, L.; Xu, J.; Vijayakumar, P.; Sharma, P.K.; Ghosh, U. Homomorphic Encryption-Based Privacy-Preserving Federated Learning in IoT-Enabled Healthcare System. IEEE Trans. Netw. Sci. Eng. 2022, 10, 2864–2880. [Google Scholar] [CrossRef]
Qayyum, A.; Ahmad, K.; Ahsan, M.A.; Al-Fuqaha, A.; Qadir, J. Collaborative Federated Learning for Healthcare: Multi-Modal COVID-19 Diagnosis at the Edge. IEEE Open J. Comput. Soc. 2022, 3, 172–184. [Google Scholar] [CrossRef]
Singh, S.; Rathore, S.; Alfarraj, A.; Tolba, A.; Yoon, B.J. A Framework for Privacy-Preservation of IoT Healthcare Data Using Federated Learning and Blockchain Technology. Future Gener. Comput. Syst. 2022, 129, 380–388. [Google Scholar] [CrossRef]
Kausar, F.; Al-Hamouz, R.; Hussain, S. Energy Demand Forecasting for Electric Vehicles Using Blockchain-Based Federated Learning. IEEE Access 2024, 12, 41287–41298. [Google Scholar] [CrossRef]
Li, J.; Meng, Y.; Ma, L.; Du, S.; Zhu, H.; Pei, Q. A Federated Learning Based Privacy-Preserving Smart Healthcare System. IEEE Trans. Ind. Inform. 2022, 18, 2021–2031. [Google Scholar] [CrossRef]
Kaur, H.; Kumar, N.; Batra, S. An Efficient Multi-Party Scheme for Privacy Preserving Collaborative Filtering for Healthcare Recommender System. Future Gener. Comput. Syst. 2018, 86, 297–307. [Google Scholar] [CrossRef]
Huang, Q.; Wang, L.; Yang, Y. Secure and Privacy-Preserving Data Sharing and Collaboration in Mobile Healthcare Social Networks of Smart Cities. Secur. Commun. Networks 2017, 2017, 6426495. [Google Scholar] [CrossRef]
Yang, J.; Li, J.; Niu, Y. A Hybrid Solution for Privacy Preserving Medical Data Sharing in the Cloud Environment. Future Gener. Comput. Syst. 2015, 43, 74–86. [Google Scholar] [CrossRef]
Truex, S.; Baracaldo, N.; Anwar, A.; Steinke, T.; Ludwig, H.; Zhang, R.; Zhou, Y. A Hybrid Approach to Privacy-Preserving Federated Learning. In Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, London, UK, 15 November 2019; pp. 1–11. [Google Scholar] [CrossRef]
Jia, B.; Zhang, X.; Liu, J.; Zhang, Y.; Huang, K.; Liang, Y. Blockchain-Enabled Federated Learning Data Protection Aggregation Scheme with Differential Privacy and Homomorphic Encryption in IIoT. IEEE Trans. Ind. Inform. 2022, 18, 4049–4058. [Google Scholar] [CrossRef]
Augenstein, S.; McMahan, H.B.; Ramage, D.; Ramaswamy, R.; Kairouz, P.; Chen, M.; Mathews, R.Y.; Arcas, B.A.Y. Generative Models for Effective ML on Private, Decentralized Datasets. In Proceedings of the 8th International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia, 26–30 April 2020; pp. 1–26. [Google Scholar] [CrossRef]
Li, Z.; Sharma, V.; Mohanty, S.P. Preserving Data Privacy via Federated Learning: Challenges and Solutions. IEEE Consum. Electron. Mag. 2020, 9, 8–16. [Google Scholar] [CrossRef]
Altaf, A.; Iqbal, F.; Latif, R.; Yakubu, B.M.; Latif, S.; Samiullah, H. A Survey of Blockchain Technology: Architecture, Applied Domains, Platforms, and Security Threats. Soc. Sci. Comput. Rev. 2022, 41, 1941–1962, Original work published 2023. [Google Scholar] [CrossRef]
Ma, Z.; Ma, J.; Miao, Y.; Li, Y.; Deng, R.H. ShieldFL: Mitigating Model Poisoning Attacks in Privacy-Preserving Federated Learning. IEEE Trans. Inf. Forensics Secur. 2022, 17, 1639–1654. [Google Scholar] [CrossRef]
Guo, J.; Li, H.; Huang, F.; Liu, Z.; Peng, Y.; Li, X. AdFL: A Poisoning Attack Defense Framework for Horizontal Federated Learning. IEEE Trans. Ind. Inform. 2022, 18, 6526–6536. [Google Scholar] [CrossRef]
Li, H.; Zhang, J.; Chen, Y.; Wang, W.; Liu, F. Review on security of federated learning and its application in healthcare. Future Gener. Comput. Syst. 2023, 144, 271–290. [Google Scholar] [CrossRef]
Mohialden, Y.M.; Hussien, N.M.; Salman, S.A.; Aljanabi, M. Secure Federated Learning with a Homomorphic Encryption Model. Int. J. Pap. Adv. Sci. Rev. 2023, 4, 1–7. [Google Scholar] [CrossRef]
Farooq, S.; Altaf, A.; Iqbal, F.; Thompson, E.B.; Vargas, D.L.R.; Díez, I.d.l.T.; Ashraf, I. Resilience Optimization of Post-Quantum Cryptography Key Encapsulation Algorithms. Sensors 2023, 23, 5379. [Google Scholar] [CrossRef]
Hashmi, Z.; Altaf, A.; Iqbal, F.; Shoaib, S. Utilization of Heuristic Approaches in Cryptography to Enhance Security. In Proceedings of the 2023 Sixth International Conference of Women in Data Science at Prince Sultan University (WiDS PSU), Riyadh, Saudi Arabia, 14–15 March 2023; pp. 229–234. [Google Scholar] [CrossRef]
Abhinand, P.; Paul, T.; Kumar, G.S. Evaluation of Computation Overhead of Paillier Encryption in Vertical Federated Learning. In Proceedings of the 2024 IEEE 5th India Council International Subsections Conference (INDISCON), Chandigarh, India, 22–24 August 2024; pp. 1–6. [Google Scholar] [CrossRef]
Yin, X.; Zhu, Y.; Hu, J. A Comprehensive Survey of Privacy-Preserving Federated Learning: A Taxonomy, Review, and Future Directions. ACM Comput. Surv. 2021, 54, 1–36. [Google Scholar] [CrossRef]
Sun, Y.; Liu, J.; Yu, K.; Alazab, M.; Lin, K. PMRSS: Privacy-Preserving Medical Record Searching Scheme for Intelligent Diagnosis in IoT Healthcare. IEEE Trans. Ind. Inform. 2021, 18, 1981–1990. [Google Scholar] [CrossRef]
Xu, J.; Glicksberg, B.S.; Su, C.; Walker, P.; Bian, J.; Wang, F. Federated Learning for Healthcare Informatics. J. Healthc. Inform. Res. 2021, 5, 1–19. [Google Scholar] [CrossRef]
Zahid, R.; Altaf, A.; Ahmad, T.; Iqbal, F.; Vera, Y.A.M.; Flores, M.A.L.; Ashraf, I. Secure Data Management Life Cycle for Government Big-Data Ecosystem: Design and Development Perspective. Systems 2023, 11, 380. [Google Scholar] [CrossRef]
Yin, L.; Feng, J.; Xun, H.; Sun, Z.; Cheng, X. A Privacy-Preserving Federated Learning for Multiparty Data Sharing in Social IoTs. IEEE Trans. Netw. Sci. Eng. 2021, 8, 2706–2718. [Google Scholar] [CrossRef]
Han, C.; Li, T.; Chen, Q.; Wu, Y.; Qin, J. Distributed and Collaborative Lightweight Edge Federated Learning for IoT Zombie Devices Detection. ACM Trans. Sens. Netw. 2024. just accepted. [Google Scholar] [CrossRef]
Pan, Y.; Chao, Z.; He, W.; Jing, Y.; Hongjia, L.; Liming, W. FedSHE: Privacy Preserving and Efficient Federated Learning with Adaptive Segmented CKKS Homomorphic Encryption. Cybersecurity 2024, 7, 40. [Google Scholar] [CrossRef]
Iqbal, F.; Altaf, A.; Waris, Z.; Aray, D.G.; Flores, M.A.L.; Díez, I.D.l.T.; Ashraf, I. Blockchain-Modeled Edge-Computing-Based Smart Home Monitoring System with Energy Usage Prediction. Sensors 2023, 23, 5263. [Google Scholar] [CrossRef] [PubMed]
Babar, F.F.; Jamil, F.; Alsboui, T.; Babar, F.F.; Ahmad, S.; Alkanhel, R.I. Federated Active Learning with Transfer Learning: Empowering Edge Intelligence for Enhanced Lung Cancer Diagnosis. In Proceedings of the 2024 International Wireless Communications and Mobile Computing (IWCMC), Ayia Napa, Cyprus, 27–31 May 2024; pp. 1333–1338. [Google Scholar] [CrossRef]
Li, X.; Gu, Y.; Dvornek, N.; Staib, L.H.; Ventola, P.; Duncan, J.S. Multi-site fMRI analysis using privacy-preserving federated learning and domain adaptation: ABIDE results. Med Image Anal. 2020, 65, 101765. [Google Scholar] [CrossRef]
Duan, M.; Liu, D.; Chen, X.; Tan, Y.; Ren, J.; Qiao, L.; Liang, L. Astraea: Self-Balancing Federated Learning for Improving Classification Accuracy of Mobile Deep Learning Applications. In Proceedings of the 2019 IEEE 37th International Conference on Computer Design (ICCD), Abu Dhabi, United Arab Emirates, 17–20 November 2019; pp. 246–254. [Google Scholar] [CrossRef]
Almaiah, M.A.; Hajjej, F.; Ali, A.; Pasha, M.F.; Almomani, O. A Novel Hybrid Trustworthy Decentralized Authentication and Data Preservation Model for Digital Healthcare IoT Based CPS. Sensors 2022, 22, 1448. [Google Scholar] [CrossRef]
Sharma, H.; Budhiraja, I.; Consul, P.; Kumar, N.; Garg, D.; Zhao, L.; Liu, L. Federated Learning Based Energy Efficient Scheme for MEC with NOMA Underlaying UAV. In Proceedings of the 5th International ACM MobiCom Workshop on Drone Assisted Wireless Communications for 5G and Beyond, Sydney, Australia, 21 October 2022; pp. 73–78. [Google Scholar]
Pan, G.; Tan, H.; Zheng, W.; Vijayakumar, P.; Wu, Q.M.J.; Sivaraman, A. Three-Factor Authentication and Key Agreement Protocol with Collusion Resistance in VANETs. J. Inf. Secur. Appl. 2025, 90, 104029. [Google Scholar] [CrossRef]

Figure 1. Working principle of homomorphic encryption.

Figure 2. Architecture of PPFLHE.

Figure 3. Accuracy trends among clients.

Figure 4. Computational time.

Figure 5. Model accuracy.

Figure 6. Performance metrics comparison by metric.

Table 1. Comparison of techniques and features in selected studies.

Author	Dataset	Techniques	Accuracy	Homomorphic Encryption
Mantey et al. [1]	Case study	SMMG, NDE algorithms	Yes	×
Wang et al. [2]	APTOS 2019, CIFAR-10	Federated learning, Paillier homomorphic encryption	Yes	✓
Wang et al. [3]	Case study (IoMT)	Optimization algorithms	Yes	✓
Hijazi et al. [4]	N-BaIoT	Federated learning	No	✓
Gu et al. [5]	diabetic retinopathy, COVID	Differentially private, blockchain, hierarchical FL	No	×
Antunes et al. [6]	EHR	Differential privacy, blockchain	No	×
Zhou et al. [7]	MNIST, CIFAR-10	Adaptive segmented CKKS HE, FedAvg	No	✓
Xu et al. [8]	Mammogram	Secure federated learning	Yes	✓
Khalid et al. [9]	EHR, genomic, imaging	Secure multiparty computation, blockchain, differential privacy	No	×
Liu et al. [10]	HAM10000	HE, Shamir’s secret sharing	Yes	✓
Qayyum et al. [12]	Chest X-ray	Clustered FL, modified VGG16	Yes	×
Singh et al. [13]	Case study	FL, blockchain, differential privacy, homomorphic encryption	No	✓
Kaur et al. [16]	Simulated	Multiparty random masking, polynomial aggregation, homomorphic encryption	Yes	✓
Huang et al. [17]	Simulated	Attribute-based encryption, identity-based broadcast encryption	No	×
Yang et al. [18]	EMR	Data anonymization, vertical partitioning, hybrid search mechanisms	No	×
Jia et al. [20]	UCI	Federated learning with DP, blockchain	Yes	✓
Ma et al. [24]	MNIST	Two-trapdoor homomorphic encryption	Yes	✓
Guo Xu et al. [25]	MNIST	VerifyNet	No	✓
Li et al. [26]	ABIDE I	Federated learning, differential privacy	Yes	×

Table 2. Model parameters and their values.

Symbols	Value
Model Loss Function	Cross-Entropy
Training Optimizer	Momentum-SGD
Gradient Momentum	0.9
Training Rate	0.01
Local Epochs	5
Batch Size	32
Federated Nodes	3

Table 3. Model performance comparison.

Sr. #	Model	Dataset	Accuracy (%)
1	EfficientNet-B0	APTOS 2019	83.19
2	MobileNet V2	APTOS 2019	81.53
3	ResNet 50	APTOS 2019	78.00

Table 4. Cross-dataset model performance comparison using EfficientNet-B0.

Sr. #	Model	Dataset	Accuracy (%)
1	EfficientNet-B0	APTOS 2019	83.19
2	EfficientNet-B0	MosMedData CT	81.27

Table 5. Performance metrics across rounds.

Round	Accuracy (%)	Precision	Recall	F1 Score
Round 1	73.52	0.64	0.57	0.54
Round 2	80.03	0.73	0.73	0.73
Round 3	83.19	0.78	0.78	0.78

Table 6. Comparison of federated learning efficiency with and without HE.

FL Type	Encryption Time	Total Latency per Round	Accuracy
Non-HE FL	0 s	∼105 s	∼82.9%
PPFLHE	6–8 s	∼215 s	83.19%

Table 7. Comparison of model accuracy.

Article	[1]	[2]	[3]	[12]	[16]	[26]	PPFLHE
Accuracy	98%	81.53%	-	80.6%	-	80.0%	83.19%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Adnan, A.; Kausar, F.; Shoaib, M.; Iqbal, F.; Altaf, A.; Asif, H.M. A Secure and Privacy-Preserving Approach to Healthcare Data Collaboration. Symmetry 2025, 17, 1139. https://doi.org/10.3390/sym17071139

AMA Style

Adnan A, Kausar F, Shoaib M, Iqbal F, Altaf A, Asif HM. A Secure and Privacy-Preserving Approach to Healthcare Data Collaboration. Symmetry. 2025; 17(7):1139. https://doi.org/10.3390/sym17071139

Chicago/Turabian Style

Adnan, Amna, Firdous Kausar, Muhammad Shoaib, Faiza Iqbal, Ayesha Altaf, and Hafiz M. Asif. 2025. "A Secure and Privacy-Preserving Approach to Healthcare Data Collaboration" Symmetry 17, no. 7: 1139. https://doi.org/10.3390/sym17071139

APA Style

Adnan, A., Kausar, F., Shoaib, M., Iqbal, F., Altaf, A., & Asif, H. M. (2025). A Secure and Privacy-Preserving Approach to Healthcare Data Collaboration. Symmetry, 17(7), 1139. https://doi.org/10.3390/sym17071139

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Secure and Privacy-Preserving Approach to Healthcare Data Collaboration

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Data Acquisition and Transformation

MosMedData CT Scans

3.2. Model Development and Privacy Implementation

3.3. Model Modification

3.4. Federated Learning and Evaluation

4. Privacy-Preserving Federated Learning Framework with Homomorphic Encryption

4.1. Privacy Preserving Federated Learning Framework

4.2. Local Gradient Computation and Encryption

4.3. Homomorphic Aggregation

4.4. Threat Model and Security Considerations

5. Performance Analysis

5.1. Experimental Setup

5.2. Dataset

5.3. Model Accuracy

5.3.1. Client Behavior and Accuracy Analysis

5.3.2. Time Consumption in Federated Setup

5.3.3. Privacy Preserving Analysis

5.4. Real-World Robustness Discussion

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI