RPFL: A Reliable and Privacy-Preserving Framework for Federated Learning-Based IoT Malware Detection
Abstract
:1. Introduction
- What are the most effective ways to integrate homomorphic encryption into FL-IMD to enhance privacy without compromising model performance?
- How can blockchain and elliptic curve digital signatures improve the reliability and integrity of the FL-IMD aggregation process while mitigating single points of failure?
- What are the computational and communication overheads introduced by homomorphic encryption and blockchain, and how do they compare to existing FL-IMD approaches?
- What decentralized incentive mechanisms can be designed to encourage honest participation in FL-IMD while ensuring fairness and security?
- Reliable Aggregation and Privacy Preservation: We leverage ECDSA to ensure that only verified clients participate in aggregation. Additionally, homomorphic encryption (HE) protects the privacy of local model weights while maintaining model accuracy.
- Blockchain-Based Decentralized Mechanisms: We develop two smart contract-based decentralized schemes to address key challenges in FL-IMD:
- –
- A performance-based client participation evaluation mechanism to ensure fair and incentivized collaboration.
- –
- A decentralized tracking and reporting system to detect and mitigate aggregator failures in real time.
- Performance Evaluation: We conduct extensive experiments to evaluate the effectiveness of the RPFL framework. The results demonstrate that our approach:
- –
- Achieves comparable model accuracy to state-of-the-art FL-IMD methods.
- –
- Enhances the aggregation process by protecting privacy and improving reliability. While this introduces computational overhead, it remains manageable.
- –
- Reduces communication costs and latency compared to Blockchain-based Federated Learning using the InterPlanetary File System (BCFL-IPFS).
- Cost and Scalability Analysis: We analyze cost considerations related to communication and discuss constraints on scalability and model evaluation in the context of deployment challenges, while also exploring potential solutions.
2. Background and Related Work
2.1. FL-IMD Approaches
2.2. Addressing Advanced Security Threats in FL-IMD
2.3. Blockchain Integration for Enhanced Security
2.4. Enhancing FL-IMD with Privacy and Security Mechanisms
3. Preliminaries
3.1. Homomorphic Encryption
- Partially Homomorphic Encryption (PHE): Supports a single type of operation.
- Somewhat Homomorphic Encryption (SWHE): Allows both addition and multiplication operations but only up to a limited extent, beyond which decryption becomes impractical.
- Fully Homomorphic Encryption (FHE): Permits unrestricted addition and multiplication operations on ciphertexts without accuracy loss. We focus on this FHE in this paper.
- Encoding: This initial step includes converting as a vector or tensor of real numbers into as a plaintext polynomial within the ring
- Scaling Factor (Global Scale): Determines the precision of the encoding and is crucial for balancing precision and noise in the encrypted data.
- Polynomial Modulus Degree (Poly Modulus Degree): Influences the number of coefficients in the plaintext polynomials, affecting both computational performance and the encryption’s security level.
- Coefficient Modulus Sizes: Represented as a list of binary values, these sizes determine the size of ciphertext elements and the overall security level. The length of this list indicates the number of supported encrypted multiplications.
- 2.
- Encryption:After encoding, the data are encrypted using public keys. This process involves adding noise to the data to enhance security and performing an intermediate rescaling step to manage the magnitude of plaintexts, thereby preventing noise accumulation.
- 3.
- Computation:Once encrypted, various operations such as addition and multiplication are performed on the ciphertexts. These operations enable the manipulation and processing of the encrypted information without compromising its security.
- 4.
- Decryption:Finally, the encrypted data are decrypted using private keys to retrieve the original information. This decryption process involves noise suppression techniques to ensure the recovered data are accurate and reliable.
3.2. Blockchain
3.3. Elliptic Curve Digital Signature Algorithm (ECDSA)
4. Proposed Solution
4.1. Workflow Overview of the RPFL for IMD
4.2. RPFL System Components
4.3. RPFL Workflow Stages
- Clients select a coordinator, responsible for:
- –
- Generating and securely distributing Homomorphic Encryption (HE) keys.
- –
- Providing ECDSA private keys to clients for signature verification.
- –
- Distributing initial global model parameters and training settings.
- –
- Deploying the RPFL smart contract and Token contract on the blockchain.
- The coordinator selects an external aggregator to ensure model privacy.
- The aggregator receives the public key for signature verification.
- This prevents unauthorized participants from contributing malicious updates.
- Each client:
- –
- Receives and stores HE encryption keys securely.
- –
- Signs their Ethereum wallet address using their ECDSA private key.
- –
- Locally trains the malware detection model.
- –
- Encrypts the updated model weights before sharing them.
- If a client does not receive the updated global model from the aggregator within the specified time, it automatically reports the aggregator’s failure indicator via a smart contract.
- The aggregator follows these steps:
- –
- Timeout-based synchronization: Clients must submit updates within a specified time frame. If a client fails to send its model within the timeout period, it is excluded from that aggregation round.
- –
- Client authentication: The Ethereum wallet address of each client is verified via ECDSA.
- –
- Privacy-preserving aggregation: The encrypted model weights are aggregated using federated averaging.
- –
- Client commitment evaluation: The aggregator can record client scores via the smart contract based on their performance (e.g., whether a client provides an update in this round, how quickly the client completes training tasks), and at the end, the system retrieves these results and, based on them, can incentivize clients to participate actively and efficiently.
- Check Failure Report
- –
- The coordinator examines aggregator failures using smart contract feedback.
- –
- Based on the severity, corrective actions are taken.
- Token Allocation
- –
- Clients are rewarded proportionally based on their performance.
- –
- The coordinator manages token distribution using the pre-deployed Token contract.
4.4. Smart Contract
4.4.1. Access Control and Security
4.4.2. Core Functions of the Smart Contract
- Role and Access Management: Defines and manages roles (Coordinator, Aggregator, Client) to ensure that only authorized entities can perform specific actions within the contract.
- Round Management: Initiates and controls training rounds, including setting durations, tracking progress, and managing transitions between rounds based on client contributions.
- Evaluation and Scoring: Assigns scores to clients based on their level of commitment and contribution, serving as the basis for token-based rewards.
- Aggregator Failure Tracking: Allows clients to record a binary value in the smart contract, indicating the aggregator’s status. The coordinator can monitor these records for reliability assessment.
- Token Distribution: Manages the distribution of ERC-20 tokens as rewards for client contributions. The smart contract enables the coordinator to set or update the ERC-20 token used, providing flexibility in the reward mechanism.
4.5. Detailed Description of Proposed Schemes
4.5.1. Reliable and Privacy-Preserving Aggregation Process for FL-IMD
- Authentication and integrity: Preventing unauthorized clients from contributing to the global model.
- Data confidentiality: The aggregator cannot access the raw model updates.
- Privacy preservation: Clients’ training data and model parameters remain secure throughout the learning process.
Algorithm 1 Reliable and privacy-preserving aggregation process (client side) |
Input: Initial model weights or encrypted global model weights , Homomorphic encryption keys , Elliptic Curve Digital Signature Algorithm (ECDSA) private key: , Local training data Output: Encrypted model weights , Ethereum_address and signed Ethereum address
|
Algorithm 2 Reliable and privacy-preserving aggregation process (aggregator side) |
Input: : Encrypted local model weights , Ethereum_address, , Output: Encrypted global model weights
|
4.5.2. Decentralized Schemes for Addressing HE Integration Challenges
Scheme 1: Performance-Based Client Participation Evaluation
- Records and evaluates contributions securely without exposing model weights.
- Incentivizes active clients while ensuring privacy compliance.
Algorithm 3 Performance-based client participation evaluation (aggregator side) |
|
Scheme 2: Aggregator Failure Mitigation
- Continuous monitoring: The aggregator’s response time and activity levels are tracked.
- Failure detection triggers: If the aggregator exhibits slow response times or becomes unresponsive, an alert is raised.
- Proactive intervention: The RPFL coordinator takes corrective actions before a complete system failure occurs.
- Performance evaluation: The aggregator’s reliability is assessed over time, helping to optimize network efficiency.
Algorithm 4 Aggregator failure mitigation (client-side perspective) |
|
5. Experiments
5.1. Dataset Setting and Model Design for FL-IMD
- Mirai-based attacks: Includes attacks such as UDP flood, ACK flood, SYN flood, and scan attack.
- BASHLITE-based attacks: Includes TCP flooding, UDP flooding, and command injection attacks.
- 79% for training;
- 1% unused;
- 20% for testing.
- Upsampling (replicating minority samples) when fewer than 50,000 samples exist.
- Downsampling (random selection) when more than 50,000 samples exist.
- 115 input neurons (one per feature);
- Two hidden layers (115 and 58 neurons) with ELU activation;
- Sigmoid activation for binary classification.
- Structured Data Compatibility: Unlike CNNs and RNNs, MLPs are well suited for tabular network traffic data.
- Baseline Consistency: Prior works [8] also utilized MLPs, ensuring a valid performance comparison.
5.2. Simulation Tools and Parameters
- Seamless integration with PyTorch 2.3.1;
- Support for federated averaging (FedAvg) aggregation;
- Simulated FL with heterogeneous clients.
- CKKS Scheme: Supports encrypted operations on real numbers;
- Polynomial Modulus Degree: 8192, balancing security and efficiency;
- Coefficient Modulus Bit Sizes: [60, 40, 40, 60] for precision and encryption depth.
- Ganache CLI: Creates a reproducible local Ethereum blockchain for smart contract deployment and testing,
- Truffle Framework: Develops Solidity-based contracts for functions,
- Web3.py: Integrates federated learning with blockchain functionalities.
5.3. Performance Metrics and Methodology, Baselines
- True Positive Rate (TPR), or recall, which measures the model’s ability to correctly detect malicious traffic:
- True Negative Rate (TNR), which evaluates how accurately the model classifies benign traffic:
- Accuracy, which reflects overall classification performance across both benign and malicious traffic:
- Classical FL-IMD (without privacy mechanisms).
- FL-IMD with Differential Privacy (DP) to assess privacy impacts.
5.4. Results and Analysis
- Additional blockchain-IPFS interactions: Each model update must be uploaded to and retrieved from IPFS, introducing significant processing delays.
- Network latency overhead: The transmission of models through IPFS requires multiple communication cycles, further compounding delay.
6. Discussion
6.1. Communication Cost Analysis
6.2. Scalability Discussion
6.3. Model Evaluation Constraints
7. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Sun, P.; Shen, S.; Wan, Y.; Wu, Z.; Fang, Z.; Gao, X.Z. A survey of iot privacy security: Architecture, technology, challenges, and trends. IEEE Internet Things J. 2024, 11, 34567–34591. [Google Scholar] [CrossRef]
- Sánchez, B.B.; Alcarria, R.; Robles, T. A Probabilistic Trust Model and Control Algorithm to Protect 6G Networks against Malicious Data Injection Attacks in Edge Computing Environments. CMES Comput. Model. Eng. Sci. 2024, 141, 631–654. [Google Scholar] [CrossRef]
- Ma, Y.; Liu, L.; Liu, Z.; Li, F.; Xie, Q.; Chen, K.; Lv, C.; He, Y.; Li, F. A Survey of DDoS Attack and Defense Technologies in Multi-Access Edge Computing. IEEE Internet Things J. 2024, 12, 1428–1452. [Google Scholar] [CrossRef]
- Chen, J.; Yan, H.; Liu, Z.; Zhang, M.; Xiong, H.; Yu, S. When federated learning meets privacy-preserving computation. ACM Comput. Surv. 2024, 56, 1–36. [Google Scholar] [CrossRef]
- Heidari, A.; Jabraeil Jamali, M.A. Internet of Things intrusion detection systems: A comprehensive review and future directions. Clust. Comput. 2023, 26, 3753–3780. [Google Scholar] [CrossRef]
- Alsoufi, M.A.; Siraj, M.M.; Ghaleb, F.A.; Al-Razgan, M.; Al-Asaly, M.S.; Alfakih, T.; Saeed, F. Anomaly-Based Intrusion Detection Model Using Deep Learning for IoT Networks. Comput. Model. Eng. Sci. 2024, 141, 823–845. [Google Scholar] [CrossRef]
- Meidan, Y.; Bohadana, M.; Mathov, Y.; Mirsky, Y.; Shabtai, A.; Breitenbacher, D.; Elovici, Y. N-baiot—Network-based detection of iot botnet attacks using deep autoencoders. IEEE Pervasive Comput. 2018, 17, 12–22. [Google Scholar] [CrossRef]
- Rey, V.; Sánchez, P.M.S.; Celdrán, A.H.; Bovet, G. Federated learning for malware detection in IoT devices. Comput. Netw. 2022, 204, 108693. [Google Scholar] [CrossRef]
- Popoola, S.I.; Ande, R.; Adebisi, B.; Gui, G.; Hammoudeh, M.; Jogunola, O. Federated deep learning for zero-day botnet attack detection in IoT-edge devices. IEEE Internet Things J. 2021, 9, 3930–3944. [Google Scholar] [CrossRef]
- Wardana, A.A.; Sukarno, P.; Salman, M. Collaborative Botnet Detection in Heterogeneous Devices of Internet of Things using Federated Deep Learning. In Proceedings of the 2024 13th International Conference on Software and Computer Applications, Bali Island, Indonesia, 1–3 February 2024; pp. 287–291. [Google Scholar]
- McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the Artificial Intelligence and Statistics, Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
- Regan, C.; Nasajpour, M.; Parizi, R.M.; Pouriyeh, S.; Dehghantanha, A.; Choo, K.K.R. Federated IoT attack detection using decentralized edge data. Mach. Learn. Appl. 2022, 8, 100263. [Google Scholar] [CrossRef]
- Thein, T.T.; Shiraishi, Y.; Morii, M. Personalized federated learning-based intrusion detection system: Poisoning attack and defense. Future Gener. Comput. Syst. 2024, 153, 182–192. [Google Scholar] [CrossRef]
- Sánchez, P.M.S.; Celdrán, A.H.; Xie, N.; Bovet, G.; Pérez, G.M.; Stiller, B. Federatedtrust: A solution for trustworthy federated learning. Future Gener. Comput. Syst. 2024, 152, 83–98. [Google Scholar] [CrossRef]
- Goh, E.; Kim, D.Y.; Lee, K.; Oh, S.; Chae, J.E.; Kim, D.Y. Blockchain-Enabled Federated Learning: A Reference Architecture Design, Implementation, and Verification. IEEE Access 2023, 11, 145747–145762. [Google Scholar] [CrossRef]
- Doan, T.V.T.; Messai, M.L.; Gavin, G.; Darmont, J. A survey on implementations of homomorphic encryption schemes. J. Supercomput. 2023, 79, 15098–15139. [Google Scholar] [CrossRef]
- Gentry, C. Fully homomorphic encryption using ideal lattices. In Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, Bethesda, MD, USA, 31 May–2 June 2009; pp. 169–178. [Google Scholar]
- Cheon, J.H.; Kim, A.; Kim, M.; Song, Y. Homomorphic encryption for arithmetic of approximate numbers. In Proceedings of the Advances in Cryptology—ASIACRYPT 2017: 23rd International Conference on the Theory and Applications of Cryptology and Information Security, Hong Kong, China, 3–7 December 2017; Proceedings, Part I 23. Springer: Berlin/Heidelberg, Germany, 2017; pp. 409–437. [Google Scholar]
- Bezuglova, E.; Kucherov, N. An Overview of Modern Fully Homomorphic Encryption Schemes. In Proceedings of the International Conference on Actual Problems of Applied Mathematics and Computer Science, Stavropol, Russia, 3–7 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 300–311. [Google Scholar]
- Benaissa, A.; Retiat, B.; Cebere, B.; Belfedhal, A.E. Tenseal: A library for encrypted tensor operations using homomorphic encryption. arXiv 2021, arXiv:2104.03152. [Google Scholar]
- OpenMined. TenSEAL: A Library for Homomorphic Encryption Operations on Tensors. Available online: https://github.com/OpenMined/TenSEAL/tree/main (accessed on 16 November 2024).
- Beutel, D.J.; Topal, T.; Mathur, A.; Qiu, X.; Fernandez-Marques, J.; Gao, Y.; Sani, L.; Li, K.H.; Parcollet, T.; de Gusmão, P.P.B.; et al. Flower: A friendly federated learning framework. arXiv 2022, arXiv:2007.14390. [Google Scholar]
- Adap. Flower: A Friendly Federated Learning Research Framework. 2024. Available online: https://github.com/adap/flower (accessed on 23 February 2025).
- Lang, S. Algebraic Number Theory, 2nd ed.; Graduate Texts in Mathematics; Springer: Berlin/Heidelberg, Germany, 1994; Volume 110. [Google Scholar]
- Deepa, N.; Pham, Q.V.; Nguyen, D.C.; Bhattacharya, S.; Prabadevi, B.; Gadekallu, T.R.; Maddikunta, P.K.R.; Fang, F.; Pathirana, P.N. A survey on blockchain for big data: Approaches, opportunities, and future directions. Future Gener. Comput. Syst. 2022, 131, 209–226. [Google Scholar] [CrossRef]
- Al-Nbhany, W.A.; Zahary, A.T.; Al-Shargabi, A.A. Blockchain-IoT healthcare applications and trends: A review. IEEE Access 2024, 12, 4178–4212. [Google Scholar] [CrossRef]
- Zheng, P.; Jiang, Z.; Wu, J.; Zheng, Z. Blockchain-based decentralized application: A survey. IEEE Open J. Comput. Soc. 2023, 4, 121–133. [Google Scholar] [CrossRef]
- Ressi, D.; Romanello, R.; Piazza, C.; Rossi, S. AI-enhanced blockchain technology: A review of advancements and opportunities. J. Netw. Comput. Appl. 2024, 225, 103858. [Google Scholar] [CrossRef]
- Ullah, S.; Zheng, J.; Din, N.; Hussain, M.T.; Ullah, F.; Yousaf, M. Elliptic Curve Cryptography; Applications, challenges, recent advances, and future trends: A comprehensive survey. Comput. Sci. Rev. 2023, 47, 100530. [Google Scholar] [CrossRef]
- Prakash, V.; Keerthi, K.; Jagadish, S.; Alkhayyat, A.; Soni, M. An Elliptic Curve Digital Signature Algorithm for Securing the Healthcare Data Using Blockchain Based IoT Architecture. In Proceedings of the 2024 International Conference on Data Science and Network Security (ICDSNS), Tiptur, India, 26–27 July 2024; pp. 1–5. [Google Scholar]
- OpenZeppelin. Access Control Documentation. 2024. Available online: https://docs.openzeppelin.com/contracts/3.x/access-control (accessed on 8 December 2024).
Study | Methodology | Strengths and Limitations |
---|---|---|
[9] | FL with deep learning using FedAvg for botnet attack detection | Strengths: High classification accuracy, privacy-preserving framework. Limitations: Lacks security mechanisms against adversarial attacks; no protection for model integrity. |
[8] | FL with supervised and unsupervised learning for IoT malware detection | Strengths: Better performance than centralized models, improved privacy. Limitations: Does not address advanced security threats related to the privacy of shared model weights at the server |
[12] | FL-based deep autoencoder for anomaly detection | Strengths: High anomaly detection accuracy (98%), local training on edge devices. Limitations: Lacks protection against poisoning attacks; no privacy mechanisms for model updates. |
[10] | Hierarchical FL-DNN with edge-fog-cloud computing for botnet detection | Strengths: High detection accuracy, scalable across IoT layers. Limitations: High communication overhead; lacks robustness against adversarial threats. |
[13] | Personalized FL-based intrusion detection system with a server-side poisoned client detector using cosine similarity | Strengths: Detects poisoned clients. Limitations: Fails to protect the privacy of shared model weights at the server, leaving vulnerabilities for potential inference attacks. |
Function | Main Actions | Description |
---|---|---|
Aggregator Selection | setAggregator | Assigns the AGGREGATOR_ROLE to an address and returns confirmation of the role assignment. |
revokeAggregator | Revokes the AGGREGATOR_ROLE from an address and returns confirmation that the role has been removed. | |
Round Management | currentRound | Returns the current training round number. |
secondsRemaining | Returns the number of seconds remaining in the current training round. | |
Evaluation and Scoring | setClientScore | Save a performance-based score for a client. |
getClientScore | Return a performance-based score for a client. | |
Aggregator Failure Tracking | reportAggregatorStatus | Record a binary value indicating the aggregator’s status (success or failure). |
getAggregatorStatus | Retrieve the binary value representing the aggregator’s status. | |
Token Distribution | countTokens | Returns the total number of tokens distributed in a given round. |
countTotalTokens | Returns total tokens distributed in all rounds. | |
setTokens | Records total tokens utilized in a given round. | |
distributeTokens | Distributes tokens to a specified wallet address. |
Parameter | Value |
---|---|
Optimizer | Stochastic Gradient Descent (SGD) |
L2-regularization (weight decay) | 0, |
Learning rate (lr) | 0.5 |
Batch size () | 64 |
Training epochs (E) | 4 |
Number of rounds (T) | 30 |
Number of experiment repetitions | 5 |
Method | Total Training Time (30 Rounds) (Minutes) | Training Time per Round (Seconds) |
---|---|---|
Classical FL-IMD | 16.21 min | 32.42 s |
Our approach with HE | 18.66 min | 37.32 s |
Metric | The Proposed Architecture | BCFL-IPFSS |
---|---|---|
Number of model transmissions | ||
Communication cost (Assume = 94 kB) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Asiri, M.; Khemakhem, M.A.; Alhebshi, R.M.; Alsulami, B.S.; Eassa, F.E. RPFL: A Reliable and Privacy-Preserving Framework for Federated Learning-Based IoT Malware Detection. Electronics 2025, 14, 1089. https://doi.org/10.3390/electronics14061089
Asiri M, Khemakhem MA, Alhebshi RM, Alsulami BS, Eassa FE. RPFL: A Reliable and Privacy-Preserving Framework for Federated Learning-Based IoT Malware Detection. Electronics. 2025; 14(6):1089. https://doi.org/10.3390/electronics14061089
Chicago/Turabian StyleAsiri, Mohammed, Maher A. Khemakhem, Reemah M. Alhebshi, Bassma S. Alsulami, and Fathy E. Eassa. 2025. "RPFL: A Reliable and Privacy-Preserving Framework for Federated Learning-Based IoT Malware Detection" Electronics 14, no. 6: 1089. https://doi.org/10.3390/electronics14061089
APA StyleAsiri, M., Khemakhem, M. A., Alhebshi, R. M., Alsulami, B. S., & Eassa, F. E. (2025). RPFL: A Reliable and Privacy-Preserving Framework for Federated Learning-Based IoT Malware Detection. Electronics, 14(6), 1089. https://doi.org/10.3390/electronics14061089