Next Article in Journal
Dynamic Error-Modulated Prescribed Performance Control of a DC–DC Boost Converter Using a Neural Network Disturbance Observer
Previous Article in Journal
A Reinforcement Learning-Based Optimization Strategy for Noise Budget Management in Homomorphically Encrypted Deep Network Inference
Previous Article in Special Issue
Person Re-Identification Enhanced by Super-Resolution Technology
error_outline You can access the new MDPI.com website here. Explore and share your feedback with us.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

SplitML: A Unified Privacy-Preserving Architecture for Federated Split-Learning in Heterogeneous Environments

by
Devharsh Trivedi
1,*,
Aymen Boudguiga
2,
Nesrine Kaaniche
3 and
Nikos Triandopoulos
4
1
Department of Computer Science, Bowie State University, Bowie, MD 20715, USA
2
CEA-LIST, Université Paris-Saclay, 91405 Orsay, France
3
Télécom SudParis, Institut Polytechnique de Paris, 91011 Evry, France
4
Department of Computer Science, Brown University, Providence, RI 02912, USA
*
Author to whom correspondence should be addressed.
Electronics 2026, 15(2), 267; https://doi.org/10.3390/electronics15020267
Submission received: 12 December 2025 / Revised: 2 January 2026 / Accepted: 3 January 2026 / Published: 7 January 2026

Abstract

While Federated Learning (FL) and Split Learning (SL) aim to uphold data confidentiality by localized training, they remain susceptible to adversarial threats such as model poisoning and sophisticated inference attacks. To mitigate these vulnerabilities, we propose SplitML , a secure and privacy-preserving framework for Federated Split Learning (FSL). By integrating I N D C P A D secure Fully Homomorphic Encryption (FHE) with Differential Privacy (DP), SplitML establishes a defense-in-depth strategy that minimizes information leakage and thwarts reconstructive inference attempts. The framework accommodates heterogeneous model architectures by allowing clients to collaboratively train only the common top layers while keeping their bottom layers exclusive to each participant. This partitioning strategy ensures that the layers closest to the sensitive input data are never exposed to the centralized server. During the training phase, participants utilize multi-key CKKS FHE to facilitate secure weight aggregation, which ensures that no single entity can access individual updates in plaintext. For collaborative inference, clients exchange activations protected by single-key CKKS FHE to achieve a consensus derived from Total Labels (TL) or Total Predictions (TP). This consensus mechanism enhances decision reliability by aggregating decentralized insights while obfuscating soft-label confidence scores that could be exploited by attackers. Our empirical evaluation demonstrates that SplitML provides substantial defense against Membership Inference (MI) attacks, reduces temporal training costs compared to standard encrypted FL, and improves inference precision via its consensus mechanism, all while maintaining a negligible impact on federation overhead.

1. Introduction

Machine Learning (ML) is a powerful tool that can solve various problems, yet it raises serious privacy concerns. Indeed, when ML models are trained on sensitive data, there is a risk that this data could be used to identify individuals or infer sensitive information about them. For instance, telecom companies implement advanced ML algorithms based on locally collected data through Security Incidents and Events Management (SIEM) to help determine potential cyber threats, protect their networks and customers’ data, and enhance security and privacy measures. Telecom data is one of the most sensitive data types, as any four location points are enough to uniquely re-identify 90% of individuals [1].
While SIEMs can be hosted on a standalone network device, they can also be deployed through cloud services offered by security service providers. These systems process logs in quasi-real-time but may also support offline log processing. They are responsible for storing, analyzing, and correlating logs. Gathered data from a standalone SIEM can include incomplete or non-representative information, leading to the inaccurate classification of incidents, thus taking improper actions that can induce severe damages [2]. Collaboration between different SIEMs is encouraged to deal with this challenge. The collaborative SIEM system enables companies to quickly identify and respond to potential threats, thereby reducing the impact of security breaches. By collaborating and sharing their expertise, companies also enhance their security posture and foster customer trust. However, this raises many issues. When a client shares its logs with other organizations, this information can be exploited, and the reported incident can be used to attack vulnerable devices. This becomes even more critical if the reported incident involves a widely used device. The use of incident information can lead to the creation of valuable target profiles for security vendors or be sold to competitors as alert reports, directly damaging the company’s brand and reputation. Indeed, distributed ML algorithms, i.e., Federated Learning (FL) [3] and Split Learning (SL) [4], enable the training of a global model on decentralized data stored on multiple client devices without sharing these data with a central entity. (Refer Appendix B.1 and Appendix B.2 for details.) This approach can benefit SIEM scenarios where sensitive proprietary data cannot be shared with peers.
Despite data being resident on the client device, confidentiality and privacy remain at risk. Distributed learning methods, including federated and split learning, provide local obfuscation but are not formal privacy mechanisms, offering no inherent guarantees of privacy. The deployment of collaborative learning in cybersecurity must account for the unique constraints of varied and limited environments, such as those found in IoT-based anomaly detection. As explored in [5], establishing a trustworthy infrastructure requires balancing privacy-preserving mechanisms with the operational challenges of Federated Learning, including node reliability and communication overhead. This work highlights the necessity of secure aggregation to mitigate privacy risks in IoT security, where individual nodes often lack the computational resources for heavy cryptographic operations.
Without these safeguards, even decentralized models remain vulnerable to membership inference and reconstruction attacks that can expose sensitive network topographies or device signatures. By situating SplitML  within this context, we acknowledge the shift toward decentralized architectures that utilize partitioned models to protect sensitive telemetry data while maintaining the efficacy of anomaly detection across heterogeneous and resource-constrained networks. Our framework builds upon these foundational challenges by employing model partitioning to offload sensitive computations from the edge while ensuring that the central aggregation process remains cryptographically secure.
Several attacks on privacy, such as Membership Inference [6,7] and Model Poisoning [8], must be considered. These attacks aim to infer sensitive information about the training data or clients. They exploit the relationship between the updates and the private features on which they were trained. By analyzing the global model, attackers might reconstruct training data or individual contributions, even if gradients are carefully designed. This is possible because the global model embodies aggregate information from participants’ data. SL involves exchanging Intermediate Representations (IR) between participants. Analyzing these IRs, even without access to raw data, may reveal sensitive information hidden within them. An attacker can reconstruct parts of the private training data used to build the model by feeding crafted inputs and observing model outputs. Dishonest users can inject manipulated updates into the training process to steer the model toward incorrect predictions or biased outcomes. These “poisoned” updates influence the global model, affecting everyone. Attackers might tamper with their data before training their local model and then contribute to the poisoned model updates. This way, they can subtly influence the global model without directly injecting malicious updates.
It is necessary to ensure confidentiality, integrity, and privacy while enhancing collaboration. Several solutions are proposed to implement various Privacy Enhancing Technologies (PET), namely (i) privacy-preserving computation, e.g., Fully Homomorphic Encryption (FHE) and secure Multi-Party Computation (MPC), and (ii) Statistical Disclosure Control (SDC) techniques, e.g., Differential Privacy (DP). FHE enables computations on encrypted data without requiring decryption, ensuring confidential analysis, while DP perturbs data to conceal individual characteristics. (Refer Appendix A.1 and Appendix A.2).
These techniques aim to protect the privacy of both the client data and the associated model and prevent inference attacks while enabling practical model training in a distributed manner. However, their implementation in real-world scenarios brings new challenges with malicious or curious adversaries (a curious adversary is a passive entity that observes and learns from a system’s communication without altering it, whereas a malicious adversary is an active entity that can observe, alter, and inject false information into the system), which may collude to derive information regarding other participants. This paper proposes SplitML , a general framework to enhance collaborative yet personalized neural networks. Though  SplitML  can be used for any application, we focus on intrusion detection in this paper. In our scheme, the input layer, the output layer, and the learning task are shared across clients with a possible variation in hidden layers. Clients may want a deeper network as depth helps achieve more accurate models for their local distributions or a shallow network to reduce resource usage.
SplitML  supports heterogeneous client models, where the top layers (close to input data) of a client model that extract generic dataset features are shared with other client models. In contrast, the bottom layers (close to output labels) are more specific. The architectural designation of ‘top’ and ‘bottom’ layers in SplitML  is chosen to reflect the flow of authority in a decentralized SIEM environment, where input-proximal layers (top) are standardized for cross-institutional alignment, while output-proximal layers (bottom) are reserved for local classification refinement.
Here, the advantages of model heterogeneity are two-fold: (1) for training, it helps increase model accuracy over non-IID data, and (2) diverse predictions can be obtained for inference through consensus. We assume clients share features and labels while keeping their data and ML models private in a semi-honest threat model. (A semi-honest threat model (also known as “honest-but-curious”) assumes that an adversary will faithfully follow a protocol’s instructions but will attempt to learn as much private information as possible from the messages they observe.) Clients collaborate to train the top layers with the help of a central server and keep their bottom layers private. In  SplitML , clients train their models locally on their private data and collaborate to train only a set of generic (top) layers with FL by sharing encrypted weights. Thus, aggregation is performed on shared layers, although they might have been trained on different (overall) topologies.
By abstraction, the use of FL increases the size of the training dataset, so the feature extraction by the first layers will be more accurate. Clients have the same architecture for the output layer but may have different hidden layers to facilitate a personalized model. As such, they refrain from sharing them with other clients and ensure they will not be able to get any information about their dataset.
SplitML  is fundamentally different from Vertical Federated Learning (VFL) in how the data is partitioned: VFL is a feature-partitioned approach, designed for scenarios where multiple clients share the same set of samples (users/entities) but each client possesses a different portion of the feature space ( A 1 , A 2 , ). In contrast, SplitML  is built upon Federated Split Learning, which is a sample-partitioned approach similar to Horizontal Federated Learning (HFL), where clients possess the same feature set ( A ) but hold different, non-overlapping samples ( D 1 , D 2 , ). This sample partitioning is central to SplitML ’s novelty, as it allows clients to use private bottom model layers for local feature extraction while collaboratively aggregating and training the common top layers, an architecture fundamentally different from the feature-split collaboration utilized by VFL.
Fully Homomorphic Encryption (FHE) represents a comprehensive cryptographic solution that facilitates the direct execution of arbitrary computational functions on ciphertexts. Originally conceptualized by Gentry in 2009 [9], FHE enables data owners to utilize untrusted cloud computing services for data analysis while ensuring that sensitive information is never exposed in plaintext [10,11,12,13,14,15,16,17,18,19], while FHE provides robust security, standard schemes frequently encounter limitations regarding substantial computational latency and significant ciphertext expansion. To mitigate these performance constraints, the Cheon-Kim-Kim-Song (CKKS) scheme [20] was introduced in 2017 as a specialized variant of FHE. Unlike traditional exact arithmetic schemes, CKKS is specifically optimized for approximate arithmetic on real and complex numbers. By treating encryption noise as part of the significant figures of the message, CKKS achieves high efficiency in numerical analysis, machine learning training, and statistical modeling. In this work, we employ the OpenFHE library [21,22] to implement a hybrid cryptographic architecture. Specifically, we utilize multi-key CKKS to secure the collaborative weight aggregation during the training phase and single-key CKKS to maintain privacy during the inference process.
Indistinguishability under Chosen-Plaintext Attack ( I N D C P A ) is the minimum security standard for modern Public-Key Cryptography (PKC), formalized by a game where an attacker, given only the Public Key (PK), must fail to distinguish between the ciphertexts of two chosen messages with a success probability better than random chance; this guarantee ensures confidentiality and necessitates the use of probabilistic encryption. However, Li and Micciancio recently showed that the standard I N D C P A model is insufficient for the CKKS FHE scheme in multi-party contexts, demonstrating a key recovery attack that is feasible when decryption results are shared among multiple parties, such as in a threshold FHE setting, thereby requiring a stronger adversarial security model than I N D C P A to maintain the confidentiality of the Secret Key (SK).
I N D C P A D protects against secret-key recovery under shared decryptions; it does not, by itself, prevent all forms of gradient leakage, which is why model partitioning and consensus inference are also required.
To mitigate this, OpenFHE subsequently extended the original CKKS scheme to operate under a stronger adversarial model that permits the sharing of decryption results among multiple parties, choosing a default configuration designed to tolerate a relatively large number of decryption queries involving the same or related ciphertexts. Specifically, OpenFHE’s CKKS implementation utilizes a countermeasure known as noise flooding, which involves adding a large, random Gaussian noise to the ciphertext just before decryption, mathematically achieving the security notion of I N D C P A D (a principle based on Differential Privacy). This method ensures that the statistical noise distribution of the output is independent of the SK, a guarantee that cannot be provided by the intrinsic, often data-dependent, noise generated during homomorphic operations.
Existing approaches for collaborative training, such as FL, SL, and their hybrids like Federated Split Learning (FSL) and SplitFed Learning (SFL), primarily offer privacy through data localization, but remain fundamentally vulnerable to various inference and poisoning attacks. We discuss previous works for privacy-preserving tasks in more detail in appendices and briefly compare our approach SplitML  (Figure 1c) with FL (Figure 1a), SL (Figure 1b), and the combinations of two approaches.
While primitives like SecAgg [23] use secure multi-party computation to protect the model weights during aggregation from the central server, they do not defend against an adversarial client’s ability to extract sensitive training data or exploit shared decryption results in advanced settings like threshold FHE. This limitation highlights a critical gap in existing secure aggregation schemes, which often focus on server-side privacy while overlooking the risks posed by malicious or compromised participants within the federation.
To further clarify the technical positioning of our framework, it is essential to compare SplitML  with recent advancements in heterogeneous training, such as the S2FL (Heterogeneous Split Federated Learning) framework. S2FL [24] is primarily designed to enhance training efficiency and accuracy by partitioning models to accommodate hardware diversity among clients, while S2FL successfully addresses resource-aware splitting and convergence speed, SplitML  extends these concepts by integrating a robust “Privacy-via-Encryption” layer using Multi-key FHE. This cryptographic integration ensures that the weights of the shared layers remain encrypted throughout the entire aggregation cycle, preventing the central server from inspecting the model parameters.
Unlike S2FL, which focuses on the training logistics of a partitioned model, SplitML  introduces a unique collaborative inference phase where consensus is reached through encrypted activations. This allows our framework to not only handle architectural heterogeneity during training but also to provide a secure mechanism for resolving classification ambiguities through a cross-institutional consensus that is absent in standard S2FL implementations. By leveraging the diversity of local bottom layers, SplitML  ensures that the global model benefits from various localized feature extraction strategies without requiring the raw data or private model segments to be exposed to peer institutions.
SplitML  generalizes FSL under the rigorous I N D C P A D security model using a combination of FHE with DP. Unlike traditional perturbation methods that inject calibrated Gaussian noise to satisfy ( ϵ , δ ) -Differential Privacy, SplitML  adopts a ’Privacy-via-Encryption’ approach. By utilizing the noise flooding techniques inherent in the OpenFHE CKKS implementation, we ensure that the inherent rounding errors of approximate arithmetic obscure the individual contributions. This provides a functional barrier against inference attacks while avoiding the significant accuracy degradation typically observed when adding external noise to satisfy a formal privacy budget.
This approach not only provides robust security against well-known threats like Membership Inference (MI) attacks but also enables novel features, such as supporting heterogeneous client model architectures and utilizing multi-key CKKS for secure aggregation and consensus-based inference. Our experiments show that the (minimum) member-accuracy for the MI attack on the top layers (for any number of shadow models) was about 37%. In contrast, for bottom layers, they were as low as 19%.
Our contributions in this paper can be summarized as follows:
  • We formalize FL and SL and present SplitML , a fused FL (for training) and SL (for inference) partitioned between ML model layers to reduce information leakage. The novelty stems from clients collaborating on partial models (instead of full models in FL) to enhance collaboration while reducing privacy risks; while federation helps improve feature extraction, horizontal splitting allows entities to personalize their models concerning their specific environments, thus improving results.
  • SplitML  implements multi-key FHE with DP during training to protect against clients colluding with each other or a server colluding with clients under an honest-but-curious assumption. SplitML  reduces time compared to training in silo while upholding privacy with I N D C P A D security.
  • We propose a novel privacy-preserving counseling process for inference. An entity can request a consensus by submitting an encrypted classification query using single-key FHE with DP to its peers.
  • We empirically show that SplitML  is robust against various threats such as poisoning and inference attacks.
This paper is organized as follows. First we introduce the threat model in Section 2.1 and detail the proposed framework in Section 2.2. Then, Section 3 discusses security threats concerning the identified adversaries, and Section 4 reviews the empirical evidence before concluding in Section 6. We briefly discuss background and related work in the Appendix.

2. Our Framework

The following subsections define the security boundaries and structural layers (throughout this paper, ‘top layers’ refer to input-proximal layers, and ‘bottom layers’ to output-proximal layers) that enable SplitML  to facilitate collaborative learning.

2.1. Threat Model

SplitML  follows the standard client-server model, where trust is established before training starts (e.g., in the log analysis domain, the organizations (clients) may opt out of the corresponding SIEM platform (server) if they host adversarial clients). In an ideal scenario, all K + 1 participants, including the server and the K clients, act honestly and perform their assigned tasks. For each training round, clients train their local models on their own (unencrypted) data, and the server combines the (encrypted) weights for shared layers. After training, all clients perform inference locally (unencrypted) and do not share any information with their peers or the server. However, a client may want to perform consensus on a subset of data to benefit from heterogeneous models, in which case a client sends (encrypted) smashed data to its peers.
Our system only considers a scenario where the server, while curious, passively observes updates and cannot modify anything. It does not address a more robust scenario where a malicious server actively collaborates with clients and disrupts the training process, while honest-but-curious clients follow the protocol, they could potentially collaborate to learn about specific individuals in the training data (membership inference attack). Malicious clients, however, may deviate from the protocol for harmful purposes and send misleading updates (model poisoning attack) or attempt to extract the entire model from a particular client. Communication (e.g., exchange of weights and activations) is encrypted in SplitML , and our scheme is secure under collusion if the colluding clients T are lower than T clients required for fused decryption (fused decryption in multi-key FHE efficiently combines partial decryption results from multiple parties into a single plaintext) under multi-key FHE ( T = K for multi-key FHE and T K for threshold-key FHE). Moreover, DP offers plausible protection in case T 1 clients collude with the server to uncover the model weights of the target.
In contrast to conventional Federated Learning (FL), where the central server maintains full visibility of the global model architecture, SplitML  adopts the partitioning principles of Split Learning (SL) to strictly limit server-side exposure. A defining security enhancement in SplitML  is the preservation of data confidentiality during the training phase, wherein the server is systematically denied access to plaintext smashed data. Upon completion of collaborative training, the framework offers flexible execution paths: clients may perform inference locally to ensure maximum autonomy, or alternatively, engage in collaborative inference. In the latter mode, the framework shifts to a decentralized paradigm where clients delegate encrypted smashed data to self-selected peers via single-key FHE. This peer-to-peer consensus mechanism effectively eliminates the server’s involvement in processing intermediate results, further mitigating the risk of centralized data leakage.

2.2. Proposed Architecture

Our proposed solution SplitML  (Definition 1) is a Federated Split Learning (FSL) approach where each client shares output labels (classes) L and input attributes (features) A but may have different ML models (hidden layers), except for the top common layers and the output layer. Clients train all their models locally on their private data and send the weights for the shared layers encrypted with an FHE scheme to a federated server S for each training round r. The server makes the averaging in the encrypted domain from the participants during that round and updates the clients.
Definition 1.
SplitML  is a privacy-preserving, secure collaborative scheme for training and inference of a partially shared ML model deploying FL using an FHE scheme with K > 1 clients and an FL server S. A client local model M k , k { 1 , , K } has total n + t k layers, where n 1 are the shared top layers up to the ’cut/split layer’ q and the rest of the t k 1 are private (bottom) layers including a common output layer. For each training round r { 1 , , R } , participants 1 < m K train M k with their private (iid or non-iid) dataset D k with common attributes A and common labels L. After each training round, the participants send their model weights encrypted with FHE for n layers ( [ 1 , , q ] ) to S, where the server aggregates the weights with some averaging algorithm and sends the encrypted updates to participants of the next round. Training continues till all M k achieve an acceptable accuracy or fixed rounds R. The clients can further collaborate to form a consensus using FHE during inference for the model outputs with low confidence close to the classification boundary for some threshold λ > 0 .
A primary advantage of SplitML  is realized during the inference phase, where participants leverage model diversity to enhance classification reliability. This is particularly valuable in scenarios where a network domain administrator identifies specific log records for which the local model yields low confidence scores, suggesting a high probability of false positives or negatives. To resolve these ambiguities, the administrator reprocesses the suspicious logs and transmits the encrypted activations from the cut (The cut layer serves as the structural boundary between the shared common layers and the localized private model segments.) layer to peer administrators. Upon receiving these encrypted activations ( E n c P K ( q ) ), peer administrators execute a forward pass through their respective model segments, which may possess unique architectural variations or have been refined on different data distributions. The resulting encrypted outputs are returned to the requester, who performs decryption using a single-key FHE secret key ( D e c S K ). A final classification is then established through a consensus mechanism based on either the majority of Total Labels (TL) or the highest aggregate Total Prediction (TP) scores. This architecture allows SplitML  to operate in both a multi-institutional capacity, by aggregating data from various entities during training, and a cross-institutional capacity, by utilizing diverse models for inference. To maintain end-to-end security, the framework employs a hybrid cryptographic approach: encrypted layer weights are used for secure aggregations during the collaborative training phase, while encrypted activations facilitate the inference consensus. Consequently, SplitML  effectively integrates the collaborative weight updates of Federated Learning for model development with the partitioned, privacy-preserving execution of Split Learning for inference.
  • SplitML  can FL with a parameter β over model layers, where β controls the proportion of layers to collaborate on. Thus, FL is realized when k K clients collaborate to train all layers of the global ML model M; hence, β = 1 ,   | M | = n ,   t = 0 . A value of β = 0 indicates no collaboration, thus | M k | = t k ,   n = 0 .
  • Transfer Learning (TL) [25,26,27,28,29] is realized for an architecture (e.g., Convolutional Neural Networks-Artificial Neural Networks (CNN-ANN) model) where K distinct clients collaborate to train the first n (e.g., convolution) layers for feature extraction. For inference after training, clients retrain the rest of the t k (e.g., Fully Connected (FC)) layers of their ML model M k till convergence on their private data without updating the first n layers (effectively freezing the feature extracting CNN layers).
We compare SplitML  with existing approaches regarding resource requirements in Table 1. For each client k in SplitML , the computation cost varies based on the specific local model segment M k rather than the fixed cost associated with a global model M in standard FL. This flexibility allows nodes with disparate hardware capabilities to participate in the federation by adjusting the depth or complexity of their private bottom layers.
Furthermore, SplitML  achieves significant bandwidth savings compared to FL, as clients only transmit model updates for the top (shared) layers n rather than the full parameter set where | M | > n . Unlike SplitFed Learning (SFL) [30], our framework supports clients with entirely different model architectures and encrypts all shared information, which, to our knowledge, represents a simultaneous first in the literature. This combination of architectural heterogeneity and cryptographic confidentiality ensures that the system remains both scalable and secure against honest-but-curious aggregators. More details about SFL and its integration with our approach are described in Appendix B.3.

2.3. Key Generation

Key generation is an offline setup phase in SplitML  before training. We use the OpenFHE [31] library to implement these procedures.

2.3.1. Public and Secret Keys

Multi-key FHE uses multiple encryption keys, one for each party. This makes it more difficult for any one party to decrypt the ciphertext, even if they are malicious. For a more secure (under collusion), multi-key approach, all K clients participating in the training generate their Public-Private key pairs P K k , S K k , k { 1 , , K } in sequence (Refer to Algorithm 1 (For simplicity, we denote various OpenFHE API for key generation as a variation of K e y G e n function.)). A shared Public Key P K = P K K is generated using each P K k .
Algorithm 1 Public and Secret Keys Generation
Input: Each client C k performs K e y G e n ( ) iteratively
Output: Public and private keypairs P K k , S K k for each client C k
  1: P K 1 , S K 1 K e y G e n ( )
  2: for each client k in { 2 , , K }  do
  3:        P K k , S K k K e y G e n ( P K k 1 )
  4: end for
  5: P K = P K K

2.3.2. Evaluation Key for Addition

Similarly, each operation-specific key, such as E K A d d for addition, E K M u l t for multiplication, and  E K D e c for fused decryption, is generated from secret shares of all the K clients (or some clients T < K as per threshold). Generating the evaluation key for addition is a two-pass process, as shown in Algorithm 2. In the first iteration, all clients generate their addition keys E K A d d k using their secret and public keys S K k and P K k . In the second iteration, the final shared addition key E K A d d is calculated from client keys E K A d d k and P K k .
Algorithm 2 Evaluation Keys Generation for Addition
Input: Keypairs P K k , S K k for each client C k
Output: Evaluation key for Addition E K A d d
  1: E K A d d 1 K e y G e n ( S K 1 )
  2: for each client k in { 2 , , K }  do
  3:        E K A d d k K e y G e n ( E K A d d 1 , S K k , P K k )
  4: end for
  5: E K A d d t e m p K e y G e n ( E K A d d 1 )
  6: for each client k in { 2 , , K }  do
  7:        E K A d d t e m p K e y G e n ( E K A d d k , E K A d d t e m p , P K k )
  8: end for
  9: E K A d d = E K A d d t e m p

2.3.3. Evaluation Key for Multiplication

The evaluation key for multiplication E K M u l t is generated in four iterations, as shown in Algorithm 3. The first two iterations are similar to the process described for the additive evaluation key in Algorithm 2. In the first pass, all clients K generate their local multiplication keys E K M u l t k , k { 1 , , K } using their respective secret keys S K k . During the second pass, clients calculate a temporary collective component E K M u l t t e m p by combining their local keys E K M u l t k with the collective public key P K . In the third pass, clients generate specialized shares E K M u l t t e m p k using the temporary collective key, their individual secret keys S K k , and the collective public key P K . This iterative process ensures that no single participant can reconstruct the full evaluation key or the secret keys of other parties. Finally, all E K M u l t t e m p k shares are fused at the aggregator to yield the final evaluation key E K M u l t , which enables the homomorphic multiplication of ciphertexts encrypted under different keys. This four-step synchronization is essential for maintaining the security of the multi-party CKKS scheme, as it allows for secure relinearization without compromising the confidentiality of the underlying secret parameters.

2.4. Training Phase

SplitML  requires that all K clients share common data attributes A, output labels L, model hyperparameters-batch size b s and learning rate l r , output (last) layer, and the model structure up to the first (top) n layers before the training begins. Clients can have a variable architecture for hidden layers, except the output layer, where the total (bottom) layers of a client are t k 1 , k { 1 , , K } , including the last layer, and the total layers of a client model are | M | k = n + t k 2 . We briefly justify using multi-key (or threshold (Multi-key FHE requires all participants to be online and share their partial decryptions to evaluate the fused decryption, which may be infeasible for organizations distributed over the internet. Threshold-FHE eases this by requiring T out of K, T K partial decryptions to calculate fused decryption.)). FHE over single-key FHE in collaborative training.
Algorithm 3 Evaluation Keys Generation for Multiplication
Input: Keypairs P K k , S K k for each client C k
Output: Evaluation key for Multiplication E K M u l t
  1: E K M u l t 1 K e y G e n ( S K 1 )
  2: for each client k in { 2 , , K }  do
  3:        E K M u l t k K e y G e n ( E K M u l t 1 , S K k )
  4: end for
  5: E K M u l t t e m p K e y G e n ( E K M u l t 1 )
  6: for each client k in { 2 , , K }  do
  7:        E K M u l t t e m p K e y G e n ( E K M u l t t e m p , E K M u l t k , P K k )
  8: end for
  9: for each client k in { K , , 1 }  do
10:        E K M u l t t e m p k K e y G e n ( S K k , E K M u l t t e m p , P K K )
11: end for
12: E K M u l t t e m p K e y G e n ( E K M u l t t e m p K , E K M u l t t e m p K 1 )
13: for each client k in { K 2 , , 1 }  do
14:        E K M u l t t e m p K e y G e n ( E K M u l t t e m p k , E K M u l t t e m p )
15: end for
16: E K M u l t = E K M u l t t e m p

2.4.1. Single-Key FHE

In the single-key FHE training (Figure 2a), one of the clients is chosen (at random) to generate the homomorphic encryption parameters: Public Key P K to encrypt client model updates, Secret Key S K to decrypt averaged weights, and Evaluation Key E K to perform averaging in the cipher domain. P K and S K are shared with all clients and E K is shared with the FL server. All clients K share their encrypted weights under P K for shared layers to server S, and S averages the weights using E K .
Clients can decrypt this result using shared secret S K and update their models. However, single-key FHE is insecure in a multiparty setting because it allows any party with the key to decrypt the entire ciphertext. This means that if one party is malicious, they can collude with the other parties to decrypt the ciphertext and learn the secret data. To address this security vulnerability, multi-key FHE was proposed.

2.4.2. Multi-Key FHE

The multi-key (MK-FHE) training procedure is detailed in Algorithm 4 and Figure 2c. For every training round r { 1 , , R } , all clients K (alternatively, several participants m , T m K are chosen in threshold-FHE) perform forward and backward propagation on their entire models M k on their private data D k and share the encrypted model updates with a common public key P K for the first n shared layers to the central server S.
S uses FedAvg [3] (or other federated averaging algorithms) to calculate the global model weights and share them with all clients. To calculate the averaged weights in the encrypted domain, S uses evaluation keys for addition E K A d d and multiplication E K M u l t to add all encrypted weights received from the clients and multiply with 1 / K to average. After receiving the encrypted result, clients partially decrypt the results with their secret keys S K k , and S generates fused decryption from partial descriptions using an evaluation key for fused decryption E K D e c to get the final result in plaintext. Clients update the weights of top layers accordingly before the next training round begins. Training continues till all the clients achieve their target accuracy or the maximum limit of rounds R.
Algorithm 4 Training
Input:  D k [ A , L ] , b s , l r , P K , S K k , E K A d d , E K M u l t , E K D e c
Output: Trained models M k for each client C k
  1: for each round r in { 1 , , R }  do
  2:       for each client k in { 1 , , K }  do
  3:             Client C k trains model M k ( D k [ A , L ] , b s , l r )
  4:       end for
  5:       Server S encodes a vector V M u l t with values 1 / K
  6:       Server S encrypts V M u l t with P K
  7:       for each shared layer in { 1 , , n }  do
  8:             for each client k in { 1 , , K }  do
  9:                   Client C k encrypts layer weights with P K
10:             end for
11:             Server S adds encrypted vectors to V A d d with E K A d d
12:             Server S multiplies V A d d to V M u l t with E K M u l t
13:             for each client k in { 1 , , K }  do
14:                   Client C k partially decrypts V M u l t with S K k
15:             end for
16:             Server S generates fused decryption using E K D e c
17:             for each client k in { 1 , , K }  do
18:                   Client C k sets layer weights from fused decryption
19:             end for
20:         end for
21: end for

2.5. Inference Phase

After the training, all clients should have converged models with their target accuracy. The top n layers have the same architecture and weights across clients, facilitating transfer learning and consensus. We propose a novel consensus approach in the encrypted domain as detailed in Algorithm 5 and Figure 2b. For prediction, any client C j can choose to perform a consensus in the encrypted domain for the samples D j close to the classification boundary or the range with high false positives/negative occurrences (After federated training concludes, a client may observe the output range for false predictions and choose to perform consensus for the samples falling in this range to improve the inference accuracy. For  S i g m o i d as f ( z ) , we choose λ = 0.05 for boundaries 0.05 (label 0) and 0.95 (label 1) by simulation. Since all the false predictions were from ranges [ 0.00 , 0.10 ] and [ 0.90 , 1.00 ] . For instance, clients may use the S i g m o i d activation function f ( z ) [ 0 , 1 ] for the output layer for a binary log classification where 0 indicates a ‘normal’ scenario and 1 is an ‘anomaly’. A client C j may choose a λ = 0.10 and collect samples D j for which f ( z ) [ 0.40 , 0.60 ] , as the classification boundary is drawn at 0.50 . First, a client C j generates single-key FHE keys P K j , S K j , E K j (different from multi-key FHE used in training) and shares E K j with consensus peers. C j calculates forward activations up to cut layer q for D j and encrypts forward activations q D j using P K for these samples to chosen m 1 peers. Peers participating in the consensus receive E K and q D j and send either the encrypted predicted label or values after performing homomorphic calculations on their bottom layers. The client decrypts these results using its secret key S K and chooses a label based on the majority vote. We propose two variants for consensus results: total labels (TL) or prediction scores (TP):
  • The (voting) clients send a classification label (TL), and the consensus is done on a label majority.
  • The (voting) clients send a result of the final activation function (TP), which is summed up, and the label is chosen if the summation is higher than some required threshold.
In ensemble learning, majority voting (hard voting) and soft voting combine predictions from multiple models. TL represents majority voting, and TP represents soft voting. For a majority vote, each model “votes” for a class, and the most popular choice wins. It is simple but ignores confidence levels. Soft voting is more nuanced, considering each model’s “certainty” by averaging their predicted probabilities for each class. This can be more accurate, especially when models disagree slightly or deal with imbalanced data.
Consider a consensus setup with a S i g m o i d activation with f ( z ) [ 0 , 1 ] for m = 10 peers, with  f ( z ) = 0 representing a “normal” class and f ( z ) = 1 as an “anomaly” for a binary log classification. In a TL consensus, a sample is considered abnormal if a majority, e.g., 6 out of 10 participants ( 50 % or more), classify a sample as 1. Meanwhile, all the predicted values are summed up for a TP consensus. A sample may be considered anomalous if the result exceeds some chosen threshold, e.g.,  5.1 for 10 participants, given that the classification boundary is drawn at f ( z ) = 0.5 (for S i g m o i d ) and 5.1 > ( 0.5 10 ) .
Algorithm 5 Inference
Input: Data subset D j from client C j
Output: Class labels or predictions scores from m clients
  1: Client C j creates a subset D j D j of local data
  2: Client C j generates P K j , S K j , E K j K e y G e n ( )
  3: Client C j shares E K j with other m consensus clients
  4: Client C j generates activations q D j for D j from cut layer q
  5: Client C j encrypts activations q D j with P K j
  6: for each consensus client h { 1 , , m }  do
  7:       Client C h receives encrypted q D j from C j
  8:       Client C h performs calculation on q D j with E K j
  9:       Client C h sends encrypted result back to Client C j
10:       Client C j decrypts results received from C h with S K j
11: end for
12: Client C j considers a majority based on decrypted labels or prediction values received from consensus clients

2.6. Differential Privacy

Multi-Key FHE is vulnerable to collusion and shared model updates (training) and cut layer activations (inference) can reveal substantial information about the local datasets in a federated setting. We use Differential Privacy (DP) to protect the privacy of honest clients. DP can be applied to local model updates before aggregation at each training round (or to activations during inference). This application helps mitigate inversion and inference attacks with minimal impact on model utility. However, DP may not prevent Extraction attacks or reduce the severity of the privacy violations that extraction enables [32]. Model Extraction attacks can be mitigated using techniques such as model compression, obfuscation, and watermarking [33] and security measures in the deployment environment. SplitML reduces privacy leakage under the honest-but-curious model with collusion by cryptographic guarantees of I N D C P A D secure FHE. Li and Micciancio [34] showed that approximate FHE schemes such as CKKS [20] can leak information about the secret key. In some scenarios, the I N D C P A model may not be sufficient for the CKKS scheme because a decryption result can be used to perform a key recovery attack. CKKS decryptions give direct access to the secret key given a ciphertext and decryption since the user gets ct ˜ = ( a ˜ s ˜ + m ˜ + e ˜ , a ˜ ) and its decryption is m ˜ + e ˜ . As a solution [35] we employ decryption given a CKKS ciphertext ct ˜ = ( c 0 ˜ , c 1 ˜ ) as a randomized procedure, Dec ( ct ˜ ) : Sample z ˜ D ˜ R ˜ , σ . Return c 0 ˜ + c 1 ˜ s ˜ + z ˜ ( mod q ˜ ) , where D ˜ R ˜ , σ is a discrete Gaussian over the polynomial ring, and σ is a standard deviation. For s ˜ > 0 bits of statistical security, σ = 12 τ 2 s ˜ / 2 ct ˜ . t ˜ , where, τ is the number of adversarial queries expected and ct ˜ . t ˜ is the ciphertext error estimate.
This attack applies to the setting where multiple parties must share decryption results, e.g., in the multi-key (or threshold) FHE setting. By default, OpenFHE [31] chooses a configuration to prevent passive attacks where many decryption queries of the same or related ciphertexts can be tolerated (The lower bound for the tolerated number of such decryption queries is N ˜ d ˜ = 128 ). For more robust adversarial models, the number of shared decryptions of the same or related ciphertexts can be increased at the cost of precision.
A recent investigation [36] into using homomorphic encryption’s intrinsic noise growth for DP found that this noise is highly dependent on the input messages, leading to potential privacy leakage. This case study showed that while a relaxed precision parameter could achieve a reasonable privacy budget ( ϵ < 0.5 ) over 50 iterations when message dependence was ignored, accounting for this dependence dramatically increased the leakage, resulting in a much worse privacy budget ( ϵ 2 ) over the same iterations. To provide robust, localized protection for honest clients against collusion between curious clients and the server, SplitML employs a two-fold privacy-enhancing mechanism during encryption that is designed to persist because the noise is locally generated by each client, not collaboratively or centrally added. (We do not manually inject additional DP noise into the system; instead, we rely on two inherent features of the CKKS scheme for DP guarantees: the default OpenFHE configuration for noise flooding and the intrinsic noise generated during CKKS operations. Alternatively, for FHE schemes like BGV or BFV, achieving DP would typically involve adding extra noise directly to the model updates before they are encrypted.) This dual protection relies on: (1) the intrinsic errors inherent to the CKKS encryption scheme, and (2) the extra noise added via noise flooding through the default OpenFHE configuration, which is the mechanism used to achieve the stronger I N D C P A D security against key-recovery attacks.

3. Security Analysis

The following results analyze the efficacy of SplitML in mitigating both model poisoning and inference-based privacy leaks.

3.1. Model Poisoning Attacks

The integrity of decentralized learning is frequently challenged by Byzantine behaviors, wherein compromised participants submit erroneous or malicious weight updates. As established by [8], the traditional reliance on arithmetic mean functions for weight aggregation presents a significant vulnerability; a singular Byzantine adversary can disproportionately influence the global parameters, effectively derailing the convergence process. This vulnerability is often exploited through model poisoning strategies, where an attacker deliberately manipulates local model states.
By injecting backdoors or intentionally skewed gradients into the shared architecture, these adversaries can compromise the global model’s reliability or introduce hidden triggers that remain dormant until activated during inference. SplitML benefits from split learning in the absence of a global model. An adversary may only influence the top (shared) layers for feature extraction and can not significantly impact the outcome of an honest client model, as the bottom (personalized) layers will compensate for the propagated errors of shared layers. The local model will be trained for multiple rounds (epochs) until the desired accuracy is achieved. In the following, we discuss different experiments and analyze the robustness of our scheme against model poisoning attacks with different settings.
First, we experimented on three small Neural Network (NN) models on a binary log classification problem. All three clients collaborate to update weights for the first layer of 4 neurons with ReLU activation. Clients have the last (output) layer with two neurons and S o f t m a x activation in common, where one neuron corresponds to the ‘normal’ and another to the ‘anomaly’ class. Model-1 has a hidden layer of 2 neurons with R e L U activation, model-2 does not have any hidden layers, and model-3 has two hidden layers with two neurons each; the prior hidden layer has R e L U , and the later hidden layer has S i g m o i d activation. We have used modified Loghub [37] HDFS_1 labeled data from Logpai; refer to Section 4.1 for details. We experimented with different configurations of SplitML and measured the trained model’s accuracy (Figure 3a,b, and Table 2).
In the first scenario, S1, all three clients have heterogeneous models, as described earlier. They perform honestly and collaborate on a shared input layer with four neurons. We achieved 96.64% validation accuracy for model-1, 95.90% accuracy for model-2, and 100% accuracy for model-3. We use this as a benchmark and compare model accuracy for each client in malicious settings S2 and S3, where clients send poisonous weights instead of correct layer weights. In S2 ( SplitML in adversarial setting with heterogenous models) and S3 ( SplitML in malicious setting with homogeneous models), only model-2 is honest, and the majority of clients (model-1 and model-3) are poisonous. We chose a considerable (poisonous) update value of 999 for the experiments compared to values in the [ 2 , 2 ] in an honest setting. S2 achieved a remarkable 89.80% (only dropped 6 % ) accuracy with a malicious majority. In S3, we repeat the malicious majority setting of S2, with all three clients having the same 4-layer architecture as model-3. We observed similar accuracy levels for poisonous clients 1 and 3, honest client 2 suffered heavily while only achieving 53.38% (over 40% loss) accuracy, close to random guessing. We repeated this experiment in an FL setting, where clients collaborate on all layers instead of the top layers in S3. We observed similar accuracy metrics for FL as S3.
To ensure the robustness of our findings, we validated the framework by conducting experiments across five distinct model architectures. In this configuration, all five clients collaboratively train the initial two layers, which function as the common top layers. The primary input layer consists of five neurons, followed by a second layer of four neurons, both of which utilize the Rectified Linear Unit ( R e L U ) activation function. The architecture culminates in a shared output layer featuring a single neuron with a S i g m o i d activation function to facilitate binary classification.
The internal composition of these models is intentionally varied to simulate hardware and task heterogeneity among participants. Specifically, Model 1 incorporates an additional private hidden layer of two neurons using R e L U activation situated prior to the output. Model 2 and Model 4 similarly include a hidden layer, consisting of four and three neurons, respectively, both utilizing R e L U activations. In contrast, Model 3 represents a streamlined architecture without any supplementary hidden layers. Model 5 is configured with a two-layer private sub-network, where both layers contain two neurons and utilize R e L U activation functions. This deliberate variation in depth and width across the local segments of the models demonstrates the capacity of SplitML to manage architectural divergence. By maintaining fixed common layers for aggregation while allowing local layers to fluctuate, the framework proves its utility in real-world scenarios where clients possess different computational constraints or domain-specific requirements.
In an honest setting, all five client models, M1 to M5, send correct updates. Under the malicious setting, only clients 2 and 4 are honest, and a majority (3 out of 5) models 1, 3, and 5 send poisonous updates (vector of 999 instead of honest values in the range [ 4 , 4 ] ). Since FL architecture collaborates on all the layers, we observed a poor performance (Table 3) as expected with very high losses. In FL, only models 1 and 2 managed to get 96% accuracy, while model 3 observed 56% accuracy, where its performance dropped by 35% compared to the honest setting. Moreover, models 4 and 5 in FL reported 0% accuracy.
FL leaks information (under malicious setting) as the reported validation accuracy for these five models corresponds to their data distribution. Since the poisonous updates classifies everything as anomalous, validation accuracy is inversely proportional to the dataset size with normal class. Model-1 had 3.08% normal samples, Model-2 had 3.67%, Model-3 had 43.23%, and Model-4 and Model-5 had 100% normal samples (thus 0% accuracy).
SplitML performed exceptionally well, with the same model accuracy in a poisonous setting as honest. Only model-3 performed poorly with 56% accuracy, as it has an output layer right after the two shared layers and did not have any additional hidden layers to compensate for the propagated poisonous values and adjust the weights in the deep hidden layers, while we presented empirical evidence of inherent robustness to poisoning attacks due to our architecture, we do not offer active mitigations to prevent poisonous updates (refer Appendix C.1), as measures based on the similarity of model updates would not be helpful if the majority of the clients are malicious.

3.2. Inference Attacks

The following subsections provide a detailed discussion of membership inference and model inversion attacks. These evaluations are critical for quantifying the “Privacy-via-Encryption” efficacy of SplitML , as they represent the most common methods used by adversaries to extract sensitive training data from shared model updates. In our analysis, we demonstrate how the architectural partitioning of SplitML and the intrinsic noise flooding of the CKKS scheme successfully obscure these sensitive signals, preventing an honest-but-curious server from executing these privacy-compromising techniques.

3.2.1. Membership Inference

Membership Inference (MI) attacks, as formalized by [38], aim to jeopardize data privacy by ascertaining whether a specific record was utilized during the training phase. These privacy violations typically manifest when sensitive artifacts, such as model weights or activations, are exchanged between participating entities. A black-box MI attack leverages shadow models (Figure 4) and label-based queries to approximate a target model’s behavior. However, the architectural nuances of SplitML necessitate a more granular evaluation. Given the partitioned nature of the framework, we investigate adversarial vectors targeting both the (i) input-proximal top layers and (ii) the output-proximal bottom components. This dual-layered analysis accounts for the unique information leakage risks inherent in the split-model paradigm, where intermediate smashed data and gradient updates serve as potential signals for membership disclosure.
  • First, we attack (input) datasets and their gradients from the split layer to determine their membership.
  • We develop another attack model to infer membership from labels or predictions given the gradients from the cut layer.
We attacked a CNN model similar to [39] on the MNIST [40] dataset. The target model has 4 CNN and 2 ANN layers, with the first convolution layer with 3 × 3 kernel and R e L U activation. It is followed by a max pooling layer with a 2 × 2 kernel. The third convolution layer has a 3 × 3 kernel with a R e L U activation followed by a fourth max pooling layer of 2 × 2. The fifth layer is a dense layer with 128 neurons with R e L U activation, followed by a 10-neuron output layer with S o f t m a x activation. We refer to this full model with six layers as architecture A (Figure 5).
Architecture-B is this full model A’s first 4 CNN layers (top split), and C is the last 2 ANN layers (bottom split). We then create more models, keeping the top layers (as in A) the same and measuring MI attack accuracy with shadow models on different bottom layers after the split. Architecture-D has 3 ANN layers: 128 neurons with R e L U , 64 with R e L U , and 10 with S o f t m a x . E has the same number of (ANN) layers (2) as in A with 64 neurons instead of 128 in the first dense layer, keeping the same output layer. F has 2 layers as in A but 256 neurons rather than 128. In G, we remove the 128-neuron layer and keep the output layer. Finally, H has 128 neurons with T a n H and 10 with S i g m o i d for the output layer. Due to the intrinsics of the shadow models, the developed attack model will have good accuracy either on MI or non-MI. An attacker may develop two attack models complementing each other for high-confidence results.
Further, we empirically observed (Table 4) that the top layers, on average, leak more information than the bottom layers for MI. However, on average, splitting helps reduce attack accuracy on partial models compared to performing this attack on a complete model. Though accuracy was observed around 50 % for average cases, which is analogous to random guessing, a well-trained model can improve attack performance; while using a Laplacian noise for DP is expected, we suggest adding extra Gaussian noise for encryption to reduce the attack accuracy and achieve I N D C P A D security for approximate FHE. Scaling parameter Δ can be adjusted for better precision for this extra noise.

3.2.2. Model Inversion

Model inversion attacks seek to reconstruct the defining characteristics of a hidden input by analyzing the corresponding model outputs. As demonstrated in [41], this technique can be used to synthesize a representative input—such as a facial prototype—that maximizes the confidence score for a specific class. It is essential to distinguish this from Membership Inference (MI); while MI aims to identify specific training records, model inversion merely generates a generalized average of features that characterize an entire category. Consequently, the resulting artifacts are often semantically disconnected from any single actual data point. In the context of log anomaly detection, the efficacy of inversion attacks is significantly diminished. Unlike image processing, where feature averaging yields recognizable patterns, the discrete nature of text logs renders “average” representations semantically incoherent. SplitML further mitigates these risks through a dual-defense strategy. First, the integration of multi-key FHE ensures that reconstructive attacks are only theoretically plausible in a highly specific, colluding adversary model. Second, the injection of Gaussian noise during the encryption process introduces statistical perturbations that mask sensitive feature correlations. A detailed discussion of contemporary countermeasures against such threats is provided in Appendix C.

3.3. Model Extraction Attacks

Model Extraction (ME) attacks represent a significant threat to the confidentiality of machine learning assets, where an adversary aims to reconstruct a high-fidelity replica of a target model through strategic labeling queries [32], while Federated Learning (FL) primarily prioritizes data localization, it remains inherently susceptible to Intellectual Property (IP) theft. In contrast, Split Learning (SL) architectures offer structural protection; by partitioning the model, they eliminate the possibility of a direct, wholesale download of the global parameters. SplitML further hardens this defense-by-design, as individual participants lack visibility into the complete neural architectures of their peers.
Despite these structural advantages, extraction remains a theoretical risk through indirect observation. Prior research by Li et al. [42] has demonstrated that malicious entities in SL environments can exploit gradient information to facilitate model theft. However, SplitML mitigates this specific vector by ensuring that peers are restricted to shared weight updates rather than raw gradient data during training. Furthermore, even in scenarios where adversaries attempt to reverse-engineer functionality via abundant inference queries [43], SplitML implements proactive countermeasures. Specifically, we advocate for a consensus mechanism based on Total Labels (TL) rather than high-precision Total Predictions (TP). By obfuscating the soft-label probability vectors and providing only the final categorical output, the framework drastically reduces the signal-to-noise ratio available to an attacker. This strategy thwarts the development of accurate surrogate models, which are often precursors to transferable adversarial attacks [44,45] or hardware-level threats like bit-flip attacks [46]. A detailed discussion on restricting query information and modifying return values is further explored in the work of Jagielski et al. [47].

4. Experimental Analysis

By benchmarking our framework against standard distributed learning paradigms, we demonstrate how architectural partitioning and encrypted aggregation affect the overall efficacy of the system in a multi-institutional SIEM context.

4.1. Dataset

Log anomaly datasets typically exhibit significant class imbalance, often being heavily skewed toward either “normal” or “anomalous” instances. To prevent the machine learning model from achieving misleadingly high accuracy through majority-class bias, we implemented a balanced sampling strategy. To validate the equilibrium of our processed dataset (Table 5), we employed a “Return-1 Model” baseline, which consistently predicts the “anomalous” class for every input. This baseline yielded an accuracy of 49.99% and a recall of 100%, confirming that the labels are evenly distributed. Under this convention, label-0 represents the normal class while label-1 signifies an anomaly.
The primary data source for this study is the Loghub HDFS_1 repository provided by Logpai [37]. This 1.47 GB dataset comprises logs generated by Hadoop-based MapReduce jobs executed across more than 200 Amazon EC2 nodes over a duration of 38.7 h. Expert labeling was performed by Hadoop domain specialists. Out of the 11,175,629 total log entries, approximately 2.58% (288,250) were identified as anomalous.
To convert these unstructured logs into a structured format suitable for analysis, we utilized the Drain log parser [48], though the specific mechanics of the textual parsing are omitted here for conciseness. For our experimental framework, we curated a balanced subset consisting of 576,499 entries, characterized by seven features distributed uniformly across both classes. This dataset was subsequently partitioned into non-overlapping, independent and identically distributed (i.i.d.) subsets totaling 397,366 observations distributed among three distinct clients. Specifically, Client-1 was allocated 69,455 normal and 66,780 anomalous samples; Client-2 received 93,032 normal and 73,055 anomalous samples; and Client-3 was assigned 43,666 normal and 51,378 anomalous samples. For each client, the data was bifurcated into a training set comprising 80% of the local observations and a testing set containing the remaining 20%.

4.2. Results

We present experimental results about ML attacks in Section 3. The computations were performed on a 2.4 GHz MacBook Pro with a Quad-Core Intel Core i5 processor and 8 GB of 2133 MHz LPDDR3 memory. We used Python 3.11 [49] with sklearn APIs [50] for binary classifiers. We compared the performance based on the following measures: Precision, Recall, Accuracy, and F1-Score. We created three small NNs, depicting three clients (Client-1, 2, 3) participating in our scheme for both training and inference.
The models share the first two Fully Connected (FC) layers, with the first layer having five neurons, 40 parameters, and R e L U activation and the second layer having four neurons, 24 parameters, and R e L U activation. For simplicity, all the models have a common output layer with a single neuron and S i g m o i d activation function. Model-1 has a third layer with two neurons, ten parameters, and R e L U activation. Model-2 only has the three layers described earlier. In contrast, Model-3 has two additional FC layers, a third layer with three neurons, 15 parameters, and R e L U activation, and a fourth layer with two neurons, eight parameters, and R e L U activation.
All the clients train their models with plaintext data and only encrypt the weights to the FL server after one round is complete. For each round, all clients train their models for one epoch in parallel. In the first round, clients initialize their model weights with random values, and for the subsequent rounds, the weights for the shared layers are set to the values provided by the server. The server calculates the results in each round using an FL algorithm (like FedAvg) in the encrypted domain and the homomorphic E K . Empirically, we observed that we reduced the epochs required to converge the model on average by half. As shown in Figure 6, for a batch size of 64 and learning rates of 0.05 and 0.10, the standalone training (S_Acc, S_Loss denotes Standalone Accuracy/Loss) required six epochs to converge whereas collaborative learning (C_Acc, C_Loss denotes Collaborative Accuracy and Loss) achieved higher accuracy in only three epochs.
We further experimented with inference using the earlier trained models. For S i g m o i d activation function f ( z ) [ 0 , 1 ] , we set boundary threshold λ = 0.05 . We performed prediction consensus using both the predicted label TL (Section 2.5) and prediction value TP (Section 2.5) approach for the samples that fall in the prediction range of [ 0.45 , 0.55 ] . We set T L 2 and T P 1.5 to choose label-1 for three clients in a consensus.
We observed that TL performed better when most samples were classified as normal (prediction scores close to 0), and TP worked better when most were anomalous (scores close to 1). For example, in an observation for l r = 0.05 with 88 normal and 0 anomaly samples, Model-1 achieved 18.18%, Model-2 achieved 73.86%, Model-3 achieved 00.00%, TL achieved 75.00%, and TP achieved 31.82% accuracy. TL outperformed all models. In another observation for l r = 0.10 (Table 6) with 353 normal and 2175 anomalies, Model-1 had 87.70%, Model-2 had 73.66%, Model-3 had 86.04%, TL had 73.66%, and TP recorded the highest of 88.77% accuracy.
In practice, a client may want to perform a consensus for the predictions most likely to be False Positives (FP) and False Negatives (FN). From our experiments, we observed that for our three clients with S i g m o i d output, all the false predictions were from [ 0.00 , 0.10 ] and [ 0.90 , 1.00 ] . For one observation for l r = 0.05 with 1388 FN and 23663 FP collected from all three clients, Model-1 achieved 5.54%, Model-2 achieved 41.11%, Model-3 achieved 5.54%, TL achieved 5.54%, and TP achieved 20.80% accuracy. For another observation for l r = 0.10 with 3166 FN and 29 FP, Model-1 achieved 0.69%, Model-2 achieved 26.29%, Model-3 achieved 26.26%, TL achieved 32.99% and TP achieved 21.60% accuracy.
Key generation involves interaction between parties and is done before the training process begins. We observed ∼5–6 s processing time (zero network communication overhead, as all the clients were simulated locally on a single machine) to generate all private and shared keys. Federation overhead was ∼1.2–1.5 s of total training time of ∼25–30 s per epoch.

5. Discussion

The experimental results and security analysis presented in this study validate SplitML as a robust framework for decentralized SIEM environments. By partitioning the model into shared top layers and private bottom layers, the system successfully balances global collaborative learning with the need for local model heterogeneity. This architectural split ensures that participants can maintain unique local configurations while still benefiting from a collective global intelligence.
The complexity of the multi-key CKKS generation and the consensus inference procedure can be understood through a simple authority model. In the key-generation phase, participants collaboratively create a public evaluation key to ensure that no single party ever possesses the full secret key. This is similar to a multi-signature vault where all institutional participants must provide their partial shares to unlock the final fused result. Similarly, the consensus inference phase serves as a decentralized second opinion. When a client’s local private layers produce an ambiguous classification for a suspicious log entry, it sends an encrypted smashed data packet, which is an intermediate activation, to its peers. The peers process this through their shared layers and return an encrypted vote. The final decision is decrypted only after aggregation. This process ensures that neither participants nor servers ever see the raw telemetry or the individual local predictions of others.

5.1. Differential Privacy and Noise Flooding

A critical distinction must be made regarding the nature of the privacy guarantees provided by SplitML , while our approach utilizes noise flooding inherent in the OpenFHE CKKS implementation to achieve I N D C P A D security, this should be distinguished from formal ( ϵ , δ ) -Differential Privacy (DP). Traditional DP provides a mathematical bound on the probability of identifying a specific individual’s contribution by injecting calibrated noise. In contrast, SplitML relies on a Privacy-via-Encryption paradigm.
The noise flooding mechanism ensures that the decryption results do not leak the secret key or sensitive intermediate values in a multi-party setting, while this effectively thwarts reconstructive inference attacks by obscuring individual gradients and activations, it does not currently provide a formal DP privacy budget accounting. This cryptographic protection is robust against the honest-but-curious server, but it does not technically satisfy the information-theoretic definition of DP unless specific ( ϵ , δ ) parameters are formally integrated into the noise generation process.

5.2. Large Language Model (LLM) Security

As distributed learning moves toward more complex architectures, the security of Large Language Models (LLMs) becomes a paramount concern. Recent surveys, such as Zhou et al. [51], have highlighted significant backdoor threats in LLMs. In these scenarios, malicious triggers can be embedded during the training or fine-tuning phase to bypass security filters. Furthermore, the deployment of LLMs in sensitive environments introduces the risk of fingerprinting attacks. In these cases, an adversary optimizes queries, sometimes using reinforcement learning, to identify and exploit model identities uniquely [52].
The SplitML architecture offers a potential defense-in-depth strategy for these emerging LLM threats. By keeping the output-proximal bottom layers, such as specialized classification heads or instruction-tuning layers, private to each institution, the framework can neutralize universal backdoor triggers that rely on global end-to-end weight manipulation [51]. Additionally, because gradients and activations are encrypted using multi-key FHE, the framework provides a cryptographic barrier against fingerprinting attempts. Protecting model updates in this manner prevents the intermodel discrepancy signals that offensive fingerprinting tools use to discriminate between models [52]. Integrating these LLM-specific defenses is a vital direction for the next generation of FSL.

6. Conclusions

This paper presents the SplitML framework, which is a unified architecture designed to resolve the critical privacy–utility trade-offs in distributed machine learning. By merging the structural advantages of FL and SL, SplitML enables model heterogeneity through a partitioned hierarchy. This involves sharing standardized top layers for collaborative feature extraction while keeping bottom layers private for localized classification.
The integration of MK-FHE ensures that sensitive activations and gradients remain encrypted during both training and inference. This provides a robust defense against reconstructive inference attacks. Furthermore, a collaborative consensus process facilitates high-precision global inference without exposing raw client telemetry. The evaluation of federation costs demonstrates that the multi-key FHE overhead is concentrated in the initial setup. Once the shared evaluation keys are established, the per-epoch training latency is dominated by local computation rather than cryptographic operations. This positioning of the security tax at the initialization phase makes SplitML a practical choice for SIEM operators who require long-term and continuous collaborative monitoring across heterogeneous network infrastructures.

7. Future Work

Despite these advancements, the current framework operates under a semi-honest or honest-but-curious threat model; while the architecture reduces the blast radius of poisoning through localized layer refinement, it lacks active cryptographic or statistical defenses to filter malicious updates in real-time. Furthermore, the reliance on a central aggregator assumes a passive adversary; it does not account for an actively malicious server capable of tampering with global weights or misrouting encrypted queries.
Future research will prioritize extending the security perimeter to include actively malicious servers by transitioning toward decentralized trust models. We propose deploying digital signature verification, multi-server distributed aggregation using threshold secret sharing, or a peer-to-peer (P2P) decentralized topology to eliminate the central aggregator as a single point of failure. These architectures distribute the flow of authority to ensure no single compromised entity can manipulate the global model or recover private data.
Additionally, we intend to integrate active Byzantine-robust aggregation techniques, such as Krum, Trimmed Mean, or Median-based filtering, to prune malicious gradients before they statistically contaminate the shared weights. We emphasize that SplitML is complementary to these robust aggregation methods; while our architectural partitioning limits the scope of an attack to the shared layers, these statistical filters would provide the necessary active defense to maintain the integrity of those shared parameters.
Finally, future iterations will formally incorporate fairness-aware constraints into the optimization objective. This will help to prevent bias and ensure equitable model performance across diverse and heterogeneous participants.

Author Contributions

All authors contributed to this study’s conceptualization and methodology. D.T. contributed to writing—original draft preparation. A.B. and N.K. contributed to writing—review and editing. D.T. contributed to visualization. N.T. contributed to supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing does not apply to this article.

Acknowledgments

During the preparation of this manuscript, the authors utilized Grammarly and Gemini to ensure grammatical accuracy and stylistic flow. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Background

We denote the notations used in this paper in Table A1, then briefly describe Fully Homomorphic Encryption (FHE) and Differential Privacy (DP) in the following subsections.
Table A1. Notations used in this paper.
Table A1. Notations used in this paper.
SymbolName
E n c Encryption
D e c Decryption
A d d Addition
M u l t Multiplication
P K Public Key
S K Secret Key
E K A d d Evaluation Key for Addition
E K M u l t Evaluation Key for Multiplication
E K D e c Evaluation Key for Fused Decryption
Δ Scaling Factor for FHE
λ Classification Threshold
f ( z ) Activation Function f on Input z
SCentral (Federation) Server
K(Total) Number of Clients
kClient Index ( k { 1 , , K } )
T(Threshold) Number of Clients required for Fused Decryption ( T K )
T Number of Colluding Clients
C k k-th Client
D k Dataset of k-th Client
oObservation (Record) in a Dataset ( o D k )
AShared Attributes (Features)
LShared Labels
nNumber of Shared Layers
t k Number of Personalized Layers of k-th Client
M k ML model of k-th Client
| M k | Number of Total Layers of k-th Client ( | M k | = n + t k )
qCut (Split) Layer
| q | Size of the Cut Layer
q Activations from Cut Layer q
N ˜ d ˜ Number of Decryption Queries
mNumber of Participants in Training
pTraining Participant Index ( p { 1 , , m } )
P p p-th Participant
m Number of Participants in Consensus
hConsensus Participant Index ( h { 1 , , m } )
RNumber of Training Rounds
rRound Index ( r { 1 , , R } )
α Fraction of ML parameters with a Client
1 α Fraction of ML parameters with the Server
b s Batch Size
l r Learning Rate

Appendix A.1. Fully Homomorphic Encryption (FHE)

Fully Homomorphic Encryption (FHE) serves as a robust cryptographic primitive enabling direct arithmetic operations on ciphertexts. This capability establishes FHE as a primary candidate for privacy-centric computation and secure data storage [10,11,53,54,55]. Since the introduction of Gentry’s seminal scheme in 2009 [9], the field has experienced substantial global interest, leading to numerous performance and security enhancements. Consequently, FHE has been integrated into a wide range of practical applications [12,13,14,15,16,17,18,56,57,58]. Based on their fundamental operations, these schemes are generally categorized into word-wise frameworks [20,59,60,61] and bit-wise frameworks [62,63]. FHE facilitates arbitrary computational logic on encrypted information without requiring decryption through a tripartite key structure: the public key ( P K ), the secret key ( S K ), and the evaluation key ( E K ). The public key serves the encryption process, whereas the secret key is reserved for decryption. The evaluation key enables the execution of arithmetic circuits on ciphertexts; while E K is typically derived from the secret key, certain implementations may generate it through the combined use of both public and private parameters.
This research adopts the CKKS [20] scheme, which distinguishes itself from alternative FHE frameworks such as BFV [61,64], BGV [65], and TFHE [62] through its unique management of encryption noise. Unlike schemes that treat noise as a strictly adversarial element, CKKS interprets noise as an inherent component of the message. This approach is analogous to floating-point arithmetic, where real numbers are represented via approximation. In this context, encryption noise does not compromise the Most Significant Bits (MSBs) of the plaintext m ˜ provided it remains within predefined bounds. Upon decryption, the ciphertext yields an approximated value m ˜ + e ˜ , where e ˜ represents the minimal residual noise. To mitigate potential precision loss during these operations, plaintexts are multiplied by a scaling factor Δ prior to encryption. Furthermore, CKKS supports batching techniques, allowing for the encoding of multiple plaintexts into a single ciphertext to facilitate Single Instruction Multiple Data (SIMD) processing. From a security perspective, CKKS is defined by a suite of probabilistic polynomial-time algorithms relative to the security parameter.
The algorithms are:
  • C K K S . K e y G e n : generates a key pair.
  • C K K S . E n c : encrypts a plaintext.
  • C K K S . D e c : decrypts a ciphertext.
  • C K K S . E v a l : evaluates an arithmetic operation on ciphertexts (encrypted data).
Multi-key CKKS is a tuple of five probabilistic polynomial-time algorithms. Given a security parameter μ , the maximal multiplicative depth of evaluatable circuits L ˜ , the number of clients K, and access structure A ˜ , C K K S . K e y G e n returns a public key P K and K secret keys S K 1 to S K K and evaluation key E K . Given a public key P K and a message m ˜ , C K K S . E n c returns a ciphertext c ˜ . Given a secret key S K k and a ciphertext c ˜ , C K K S . P D e c returns a partial decryption ψ k . Given an evaluation key E K , a circuit f, and ciphertexts c ˜ 1 to c ˜ k , C K K S . E v a l returns a ciphertext c ˜ f . Given a set of partial decryptions { ψ k } where k A ˜ , and a ciphertext c ˜ , C K K S . C o m b i n e returns a message m ˜ approximate of m ˜ .
  • P K , { S K 1 , , S K K } , E K C K K S . K e y G e n ( 1 μ , L ˜ , K , A ˜ )
  • c ˜ C K K S . E n c ( m ˜ , P K )
  • ψ k C K K S . P D e c ( c ˜ , S K k )
  • c ˜ f C K K S . E v a l ( f , E K , c ˜ 1 , , c ˜ K )
  • m ˜ m ˜ C K K S . C o m b i n e ( { ψ k } k A ˜ , c ˜ )

Appendix A.2. Differential Privacy (DP)

Differential Privacy (DP) [66] establishes a mathematical framework for data analysis that safeguards individual privacy by ensuring the results do not reveal sensitive information regarding any specific record. This objective is achieved through the integration of stochastic noise, which renders the distinction between an original dataset and its noisy counterpart statistically negligible. The magnitude of this noise is governed by the privacy budget, denoted as ϵ . Within this framework, a higher ϵ value generally corresponds to a more permissive privacy threshold, whereas lower values signify a more rigorous privacy guarantee. Formally, a mechanism M satisfies ( ϵ , δ ) -DP if, for all adjacent datasets x and y and for all possible outcome subsets R , the following condition is met: P [ M ( x ) R ] e ϵ · P [ M ( y ) R ] + δ .
The application of random noise effectively masks private attributes, and the specific noise parameters can be calibrated to satisfy DP requirements [66,67] and related variants [68] specifically designed to protect machine learning training data [69]. DP methodologies have been successfully extended to various models, including regressions [70,71], Support Vector Machines (SVM) [72], Decision Trees (DT) [73], and Neural Networks (NN) [74]. In practice, noise injection can occur at the input level [75], within local updates prior to transmission [76], or at the centralized server [77].
By quantifying the added noise, one can compute a probability bound on information leakage, characterizing the difficulty for an adversary to reconstruct private features. The primary objective is to restrict the influence of individual samples on the global parameters, thereby preventing model extraction or inversion [78].
In multi-party settings, DP facilitates the training of shared models on decentralized data without compromising participant confidentiality [79]. However, the presence of honest-but-curious adversaries who may collude necessitates careful consideration of the noise source. If the centralized server is untrusted, it cannot be relied upon to generate noise, as the disclosure of noise parameters to specific clients would invalidate the DP protections. Consequently, participants often generate noise in a distributed manner [80,81]. This approach is particularly effective when employing a Gaussian distribution, as the additive stability of Gaussian noise simplifies the distributed aggregation process.
Comparative studies by Dong et al. [82] indicate that the trade-off between model utility and defensive efficacy is primarily dictated by the noise magnitude rather than the specific distribution type, such as Gaussian or Laplacian. Nonetheless, both noise types inevitably lead to a non-negligible reduction in the target model’s accuracy. Expanding on this, Titcombe et al. [83] proposed applying additive Laplacian noise to intermediate data representations before transmission to a computational server. This unilateral defense obscures the communication between model segments, significantly complicating the adversary’s efforts to map intermediate representations back to raw input data. This decentralized application of noise is particularly valuable when the data holder lacks trust in the computational infrastructure.

Appendix B. Related Work

In this section, we present formal definitions for Federated Learning (Definition A1) and Split Learning (Definition A2) and discuss their inherent vulnerabilities.

Appendix B.1. Federated Learning

Federated Learning (FL), as conceptualized by Google [3], facilitates the decentralized training of machine learning models across distributed devices that harbor privacy-sensitive local datasets. This paradigm was specifically designed to address the “data islanding” problem, where regulatory or logistical constraints prevent the centralization of raw data. At the initiation of the training process, designated as round r = 0 , a centralized orchestration server S initializes a global model M 0 . This baseline model is subsequently disseminated to a subset of m participants selected from a total pool of k available clients for the current communication round. Upon the successful reception of the global parameters M 0 , each participating client executes local on-device training, refining the model weights using their indigenous data samples. Following this local computation phase, participants transmit their respective updated model states, denoted as M p 0 , back to the centralized server. The orchestrator then performs a secure aggregation of these individual updates to synthesize the next iteration of the global model, M 1 . This iterative cycle of local computation and global communication persists until the R-th round ( r = R ), at which point the server determines that the global model M R has achieved a satisfactory state of convergence. Based on the distribution of data across the feature and sample spaces, FL is formally categorized into three primary taxonomies [84]:
  • Horizontal Federated Learning (HFL): This configuration is utilized when datasets share a significant overlap in their feature space but possess distinct sample IDs. HFL is often referred to as sample-partitioned federated learning, as it involves organizations that collect similar types of data from different user bases.
  • Vertical Federated Learning (VFL): VFL applies to scenarios where participants possess different features for a largely overlapping set of sample IDs. Furthermore, known as feature-partitioned federated learning, this type allows disparate organizations, such as a financial institution and a retail entity, to collaboratively train a model on a shared set of individuals without exchanging raw attributes.
  • Federated Transfer Learning (FTL): FTL addresses the most challenging scenario where participating entities share neither a significant portion of the sample IDs nor a common feature space. This approach leverages transfer learning techniques to bridge the gap between heterogeneous domains, allowing knowledge to be extracted from a source domain to improve a model in a distinct target domain.
Definition A1.
FL is a privacy-preserving collaborative distributed learning scheme for training a globally shared ML model M with K > 1 clients and an FL server S. For each training round r { 1 , , R } , participants 1 < m K train all the n > 0 shared layers of M r 1 with their private dataset D k , k { 1 , , K } with common attributes A and common labels L. After each training round, S averages the weights of M r 1 from all participants with some averaging algorithm and sends the updated model weights M r to participants of the next round. Training continues till M converges or fixed rounds R.
The main disadvantage of FL is that each client needs to run the full ML model, and resource-constrained clients, such as those available in the Internet of Things (IoT) devices, can only afford to run part of the model. In SplitML clients runs full models locally, but communication cost is lower, as weights for only shared layers are sent. Furthermore, the central server requires less computation in SplitML to average the weights of shared layers compared to all layers in FL. Hence, SplitML can help lower the communication costs on clients and computation costs on the server. Model privacy vanishes in FL for others if one of the clients is compromised. The heterogeneity of models in SplitML can help protect privacy even though some clients are compromised.
FL is also vulnerable to Inference attacks. Truex et al. [85] proposed a feasible black-box membership inference attack in FL. Zhu et al. [76] proposed a deep leakage method to retrieve training data from publicly shared gradients on computer vision and Natural Language Processing (NLP) tasks. Wang et al. [86] used a Generative Adversarial Network (GAN) based method called Multi-task GAN in FL to precisely recover the private data from a specific client, which causes user-level privacy leakage. We provide empirical evidence to show that SplitML offers robust protection against these attacks. The heterogeneity of data and models and the fact that in SplitML , only top layers weights are shared provides extensive protection against inference.
In an FL setting [87,88,89], the heterogeneity of client data makes the (Model) Poisoning attacks easier and detection harder. A Byzantine adversary launches a Model Poisoning attack, manipulating an arbitrary proportion of malicious users to deviate from a correct trend by submitting poisonous local model updates. The adversarial objective of malicious users is to cause the federated model to yield attacker-chosen target labels for specific samples (targeted attacks [87,88]), or to misclassify all testing samples indiscriminately (untargeted attacks [89]). Privacy-Preserving Federated Learning (PPFL) is vulnerable to Model Poisoning attacks launched by a Byzantine adversary, who crafts malicious local gradients to harm the accuracy of the federated model. To mitigate these attacks, the central server must distinguish the information uploaded by honest clients from malicious ones. Our experiments show that SplitML by-design protects against malicious clients performing Model Poisoning attacks to lower the accuracy of honest clients, even when the majority are sending malicious updates.
HeteroFL [90] introduces a novel framework to address the challenge of possibly heterogeneous clients such as mobile phones and IoT devices equipped with different computation and communication capabilities in FL. HeteroFL tackles inefficiencies by allowing clients to train local models with varying complexity levels. HeteroFL proposes to allocate subsets of global model parameters adaptively according to the corresponding capabilities of local clients. Clients with higher computational capabilities can train more complex models, while those with limited resources can use simpler models. Despite these differences, all models belong to the same model class. This approach departs from traditional FL, where local models typically share the same architecture as the global model.
Helios [91] is a heterogeneity-aware framework to address the straggler (devices with weak computational capacities) issue in FL. It identifies the different training capabilities of individual devices and assigns them appropriate workloads. Helios proposes a ”soft-training“ method that dynamically compresses the model training workload for stragglers. This is achieved through a rotating neuron training approach, where only a subset of the model’s neurons are trained at each step. Helios aims to accelerate the training of stragglers while maintaining the accuracy and convergence of the overall FL process.
Heterogenous FL [90,91] enables the training of heterogeneous local models while producing a shared global inference model; while these frameworks consider heterogeneous clients in terms of computational resources to collaboratively train a shared global model, SplitML considers heterogeneity in terms of model architecture itself. Unlike the model partitions calculated by algorithms in these approaches, clients in SplitML choose their own local models and do not have a shared global model.

Appendix B.2. Split Learning

In the (vanilla) Split Learning (SL) algorithm [4], from one specific layer, called the ‘split layer’ or ‘cut layer,’ the Neural Network (NN) is split into two sub-networks. The client performs forward propagation on local training data and computes the output of its sub-network up to the cut layer, which is sent to the server to compute the final output until the last layer of the network. At the server’s sub-network, the gradients are backpropagated from the last layer to the split layer, and the gradient of the split layer is sent back to the client. The client performs the rest of the backward propagation process from the split to the first layer of the network. This process continues until the client has new training data. The server has no direct access to clients’ raw data, and complete model parameters are not sent to the server. The only information being communicated is the output of the cut layer from clients to the server and the cut layer gradient from the server to clients. Compared with other approaches, such as FL, SL requires consistently fewer resources from the participating clients, enabling lightweight and scalable distributed training solutions.
Definition A2.
(Vanilla/Horizontal) SL is a privacy-preserving collaborative distributed learning scheme for training a globally shared ML model M with K > 1 clients and an FL server S. For each training round r { 1 , , R } , a participant P p , p { 1 , , m } , 1 < m K trains the first n > 0 shared layers ( | M | = n + t , t > 0 ) up to the ‘cut/split layer’ q of the ML model with its private dataset D k with common attributes A and common labels L. In each round, the participant sends its gradients from the cut layer q to S, where the training continues for the last t layers, and later S sends the gradients back to the participant for backward-propagation of M. Training continues till M converges or fixed rounds R.
SL splits the full ML model M = M S ( M C ( · ) ) into multiple smaller network portions and trains them separately on a server ( M S ) and (distributed) clients ( M C ) with their local data. The relay-based training in SL makes the clients’ resources idle because only one client engages with the server at one instance, causing a significant increase in the training overhead with many clients. SL can become inefficient with many clients; unlike SL, training in FL and SplitML is parallel.
SplitNN [92], a distributed deep learning method, does not share raw data or model details with collaborating institutions. The proposed configurations of SplitNN (a) Simple vanilla SL, (b) SL without label sharing, and (c) SL for vertically partitioned data cater to practical health settings [92]. A drawback of the (a) vanilla SL is that the output labels L of training samples must be transmitted from clients to the server. In (b), a U-shaped configuration for SL is presented to alleviate the problem of label sharing in SL, where four sub-networks are used to make the model training possible without label sharing. The (a) and (b) approaches are suitable for horizontal data where training samples in different clients share the same feature space but do not share the sample space.
In contrast, vertical SL schemes deal with data structures in which different features of the same training samples are available to different clients. In (c), a vertical SL configuration is presented in which two clients containing different modalities of the same training samples train their specific sub-networks up to the cut layer q. Then, the outputs of their sub-networks are concatenated and sent to the server. The server performs forward and backward propagations and sends the gradient to each client to complete the backward propagation and train the overall network. However, SL introduces many security risks like data leakage and model theft.
Like FL and SplitML , the SL scheme protects clients’ data by not sending it directly to the server. However, model inversion attacks can compromise data protection in SL. Model inversion is a class of attacks that attempts to recreate data fed through a predictive model, either at inference or training [41,93,94,95].
It has been shown that model inversion attacks work better on earlier hidden layers of a neural network due to the increased structural similarity to input data [96]. This makes SL a prime target for model inversion attacks. In SplitML training, we do not share gradients from cut layer q but collaborate on encrypted model weights. In SplitML , inference is done locally after a model is converged in training, and if a client chooses to perform consensus with peers, it encrypts q with FHE. These measures enhance defenses against inference attacks. In a backdoor attack, a malicious party could introduce a backdoor into the model trained by the SL participants. This would allow the malicious party to control the model’s output, even if they cannot access it. In SplitML , these attacks are more challenging due to the heterogeneity of models and the fact that models are not partitioned between clients and servers.
In Sybil attacks [97,98], a malicious party could create multiple fake identities to influence the model’s training. This could be done to skew the model’s output in the desired direction. In SplitML , the setup phase requires generating collaborative evaluation keys E K and a secret key S K shares for fused decryption. Hence, it is computationally infeasible for a PPT adversary to set up fake clients and participate in encrypted collaborative training.
Pasquini et al. [99] demonstrated that an honest-but-curious server can reconstruct sensitive client data during the training phase of Split Learning (SL). They introduced the Feature-space Hijacking Attack (FSHA) by adapting and extending the generative inference attacks originally described in [100] to the specific architectural constraints of SL. The FSHA threat model assumes an adversary with access to a public dataset that shares a similar statistical distribution with the client’s private training data. This public data is used to pre-train a local autoencoder, consisting of an encoder and a decoder, which provides the server with a mapping between the input space and a latent representation.
The success of the attack relies on the server’s ability to manipulate the training process by steering the client’s model toward a specific, appositely crafted target feature space. During the training iterations, the server forces the client to generate intermediate representations, often referred to as “smashed data,” that align with the latent space of the server’s pre-trained decoder. Because the decoder is optimized to invert values within that specific latent space, it can effectively reconstruct the original private inputs from the activations received from the client.
This attack is executed in two distinct stages: (1) a setup phase, characterized by the server hijacking the learning trajectory to align the feature representations, and (2) a subsequent inference phase, during which the server can freely recover high-fidelity versions of the client’s raw data. This vulnerability underscores the critical necessity for cryptographic safeguards like FHE and noise-based defenses like DP, which SplitML employs to prevent the server from accessing or steering the latent space in plaintext.
UnSplit [101] proposed a novel symbiotic combination of model stealing and model inversion in a limited threat model within the context of SL. UnSplit assumes an honest-but-curious attacker, a much weaker form than a powerful malicious attacker, who knows the (global) model architecture but not the parameters. The attacker aims to recover any input given to the network and obtain a functionally similar (i.e., similar performance on unseen data) clone of the client network.
Unlike FSHA [99], UnSplit [101] has no assumptions about the attacker’s knowledge of a public data set related to the original task. Unlike SL, the server (adversary) does not control the model after the split layer and does not influence the learning process in SplitML . Moreover, encrypted layer weights are shared in SplitML , not the gradients, making such attacks infeasible. In SplitML , the attacker may not know the client’s local model due to heterogeneity. Additionally, SplitML deploys multi-key FHE with ( ϵ , δ ) -DP (refer Appendix A for details) to provide provable security guarantees.

Appendix B.3. Integrating FL with SL

Thapa et al. [30] proposed SplitFed Learning (SFL), combining FL and SL to train models on horizontally partitioned data. SFL considers the advantages of FL and SL while emphasizing data privacy and the robustness of the model by incorporating Differential Privacy (DP). In FL, the central server has access to the entire model. In contrast, the central server has access only to the server-side portion of the model and the smashed data (i.e., activation vectors of the cut layer) from the client in SL and SFL. In SFL, all clients perform forward-propagation on their client-side model in parallel and pass their smashed data to the central server. With each client’s smashed data, the central server processes forward-propagation and backward-propagation on its server-side model. It then sends the gradients of the smashed data to the respective clients for backward propagation. Afterward, the server updates its model by federated averaging algorithm (FedAvg [3]), and each client performs the backward propagation on their client-side local model and computes its gradients. A DP mechanism makes gradients private and sends them to the fed server. The fed server conducts the FedAvg of the client-side local updates and sends them back to all participants.
SFL proposes two primary variants to optimize distributed training. In S F L V 1 (splitfedv1), the server-side model segments for all clients are executed in parallel and subsequently aggregated via FedAvg to produce a global server-side model at the conclusion of each epoch. Conversely, S F L V 2 (splitfedv2) processes the forward and backward propagations of the server-side model sequentially based on each client’s smashed data, thereby eliminating the need for separate model aggregation. The design of S F L V 2 is motivated by the potential for increased model accuracy, as removing the parallel aggregation step may better preserve gradient information during the learning process. However, this sequential approach introduces a significant latency bottleneck. As the number of clients increases, the total training time cost generally increases in the order of S F L V 1 < S F L V 2 < S L . This comparison highlights a fundamental trade-off between the communication efficiency of parallelized updates and the potential performance gains of sequential processing.
While SFL takes advantage of FL with SL, it also inherits its security challenges. For instance, the global model is shared across all participants, making inference and poisoning attacks easier. In our scheme SplitML , clients may have different bottom layers, making the attacks difficult. Another advantage of our scheme is consensus after training, where a client may request its peers to vote on some of the samples during inference.
Existing Federated and Split Learning approaches work on vertically or horizontally partitioned data and cannot handle sequentially partitioned data where multiple sequential data segments are distributed across clients. The most common ML models for training on sequential data are Recurrent Neural Network (RNN), Gated Recurrent Unit (GRU), and Long Short-Term Memory (LSTM). Existing SL approaches work on feed-forward networks and cannot be used for recurrent networks. This data distribution is different from vertical and horizontal partitioned data. FedSL [102] proposed a novel federated split learning framework to train models on distributed sequential data. RNNs are split into sub-networks, and each sub-network is trained on a client containing single segments of multiple-segment training sequences.
The architectural evolution of partitioned learning has seen various implementations tailored to specific neural network topographies. Earlier Split Learning (SL) methodologies primarily focused on the partitioning of feed-forward neural networks, whereas Federated Split Learning (FedSL) extended this concept to Recurrent Neural Networks (RNN). In these frameworks, the SL component facilitates cooperation between clients, while the Federated Learning (FL) component manages the global synchronization between the clients and the central orchestration server. A critical differentiator among these approaches is the management of sensitive label information. In the SplitFed [30] architecture, the training process necessitates label sharing between the participating clients and the split server to compute the loss and complete the backpropagation cycle. In contrast, FedSL [102] provides a more privacy-centric alternative where the complete model parameters are never consolidated, and, crucially, label sharing is not required between clients or between the clients and the server. This design choice minimizes the risk of label leakage and preserves the confidentiality of the ground truth data.
This established line of research concerning sequential model partitioning is parallel to the SplitML framework; while those works focus on the logistics of the split and the types of neural layers supported, SplitML builds upon these partitioning concepts by integrating advanced cryptographic safeguards and consensus-based inference. By situating our work alongside these developments, we clarify that our methodology adheres to the principles of decentralized, partitioned training while introducing novel privacy-preserving mechanisms for the weight aggregation and inference phases.

Appendix C. Defenses

This section reviews established defenses against the machine learning attacks discussed in Section 3. By analyzing these countermeasures, we demonstrate how the integration of architectural partitioning and cryptographic safeguards neutralizes adversarial threats without compromising model utility.

Appendix C.1. Model Poisoning

To address the threat of Byzantine failures in Federated Learning (FL), several robust aggregation techniques have been proposed to identify and mitigate the impact of malicious updates. Yin et al. [103] introduced the use of median statistics, where the global update is derived from the median of local gradients to ensure resilience against extreme outliers. Alternatively, the Krum algorithm [104] utilizes Euclidean distance metrics to evaluate local gradients; it selects the update that exhibits the highest proximity to its neighbors as the representative global gradient. Expanding on this, Bulyan [105] combines the Krum selection process with an additional aggregation layer to further refine the global update. In the domain of clustering, Auror [106] employs K-means algorithms to group local model updates, thereby isolating outliers that deviate from the majority cluster.
The challenge of model poisoning is significantly exacerbated in non-IID (Independent and Identically Distributed) environments, where benign local updates inherently exhibit distinct statistical variations due to the unique data distributions held by each participant [107,108]. Research by Fung et al. regarding Sybil attacks [109] demonstrates that while cosine similarity can effectively identify malicious gradients, it relies on the specific assumption that attackers share similar and distinct variations. In highly heterogeneous environments, traditional distance-based schemes [105,106] frequently fail to distinguish between malicious activity and natural data skew. Consequently, these algorithms often misinterpret legitimate, non-IID updates as adversarial outliers, resulting in the exclusion of valuable information from the global model. This phenomenon is often characterized as a high false-positive rate for anomaly detection.
Conversely, in IID scenarios, the opposite problem may occur. When the underlying data is uniform, benign gradients cluster tightly together. In such cases, these distributions may be erroneously flagged as a coordinated Sybil attack, or a concentrated malicious update might be accepted if it closely mimics the tight cluster of benign updates. These misjudgments lead to a significant degradation in global model accuracy and a lack of robustness against sophisticated poisoning. This duality highlights the need for more nuanced defenses, such as those proposed in SplitML , which utilize cryptographic and consensus-based verification to separate identity from data distribution.
To overcome these limitations, mechanisms like FLTrust [110] and the trimmed-means method [89] rely on a centralized, clean validation dataset. These methods evaluate the trustworthiness of local updates by measuring their similarity to a benign gradient generated from the server’s validation data. However, such approaches are fundamentally incompatible with Privacy-Preserving Federated Learning (PPFL) paradigms. In PPFL, privacy regulations and policies [111,112] strictly prohibit the server from directly collecting or operating on raw training or validation data.
Recent efforts have focused on reconciling robust aggregation with privacy constraints. Liu et al. [113] proposed Privacy-Enhanced FL (PEFL), which utilizes the Pearson correlation coefficient to identify outliers within IID datasets; while PEFL employs a two-server architecture for secure computation, it necessitates a strong security assumption: the confidentiality of a secret key held by a trusted server. If this server is compromised, the secret key is exposed, leading to a total breach of data privacy.
To mitigate this risk, Ma et al. [114] developed ShieldFL, a defense strategy utilizing two-trapdoor homomorphic encryption. This framework allows for the detection of poisoned updates within an encrypted domain without requiring a singular, vulnerable secret key. ShieldFL incorporates a Byzantine-tolerant aggregation method based on cosine similarity, which maintains robustness across both IID and non-IID data distributions without compromising the underlying privacy of the participants.

Appendix C.2. Membership Inference

Several researchers have explored the integration of perturbation techniques within decentralized learning frameworks to fortify privacy. Wei et al. [115] introduced the NbAFL algorithm, which strategically injects artificial noise into local parameters prior to the aggregation phase. This approach ensures that individual contributions remain obfuscated before they reach the central orchestrator. Taking a different architectural perspective, Geyer et al. [116] addressed privacy from the participant level by employing a combination of random sub-sampling and the Gaussian mechanism. This dual-layered strategy is designed to distort the aggregate sum of all updates, thereby providing a differentially private guarantee for the global model without requiring trust in the individual contributions of every client. To address the risks associated with multi-party collusion, Hao et al. [117] proposed a hybrid defense that integrates additively homomorphic encryption [118] with DP. By combining cryptographic and statistical safeguards, their methodology offers a more resilient layer of protection against information leakage. This is particularly effective in adversarial scenarios where multiple participating nodes or the centralized server itself may collude to reconstruct sensitive data. The synergy between homomorphic encryption and DP ensures that even if the encrypted values are intercepted, the statistical noise provides a secondary barrier that prevents the precise extraction of raw features.

Appendix C.3. Model Inversion

To counter the threat of model inversion attacks, Abuadbba et al. [119] implemented a defense mechanism by applying stochastic noise to the intermediate tensors within a Split Learning (SL) framework. This study focused on one-dimensional electrocardiogram (ECG) data and formalized the noise injection as a Differential Privacy (DP) [120] mechanism; while this approach effectively obscures the activations transmitted to the server, the authors noted a substantial degradation in model accuracy even for relatively modest privacy budget ( ϵ ) values. This highlights the inherent trade-off between utility and privacy in perturbation-based defenses, where strong confidentiality often compromises the functional performance of the underlying classifier.
A related methodology, introduced as Shredder [121], seeks to optimize this trade-off by employing an adaptive noise generation strategy. Rather than applying uniform noise, Shredder learns an optimal noise mask designed to minimize the mutual information between the raw input and the intermediate representations. By selectively obscuring sensitive features while preserving those essential for the primary task, Shredder aims to reduce the risk of data reconstruction with a lower impact on accuracy than traditional DP methods. Both of these studies provide critical context for SplitML , which seeks to achieve high privacy through FHE and DP without the severe accuracy penalties observed in earlier tensor-perturbation research.
NoPeekNN [122] limits data reconstruction in SL by minimizing the distance correlation between the input data and the intermediate tensors during model training. NoPeekNN optimizes the model by a weighted combination of the task’s loss and a distance correlation loss, which measures the similarity between the input and intermediate data. NoPeekNN’s loss weighting is governed by a hyperparameter α [ 0 , ] , but while NoPeekNN was shown to reduce an autoencoder’s ability to reconstruct input data, it has not been applied to an adversarial model inversion attack.
Titcombe et al. [83] extend NoPeekNN [122] and propose a simple additive noise method to defend against model inversion. Their method can significantly reduce attack efficacy at an acceptable accuracy trade-off on MNIST. Their work defines a threat model for SL in which training and inference data are stolen in an FL system. They examine the practical limitations on attack efficacy, such as the amount of data available to an attacker and their prior knowledge of the target model. They introduce a method for protecting data, consisting of random noise added to the intermediate data representation of SL. However, they only consider the susceptibility of inference-time data to model inversion and do not investigate the efficacy of the attack on data collected during training.
As noise defense and NoPeekNN are introduced at the intersection between model segments, they do not protect against white box model inversion, which extracts training data memorized by a data owner’s model segment. Other works [123,124] provide practical ways to mitigate model inversion attacks. However, they cannot achieve satisfactory mitigation when the client-side model has few layers (less than 3 in a VGG-11 model). A training-time DP mechanism, such as DP-SGD or PATE [78,79,125] could offer protection against other attack types (e.g., MI [38], Sybil attacks [97,98]).

Appendix C.4. Model Extraction

Model owners can implement digital watermarking as a proactive measure to detect unauthorized usage and assert ownership of stolen machine learning assets. This technique is conceptually related to data poisoning, as it leverages the redundant capacity within a model to overfit specific, clandestine input-output pairs known exclusively to the defender. These “trigger sets” serve as a cryptographic signature embedded within the model parameters. Previous studies [126,127] introduced watermarking algorithms that utilize specialized poisoning strategies [128] to ensure the model responds with high confidence to these predefined outliers without compromising performance on the primary task. Recent advancements have sought to make these watermarks more resilient against removal or “fine-tuning” attacks. For instance, [129] proposed the Entangled Watermark Embedding (EWE) framework, which utilizes Soft Nearest Neighbors Loss (SNNL). The objective of EWE is to facilitate the entanglement of representations between legitimate task data and the watermark triggers. By forcing the neural network to associate the features of the watermark with those of the actual training data, the authors ensure that any attempt to excise the watermark or overwrite it through further training would simultaneously degrade the model’s accuracy on the original classification task. This entanglement creates a robust defensive bond that effectively binds the model’s identity to its functional performance.

References

  1. de Montjoye, Y.A.; Radaelli, L.; Singh, V.K.; Pentland, A.S. Unique in the shopping mall: On the reidentifiability of credit card metadata. Science 2015, 347, 536–539. [Google Scholar] [CrossRef]
  2. Zhang, J.; Li, C.; Qi, J.; He, J. A Survey on Class Imbalance in Federated Learning. arXiv 2023, arXiv:2303.11673. [Google Scholar] [CrossRef]
  3. McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the Artificial Intelligence and Statistics; PMLR: London, UK, 2017; pp. 1273–1282. [Google Scholar]
  4. Gupta, O.; Raskar, R. Distributed learning of deep neural network over multiple agents. J. Netw. Comput. Appl. 2018, 116, 1–8. [Google Scholar] [CrossRef]
  5. Kumar, D.; Pawar, P.P.; Meesala, M.K.; Pareek, P.K.; Addula, S.R.; K.S., S. Trustworthy IoT Infrastructures: Privacy-Preserving Federated Learning with Efficient Secure Aggregation for Cybersecurity. In Proceedings of the 2024 International Conference on Integrated Intelligence and Communication Systems (ICIICS), Kalaburagi, India, 22–23 November 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–8. [Google Scholar]
  6. Yin, X.; Zhu, Y.; Hu, J. A comprehensive survey of privacy-preserving federated learning: A taxonomy, review, and future directions. ACM Comput. Surv. (CSUR) 2021, 54, 1–36. [Google Scholar] [CrossRef]
  7. Nasr, M.; Shokri, R.; Houmansadr, A. Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning. In Proceedings of the 2019 IEEE symposium on security and privacy (SP), San Francisco, CA, USA, 20–22 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 739–753. [Google Scholar]
  8. Xia, Q.; Tao, Z.; Hao, Z.; Li, Q. FABA: An algorithm for fast aggregation against byzantine attacks in distributed neural networks. In Proceedings of the IJCAI, Macao, China, 10–16 August 2019. [Google Scholar]
  9. Gentry, C. Fully homomorphic encryption using ideal lattices. In Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, Bethesda, MD, USA, 31 May–2 June 2009; pp. 169–178. [Google Scholar]
  10. Trivedi, D. Privacy-Preserving Security Analytics, 2023. Available online: https://www.isaca.org/resources/news-and-trends/isaca-now-blog/2023/privacy-preserving-security-analytics.
  11. Trivedi, D. The Future of Cryptography: Performing Computations on Encrypted Data. ISACA J. 2023, 1, 2. [Google Scholar]
  12. Angel, S.; Chen, H.; Laine, K.; Setty, S. PIR with compressed queries and amortized query processing. In Proceedings of the 2018 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 20–24 May 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 962–979. [Google Scholar]
  13. Bos, J.W.; Castryck, W.; Iliashenko, I.; Vercauteren, F. Privacy-friendly forecasting for the smart grid using homomorphic encryption and the group method of data handling. In Proceedings of the Progress in Cryptology-AFRICACRYPT 2017: 9th International Conference on Cryptology in Africa, Dakar, Senegal, 24–26 May 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 184–201. [Google Scholar]
  14. Boudguiga, A.; Stan, O.; Sedjelmaci, H.; Carpov, S. Homomorphic Encryption at Work for Private Analysis of Security Logs. In Proceedings of the ICISSP, Valletta, Malta, 25–27 February 2020; pp. 515–523. [Google Scholar]
  15. Bourse, F.; Minelli, M.; Minihold, M.; Paillier, P. Fast homomorphic evaluation of deep discretized neural networks. In Proceedings of the Advances in Cryptology–CRYPTO 2018: 38th Annual International Cryptology Conference, Santa Barbara, CA, USA, 19–23 August 2018; Proceedings, Part III 38. Springer: Berlin/Heidelberg, Germany, 2018; pp. 483–512. [Google Scholar]
  16. Kim, M.; Lauter, K. Private genome analysis through homomorphic encryption. BMC Med. Inform. Decis. Mak. 2015, 15, S3. [Google Scholar] [CrossRef]
  17. Trama, D.; Clet, P.E.; Boudguiga, A.; Sirdey, R. Building Blocks for LSTM Homomorphic Evaluation with TFHE. In Proceedings of the International Symposium on Cyber Security, Cryptology, and Machine Learning, Be’er Sheva, Israel, 29–30 June 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 117–134. [Google Scholar]
  18. Trivedi, D.; Boudguiga, A.; Triandopoulos, N. SigML: Supervised Log Anomaly with Fully Homomorphic Encryption. In Proceedings of the International Symposium on Cyber Security, Cryptology, and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2023; pp. 372–388. [Google Scholar]
  19. Bendoukha, A.A.; Demirag, D.; Kaaniche, N.; Boudguiga, A.; Sirdey, R.; Gambs, S. Towards Privacy-preserving and Fairness-aware Federated Learning Framework. Proc. Priv. Enhancing Technol. 2025, 2025, 845–865. [Google Scholar] [CrossRef]
  20. Cheon, J.H.; Kim, A.; Kim, M.; Song, Y. Homomorphic encryption for arithmetic of approximate numbers. In Proceedings of the Advances in Cryptology–ASIACRYPT 2017: 23rd International Conference on the Theory and Applications of Cryptology and Information Security, Hong Kong, China, 3–7 December 2017; Proceedings, Part I 23. Springer: Berlin/Heidelberg, Germany, 2017; pp. 409–437. [Google Scholar]
  21. Badawi, A.A.; Alexandru, A.; Bates, J.; Bergamaschi, F.; Cousins, D.B.; Erabelli, S.; Genise, N.; Halevi, S.; Hunt, H.; Kim, A.; et al. OpenFHE: Open-Source Fully Homomorphic Encryption Library. Cryptology ePrint Archive, Paper 2022/915. 2022. Available online: https://eprint.iacr.org/2022/915.
  22. Al Badawi, A.; Bates, J.; Bergamaschi, F.; Cousins, D.B.; Erabelli, S.; Genise, N.; Halevi, S.; Hunt, H.; Kim, A.; Lee, Y.; et al. OpenFHE: Open-Source Fully Homomorphic Encryption Library. In WAHC’22: Proceedings of the 10th Workshop on Encrypted Computing & Applied Homomorphic Cryptography; Association for Computing Machinery: New York, NY, USA, 2022; pp. 53–63. [Google Scholar] [CrossRef]
  23. Bonawitz, K.; Ivanov, V.; Kreuter, B.; Marcedone, A.; McMahan, H.B.; Patel, S.; Ramage, D.; Segal, A.; Seth, K. Practical Secure Aggregation for Privacy-Preserving Machine Learning. In CCS ’17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security; Association for Computing Machinery: New York, NY, USA, 2017; pp. 1175–1191. [Google Scholar] [CrossRef]
  24. Yan, D.; Hu, M.; Xie, X.; Yang, Y.; Chen, M. S2FL: Toward Efficient and Accurate Heterogeneous Split Federated Learning. IEEE Trans. Comput. 2026, 75, 320–334. [Google Scholar] [CrossRef]
  25. Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2009, 22, 1345–1359. [Google Scholar] [CrossRef]
  26. Tan, C.; Sun, F.; Kong, T.; Zhang, W.; Yang, C.; Liu, C. A survey on deep transfer learning. In Proceedings of the Artificial Neural Networks and Machine Learning–ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, 4–7 October 2018; Proceedings, Part III 27. Springer: Berlin/Heidelberg, Germany, 2018; pp. 270–279. [Google Scholar]
  27. Ring, M.B. CHILD: A first step towards continual learning. Mach. Learn. 1997, 28, 77–104. [Google Scholar] [CrossRef]
  28. Yang, Q.; Ling, C.; Chai, X.; Pan, R. Test-cost sensitive classification on data with missing values. IEEE Trans. Knowl. Data Eng. 2006, 18, 626–638. [Google Scholar] [CrossRef]
  29. Zhu, X.; Wu, X. Class noise handling for effective cost-sensitive learning by cost-guided iterative classification filtering. IEEE Trans. Knowl. Data Eng. 2006, 18, 1435–1440. [Google Scholar]
  30. Thapa, C.; Arachchige, P.C.M.; Camtepe, S.; Sun, L. Splitfed: When federated learning meets split learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; Volume 36, pp. 8485–8493. [Google Scholar]
  31. Security Notes for Homomorphic Encryption—OpenFHE Documentation. 2022. Available online: https://openfhe-development.readthedocs.io/en/latest/sphinx_rsts/intro/security.html.
  32. Tramèr, F.; Zhang, F.; Juels, A.; Reiter, M.K.; Ristenpart, T. Stealing Machine Learning Models via Prediction APIs. In Proceedings of the USENIX Security Symposium, Austin, TX, USA, 10–12 August 2016; Volume 16, pp. 601–618. [Google Scholar]
  33. Juuti, M.; Szyller, S.; Marchal, S.; Asokan, N. PRADA: Protecting against DNN model stealing attacks. In Proceedings of the 2019 IEEE European Symposium on Security and Privacy (EuroS&P), Stockholm, Sweden, 17–19 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 512–527. [Google Scholar]
  34. Li, B.; Micciancio, D. On the security of homomorphic encryption on approximate numbers. In Proceedings of the Advances in Cryptology–EUROCRYPT 2021: 40th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Zagreb, Croatia, 17–21 October 2021; Proceedings, Part I 40. Springer: Berlin/Heidelberg, Germany, 2021; pp. 648–677. [Google Scholar]
  35. Openfhe-Development/src/pke/examples/CKKS_ NOISE_FLOODING.md at Main·Openfheorg/Openfhe-Development. 2022. Available online: https://github.com/openfheorg/openfhe-development/blob/main/src/pke/examples/CKKS_NOISE_FLOODING.md.
  36. Ogilvie, T. Differential Privacy for Free? Harnessing the Noise in Approximate Homomorphic Encryption. Cryptol. ePrint Arch. 2023. [Google Scholar] [CrossRef]
  37. He, S.; Zhu, J.; He, P.; Lyu, M.R. Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics. arXiv 2020, arXiv:2008.06448. [Google Scholar] [CrossRef]
  38. Shokri, R.; Stronati, M.; Song, C.; Shmatikov, V. Membership inference attacks against machine learning models. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–24 May 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 3–18. [Google Scholar]
  39. Nicolae, M.I.; Sinn, M.; Tran, M.N.; Buesser, B.; Rawat, A.; Wistuba, M.; Zantedeschi, V.; Baracaldo, N.; Chen, B.; Ludwig, H.; et al. Adversarial Robustness Toolbox v1.2.0. arXiv 2018, arXiv:1807.01069. [Google Scholar] [CrossRef]
  40. LeCun, Y.; Cortes, C.; Burges, C.J.B. THE MNIST DATABASE of Handwritten Digits. Available online: https://yann.lecun.org/exdb/mnist/index.html.
  41. Fredrikson, M.; Jha, S.; Ristenpart, T. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA, 12–16 October 2015; pp. 1322–1333. [Google Scholar]
  42. Li, J.; Rakin, A.S.; Chen, X.; Yang, L.; He, Z.; Fan, D.; Chakrabarti, C. Model Extraction Attacks on Split Federated Learning. arXiv 2023, arXiv:2303.08581. [Google Scholar] [CrossRef]
  43. Jagielski, M.; Carlini, N.; Berthelot, D.; Kurakin, A.; Papernot, N. High accuracy and high fidelity extraction of neural networks. In Proceedings of the 29th USENIX Conference on Security Symposium, Boston, MA, USA, 12–14 August 2020; pp. 1345–1362. [Google Scholar]
  44. Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and harnessing adversarial examples. arXiv 2014, arXiv:1412.6572. [Google Scholar]
  45. Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards deep learning models resistant to adversarial attacks. arXiv 2017, arXiv:1706.06083. [Google Scholar]
  46. Rakin, A.S.; He, Z.; Fan, D. Bit-flip attack: Crushing neural network with progressive bit search. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1211–1220. [Google Scholar]
  47. Jagielski, M.; Carlini, N.; Berthelot, D.; Kurakin, A.; Papernot, N. High-fidelity extraction of neural network models. arXiv 2019, arXiv:1909.01838. [Google Scholar]
  48. He, P.; Zhu, J.; Zheng, Z.; Lyu, M.R. Drain: An online log parsing approach with fixed depth tree. In Proceedings of the 2017 IEEE International Conference on Web Services (ICWS), Honolulu, HI, USA, 25–30 June 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 33–40. [Google Scholar]
  49. Foundation, P.S. Python 3.11, 2023. Available online: https://www.python.org/downloads/release/python-3110/.
  50. Buitinck, L.; Louppe, G.; Blondel, M.; Pedregosa, F.; Mueller, A.; Grisel, O.; Niculae, V.; Prettenhofer, P.; Gramfort, A.; Grobler, J.; et al. API design for machine learning software: Experiences from the scikit-learn project. In Proceedings of the ECML PKDD Workshop: Languages for Data Mining and Machine Learning, Prague, Czech Republic, 23–27 September 2013; pp. 108–122. [Google Scholar]
  51. Zhou, Y.; Ni, T.; Lee, W.B.; Zhao, Q. A survey on backdoor threats in large language models (llms): Attacks, defenses, and evaluations. arXiv 2025, arXiv:2502.05224. [Google Scholar] [CrossRef]
  52. Kurian, K.; Holland, E.; Oesch, S. Attacks and defenses against llm fingerprinting. arXiv 2025, arXiv:2508.09021. [Google Scholar] [CrossRef]
  53. Trivedi, D. GitHub-devharsh/chiku: Polynomial function approximation library in Python. 2023. Available online: https://github.com/devharsh/chiku.
  54. Trivedi, D. Brief announcement: Efficient probabilistic approximations for sign and compare. In Proceedings of the International Symposium on Stabilizing, Safety, and Security of Distributed Systems, Jersey City, NJ, USA, 2–4 October 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 289–296. [Google Scholar]
  55. Trivedi, D. Towards Efficient Security Analytics. Ph.D. Thesis, Stevens Institute of Technology, Hoboken, NJ, USA, 2024. [Google Scholar]
  56. Trivedi, D.; Boudguiga, A.; Kaaniche, N.; Triandopoulos, N. SigML++: Supervised log anomaly with probabilistic polynomial approximation. Cryptography 2023, 7, 52. [Google Scholar] [CrossRef]
  57. Trivedi, D.; Malcolm, C.; Harrell, J.; Omisakin, H.; Addison, P. PETA: A Privacy-Enhanced Framework for Secure and Auditable Tax Analysis. J. Cybersecur. Digit. Forensics Jurisprud. 2025, 1, 81–94. [Google Scholar]
  58. Trivedi, D.; Boudguiga, A.; Kaaniche, N.; Triandopoulos, N. SplitML: A Unified Privacy-Preserving Architecture for Federated Split-Learning in Heterogeneous Environments. Preprints 2025. [Google Scholar] [CrossRef]
  59. Brakerski, Z.; Gentry, C.; Vaikuntanathan, V. (Leveled) fully homomorphic encryption without bootstrapping. ACM Trans. Comput. Theory (TOCT) 2014, 6, 1–36. [Google Scholar] [CrossRef]
  60. Gentry, C.; Sahai, A.; Waters, B. Homomorphic encryption from learning with errors: Conceptually-simpler, asymptotically-faster, attribute-based. In Proceedings of the Advances in Cryptology—CRYPTO 2013: 33rd Annual Cryptology Conference, Santa Barbara, CA, USA, 18–22 August 2013; Proceedings, Part I. Springer: Berlin/Heidelberg, Germany, 2013; pp. 75–92. [Google Scholar]
  61. Fan, J.; Vercauteren, F. Somewhat Practical Fully Homomorphic Encryption. Cryptology ePrint Archive, Report 2012/144. 2012. Available online: https://eprint.iacr.org/2012/144.
  62. Chillotti, I.; Gama, N.; Georgieva, M.; Izabachene, M. Faster fully homomorphic encryption: Bootstrapping in less than 0.1 seconds. In Proceedings of the Advances in Cryptology—ASIACRYPT 2016: 22nd International Conference on the Theory and Application of Cryptology and Information Security, Hanoi, Vietnam, 4–8 December 2016; Proceedings, Part I 22. Springer: Berlin/Heidelberg, Germany, 2016; pp. 3–33. [Google Scholar]
  63. Ducas, L.; Micciancio, D. FHEW: Bootstrapping homomorphic encryption in less than a second. In Proceedings of the Advances in Cryptology–EUROCRYPT 2015: 34th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Sofia, Bulgaria, 26–30 April 2015; Proceedings, Part I 34. Springer: Berlin/Heidelberg, Germany, 2015; pp. 617–640. [Google Scholar]
  64. Brakerski, Z. Fully Homomorphic Encryption Without Modulus Switching from Classical GapSVP. In Proceedings of the 32nd Annual Cryptology Conference on Advances in Cryptology—CRYPTO 2012, Santa Barbara, CA, USA, 19–23 August 2012; Springer: New York, NY, USA, 2012; Volume 7417, pp. 868–886. [Google Scholar] [CrossRef]
  65. Brakerski, Z.; Gentry, C.; Vaikuntanathan, V. Fully Homomorphic Encryption without Bootstrapping. Cryptology ePrint Archive, Paper 2011/277. 2011. Available online: https://eprint.iacr.org/2011/277.
  66. Dwork, C. Differential privacy. In Proceedings of the International Colloquium on Automata, Languages, and Programming; Springer: Berlin/Heidelberg, Germany, 2006; pp. 1–12. [Google Scholar]
  67. Dwork, C.; Roth, A. The algorithmic foundations of differential privacy. Found. Trends® Theor. Comput. Sci. 2014, 9, 211–407. [Google Scholar] [CrossRef]
  68. Li, N.; Qardaji, W.; Su, D.; Wu, Y.; Yang, W. Membership privacy: A unifying framework for privacy definitions. In Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, Berlin, Germany, 4–8 November 2013; pp. 889–900. [Google Scholar]
  69. Vinterbo, S.A. Differentially private projected histograms: Construction and use for prediction. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer: Berlin/Heidelberg, Germany, 2012; pp. 19–34. [Google Scholar]
  70. Chaudhuri, K.; Monteleoni, C. Privacy-preserving logistic regression. Adv. Neural Inf. Process. Syst. 2008, 21, 1–10. [Google Scholar]
  71. Zhang, J.; Zhang, Z.; Xiao, X.; Yang, Y.; Winslett, M. Functional mechanism: Regression analysis under differential privacy. arXiv 2012, arXiv:1208.0219. [Google Scholar] [CrossRef]
  72. Rubinstein, B.I.; Bartlett, P.L.; Huang, L.; Taft, N. Learning in a large function space: Privacy-preserving mechanisms for SVM learning. arXiv 2009, arXiv:0911.5708. [Google Scholar] [CrossRef]
  73. Jagannathan, G.; Pillaipakkamnatt, K.; Wright, R.N. A practical differentially private random decision tree classifier. In Proceedings of the 2009 IEEE International Conference on Data Mining Workshops, Miami, FL, USA, 6 December 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 114–121. [Google Scholar]
  74. Shokri, R.; Shmatikov, V. Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA, 12–16 October 2015; pp. 1310–1321. [Google Scholar]
  75. Pustozerova, A.; Mayer, R. Information leaks in federated learning. In Proceedings of the Network and Distributed System Security Symposium, San Diego, CA, USA, 23–26 February 2020; Volume 10, p. 122. [Google Scholar]
  76. Zhu, L.; Liu, Z.; Han, S. Deep leakage from gradients. Adv. Neural Inf. Process. Syst. 2019, 32, 1–11. [Google Scholar]
  77. McMahan, H.B.; Ramage, D.; Talwar, K.; Zhang, L. Learning differentially private language models without losing accuracy. CoRR abs/1710.06963 (2017). arXiv 2017, arXiv:1710.06963. [Google Scholar]
  78. Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H.B.; Mironov, I.; Talwar, K.; Zhang, L. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 308–318. [Google Scholar]
  79. Papernot, N.; Song, S.; Mironov, I.; Raghunathan, A.; Talwar, K.; Erlingsson, Ú. Scalable private learning with pate. arXiv 2018, arXiv:1802.08908. [Google Scholar] [CrossRef]
  80. Sabater, C.; Bellet, A.; Ramon, J. Distributed Differentially Private Averaging with Improved Utility and Robustness to Malicious Parties. In Proceedings of the NeurIPS 2020 Workshop on Privacy Preserving Machine Learning-PriML and PPML Joint Edition, Virtual, 11 December 2020. [Google Scholar]
  81. Grivet Sébert, A.; Pinot, R.; Zuber, M.; Gouy-Pailler, C.; Sirdey, R. SPEED: Secure, PrivatE, and efficient deep learning. Mach. Learn. 2021, 110, 675–694. [Google Scholar] [CrossRef]
  82. Dong, X.; Yin, H.; Alvarez, J.M.; Kautz, J.; Molchanov, P.; Kung, H. Privacy Vulnerability of Split Computing to Data-Free Model Inversion Attacks. arXiv 2021, arXiv:2107.06304. [Google Scholar]
  83. Titcombe, T.; Hall, A.J.; Papadopoulos, P.; Romanini, D. Practical defences against model inversion attacks for split neural networks. arXiv 2021, arXiv:2104.05743. [Google Scholar] [CrossRef]
  84. Yang, Q.; Liu, Y.; Chen, T.; Tong, Y. Federated machine learning: Concept and applications. ACM Trans. Intell. Syst. Technol. (TIST) 2019, 10, 1–19. [Google Scholar] [CrossRef]
  85. Truex, S.; Liu, L.; Gursoy, M.E.; Yu, L.; Wei, W. Demystifying membership inference attacks in machine learning as a service. IEEE Trans. Serv. Comput. 2019, 14, 2073–2089. [Google Scholar] [CrossRef]
  86. Wang, Z.; Song, M.; Zhang, Z.; Song, Y.; Wang, Q.; Qi, H. Beyond inferring class representatives: User-level privacy leakage from federated learning. In Proceedings of the IEEE INFOCOM 2019-IEEE Conference on Computer Communications, Paris, France, 29 April–2 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 2512–2520. [Google Scholar]
  87. Bagdasaryan, E.; Veit, A.; Hua, Y.; Estrin, D.; Shmatikov, V. How to backdoor federated learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Online, 26–28 August 2020; PMLR: London, UK, 2020; pp. 2938–2948. [Google Scholar]
  88. Hou, B.; Gao, J.; Guo, X.; Baker, T.; Zhang, Y.; Wen, Y.; Liu, Z. Mitigating the backdoor attack by federated filters for industrial IoT applications. IEEE Trans. Ind. Inform. 2021, 18, 3562–3571. [Google Scholar] [CrossRef]
  89. Fang, M.; Cao, X.; Jia, J.; Gong, N.Z. Local model poisoning attacks to byzantine-robust federated learning. In Proceedings of the 29th USENIX Conference on Security Symposium, Boston, MA, USA, 12–14 August 2020; pp. 1623–1640. [Google Scholar]
  90. Diao, E.; Ding, J.; Tarokh, V. Heterofl: Computation and communication efficient federated learning for heterogeneous clients. arXiv 2020, arXiv:2010.01264. [Google Scholar]
  91. Xu, Z.; Yu, F.; Xiong, J.; Chen, X. Helios: Heterogeneity-aware federated learning with dynamically balanced collaboration. In Proceedings of the 2021 58th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 5–9 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 997–1002. [Google Scholar]
  92. Vepakomma, P.; Gupta, O.; Swedish, T.; Raskar, R. Split learning for health: Distributed deep learning without sharing raw patient data. arXiv 2018, arXiv:1812.00564. [Google Scholar] [CrossRef]
  93. Chen, S.; Jia, R.; Qi, G.J. Improved Techniques for Model Inversion Attacks. 2020. Available online: https://openreview.net/forum?id=unRf7cz1o1.
  94. Zhang, Y.; Jia, R.; Pei, H.; Wang, W.; Li, B.; Song, D. The secret revealer: Generative model-inversion attacks against deep neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 253–261. [Google Scholar]
  95. Wu, M.; Zhang, X.; Ding, J.; Nguyen, H.; Yu, R.; Pan, M.; Wong, S.T. Evaluation of inference attack models for deep learning on medical data. arXiv 2020, arXiv:2011.00177. [Google Scholar] [CrossRef]
  96. He, Z.; Zhang, T.; Lee, R.B. Model inversion attacks against collaborative inference. In Proceedings of the 35th Annual Computer Security Applications Conference, San Juan, PR, USA, 9–13 December 2019; pp. 148–162. [Google Scholar]
  97. Douceur, J.R. The sybil attack. In Proceedings of the Peer-to-Peer Systems: First International Workshop, IPTPS 2002, Cambridge, MA, USA, 7–8 March 2002; Revised Papers 1. Springer: Berlin/Heidelberg, Germany, 2002; pp. 251–260. [Google Scholar]
  98. Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and open problems in federated learning. Found. Trends® Mach. Learn. 2021, 14, 1–210. [Google Scholar] [CrossRef]
  99. Pasquini, D.; Ateniese, G.; Bernaschi, M. Unleashing the tiger: Inference attacks on split learning. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual, 15–19 November 2021; pp. 2113–2129. [Google Scholar]
  100. Hitaj, B.; Ateniese, G.; Perez-Cruz, F. Deep models under the GAN: Information leakage from collaborative deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 603–618. [Google Scholar]
  101. Erdoğan, E.; Küpçü, A.; Çiçek, A.E. Unsplit: Data-oblivious model inversion, model stealing, and label inference attacks against split learning. In Proceedings of the 21st Workshop on Privacy in the Electronic Society, Los Angeles, CA, USA, 7 November 2022; pp. 115–124. [Google Scholar]
  102. Abedi, A.; Khan, S.S. Fedsl: Federated split learning on distributed sequential data in recurrent neural networks. arXiv 2020, arXiv:2011.03180. [Google Scholar] [CrossRef]
  103. Yin, D.; Chen, Y.; Kannan, R.; Bartlett, P. Byzantine-robust distributed learning: Towards optimal statistical rates. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; PMLR: Cambridge MA, USA, 2018; pp. 5650–5659. [Google Scholar]
  104. Blanchard, P.; El Mhamdi, E.M.; Guerraoui, R.; Stainer, J. Machine learning with adversaries: Byzantine tolerant gradient descent. Adv. Neural Inf. Process. Syst. 2017, 30, 1–11. [Google Scholar]
  105. Ei Mhamdi, E.M.; Guerraoui, R.; Rouault, S. The hidden vulnerability of distributed learning in byzantium. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; PMLR: Cambridge MA, USA, 2018; pp. 3521–3530. [Google Scholar]
  106. Shen, S.; Tople, S.; Saxena, P. Auror: Defending against poisoning attacks in collaborative deep learning systems. In Proceedings of the 32nd Annual Conference on Computer Security Applications, Los Angeles, CA, USA, 5–9 December 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 508–519. [Google Scholar]
  107. Zhao, Y.; Li, M.; Lai, L.; Suda, N.; Civin, D.; Chandra, V. Federated learning with non-iid data. arXiv 2018, arXiv:1806.00582. [Google Scholar] [CrossRef]
  108. Wang, H.; Kaplan, Z.; Niu, D.; Li, B. Optimizing federated learning on non-iid data with reinforcement learning. In Proceedings of the IEEE INFOCOM 2020-IEEE Conference on Computer Communications, Toronto, ON, Canada, 6–9 July 2020; pp. 1698–1707. [Google Scholar]
  109. Fung, C.; Yoon, C.J.; Beschastnikh, I. The Limitations of Federated Learning in Sybil Settings. In Proceedings of the RAID, Virtual, 14–16 October 2020; pp. 301–316. [Google Scholar]
  110. Cao, X.; Fang, M.; Liu, J.; Gong, N.Z. Fltrust: Byzantine-robust federated learning via trust bootstrapping. arXiv 2020, arXiv:2012.13995. [Google Scholar]
  111. Melis, L.; Song, C.; De Cristofaro, E.; Shmatikov, V. Exploiting unintended feature leakage in collaborative learning. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2019; pp. 691–706. [Google Scholar]
  112. Goddard, M. The EU General Data Protection Regulation (GDPR): European regulation that has a global impact. Int. J. Mark. Res. 2017, 59, 703–705. [Google Scholar] [CrossRef]
  113. Liu, X.; Li, H.; Xu, G.; Chen, Z.; Huang, X.; Lu, R. Privacy-enhanced federated learning against poisoning adversaries. IEEE Trans. Inf. Forensics Secur. 2021, 16, 4574–4588. [Google Scholar] [CrossRef]
  114. Ma, Z.; Ma, J.; Miao, Y.; Li, Y.; Deng, R.H. ShieldFL: Mitigating model poisoning attacks in privacy-preserving federated learning. IEEE Trans. Inf. Forensics Secur. 2022, 17, 1639–1654. [Google Scholar] [CrossRef]
  115. Wei, K.; Li, J.; Ding, M.; Ma, C.; Yang, H.H.; Farokhi, F.; Jin, S.; Quek, T.Q.; Poor, H.V. Federated learning with differential privacy: Algorithms and performance analysis. IEEE Trans. Inf. Forensics Secur. 2020, 15, 3454–3469. [Google Scholar] [CrossRef]
  116. Geyer, R.C.; Klein, T.; Nabi, M. Differentially private federated learning: A client level perspective. arXiv 2017, arXiv:1712.07557. [Google Scholar]
  117. Hao, M.; Li, H.; Luo, X.; Xu, G.; Yang, H.; Liu, S. Efficient and privacy-enhanced federated learning for industrial artificial intelligence. IEEE Trans. Ind. Inform. 2019, 16, 6532–6542. [Google Scholar] [CrossRef]
  118. Phong, L.T.; Aono, Y.; Hayashi, T.; Wang, L.; Moriai, S. Privacy-Preserving Deep Learning via Additively Homomorphic Encryption. IEEE Trans. Inf. Forensics Secur. 2018, 13, 1333–1345. [Google Scholar] [CrossRef]
  119. Abuadbba, S.; Kim, K.; Kim, M.; Thapa, C.; Camtepe, S.A.; Gao, Y.; Kim, H.; Nepal, S. Can we use split learning on 1d cnn models for privacy preserving training? In Proceedings of the 15th ACM Asia Conference on Computer and Communications Security, Melbourne, Australia, 10–14 July 2020; pp. 305–318. [Google Scholar]
  120. Dwork, C. Differential privacy: A survey of results. In Proceedings of the Theory and Applications of Models of Computation: 5th International Conference, TAMC 2008, Xi’an, China, 25–29 April 2008; Proceedings 5. Springer: Berlin/Heidelberg, Germany, 2008; pp. 1–19. [Google Scholar]
  121. Mireshghallah, F.; Taram, M.; Ramrakhyani, P.; Jalali, A.; Tullsen, D.; Esmaeilzadeh, H. Shredder: Learning noise distributions to protect inference privacy. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 16–20 March 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 3–18. [Google Scholar]
  122. Vepakomma, P.; Gupta, O.; Dubey, A.; Raskar, R. Reducing Leakage in Distributed Deep Learning for Sensitive Health Data. 2019. Available online: https://aiforsocialgood.github.io/iclr2019/accepted/track1/pdfs/29_aisg_iclr2019.pdf.
  123. Li, J.; Rakin, A.S.; Chen, X.; He, Z.; Fan, D.; Chakrabarti, C. Ressfl: A resistance transfer framework for defending model inversion attack in split federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10194–10202. [Google Scholar]
  124. Vepakomma, P.; Singh, A.; Gupta, O.; Raskar, R. NoPeek: Information leakage reduction to share activations in distributed deep learning. In Proceedings of the 2020 International Conference on Data Mining Workshops (ICDMW), Sorrento, Italy, 17–20 November 2020; pp. 933–942. [Google Scholar]
  125. Papernot, N.; Abadi, M.; Erlingsson, U.; Goodfellow, I.; Talwar, K. Semi-supervised knowledge transfer for deep learning from private training data. arXiv 2016, arXiv:1610.05755. [Google Scholar]
  126. Zhang, J.; Gu, Z.; Jang, J.; Wu, H.; Stoecklin, M.P.; Huang, H.; Molloy, I. Protecting intellectual property of deep neural networks with watermarking. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security, Incheon, Republic of Korea, 4–8 June 2018; pp. 159–172. [Google Scholar]
  127. Nagai, Y.; Uchida, Y.; Sakazawa, S.; Satoh, S. Digital watermarking for deep neural networks. Int. J. Multimed. Inf. Retr. 2018, 7, 3–16. [Google Scholar] [CrossRef]
  128. Jagielski, M.; Oprea, A.; Biggio, B.; Liu, C.; Nita-Rotaru, C.; Li, B. Manipulating machine learning: Poisoning attacks and countermeasures for regression learning. In Proceedings of the 2018 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 20–24 May 2018; pp. 19–35. [Google Scholar]
  129. Jia, H.; Choquette-Choo, C.A.; Chandrasekaran, V.; Papernot, N. Entangled Watermarks as a Defense against Model Extraction. In Proceedings of the USENIX Security Symposium, Online, 11–13 August 2021; pp. 1937–1954. [Google Scholar]
Figure 1. (a) In FL, all the clients and the server have the same (global) model architecture and weights. (b) The shared (global) model is partitioned in SL between clients and the server. Instead of training the full model locally and sharing the model weight updates after each training round with the central server (in FL), training is split in SL, and forward and backward passes are carried over iterative training rounds. Our scheme (c) SplitML prevents the leakage caused (in FL and SL) by a global model architecture and model weights as each client in SplitML is allowed to have a different (local) model architecture for their bottom layers after the ‘split’ (or ‘cut’) layer. Clients collaborate through a server employing multi-key or threshold FHE to train the common top layers.
Figure 1. (a) In FL, all the clients and the server have the same (global) model architecture and weights. (b) The shared (global) model is partitioned in SL between clients and the server. Instead of training the full model locally and sharing the model weight updates after each training round with the central server (in FL), training is split in SL, and forward and backward passes are carried over iterative training rounds. Our scheme (c) SplitML prevents the leakage caused (in FL and SL) by a global model architecture and model weights as each client in SplitML is allowed to have a different (local) model architecture for their bottom layers after the ‘split’ (or ‘cut’) layer. Clients collaborate through a server employing multi-key or threshold FHE to train the common top layers.
Electronics 15 00267 g001
Figure 2. (a) Training and (b) Inference phases with single-key FHE and (c) Training phase with multi-key FHE in SplitML . Unlike most schemes, we perform both training and inference using FHE. For single-key FHE, a randomly chosen client generates P K , S K , E K before training and shares E K with the federated server and P K , S K with training participants { P T } . For inference, a requester R generates session keys P K R , S K R , E K R and shares E K R with consensus participants { P C } . R queries { P C } for prediction scores or labels. In multi-key protocol, each client generates its secret key S K i sequentially for partial decryption, and a shared public key P K is calculated at the end. Evaluation keys for an addition E K A d d , multiplication E K M u l t , and fused decryption E K D e c are calculated afterward.
Figure 2. (a) Training and (b) Inference phases with single-key FHE and (c) Training phase with multi-key FHE in SplitML . Unlike most schemes, we perform both training and inference using FHE. For single-key FHE, a randomly chosen client generates P K , S K , E K before training and shares E K with the federated server and P K , S K with training participants { P T } . For inference, a requester R generates session keys P K R , S K R , E K R and shares E K R with consensus participants { P C } . R queries { P C } for prediction scores or labels. In multi-key protocol, each client generates its secret key S K i sequentially for partial decryption, and a shared public key P K is calculated at the end. Evaluation keys for an addition E K A d d , multiplication E K M u l t , and fused decryption E K D e c are calculated afterward.
Electronics 15 00267 g002
Figure 3. Comparing training loss (lower is better) and validation accuracy (higher is better) under model poisoning attack for different configurations (all three clients are honest (no poisoning) and only Client-2 is honest (Model-1, 3 sends poisonous updates)) of SplitML (honest and malicious) and Federated Learning (malicious).
Figure 3. Comparing training loss (lower is better) and validation accuracy (higher is better) under model poisoning attack for different configurations (all three clients are honest (no poisoning) and only Client-2 is honest (Model-1, 3 sends poisonous updates)) of SplitML (honest and malicious) and Federated Learning (malicious).
Electronics 15 00267 g003
Figure 4. Membership Inference (MI) attack using black-box shadow models deploys multiple shadows with synthetic or captured data for membership (Training set) and non-membership (Testing set) to train the attack model, which can either detect membership (In) or non-membership (Out) of a record with high accuracy.
Figure 4. Membership Inference (MI) attack using black-box shadow models deploys multiple shadows with synthetic or captured data for membership (Training set) and non-membership (Testing set) to train the attack model, which can either detect membership (In) or non-membership (Out) of a record with high accuracy.
Electronics 15 00267 g004
Figure 5. Member-Accuracy of Membership Inference attack for different shadow model sizes on different architectures. A = Full model without split, B = Top (common) layers before the split, C = Bottom layers after the split, D = More split layers, E = The same number of split layers but fewer neurons, F = Same layers but more neurons, G = Less number of split layers and H = The Same number of bottom split layers but different activation functions. Attack accuracy on a full model (all layers) was as high as 80 % , while splitting the layers reduced the attack accuracy to 0.5 or 50 % , equivalent to random guessing.
Figure 5. Member-Accuracy of Membership Inference attack for different shadow model sizes on different architectures. A = Full model without split, B = Top (common) layers before the split, C = Bottom layers after the split, D = More split layers, E = The same number of split layers but fewer neurons, F = Same layers but more neurons, G = Less number of split layers and H = The Same number of bottom split layers but different activation functions. Attack accuracy on a full model (all layers) was as high as 80 % , while splitting the layers reduced the attack accuracy to 0.5 or 50 % , equivalent to random guessing.
Electronics 15 00267 g005
Figure 6. Accuracy and loss (in standalone and collaborative settings) for training epochs with learning rate l r { 0.05 , 0.10 } .
Figure 6. Accuracy and loss (in standalone and collaborative settings) for training epochs with learning rate l r { 0.05 , 0.10 } .
Electronics 15 00267 g006
Table 1. Comparison of computation and communication costs per client in different schemes. Assuming | n | < | M | (usually cardinality follows: | q | < | n | < | M | ), SplitML  can help save bandwidth over FL. Here, α | M | denotes the fraction of ML parameters of model M with a client.
Table 1. Comparison of computation and communication costs per client in different schemes. Assuming | n | < | M | (usually cardinality follows: | q | < | n | < | M | ), SplitML  can help save bandwidth over FL. Here, α | M | denotes the fraction of ML parameters of model M with a client.
MethodComputationCommunication
FL [3] D k | M | 2 | M |
SL [4] D k α | M | 2 | q |
SFL [30] D k α | M | 2 | q | + 2 α | M |
SplitML D k | M k | 2 n
Table 2. Accuracy metrics for clients during training under varying settings. M-1, 2, 3 refers to model-1, 2, 3; TA is Training Accuracy; TL is Training Loss; VA is Validation Accuracy; VL is Validation Loss. S1 is heterogeneous models of SplitML in honest setting; S2 is heterogeneous models in poisonous setting; S3 is homogeneous models of SplitML in poisonous setting, and FL is poisonous setting.
Table 2. Accuracy metrics for clients during training under varying settings. M-1, 2, 3 refers to model-1, 2, 3; TA is Training Accuracy; TL is Training Loss; VA is Validation Accuracy; VL is Validation Loss. S1 is heterogeneous models of SplitML in honest setting; S2 is heterogeneous models in poisonous setting; S3 is homogeneous models of SplitML in poisonous setting, and FL is poisonous setting.
TypeS1S2S3FL
M1 M2 M3 M1 M2 M3 M1 M2 M3 M1 M2 M3
TA0.96550.95831.00000.96550.84041.00000.96550.53451.00000.96550.53450.9994
TL0.14650.14410.00000.15010.51150.00000.15010.69080.00000.15530.69100.0151
VA0.96640.95901.00000.96640.89801.00000.96640.53381.00000.96640.53381.0000
VL0.14330.14380.00000.14730.43480.00000.14720.69100.00000.14720.69090.0008
Table 3. Accuracy metrics for five clients in a training phase. S1 refers to heterogeneous models in honest SplitML setting, S2 is heterogeneous models in poisonous SplitML setting, S3 is homogeneous models in poisonous SplitML , and FL is performed under the poisonous setting.
Table 3. Accuracy metrics for five clients in a training phase. S1 refers to heterogeneous models in honest SplitML setting, S2 is heterogeneous models in poisonous SplitML setting, S3 is homogeneous models in poisonous SplitML , and FL is performed under the poisonous setting.
ArchTraining LossValidation Accuracy
M1 M2 M3 M4 M5 M1 M2 M3 M4 M5
S10.13520.15640.23900.00000.00000.96910.96230.91631.00001.0000
S20.13780.15685241.37010.00000.00000.96910.96230.56581.00001.0000
S30.13780.15680.68390.00000.00000.96910.96230.56581.00001.0000
FL8.48 × 10151.03 × 10161.19 × 10173.20 × 10173.43 × 10170.96910.96230.56580.00000.0000
Table 4. Member-accuracy for different shadow models.
Table 4. Member-accuracy for different shadow models.
ShadowsArchitectureMedianMinimumMaximum
Full model0.43740.43020.6816
1Top layers0.50720.50060.5100
Bottom layers0.39700.30460.7036
Full model0.54640.17640.7552
2Top layers0.50640.41860.6028
Bottom layers0.50720.39940.7042
Full model0.63060.34820.8566
3Top layers0.51700.41540.6088
Bottom layers0.49580.30300.6030
Full model0.67500.46440.8170
4Top layers0.49940.40800.5838
Bottom layers0.47780.30040.6784
Full model0.71300.44200.8260
5Top layers0.50150.45420.6166
Bottom layers0.48810.28920.6196
Full model0.76100.47300.8984
6Top layers0.50580.43520.5674
Bottom layers0.39340.37920.5972
Full model0.75720.73240.8428
7Top layers0.47660.45660.5402
Bottom layers0.50540.50160.5856
Full model0.75460.55640.7862
8Top layers0.52420.37140.5594
Bottom layers0.40520.37680.8056
Full model0.83560.81760.8530
9Top layers0.44820.43740.5168
Bottom layers0.50080.39040.6004
Full model0.78860.53240.8270
10Top layers0.52060.43760.5776
Bottom layers0.40200.19100.8900
Table 5. Return-1 model shows our dataset’s balanced distribution of two classes.
Table 5. Return-1 model shows our dataset’s balanced distribution of two classes.
TypeAccuracyPrecisionRecallF1-Score
Full (100%)0.49990.49991.00000.6666
Test (20%)0.50160.50161.00000.6681
Table 6. Consensus results with l r = 0.10 .
Table 6. Consensus results with l r = 0.10 .
TypeAccuracyPrecisionRecallF1-Score
Model-10.87700.87491.00000.9333
Model-20.73660.93190.74850.8302
Model-30.86040.86041.00000.9249
TL0.73660.93190.74850.8302
TP0.88770.88451.00000.9387
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Trivedi, D.; Boudguiga, A.; Kaaniche, N.; Triandopoulos, N. SplitML: A Unified Privacy-Preserving Architecture for Federated Split-Learning in Heterogeneous Environments. Electronics 2026, 15, 267. https://doi.org/10.3390/electronics15020267

AMA Style

Trivedi D, Boudguiga A, Kaaniche N, Triandopoulos N. SplitML: A Unified Privacy-Preserving Architecture for Federated Split-Learning in Heterogeneous Environments. Electronics. 2026; 15(2):267. https://doi.org/10.3390/electronics15020267

Chicago/Turabian Style

Trivedi, Devharsh, Aymen Boudguiga, Nesrine Kaaniche, and Nikos Triandopoulos. 2026. "SplitML: A Unified Privacy-Preserving Architecture for Federated Split-Learning in Heterogeneous Environments" Electronics 15, no. 2: 267. https://doi.org/10.3390/electronics15020267

APA Style

Trivedi, D., Boudguiga, A., Kaaniche, N., & Triandopoulos, N. (2026). SplitML: A Unified Privacy-Preserving Architecture for Federated Split-Learning in Heterogeneous Environments. Electronics, 15(2), 267. https://doi.org/10.3390/electronics15020267

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop