Next Article in Journal
The Evaluation of a Double-Spend Attack Probability for Ouroboros-like Proof-of-Stake Consensus
Previous Article in Journal
Empirical Evaluation of Android Browser Forensics and Artifact Persistence
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Vertical Federated XGBoost with Privacy Preservation via Secure Multiparty Computation

1
School of Computing Technologies, RMIT University, Melbourne, VIC 3000, Australia
2
CSIRO Data61, Clayton, VIC 3168, Australia
*
Author to whom correspondence should be addressed.
J. Cybersecur. Priv. 2026, 6(3), 79; https://doi.org/10.3390/jcp6030079
Submission received: 12 March 2026 / Revised: 13 April 2026 / Accepted: 21 April 2026 / Published: 1 May 2026
(This article belongs to the Section Privacy)

Abstract

Gradient Boosted Decision Trees (GBDTs) are popular for their strong predictive performance. However, in domains like finance and healthcare, data are often distributed across organizations, making collaborative model training challenging due to privacy concerns. Vertical federated learning (VFL) enables such collaboration when data are split by features, but many existing methods focus on protecting raw data while exposing sensitive model information, such as gradients and Hessians—especially to the label-owning party. Techniques like Homomorphic Encryption and Secret Sharing help, but often rely on trusted or privileged parties and may still leak intermediate statistics. To address this, we propose MPC-XGB, a privacy-preserving framework for training XGBoost under VFL with an honest-but-curious threat model. It uses secure three-party computation with Replicated Secret Sharing, distributing data across non-colluding servers and performing all computations on shares. This ensures that raw data, labels, and model statistics remain hidden, while supporting both secure training and prediction. Experiments show that MPC-XGB achieves strong performance (0.93 accuracy, 0.82 AUC), comparable to that of existing methods, with improved privacy guarantees.

1. Introduction

A Gradient Boosted Decision Tree (GBDT) is a well-established machine learning (ML) technique widely adopted for its strong predictive performance, scalability, and ability to handle diverse data types with minimal pre-processing. However, as with any ML model, their performance benefits from access to large and diverse training data, often achievable only through extensive data collection. This reliance on data introduces serious privacy concerns, especially when dealing with sensitive user information (e.g., finance and healthcare). Due to privacy regulations such as the General Data Protection Regulation (GDPR) [1], the Consumer Privacy Bill of Rights (CPBR) [2] in the United States, and the Privacy Act 1988 in Australia [3], there is a growing need to secure data-driven techniques using privacy-preserving principles. This article is an extended version of our previously published short paper at IEEE BigData 2025 [4]. Compared to the earlier work, this paper provides a comprehensive description of the proposed algorithms, a detailed threat model and security analysis, and an expanded experimental evaluation.
Federated learning (FL) [5,6] offers a decentralized solution by enabling multiple parties to collaboratively train ML models without exchanging raw data. This approach has gained popularity in privacy-sensitive domains where data are naturally distributed across multiple organizations, for example, financial institutions jointly detecting fraudulent transactions while each party holds a portion of user records or transaction details, or healthcare providers improving personalized treatment recommendations by leveraging complementary patient data without exposing sensitive health records. Despite its advantages, FL still raises significant privacy concerns, as model updates can inadvertently leak sensitive information [7,8,9,10].
Within FL, Horizontal federated learning (HFL) partitions data by samples while sharing the same feature space [6]. HFL has been extensively studied with privacy-enhancing techniques such as Homomorphic Encryption (HE) [11,12,13], Differential Privacy (DP) [14,15,16], and Multi-Party Computation (MPC) [17,18]. These privacy-preserving mechanisms are used along with HFL to protect model updates during collaborative training [5,19,20,21,22]. In contrast, Vertical federated learning (VFL) partitions data by features across parties, typically with a single label-holding party. While many HFL techniques exist, VFL remains comparatively underexplored. Notable VFL works include HE-based SecureBoost [23], additive secret sharing schemes [24], hybrid HE+DP approaches [25], and recent secure gradient boosting variants [23,24,26,27,28,29,30,31,32].
In VFL, one Active Party (AP) holds the data features along with labels, while one or more Passive Parties (PPs) hold additional features [33]. Unlike HFL, where each party can train a submodel locally, VFL requires cross-party interaction during training, making it particularly susceptible to privacy leakage, especially towards the AP, which performs label-dependent computations [23,33]. Existing VFL frameworks typically rely on either (i) direct communication between the AP and PPs, in which intermediate values such as first and second-order derivatives of the training loss (gradients and Hessians) are computed and transmitted, exposing either labels or other model statistics and risking leakage [23,26,27,28,29,30]; or (ii) a single intermediary server that coordinates training but still learns sensitive values [33]. These designs undermine the goal of privacy-preserving federated learning. As illustrated in Figure 1a, a conventional VFL setup relies on direct communication between the AP, PP, and a central coordinator, where label-dependent gradients and Hessians from the AP and feature-derived statistics from the PP may be exchanged during training. Such exposure of intermediate model statistics can reveal sensitive information about labels or feature distributions to other entities involved in the training process.
For decision trees in VFL, SecureBoost [23], a HE-based vertical XGBoost method, hides raw features, but the AP remains privileged because it decrypts passive-side encrypted split statistics derived from the AP’s label-dependent gradients/Hessians. An additive SS scheme [27] allows each party to reconstruct the other’s inputs because each holds one original value and one share of the secret. A hybrid HE+DP approach for fraud detection in VFL [25] permits the AP to decrypt sensitive features from the PP for tree construction. As noted in the recent survey on tree-based VFL [34], the area has expanded beyond early HE-based designs to include more scalable, system-oriented, and privacy-enhanced variants such as SecureBoost+, Pivot, and OpBoost [35,36,37]. These studies show that tree-based VFL is increasingly practical, but they also highlight recurring trade-offs among privacy, efficiency, and trust assumptions. In particular, existing methods often still rely on revealing some intermediate statistics, introducing a trusted or semi-trusted coordinator, or sharing sensitive information among parties such as PP feature information sharing with AP. This leaves room for stronger end-to-end protection of labels, features, gradients, and Hessians during VFL tree training [34]. Some secure tree-learning frameworks further expand this design space. For example, SecureXGB proposes a secure and efficient multi-party protocol for vertical federated XGBoost, SiGBDT develops a large-scale vertically partitioned GBDT training framework using function secret sharing, and Guard-GBDT improves efficiency through approximations tailored to privacy-preserving GBDT training on vertical data [38,39,40]. These systems reinforce that recent progress has focused strongly on efficiency and scalable secure training while still leaving important design choices around leakage, trust assumptions, and protection of intermediate statistics.
Unlike prior VFL tree-training methods that protect selected components of the pipeline while still exposing sensitive intermediate statistics to a privileged party [23,25,27,34], our goal is to remove such privileged visibility from the training architecture itself. The conceptual novelty of our proposed method, MPC-XGB, lies in combining vertical XGBoost training with three-party replicated secret sharing in a way that moves all label-dependent split evaluation for passive features into MPC so that raw features, labels, per-instance gradients, and Hessians are not revealed to either training party or to any single server.
Table 1 highlights the key distinction between prior secure VFL tree-training methods and MPC-XGB in terms of protection of sensitive information. HE-based methods such as SecureBoost mainly protect raw feature values during cross-party computation, yet still leave the AP in a privileged position because passive-side split evaluation depends on label-derived statistics that are revealed, transferred, or inferable during training. Hybrid designs that combine HE with DP or related approximations improve efficiency and scalability, but typically do so by allowing for controlled disclosure, relying on a semi-trusted coordinator, or perturbing intermediate outputs in ways that trade privacy for utility. Additive secret-sharing approaches reduce direct plaintext exchange, but selected values may still be reconstructed by one or more training parties; so, confidentiality is not preserved end-to-end for all intermediate statistics. In contrast, MPC-XGB protects all five categories shown in Table 1, namely labels, AP-side gradients and Hessians, PP-side gradients and Hessians, AP features, and PP features, by redesigning the training architecture so that no party acts as a privileged recipient of label-dependent or feature-derived information. Instead, sensitive split evaluation is carried out entirely on secret shares across three non-colluding servers, and only the minimal split outcome required for tree growth is revealed, yielding stronger end-to-end privacy guarantees while preserving XGBoost functionality under VFL.
  • Our Framework:
To address the privacy limitations of existing VFL tree-training architectures shown in Figure 1a, we propose MPC-XGB, a three-server framework for privacy-preserving XGBoost [41] under vertical federated learning, as illustrated in Figure 1b. The key architectural idea is to remove the privileged role typically held by the label-owning party or an intermediary coordinator during split evaluation. Instead, the AP and PP secret-share their sensitive inputs with three non-colluding servers: the AP secret-shares the label-dependent gradients and Hessians, while the PP secret-shares binarized feature indicators derived from its local threshold candidates. The three servers then jointly evaluate the passive-party split gains using MPC and return only the best passive split information required for tree construction. Under the semi-honest, non-colluding assumption, raw features, labels, per-instance gradients, and Hessians remain protected throughout training such that:
  • No single party or server can reconstruct any private data or intermediate values;
  • The protocol is secure under a semi-honest threat model, assuming no two servers collude.
Here, all participating parties act as clients, collaborating through three non-colluding servers that securely perform computations necessary for training decision trees. The AP secret-shares gradients and Hessians, PPs secret-share binarized features, and all sensitive computations occur exclusively on secret shares. Compared to our IEEE BigData short paper [4], this journal version presents a detailed algorithmic specifications and threat model, and an in-depth security discussion, formally reasoning about privacy guarantees under the assumed adversarial setting.
  • Our Contributions:
    • We present MPC-XGB, a privacy-preserving protocol and system architecture for vertical XGBoost training that removes the need for any privileged training party to access another party’s sensitive intermediate statistics.
    • We show how label-dependent quantities and passive-side split evaluation can be carried out entirely on secret shares using three-party replicated secret sharing, thereby protecting raw features, labels, gradients, and Hessians end-to-end during training.
    • We provide a detailed end-to-end algorithmic description of the proposed framework, along with a clearly defined threat model, extensive security analysis and leakage profile, demonstrating correctness and confidentiality under a semi-honest adversarial assumption with non-colluding servers.
    • We substantially extend the experimental evaluation by adding comparisons with additional baseline methods and evaluating the framework on an additional dataset, resulting in more comprehensive performance analysis compared to our earlier work.

2. Technical Preliminaries

This section outlines the core technical foundations of our work, including an overview of XGBoost, secure three-party computation, and a baseline VFL-XGBoost system without additional privacy enhancements.

2.1. XGBoost

XGBoost builds an ensemble of decision trees over T boosting rounds, where each new tree is added to improve the current predictions [41,42]. For binary classification, the initial prediction is commonly set using the log-odds of the positive class:
y ^ i ( 0 ) = log μ 1 μ ,
where μ is the mean of the binary labels. The current prediction is converted to a probability using the sigmoid function:
p i = 1 1 + e y ^ i ,
and the first- and second-order derivatives of the logistic loss are:
g i = p i y i , h i = p i ( 1 p i ) .
At each node, XGBoost evaluates candidate splits by partitioning the instance set I into left and right subsets, I L and I R , and selecting the split that maximizes the gain:
L split = 1 2 i I L g i 2 i I L h i + λ + i I R g i 2 i I R h i + λ i I g i 2 i I h i + λ γ ,
where λ is the leaf regularization parameter and γ penalizes new splits. If no beneficial split is found, the node becomes a leaf with weight:
w j = i I j g i i I j h i + λ ,
where I j is the set of instances in leaf j.
After a tree is built, predictions are updated as follows:
y ^ i ( t ) = y ^ i ( t 1 ) + η w j t ( i ) ( t ) ,
where η is the learning rate and w j t ( i ) ( t ) is the weight of the leaf reached by instance i in tree t. Repeating this process for T rounds yields the final prediction:
y ^ i ( T ) = y ^ i ( 0 ) + η t = 1 T w j t ( i ) ( t ) .
  • Approximate Split-Finding:
To reduce the cost of evaluating all possible thresholds, XGBoost typically uses approximate split-finding based on quantiles. For each feature f k , a set of candidate thresholds Θ k = { θ k 1 , θ k 2 , } is constructed. Gradient and Hessian statistics are then aggregated within quantile bins:
G k v = i : θ k , v 1 x i k < θ k v g i , H k v = i : θ k , v 1 x i k < θ k v h i ,
which enables efficient evaluation of split gains in Equation (4).

2.2. Secure Three-Party Computation with Replicated Secret Sharing

We employ a three-party computation protocol [43], secure against semi-honest adversaries under an honest majority over the ring Z 2 with a ( 2 , 3 ) replicated secret sharing (RSS) in which each party holds two ring elements. The protocol allows multiple parties to jointly evaluate arithmetic circuits over private inputs without revealing any information beyond the final output.
  • Replicated secret sharing:
Let v Z 2 . Sample x 1 , x 2 , x 3 Z 2 such that x 1 + x 2 + x 3 0 ( mod 2 ) and define:
a 1 = x 3 v , a 2 = x 1 v , a 3 = x 2 v ( mod 2 ) .
Party P i receives the pair ( x i , a i ) . Any two parties can reconstruct v (e.g., v = x 1 a 2 = x 2 a 3 = x 3 a 1 ).
  • Addition:
Given ( x i , a i ) sharing v 1 and ( y i , b i ) sharing v 2 , each party computes locally (no communication):
z i = x i + y i , c i = a i + b i ( mod 2 ) .
The pairs ( z i , c i ) form a valid sharing of v 1 + v 2 since c i = ( x i 1 + y i 1 ) ( v 1 + v 2 ) .
  • Multiplication:
Let ( x i , a i ) share v 1 and ( y i , b i ) share v 2 . Assume parties P 1 , P 2 , P 3 hold correlated randomness α , β , γ Z 2 such that α + β + γ 0 ( mod 2 ) . Write ρ 1 = α , ρ 2 = β , ρ 3 = γ and define the cyclic index i 1 : = ( i mod 3 ) + 1 . The product v 1 v 2 is computed in one round as follows:
  • Compute masked values and send (one element each):
    r i = a i b i x i y i + ρ i 3 ,
    send r i from P i to P i 1 ( i { 1 , 2 , 3 } ), where division by 3 is well-defined since gcd ( 3 , 2 ) = 1 .
  • Locally convert to a ( 2 , 3 ) -sharing of v 1 v 2 :
    ( z i , c i ) = r i 2 r i , 2 r i 2 r i ( mod 2 ) ,
    for i { 1 , 2 , 3 } .
The resulting pairs ( z i , c i ) constitute a correct ( 2 , 3 ) -RSS of v 1 v 2 . In prior privacy-preserving VFL research, some methods have adopted additive secret sharing to protect intermediate values such as gradients, Hessians, or split statistics during training [27]. In additive secret sharing, a secret is decomposed into shares whose sum reconstructs the value, which makes it simple and effective for basic secure aggregation. However, in two-party VFL settings, this approach has important limitations. Since the participating parties already possess their own local information and jointly hold the shares needed for reconstruction, sensitive intermediate statistics may become recoverable during protocol execution, creating a potential privacy risk. In addition, secure multiplication under additive secret sharing generally requires extra interaction rounds or additional preprocessing, which can introduce substantial communication and computational overhead for tree-based training, where many multiplications and comparisons are needed. By contrast, RSS in a three-party honest-majority setting provides stronger separation of knowledge, as no single party holds enough information to reconstruct the secret, while also enabling more efficient secure arithmetic. These properties make RSS better suited for our VFL XGBoost framework, where secure and repeated split evaluation is a core requirement.

3. System and Threat Model

This section formalizes our privacy-preserving VFL framework for training XGBoost under secure three-party computation. The system follows a client–server architecture where the AP and PP act as clients and three non-colluding servers run ( 2 , 3 ) replicated secret sharing over Z 2 . All sensitive information (raw features, labels, per-instance gradients/Hessians) is secret-shared and evaluated as arithmetic circuits, and each server observes only two shares and never learns any sensitive plaintext value. We adopt an honest-but-curious (semi-honest) adversarial model with at most one corrupted computer server and non-colluding parties. The design overcomes dominant VFL leakage issues e.g., label leakage from gradient exchange and inference of PP feature data, by ensuring that intermediate, label-dependent statistics remain on shares and are never reconstructed during training or prediction. Consequently, no raw features, labels or gradients are disclosed to any party during training and inference.

3.1. System Architecture

The system comprises three roles, as shown in Figure 1b. The datasets and notation follow the preliminaries (Equations (13)–(15)).
  • Active party (AP): Holds D A (Equation (14)) for n aligned instances. At each node with instance set I, the AP computes ( g i , h i ) and evaluates AP-side candidate splits for quantile-based candidate thresholds per feature as in the preliminaries’ approximate split-finding, and coordinates overall training.
  • Passive party (PP): Holds D P (Equation (15)) with m 2 features (no labels). It forms quantile-based candidate thresholds per feature and produces bin indicators (per feature, per threshold) for secure gain evaluation at the servers.
  • Secure servers: Three non-colluding, semi-honest servers ( S 1 , S 2 , S 3 ) running ( 2 , 3 ) replicated secret sharing over Z 2 . Given secret shares of AP’s ( g i , h i ) and PP’s bin indicators, they compute split gains for all passive features and thresholds.

Notation

Let F A = { f A u } u = 1 m 1 and F P = { f P v } v = 1 m 2 denote AP and PP feature sets, respectively. Each feature f A u (resp. f P v ) has a finite candidate-threshold set Θ A u (resp. Θ P v ), as introduced in the preliminaries. At a tree node with instance set I { 1 , , n } , a candidate ( f A u , θ ) or ( f P v , θ ) with θ Θ A u or θ Θ P v induces a partition I ( I L , I R ) .

3.2. Threat Model

We consider a semi-honest (honest-but-curious) adversarial model in which all entities follow the prescribed protocol but may attempt to infer additional information from their local views. The system comprises the Active Party ( A ), one or more Passive Parties ( P 1 , , P k ), and three non-colluding servers ( S 1 , S 2 , S 3 ) that execute three-party RSS over the ring Z 2 . Samples are entity-aligned across parties with vertically partitioned features, and only A holds labels.
We assume an adversary may corrupt at most one party (either A or some P i ) or one server during protocol execution. We also discuss the case where one party colludes with one server. Our privacy guarantees hold under the honest-majority, non-colluding assumption of three-party RSS.

3.2.1. Independent Corruption

Server corruption: Under three-party RSS with honest majority ( t = 1 ), the view of any single server is information-theoretically private and does not reveal raw features, labels, gradients, Hessians, or other plaintext intermediate values.
Party corruption: A corrupted A may observe the final trained model and the intentionally revealed passive split outputs used during training; a corrupted P i may observe only its own inputs and protocol messages. However, raw features of the other parties, labels, per-instance gradients/Hessians, and unselected candidate statistics are never revealed in plaintext and remain protected beyond the defined leakage profile.

3.2.2. Collusion Between One Party and One Server

We also consider collusion between a single party and a single server. In this case, the joint view remains insufficient to reconstruct another party’s private inputs or hidden intermediate statistics because one server alone does not hold enough information to reconstruct RSS-shared values, and plaintext values are never revealed at the servers.

3.2.3. Threat Boundaries

Our guarantees do not extend beyond the above corruption threshold. In particular, if two servers collude, the privacy guarantee of the current RSS setting no longer holds, since two parties together can reconstruct shared secrets. Likewise, collusion between A and P i trivially combines labels and all vertically partitioned features, revealing the full dataset. Further, malicious adversaries that deviate from the protocol, such as by sending malformed shares or manipulating intermediate computations, are outside the scope of the present semi-honest analysis.
The semi-honest, honest-majority setting is a standard starting point in secure multi-party computation and has been widely adopted in prior three-party computation literature [43,44,45]. In our setting, the assumption of non-colluding servers is motivated by deployment across independent administrative domains. Extending MPC-XGB to maliciously secure protocols, such as variants with consistency checks, MAC-based protections, or other active-security mechanisms, is an important direction for future work.

4. Proposed Method: MPC-XGB

This section formalizes our privacy-preserving VFL framework under secure three-party computation. We introduce a baseline VFL with no additional privacy first, and then we move to our proposed privacy preserving framework.

4.1. VFL-XGBoost (No-Privacy Baseline)

This subsection formalizes a non-private baseline for training XGBoost under VFL. The system comprises the AP, that holds labels and local features, and one or more PPs that hold data features only. This baseline serves as the comparison point for our subsequent privacy-preserving three-party protocol. Let the training dataset be:
D = x ( i ) , y i i = 1 n , y i { 0 , 1 } .
The AP holds:
D A = f A 1 ( i ) , f A 2 ( i ) , , f A m 1 ( i ) , y i i = 1 n ,
For ease of notation, we describe the case with a single PP; the multi-PP case repeats the PP-side steps in parallel. Denote the PP’s dataset as follows:
D P = f P 1 ( i ) , f P 2 ( i ) , , f P m 2 ( i ) i = 1 n .
i.e., m 2 local features but no labels.
Feature notation: Let F A = { f A 1 , , f A m 1 } and F P = { f P 1 , , f P m 2 } denote the feature sets owned by the AP and PP, respectively. For instance i, write F A ( i ) = ( f A 1 ( i ) , , f A m 1 ( i ) ) and F P ( i ) = ( f P 1 ( i ) , , f P m 2 ( i ) ) , so that x ( i ) = ( F A ( i ) , F P ( i ) ) . All parties share the same aligned index set { 1 , , n } .

4.1.1. Baseline Training Flow

The steps involved in the training of VFL XGBoost are given below:
  • Node statistics at the AP: At a current node with instance set I, the AP computes probabilities and first/second derivatives { p i , g i , h i } i I using Equations (2) and (3).
  • Share label-dependent statistics: The AP transmits { ( g i , h i ) } i I to the PP.
  • PP-side split search: Using the received { g i , h i } for I, the PP evaluates candidate splits over its local features (using quantile thresholds per feature), computes the split gain via Equation (4), and returns its best split to the AP (feature ID, threshold and gain).
  • AP-side split search (in parallel): The AP evaluates candidate splits over its own features (using quantile thresholds per feature) by calculating the split gain in Equation (4).
  • Selection of overall best split: The AP compares the best PP candidate with the best AP candidate and chooses the higher gain split. The node’s instance set I is partitioned into I L and I R accordingly. If the PP’s split is chosen, the AP requests I L / I R from the PP.
  • Recursive process: Repeat Steps 1–5 independently on each child node until the round’s stopping criteria are met (e.g., maximum depth, minimum samples, or no positive gain), yielding one tree. Compute leaf weight given by Equation (5).
  • Boosting Iterations: After the tree is built, the AP updates the predictions y ^ ( t ) as in Equation (6), then recomputes p i , g i , h i for the next round from the updated predictions.

4.1.2. Limitations of Non-Private VFL-XGBoost

Although the raw features or labels are not shared directly with other parties in this approach, this baseline exposes sensitive information, i.e., sharing { g i , h i } reveals label-dependent statistics to the PP. For common losses (e.g., logistic), g i and h i correlate with y i and the AP’s current predictions. Repeated exposure at every node and round enables the PP to infer labels with high confidence. The next section introduces our privacy-preserving protocol, which mitigates these vulnerabilities by introducing three-party computation.

4.2. MPC-XGB

Figure 2 summarizes one iteration. At each node with instance set I, the AP computes ( p i , g i , h i ) using Equations (2) and (3) and evaluates AP-side candidates with the split gain L split (Equation (4)). In parallel, the PP forms quantile thresholds Θ P v for each feature f P v , binarizes values according to the thresholds, and secret-shares these along with the AP’s secret-shared ( g i , h i ) to the three servers. The servers aggregate per-threshold ( G , H ) (Equation (8)) and compute L split for all passive candidates, revealing only the argmax tuple (gain, feature index, threshold) to the AP. The AP compares this passive argmax with its best local candidate and selects the global maximizer of L split . The selected split partitions I into ( I L , I R ) and the process of finding best splits recurses until stopping criteria hold. Leaf weights are then set via Equation (5), the tree is added to the ensemble, and predictions are updated using Equation (6). The process repeats for T boosting rounds.

4.3. Objective Formulation of MPC-XGB

At a node with instance set I, let C A ( I ) and C P ( I ) denote the candidate splits formed from the active-party and passive-party features, respectively, using their corresponding quantile thresholds. The node-level split selection is:
c = arg max c C A ( I ) C P ( I ) L split ( c ) ,
where L split is the XGBoost split gain in Equation (4).
MPC-XGB preserves this objective while decomposing its evaluation according to feature ownership. The AP computes the best split over C A ( I ) locally using its own features and label-dependent derivatives. In parallel, the three MPC servers securely compute the best split over C P ( I ) from secret shares of the AP’s gradients and Hessians and the PP’s binarized feature indicators. Only the best passive split tuple L split P , v , θ is revealed to the AP, which then compares it with the best active split and selects the overall maximizer.
Therefore, MPC-XGB preserves the same split-selection rule as non-private VFL-XGBoost while shifting passive-side split evaluation from plaintext computation to replicated-secret-shared MPC. The step-by-step training flow is discussed below:
  • Step 1: Local Gradient/Hessian Computation (AP) and Feature Binarization (PP)
    • AP:
      For each i I , compute ( p i , g i , h i ) via Equations (2) and (3).
      For each active feature f A u , form quantile-based thresholds Θ A u and find split gain for AP-side candidates with L split (Equation (4)).
    • PP:
      For each passive feature f P v , obtain candidate thresholds Θ P v using the quantile-based method.
      For each threshold θ Θ P v , binarize feature values as follows:
      f ˜ P v ( i , θ ) = 1 , if f P v ( i ) θ , 0 , otherwise ,
      where f P v ( i ) is the value of feature f P v for instance i. This binarized feature encodes whether instance i would be placed in right or left branch if split at θ .
  • Step 2: Secret Sharing to Three Servers
    • AP: shares { g i } i I and { h i } i I for the current node’s instance set I.
    • PP: for each feature f P v and each threshold θ Θ P v , shares:
      v , θ , { f ˜ P v ( i , θ ) } i I .
    • Shares are computed as defined in Section 2.2 (replicated secret sharing over Z 2 ).
  • Step 3: Secure Split Evaluation (MPC Servers)
    • From shares, compute the sums G = i I g i and H = i I h i .
    • For each passive candidate ( v , θ ) , compute left/right sums of G and H.
    • Evaluate the split objective L split ( v , θ ) using Equation (4), and reveal only:
      L split , v , θ
      where L split = max v , θ L split ( v , θ ) denotes the maximum split gain (max split gain) and ( v , θ ) is the corresponding feature id and threshold value.
  • Step 4: Best Global Split Selection (AP)
    • AP compares the best local candidate from Step 1, ( u , θ A , L split A ) , with the best passive candidate from Step 3, ( v , θ , L split P ) .
    • Choose the higher gain:
      Active if L split A L split P ; otherwise , Passive .
    • Insert the node in tree based on the best global split.
  • Step 5: Node partitioning and Recursion
    • Partition the current instance set I according to the best split in Step 4:
      Active split: I L = { i I : f A u ( i ) θ A } , I R = I I L . The AP partitions locally and shares I L with the PP.
      Passive split: I L = { i I : f P v ( i ) θ } , I R = I I L . The PP partitions and returns I L to the AP.
    • For each child node with instance set I { I L , I R } , recursively compute maximum gain until a stopping criterion holds (i.e., max L split 0 , depth limit, or | I | < min _ sample ), create a leaf and assign its weight via Equation (5).
  • Step 6: Boosting Iterations
    • After the tree t is finalized, update predictions using Equation (6).
    • Add the tree to the ensemble and proceed to the next round; recompute ( p i , g i , h i ) via Equations (2) and (3).
    • Repeat Steps 1–5 until T trees are built.

4.4. Boosting Round Initialization and Gradient Sharing

Algorithm 1 presents the overall training loop for our MPC-XGB framework. The procedure follows the principle of gradient boosting, where T decision trees are constructed sequentially, each correcting the residual errors of the previous trees.
Algorithm 1 Privacy-Preserving VFL-XGBoost Training
  1:
Input:
  • Active-party dataset D A of size n × m 1 with features F A = { f A u } u = 1 m 1 and labels { y i } i = 1 n
  • Passive-party dataset D P of size n × m 2 with features F P = { f P v } v = 1 m 2 (Equations (14) and (15))
  2:
Output: Ensemble of T decision trees
  3:
for  t = 1 to T do                      ▷ Build T trees
  4:
    At Active Party:
  5:
    for  i = 1 to n do
  6:
        if  t = = 1  then
  7:
           Initialize prediction y ^ i ( 0 ) via Equation (1)
  8:
        else
  9:
           Use predictions y ^ i ( t 1 ) calculated from the previous round using Equation (6)
10:
        end if
11:
        Compute p i , g i , h i using Equations (2) and (3)
12:
    end for
13:
    Secret-share { g i } i = 1 n and { h i } i = 1 n to servers S 1 , S 2 , S 3 using RSS as defined in the Secure Three-Party Computation subsection (Equation (9))
14:
    Initialize empty tree T r e e ( t ) ; set root instance set I = { 1 , , n }
15:
     T r e e ( t ) Tree _ Construction ( I , G ( t ) , H ( t ) , T r e e ( t ) )            ▷ Algorithm 2
16:
    Add T r e e ( t ) to the ensemble
17:
    Update predictions for all i using Equation (6)
18:
end for
19:
return Ensemble of T trees
At the beginning of each boosting round, the AP maintains prediction scores y ^ i ( t ) for every instance. i { 1 , , n } initializes y ^ ( 0 ) by Equation (1), derives ( p i , g i , h i ) via Equations (2) and (3), and secret-shares { g i , h i } to the servers (RSS as in Section 2.2). The root node starts with I = { 1 , , n } . The recursive tree-building procedure (Tree_Construction) is then invoked, which partitions instances, assigns optimal leaf weights (Equation (5)), and evaluates splits using Equation (4). After the tree is finalized, it is added to the ensemble, and the AP updates its predictions using the new leaf weights. This iterative process continues for T rounds, gradually refining predictions and yielding a final ensemble of decision trees trained in a privacy-preserving manner.

4.5. Recursive Tree Construction and Secure Split Selection

Algorithm 2 outlines the recursive tree construction procedure and split selection across both parties.The AP evaluates gains on F A locally (Equation (4)), and the PP binarizes F P against Θ P v and secret-shares binarized features with three servers.
The servers perform secure gain computation for each passive feature using the secret-shared gradients and Hessians. Only the best-performing feature-threshold pair is revealed to the AP. If the chosen split belongs to the AP, it performs the instance space partitioning and shares I L with PP. Otherwise, PP partitions the data and shares I L with AP.
Each recursive call of split selection processes a smaller subset of instances, and the process continues until stopping criteria are met, i.e, non-positive gain, maximum depth reached, or | I | < min _ sample . At that point, a leaf weight is computed using the Equation (5).
Algorithm 2 Tree_Construction(I, G ( t ) , H ( t ) , T r e e ( t ) )
  • Input:
    • Instance space of current node I
    • G ( t ) = { g i ( t ) } i I and H ( t ) = { h i ( t ) } i I from AP
    • Feature sets F A = { f A u } u = 1 m 1 , F P = { f P v } v = 1 m 2
  • Output: Constructed T r e e ( t ) with privacy preservation
  1:
for each party c { A , P }  do
  2:
    for each feature f F c  do
  3:
        Form candidate thresholds Θ A u , Θ P v
  4:
    end for
  5:
    if  c = A then                ▷ Active Party local search
  6:
        for each f = f A u F A  do
  7:
           for each θ Θ A u  do
  8:
               Compute L split ( A ) ( u , θ ) via Equation (4)
  9:
           end for
10:
        end for
11:
    else                         ▷ Passive Party
12:
        for each f = f P v F P  do
13:
           for each θ Θ P v  do
14:
               Binarize feature values { f ˜ P v ( i , θ ) } i I
15:
               Secret-share { f ˜ P v ( i , θ ) } i I , feature id v, and θ to S 1 , S 2 , S 3 using RSS (Equation (9))
16:
           end for
17:
        end for
18:
    end if
19:
end for
20:
At Servers: GainComputationServer()            ▷ Algorithm 3
21:
At Active Party:
22:
node arg max L split A , f A , θ A , L split P , f P , θ P
23:
if node meets stopping criteria then
24:
    Set leaf weight w via Equation (5); insert leaf , w into T r e e ( t ) ; return
25:
else
26:
    Insert internal split f * , θ * to T r e e ( t )
27:
    Partition I into I L and I R according to f * , θ *
28:
    Tree_Construction( I L , G ( t ) , H ( t ) , T r e e ( t ) )
29:
    Tree_Construction( I R , G ( t ) , H ( t ) , T r e e ( t ) )
30:
end if

4.6. Secure Gain Computation on MPC Servers

Algorithm 3 details how the MPC servers collaboratively compute split gains over secret shares without revealing private values. Given shares of { g i , h i } and { f ˜ P v ( i , θ ) } , servers compute ( G L , H L ) and ( G R , H R ) , derive ( G , H ) , and evaluate L split entirely over shares. The computation is conducted entirely over secret shares, ensuring that no server learns any private values. Only L split P , v , θ is opened to the AP; all unselected candidates remain hidden.
To maintain correctness over finite fields, all arithmetic operations including addition, multiplications and divisions are carefully implemented using secret-sharing protocols, as defined in Section 2.2.

4.7. MPC-XGB Inference

After training is complete, inference for new samples proceeds using the trained ensemble. Algorithm 4 outlines how inference is performed locally at the AP without revealing or accessing raw passive features. The AP performs all prediction steps by evaluating AP splits locally and following recorded PP splits using the stored ( v , θ ) captured at training time (no PP interaction or raw F P is required at inference). For each node in a tree, if the split condition is based on an feature’s by AP, the AP makes the decision directly. If it is based on PP’s feature, the AP uses the stored feature ID and corresponding threshold to simulate the decision, as no raw values are needed at inference time. The per-tree leaf weights are accumulated with learning rate per Equation (6) and converted to probabilities via Equation (2). This approach ensures that no communication or online interaction is required with the PP during inference. As all splits are recorded using feature IDs and corresponding thresholds, the AP can traverse each tree locally without compromising privacy.
Algorithm 3 GainComputationServer()
  • Inputs at S 1 , S 2 , S 3 : RSS shares (Equation (9)) of
    • { g i ( t ) } i I and { h i ( t ) } i I from AP
    • for each passive feature f P v and threshold θ Θ P v , shares of { f ˜ P v ( i , θ ) } i I from PP
  • Output: Reveal only L split P , v , θ to AP
  1:
for each v { 1 , , m 2 }  do
  2:
    for each θ Θ P v  do
  3:
        Using shared multiplication (Equations (11) and (12)) and local addition (Equation (10)), compute:
  4:
         G L i I 1 f ˜ P v ( i , θ ) · g i ( t )
  5:
         H L i I 1 f ˜ P v ( i , θ ) · h i ( t )
  6:
         G R i I f ˜ P v ( i , θ ) · g i ( t )
  7:
         H R i I f ˜ P v ( i , θ ) · h i ( t )
  8:
         G G L + G R ,     H H L + H R
  9:
        Compute L split ( v , θ ) via Equation (4) using ( G L , H L ) , ( G R , H R ) , and ( G , H ) (all over shares)
10:
    end for
11:
end for
12:
L split P , v , θ arg max v , θ L split ( v , θ )
13:
Reveal only L split P , v , θ to the Active Party
Algorithm 4 Privacy-Preserving Prediction for a New Sample
  1:
Input:
  • Trained ensemble { T r e e ( t ) } t = 1 T
  • New sample x with features ( x 1 , x 2 , , x n )
  2:
Output: Predicted probability p for the sample
  3:
for  t = 1 to T do
  4:
    Start at the root node of tree t held by the AP
  5:
    while  n o d e is not a leaf do
  6:
        if  n o d e split uses f A u at threshold θ  then
  7:
           if  x < θ  then
  8:
                n o d e n o d e . L E F T
  9:
           else
10:
                n o d e n o d e . R I G H T
11:
           end if
12:
        else                  ▷ split uses passive feature f P v
13:
           Follow stored ( v , θ ) branch recorded at training time (no raw F P access)
14:
        end if
15:
    end while
16:
    Let j be the reached leaf; update y ^ y ^ + η w j ( t )        ▷ Equation (6)
17:
end for
18:
p sigmoid ( y ^ ) per Equation (2)
19:
return  p

5. Security Analysis

In this section, we analyze the security of MPC-XGB under the semi-honest model. Our protocol combines RSS and MPC to prevent any party or server from accessing other parties’ private inputs or hidden intermediate values beyond the explicitly defined leakage. We summarize the guarantees in terms of confidentiality, leakage profile, and simulation-based security.

5.1. Confidentiality of Inputs and Intermediate Values

Raw AP/PP features, labels, per-instance gradients ( g i ) , per-instance Hessians ( h i ) , and plaintext intermediate statistics are never revealed during training. All sensitive values are secret-shared, and the servers evaluate bin comparisons, aggregate ( G , H ) values, and compute split gains without reconstructing those quantities in plaintext. In particular, unselected candidate gains, unselected thresholds, and candidate-specific gradient/Hessian aggregates remain hidden throughout the protocol. Passive parties do not receive labels or gradients/Hessians in plaintext, and the AP does not receive raw passive features.

5.2. Leakage Profile

We now make explicit the information intentionally revealed by MPC-XGB under the assumed threat model.
  • Per-node leakage during training:
At each node, only the best passive split tuple
L split P , v , θ
is revealed to the Active Party for comparison with its locally computed best active split. No other passive candidate statistics are opened. In particular, all non-maximal passive gains, feature-threshold candidates, and their associated aggregated gradient/Hessian values remain hidden. In our implementation, the current node instance space | I | is also revealed for synchronization. This count is therefore part of the defined leakage profile.
  • Per-tree leakage:
Across all nodes within one tree, the AP observes the sequence of selected passive split tuples and the corresponding node sizes for those nodes where passive candidates are evaluated. This reveals the final tree structure and split decisions that are necessary outputs of training, but does not reveal raw passive features, labels, per-instance gradients/Hessians, or unselected candidate statistics.
  • Cumulative leakage across boosting rounds:
Across multiple trees, the repeated revelation of selected passive split tuples and node sizes may expose coarse distributional information, such as repeated selection patterns over passive feature IDs. However, this leakage is limited to the selected outputs of training and does not reveal raw feature values, feature names, labels, per-instance gradients/Hessians, or the full set of candidate gains considered at each node.
  • Inference-time leakage:
At inference time, no new gradients, Hessians, labels, or passive candidate statistics are generated or revealed. The AP uses the final trained model, including the recorded split metadata, to traverse the ensemble. Therefore, inference does not introduce additional protocol leakage beyond the final model already disclosed at the end of training.

5.3. Simulation-Based Security and Final Output Disclosure

We target simulation-based security in the semi-honest model with respect to an ideal training functionality F Train that outputs exactly the leakage profile above and the final model. For each potentially corrupted P { A , P 1 , , P k , S 1 , S 2 , S 3 } , there exists a PPT simulator Sim P such that View P ϵ Sim P ( output P ) with negligible ϵ in the security parameter κ . Robustness against one server corruption and one party + one server collusion follows from RSS privacy against one corrupt participant; servers exchange only additive shares. At the training end, only the model structure (feature indices, thresholds, leaf weights) is revealed to A ; inference can be run locally without further interaction, preserving the same privacy guarantees post-training. Together, these guarantees provide end-to-end privacy preservation throughout the entire VFL-XGBoost training pipeline, across multiple parties and repeated computation rounds.

6. Performance Evaluation

In this section, we evaluate the proposed MPC-XGB framework on two credit-related datasets to assess its predictive performance under vertically partitioned training and compare the performance with baselines.
Dataset 1: We conducted experiments using the Credit Scoring dataset [46], which contains 150,000 instances and 10 financial attributes per user. The task is to predict whether a user is at risk of experiencing severe financial distress. For the VFL setup, we partitioned features vertically: the AP held five attributes with labels, and the PP held the remaining five attributes without labels. This dataset represents a realistic financial risk prediction scenario in which sensitive user information is naturally distributed across parties. It therefore provides a suitable testbed for assessing whether privacy-preserving VFL can retain strong predictive performance under secure training.
Dataset 2: In addition, we evaluated our framework on a second credit scoring dataset [47], which is correlated to the task of predicting whether a user would make credit card payments on time. This dataset consists of 30,000 instances and 25 attributes in total. For the VFL setting, we vertically partitioned the features such that the AP held 14 attributes along with labels, while the PP held the remaining 12 attributes without labels. Compared with Dataset 1, this dataset contains a larger feature space and a different prediction target, allowing us to evaluate the consistency of the proposed method across distinct credit-related tasks. This also helps demonstrate that the framework is not limited to a single dataset configuration or feature split.
Both datasets are highly imbalanced; so, we report threshold-agnostic metrics (ROC-AUC and PR-AUC) and threshold-aware metrics (Precision, Recall, F1) at a validation-selected operating point (threshold chosen to maximize F1). Robustness is probed via per-tree row subsampling. Specifically, at the beginning of each boosting round, a random subset of the training instances is sampled without replacement according to row_sample, and the gradients, Hessians, and candidate split statistics for that tree are computed only on the sampled subset. A new subset is drawn independently for each tree; so, the ensemble is trained on multiple overlapping but non-identical views of the data. Since class imbalance can make accuracy alone potentially misleading, these metrics provide a more informative view of ranking quality as well as minority-class detection performance. We compare four configurations: (i) Non-Private VFL-XGBoost (baseline without privacy), (ii) SecureBoost [23], (iii) Hybrid FL [25], and (iv) MPC-XGB (our privacy-preserving VFL algorithm).
Experimental Setup:
Hardware and Software Environment: All experiments were conducted on a machine equipped with a 4th generation Intel Xeon Sapphire Rapids processor, 4 vCPUs (8 threads), and 64 GiB RAM. The experiments were executed on a single machine, where both the data-owner scripts and the MP-SPDZ-based secure computation backend were run. Our implementation used the MP-SPDZ framework [48] with the three-party replicated secret sharing backend (ps-rep-ring), consistent with the semi-honest, honest-majority setting considered in this work. All secure computations were implemented in MP-SPDZ over the arithmetic ring Z 2 163 , i.e., with ring bit length = 163 , and fixed-point values were represented using sfix with precision parameters ( f , k ) = ( 12 , 60 ) .
Hyperparameters: For Dataset 1, which contains 150,000 total instances, we used 50,000 instances as the test set. From the remaining 100,000 instances, 10% was used for validation and the rest for training, resulting in 90,000 training instances, 10,000 validation instances, and 50,000 test instances. The same data partition and vertical feature split were used for all compared methods and for Dataset 2. A fixed random seed of 42 was used throughout. The validation set was used to tune the model configuration and to select the operating threshold for threshold-dependent metrics. Based on this setup, XGBoost used a maximum depth of 6 and up to 20 boosting rounds (max_trees). We set λ = 2.0 (lmd), min_sample = 8, and γ = 0.3 (gma). Approximate split finding used 8 quantile bins (quantile), and the learning rate (eta) was set to 0.05 . Threshold-dependent metrics, including Precision, Recall, and F1, were computed on the test set using the operating threshold selected on the validation set, while threshold-independent metrics such as ROC-AUC and PR-AUC were computed directly from test-set prediction scores.
Efficiency-Oriented Design Choices: We employed row subsampling with row_sample values of 0.5 , 0.3 , 0.15 , and 0.05 . For each setting, at every boosting round, only the corresponding fraction of training instances was randomly sampled and used to construct the current tree. The subset was drawn independently for each tree, introducing controlled diversity across boosting rounds while reducing the volume of secure computation required for split evaluation. These settings were selected by us after empirical testing to balance predictive performance with the practical overhead of secure computation, particularly for repeated split evaluation under MPC. In particular, quantile-based split approximation limits the number of candidate thresholds per feature, while per-tree row subsampling reduces the number of instances contributing to secure gradient and Hessian aggregation. Together, these two design choices lower both computation and communication overhead while preserving stable model quality.
Results: Table 2 presents the overall model performance across four configurations. We used non-private VFL-XGBoost, SecureBoost and Hybrid FL as primary baselines because they are structurally closest to our setting: all operate in a vertical tree-learning scenario with an active label-holding party and passive feature-holding party, allowing controlled comparison of privacy–utility trade-offs under similar training workflows. MPC-XGB outperforms SecureBoost across most metrics and approaches the non-private baseline on accuracy, while its ROC-AUC is lower than that of non-private VFL-XGBoost but higher than that of SecureBoost. Notably, MPC-XGB shows higher precision and lower recall than the non-private baseline. We attribute this difference to three factors under severe imbalance: (1) In our three-party MPC setup, all real-valued operations use fixed-point arithmetic with limited precision, which can perturb gain values and split selection. (2) Our use of per-tree row subsampling, where each tree is trained on an independently drawn subset of instances, introduces additional stochasticity into the ensemble. Under severe class imbalance, this may reduce the frequency with which borderline positive samples appear in the training subset of a given tree, slightly lowering recall while often improving precision by yielding more conservative split selection. (3) The validation-selected operating point (F1-max) tends to favor precision over recall. Compared to Hybrid FL, although Hybrid FL achieves high accuracy and F1-score, its ROC-AUC is substantially lower. Given the highly imbalanced nature of the evaluated datasets, ROC-AUC is a more reliable and primary metric, as it is threshold-independent and better reflects ranking quality under class imbalance.
ROC Analysis: Figure 3 shows ROC, Precision–Recall, and calibration plots for three methods. MPC-XGB attains ROC-AUC = 0.8157 and PR-AUC = 0.3308 ; the non-private baseline leads with ROC-AUC = 0.87 and PR-AUC = 0.3788 , while SecureBoost is lower on ROC (AUC = 0.7913 ) and slightly higher on PR-AUC (=0.3341). MPC-XGB tracks SecureBoost closely, with an advantage on ROC and a small deficit on PR-AUC. The precision–recall curve illustrates model sensitivity to the minority class, highlighting the drop in precision at lower thresholds common in datasets with rare positive events. The calibration plot shows that predicted probabilities are generally well-aligned with actual observed frequencies, though the model tends to be under confident for mid-range predicted probabilities (0.3–0.7), which could be addressed with further calibration tuning. The dashed line in the ROC curve represents random classifier performance, while in the calibration curve it indicates perfect calibration.
Scalability: To examine the system’s behavior under reduced sample sizes, Table 3 shows performance with decreasing data fractions on the first credit scoring dataset. We find that model performance remains broadly stable as the available data are reduced. In particular, the 5% setting remains close to the 50% setting on the main evaluation metrics, with the F1 score slightly improving from 0.4136 to 0.4158 and AUC remaining close ( 0.8109 vs. 0.8002 ). This indicates that, under our row-subsampled training setting, using only 5% of the data can still preserve competitive predictive performance while substantially reducing the cost of secure computation. We therefore report the communication and runtime costs for the 5% data setting, as it provides a better practical trade-off between efficiency and utility. We further evaluate scalability on the second credit scoring dataset (30,000 instances) by progressively reducing the available training samples. Table 4 reports the corresponding results, showing consistent trends where MPC-XGB maintains stable predictive performance across varying data fractions, further validating the robustness of the proposed framework across datasets of different sizes.
To evaluate the stability of MPC-XGB under aggressive row subsampling, we repeated the 5% sampling setting three times using different random seeds. The results remained consistent across runs, where the model achieved Accuracy = 0.9100 ± 0.0054 (mean ± standard deviation), precision = 0.3696 ± 0.0281 , recall = 0.4752 ± 0.0417 , F1 = 0.4158 ± 0.0226 , and AUC = 0.8002 ± 0.0073 . Overall, the results suggest that even under aggressive subsampling setting, the model retains reasonably consistent predictive behaviour while substantially reducing the secure computation required during training.
We further evaluate scalability on the second credit scoring dataset (30,000 instances) by progressively reducing the available training samples. Table 4 reports the corresponding results, showing consistent trends where MPC-XGB maintains stable predictive performance across varying data fractions, further validating the robustness of the proposed framework across datasets of different sizes.
Computation cost and efficiency: For the 5% data setting, the end-to-end training time for MPC-XGB was approximately 48 , 325 s ( 13.42 h). The offline phase, which pre-generates correlated randomness and preprocessing material required for secure operations, consumed approximately 28 , 359 s and involved about 1.2 TB of communication across roughly 3.22 million protocol rounds. The online phase, which performs the actual secure evaluation of split gains during tree construction, required approximately 19 , 966 s and transferred about 33.7 GB of data over approximately 111.17 million rounds.
Across the entire training process, the MPC protocol executed approximately 14.49 billion secure multiplications and opened more than 50.37 million intermediate values required for protocol coordination. In addition, communication between the data owners (AP and PP) and the MPC servers accounted for approximately 984.7 MB over around 1.09 million rounds. For reference, the runtime for one MPC-XGB decision tree under the same 5% data setting was approximately 2416 s ( 40.3 min). These costs arise primarily from the repeated secure evaluation of candidate splits at each node of the boosted decision trees, where gradient and Hessian statistics must be aggregated securely across parties.
Because secure computation significantly increases computational and communication overhead compared to plaintext training, we adopt two design choices to keep the cost manageable. First, we employ quantile-based split selection, which limits the number of candidate thresholds evaluated for each feature and thus reduces the number of secure operations required during split evaluation. Second, we apply row subsampling, which reduces the number of instances processed at each boosting round, thereby lowering the number of MPC rounds required during training. These optimizations reduce the overall online computation cost while maintaining stable predictive performance. Although row subsampling can slightly reduce recall under severe class imbalance, it improves training efficiency and helps maintain stable model accuracy within the MPC setting.
Runtime comparison: To provide a direct runtime comparison under matched conditions, we also ran SecureBoost using the same dataset setting, feature partition, and hyperparameter configuration as the 5% MPC-XGB experiment. Under these same settings, SecureBoost completed training in approximately 1993 s (≈33.2 min), whereas MPC-XGB required approximately 48 , 325 s (≈13.42 h) for end-to-end training. This higher training cost is expected, since MPC-XGB performs split evaluation through three-party replicated secret sharing and secure fixed-point computation rather than relying on lighter protected or partially revealed training procedures. In return, MPC-XGB offers stronger privacy guarantees than SecureBoost by protecting not only raw features but also labels and intermediate training statistics such as gradients and Hessians during split evaluation. Although the training phase is more expensive, this cost is typically incurred only once during model construction. In contrast, inference in MPC-XGB is efficient, as prediction is carried out locally at the AP using the trained model without requiring repeated MPC interaction. Thus, the additional cost is concentrated in training, while deployment-time prediction remains fast and practical.
Overall, MPC-XGB achieves a strong privacy–utility trade-off. The proposed framework improves precision and accuracy compared to SecureBoost and Hybrid FL while remaining competitive with non-private VFL-XGBoost. At the same time, MPC-XGB provides stronger end-to-end privacy guarantees by ensuring that labels, features, and intermediate training statistics remain protected throughout the training process.

7. Conclusions and Future Work

We presented MPC-XGB, a privacy-preserving XGBoost framework for vertical federated learning using a secure three-party protocol. The system keeps both features and labels private throughout training and prediction, and removes the asymmetry that otherwise advantages the label-holding party (AP) by executing label-dependent computations inside MPC. On a credit-scoring dataset, MPC-XGB delivers predictive performance comparable to that of non-private VFL-XGBoost and improves over SecureBoost while providing strictly stronger privacy guarantees. Compared to prior methods, our approach supports full tree learning end-to-end with no reliance on a partially trusted intermediary server. The system also remains robust on reduced data subsets, proving its applicability in practical federated settings.
Limitation: Training is time- and bandwidth-intensive; to manage time cost, we employed quantile-based split finding and per-tree row subsampling, which keep online rounds manageable in MPC.
Future work: We will focus on reducing the computational and communication cost of the framework through more efficient protocol design and system-level optimization via batching and packing of operations, adaptive fixed-point precision, and improved offline/online scheduling. We also plan to extend the framework to stronger threat models, particularly malicious adversaries, to provide stronger security guarantees in more challenging deployment settings.

Author Contributions

Conceptualization, A.R., E.H., M.Y., T.S. and X.Y.; Methodology, A.R., E.H., M.Y., T.S. and X.Y.; Software, A.R. and X.W.; Validation, A.R., X.W. and X.Y.; Formal analysis, A.R.; Investigation, A.R., E.H., M.Y., T.S. and X.Y.; Resources, A.R.; Data curation, A.R.; Writing—original draft preparation, A.R.; Writing—review and editing, A.R., E.H., M.Y., T.S. and X.Y.; Visualization, A.R.; Supervision, E.H., M.Y., T.S. and X.Y.; Project administration, M.Y. and X.Y.; Funding acquisition, M.Y. and X.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by CSIRO Data61, Australia and Australian Research Council Projects DP220102803, DP240102140, LP220200649.

Data Availability Statement

The datasets analyzed in this study are publicly available. The “Give Me Some Credit” dataset is available at: https://www.kaggle.com/c/GiveMeSomeCredit (2011) (accessed on 11 March 2026). The “Default of Credit Card Clients” dataset is available at: https://www.kaggle.com/datasets/uciml/default-of-credit-card-clients-dataset (accessed on 11 March 2026). No new data were created during this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. The European Parliament; The Council of The European Union. General Data Protection Regulation (GDPR), Regulation (EU) 2016/679. In Official Journal of the European Union (OJ L 119); Publications Office of the European Union: Luxembourg, 2016. [Google Scholar]
  2. House, T.W. Consumer Data Privacy in a Networked World: A Framework for Protecting Privacy and Promoting Innovation in the Global Digital Economy. In White House Report (Consumer Privacy Bill of Rights Framework); The White House: Washington, DC, USA, 2012. [Google Scholar]
  3. Government, A. Privacy Act 1988. In Federal Register of Legislation; Australian Government: Canberra, Australia, 1988. [Google Scholar]
  4. Ramay, A.; He, E.; Yang, M.; Sarwar, T.; Wang, X.; Yi, X. MPC-XGB: Privacy-Preserving Vertical Federated XGBoost via Secure Multiparty Computation. In Proceedings of the 2025 IEEE International Conference on Big Data (BigData); IEEE: Piscataway, NJ, USA, 2025; pp. 1807–1812. [Google Scholar] [CrossRef]
  5. McMahan, H.B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 20–22 April 2017. [Google Scholar]
  6. Yang, Q.; Liu, Y.; Chen, T.; Tong, Y. Federated machine learning: Concept and applications. ACM Trans. Intell. Syst. Technol. 2019, 10, 12. [Google Scholar] [CrossRef]
  7. Mammen, P.M. Federated learning: Opportunities and challenges. arXiv 2021, arXiv:2101.05428. [Google Scholar] [CrossRef]
  8. Liu, Z.; Guo, J.; Yang, W.; Fan, J.; Lam, K.Y.; Zhao, J. Privacy-Preserving Aggregation in Federated Learning: A Survey. In IEEE Transactions on Big Data; IEEE: Piscataway, NJ, USA, 2022. [Google Scholar] [CrossRef]
  9. Yin, X.; Zhu, Y.; Hu, J. A Comprehensive Survey of Privacy-Preserving Federated Learning: A Taxonomy, Review, and Future Directions. ACM Comput. Surv. 2021, 54, 131. [Google Scholar] [CrossRef]
  10. Li, Q.; Wen, Z.; Wu, Z.; Hu, S.; Wang, N.; Li, Y.; Liu, X.; He, B. A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection. IEEE Trans. Knowl. Data Eng. 2021, 35, 3347–3366. [Google Scholar] [CrossRef]
  11. Zhang, C.; Li, S.; Xia, J.; Wang, W.; Yan, F.; Liu, Y. BatchCrypt: Efficient Homomorphic Encryption for Cross-Silo Federated Learning. In Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC 20); USENIX Association: Berkeley, CA, USA, 2020; pp. 493–506. [Google Scholar]
  12. Fang, H.; Qian, Q. Privacy preserving machine learning with homomorphic encryption and federated learning. Future Internet 2021, 13, 94. [Google Scholar] [CrossRef]
  13. Yi, X.; Paulet, R.; Bertino, E. Homomorphic Encryption; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
  14. Wei, K.; Li, J.; Ding, M.; Ma, C.; Yang, H.H.; Farokhi, F.; Jin, S.; Quek, T.Q.; Poor, H.V. Federated learning with differential privacy: Algorithms and performance analysis. IEEE Trans. Inf. Forensics Secur. 2020, 15, 3454–3469. [Google Scholar] [CrossRef]
  15. Truex, S.; Liu, L.; Chow, K.H.; Gursoy, M.E.; Wei, W. LDP-Fed: Federated Learning with Local Differential Privacy. In Proceedings of the Third ACM International Workshop on Edge Systems, Analytics and Networking; Association for Computing Machinery: New York, NY, USA, 2020; pp. 61–66. [Google Scholar]
  16. Dwork, C. Differential Privacy. In Proceedings of the 33rd International Colloquium on Automata, Languages and Programming (ICALP); Springer: Berlin/Heidelberg, Germany, 2006; pp. 1–12. [Google Scholar]
  17. Bonawitz, K.; Ivanov, V.; Kreuter, B.; Marcedone, A.; McMahan, H.B.; Patel, S.; Ramage, D.; Segal, A.; Seth, K. Practical secure aggregation for federated learning on user-held data. arXiv 2016, arXiv:1611.04482. [Google Scholar] [CrossRef]
  18. Cramer, R.; Damgård, I.; Maurer, U. General secure multi-party computation from any linear secret-sharing scheme. In Proceedings of the International Conference on the Theory and Applications of Cryptographic Techniques; Springer: Berlin/Heidelberg, Germany, 2000; pp. 316–334. [Google Scholar]
  19. Truex, S.; Baracaldo, N.; Anwar, A.; Steinke, T.; Ludwig, H.; Zhang, R.; Zhou, Y. A hybrid approach to privacy-preserving federated learning. In Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security; Association for Computing Machinery: New York, NY, USA, 2019; pp. 1–11. [Google Scholar]
  20. Kaminaga, H.; Awaysheh, F.M.; Alawadi, S.; Kamm, L. MPCFL: Towards Multi-party Computation for Secure Federated Learning Aggregation. In Proceedings of the 16th IEEE/ACM International Conference on Utility and Cloud Computing; Association for Computing Machinery: New York, NY, USA, 2023; pp. 1–10. [Google Scholar]
  21. McMahan, H.B.; Ramage, D.; Talwar, K.; Zhang, L. Learning Differentially Private Recurrent Language Models. In Proceedings of the 6th International Conference on Learning Representations (ICLR 2018), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  22. McMahan, H.B.; Moore, E.; Ramage, D.; y Arcas, B.A. Federated Learning of Deep Networks Using Model Averaging. arXiv 2016, arXiv:1602.05629. [Google Scholar]
  23. Cheng, K.; Fan, T.; Jin, Y.; Liu, Y.; Chen, T.; Papadopoulos, D.; Yang, Q. SecureBoost: A lossless federated learning framework. IEEE Intell. Syst. 2021, 36, 87–98. [Google Scholar] [CrossRef]
  24. Jayaram, K.R.; Muthusamy, V.; Thomas, G.; Verma, A.; Purcell, M. Towards End to End Secure and Efficient Federated Learning for XGBoost. In Proceedings of the AAAI Workshop on Federated Learning (FL-AAAI), Vancouver, BC, Canada, 28 February–1 March 2022. [Google Scholar]
  25. Zhang, H.; Hong, J.; Dong, F.; Drew, S.; Xue, L.; Zhou, J. A privacy-preserving hybrid federated learning framework for financial crime detection. arXiv 2023, arXiv:2302.03654. [Google Scholar] [CrossRef]
  26. Chen, W.; Ma, G.; Fan, T.; Kang, Y.; Xu, Q.; Yang, Q. SecureBoost+: A High Performance Gradient Boosting Tree Framework for Large Scale Vertical Federated Learning. arXiv 2021, arXiv:2110.10927. [Google Scholar] [CrossRef]
  27. Fang, W.; Zhao, D.; Tan, J.; Chen, C.; Yu, C.; Wang, L.; Wang, L.; Zhou, J.; Zhang, B. Large-Scale Secure XGB for Vertical Federated Learning. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management; Association for Computing Machinery: New York, NY, USA, 2021; pp. 443–452. [Google Scholar]
  28. Xie, L.; Liu, J.; Lu, S.; Chang, T.H.; Shi, Q. An efficient learning framework for federated XGBoost using secret sharing and distributed optimization. ACM Trans. Intell. Syst. Technol. 2022, 13, 77. [Google Scholar] [CrossRef]
  29. Feng, Z.; Xiong, H.; Song, C.; Yang, S.; Zhao, B.; Wang, L.; Chen, Z.; Yang, S.; Liu, L.; Huan, J. SecureGBM: Secure Multi-Party Gradient Boosting. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data); IEEE: Piscataway, NJ, USA, 2019; pp. 1312–1321. [Google Scholar] [CrossRef]
  30. Fang, W.; Chen, C.; Tan, J.; Yu, C.; Lu, Y.; Wang, L.; Wang, L.; Zhou, J.; Liu, A.X. A Hybrid-Domain Framework for Secure Gradient Tree Boosting. arXiv 2020, arXiv:2005.08479. [Google Scholar] [CrossRef]
  31. Li, Q.; Zhaomin, W.; Cai, Y.; Yung, C.M.; Fu, T.; He, B. FedTree: A Federated Learning System for Trees. In Proceedings of Machine Learning and Systems; MLSys.org: Indio, CA, USA, 2023; Volume 5. [Google Scholar]
  32. Cheng, Y.; Liu, Y.; Chen, T. FederBoost: Private Federated Learning for GBDT. arXiv 2021, arXiv:2107.01402. [Google Scholar] [CrossRef]
  33. Liu, Y.; Kang, Y.; Zou, T.; Pu, Y.; He, Y.; Ye, X.; Ouyang, Y.; Zhang, Y.Q.; Yang, Q. Vertical Federated Learning: Concepts, Advances, and Challenges. IEEE Trans. Knowl. Data Eng. 2024, 36, 3615–3634. [Google Scholar] [CrossRef]
  34. Qian, B.; Xie, Y.; Li, Y.; Ding, B.; Zhou, J. Tree-based Models for Vertical Federated Learning: A Survey. ACM Comput. Surv. 2025, 57, 241. [Google Scholar] [CrossRef]
  35. Fan, T.; Chen, W.; Ma, G.; Kang, Y.; Fan, L.; Yang, Q. SecureBoost+: Large Scale and High-Performance Vertical Federated Gradient Boosting Decision Tree. In Trustworthy Federated Learning; Springer: Singapore, 2024; pp. 365–381. [Google Scholar] [CrossRef]
  36. Wu, Y.; Cai, S.; Xiao, X.; Chen, G.; Ooi, B.C. Privacy Preserving Vertical Federated Learning for Tree-based Models. Proc. Vldb Endow. 2020, 13, 2090–2103. [Google Scholar] [CrossRef]
  37. Li, X.; Hu, Y.; Liu, W.; Feng, H.; Peng, L.; Hong, Y.; Ren, K.; Qin, Z. OpBoost: A Vertical Federated Tree Boosting Framework Based on Order-Preserving Desensitization. Proc. Vldb Endow. 2022, 16, 202–215. [Google Scholar] [CrossRef]
  38. Wang, H.; Guo, Y.; Hu, S.; Luo, X.; Wang, M.; Xu, M. SecureXGB: A Secure and Efficient Multi-Party Protocol for Vertical Federated XGBoost. ACM Trans. Internet Technol. 2025, 3, 73. [Google Scholar] [CrossRef]
  39. Jiang, Y.; Mei, F.; Dai, T.; Li, Y. SiGBDT: Large-Scale Gradient Boosting Decision Tree Training via Function Secret Sharing. In Proceedings of the 19th ACM Asia Conference on Computer and Communications Security; Association for Computing Machinery: New York, NY, USA, 2024; pp. 274–288. [Google Scholar] [CrossRef]
  40. Song, A.; Cui, S.; Bai, J.; Cheng, K.; Shen, Y.; Russello, G. Guard-GBDT: Efficient Privacy-Preserving Approximated GBDT Training on Vertical Dataset. arXiv 2025, arXiv:2507.20688. [Google Scholar]
  41. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
  42. Nielsen, D. Tree Boosting with XGBoost: Why Does XGBoost Win “Every” Machine Learning Competition? Master’s Thesis, NTNU, Trondheim, Norway, 2016. [Google Scholar]
  43. Araki, T.; Furukawa, J.; Lindell, Y.; Nof, A.; Ohara, K. High-throughput semi-honest secure three-party computation with an honest majority. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security; Association for Computing Machinery: New York, NY, USA, 2016; pp. 805–817. [Google Scholar]
  44. Mohassel, P.; Rindal, P. ABY3: A Mixed Protocol Framework for Machine Learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security; Association for Computing Machinery: New York, NY, USA, 2018; pp. 35–52. [Google Scholar] [CrossRef]
  45. Chida, K.; Genkin, D.; Hamada, K.; Ikarashi, D.; Kikuchi, R.; Lindell, Y.; Nof, A. Fast Large-Scale Honest-Majority MPC for Malicious Adversaries. In Advances in Cryptology—CRYPTO 2018; Springer: Cham, Switzerland, 2018; pp. 34–64. [Google Scholar] [CrossRef]
  46. Kaggle. Give Me Some Credit: Dataset. Available online: https://www.kaggle.com/c/GiveMeSomeCredit (accessed on 10 March 2025).
  47. Yeh, I.C. Default of Credit Card Clients Dataset. Available online: https://www.kaggle.com/datasets/uciml/default-of-credit-card-clients-dataset (accessed on 10 March 2025).
  48. Keller, M. MP-SPDZ: A Versatile Framework for Multi-Party Computation. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security; Association for Computing Machinery: New York, NY, USA, 2020; pp. 1575–1590. [Google Scholar] [CrossRef]
Figure 1. (a) Conventional VFL architecture with direct communication. (b) Proposed MPC-XGBoost architecture with three non-colluding servers.
Figure 1. (a) Conventional VFL architecture with direct communication. (b) Proposed MPC-XGBoost architecture with three non-colluding servers.
Jcp 06 00079 g001
Figure 2. Training flow of MPC-XGB: Secure VFL XGBoost construction using three-party computation.
Figure 2. Training flow of MPC-XGB: Secure VFL XGBoost construction using three-party computation.
Jcp 06 00079 g002
Figure 3. ROC (left), precision–recall (middle), and calibration (right) for MPC-XGB (blue), SecureBoost (green), and Non-Private VFL-XGBoost (red).
Figure 3. ROC (left), precision–recall (middle), and calibration (right) for MPC-XGB (blue), SecureBoost (green), and Non-Private VFL-XGBoost (red).
Jcp 06 00079 g003
Table 1. Privacy preservation across VFL methods. ✓: protected, ✗: revealed.
Table 1. Privacy preservation across VFL methods. ✓: protected, ✗: revealed.
ModelLabels
(AP)
Grad/Hess
(AP)
Grad/Hess
(PP)
Features
(AP)
Features
(PP)
SecureBoost [23]
Additive SS [27]
Hybrid FL [25]
Our Work
Table 2. Comparison of performance metrics across Non-Private VFL XGBoost, SecureBoost, Hybrid FL, and MPC-XGB.
Table 2. Comparison of performance metrics across Non-Private VFL XGBoost, SecureBoost, Hybrid FL, and MPC-XGB.
MetricNon-Private
VFL XGBoost
SecureBoostHybrid FLMPC-XGB
Accuracy0.90150.920.940.93
Precision0.370.450.980.4900
Recall0.560.230.750.2500
F10.470.310.810.3300
AUC0.870.79130.660.8157
Table 3. Impact of data size on model performance for MPC-XGB on Dataset 1.
Table 3. Impact of data size on model performance for MPC-XGB on Dataset 1.
Metric50%30%15%5%
Accuracy0.93230.93170.91710.9100
Precision0.49360.49010.40150.3696
Recall0.31770.25300.38480.4752
F10.41360.33450.38520.4158
AUC0.81090.79580.81190.8002
Table 4. Impact of data size on model performance for MPC-XGB on Dataset 2.
Table 4. Impact of data size on model performance for MPC-XGB on Dataset 2.
Metric50%30%15%5%
Accuracy0.840.820.800.7950
Precision0.500.570.55120.5445
Recall0.470.360.32500.3133
F10.490.4410.4090.3970
AUC0.700.690.68230.6618
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ramay, A.; He, E.; Yang, M.; Sarwar, T.; Wang, X.; Yi, X. Vertical Federated XGBoost with Privacy Preservation via Secure Multiparty Computation. J. Cybersecur. Priv. 2026, 6, 79. https://doi.org/10.3390/jcp6030079

AMA Style

Ramay A, He E, Yang M, Sarwar T, Wang X, Yi X. Vertical Federated XGBoost with Privacy Preservation via Secure Multiparty Computation. Journal of Cybersecurity and Privacy. 2026; 6(3):79. https://doi.org/10.3390/jcp6030079

Chicago/Turabian Style

Ramay, Asma, Estrid He, Mengmeng Yang, Tabinda Sarwar, Xinqian Wang, and Xun Yi. 2026. "Vertical Federated XGBoost with Privacy Preservation via Secure Multiparty Computation" Journal of Cybersecurity and Privacy 6, no. 3: 79. https://doi.org/10.3390/jcp6030079

APA Style

Ramay, A., He, E., Yang, M., Sarwar, T., Wang, X., & Yi, X. (2026). Vertical Federated XGBoost with Privacy Preservation via Secure Multiparty Computation. Journal of Cybersecurity and Privacy, 6(3), 79. https://doi.org/10.3390/jcp6030079

Article Metrics

Back to TopTop