Vertical Federated XGBoost with Privacy Preservation via Secure Multiparty Computation
Abstract
1. Introduction
- Our Framework:
- No single party or server can reconstruct any private data or intermediate values;
- The protocol is secure under a semi-honest threat model, assuming no two servers collude.
- Our Contributions:
- We present MPC-XGB, a privacy-preserving protocol and system architecture for vertical XGBoost training that removes the need for any privileged training party to access another party’s sensitive intermediate statistics.
- We show how label-dependent quantities and passive-side split evaluation can be carried out entirely on secret shares using three-party replicated secret sharing, thereby protecting raw features, labels, gradients, and Hessians end-to-end during training.
- We provide a detailed end-to-end algorithmic description of the proposed framework, along with a clearly defined threat model, extensive security analysis and leakage profile, demonstrating correctness and confidentiality under a semi-honest adversarial assumption with non-colluding servers.
- We substantially extend the experimental evaluation by adding comparisons with additional baseline methods and evaluating the framework on an additional dataset, resulting in more comprehensive performance analysis compared to our earlier work.
2. Technical Preliminaries
2.1. XGBoost
- Approximate Split-Finding:
2.2. Secure Three-Party Computation with Replicated Secret Sharing
- Replicated secret sharing:
- Addition:
- Multiplication:
- Compute masked values and send (one element each):send from to (), where division by 3 is well-defined since .
- Locally convert to a -sharing of :for .
3. System and Threat Model
3.1. System Architecture
- Active party (AP): Holds (Equation (14)) for n aligned instances. At each node with instance set I, the AP computes and evaluates AP-side candidate splits for quantile-based candidate thresholds per feature as in the preliminaries’ approximate split-finding, and coordinates overall training.
- Passive party (PP): Holds (Equation (15)) with features (no labels). It forms quantile-based candidate thresholds per feature and produces bin indicators (per feature, per threshold) for secure gain evaluation at the servers.
- Secure servers: Three non-colluding, semi-honest servers () running replicated secret sharing over . Given secret shares of AP’s and PP’s bin indicators, they compute split gains for all passive features and thresholds.
Notation
3.2. Threat Model
3.2.1. Independent Corruption
3.2.2. Collusion Between One Party and One Server
3.2.3. Threat Boundaries
4. Proposed Method: MPC-XGB
4.1. VFL-XGBoost (No-Privacy Baseline)
4.1.1. Baseline Training Flow
- Share label-dependent statistics: The AP transmits to the PP.
- PP-side split search: Using the received for I, the PP evaluates candidate splits over its local features (using quantile thresholds per feature), computes the split gain via Equation (4), and returns its best split to the AP (feature ID, threshold and gain).
- AP-side split search (in parallel): The AP evaluates candidate splits over its own features (using quantile thresholds per feature) by calculating the split gain in Equation (4).
- Selection of overall best split: The AP compares the best PP candidate with the best AP candidate and chooses the higher gain split. The node’s instance set I is partitioned into and accordingly. If the PP’s split is chosen, the AP requests / from the PP.
- Recursive process: Repeat Steps 1–5 independently on each child node until the round’s stopping criteria are met (e.g., maximum depth, minimum samples, or no positive gain), yielding one tree. Compute leaf weight given by Equation (5).
- Boosting Iterations: After the tree is built, the AP updates the predictions as in Equation (6), then recomputes for the next round from the updated predictions.
4.1.2. Limitations of Non-Private VFL-XGBoost
4.2. MPC-XGB
4.3. Objective Formulation of MPC-XGB
- Step 1: Local Gradient/Hessian Computation (AP) and Feature Binarization (PP)
- AP:
- PP:
- –
- For each passive feature , obtain candidate thresholds using the quantile-based method.
- –
- For each threshold , binarize feature values as follows:where is the value of feature for instance i. This binarized feature encodes whether instance i would be placed in right or left branch if split at .
- Step 2: Secret Sharing to Three Servers
- AP: shares and for the current node’s instance set I.
- PP: for each feature and each threshold , shares:
- Shares are computed as defined in Section 2.2 (replicated secret sharing over ).
- Step 3: Secure Split Evaluation (MPC Servers)
- From shares, compute the sums and .
- For each passive candidate , compute left/right sums of G and H.
- Evaluate the split objective using Equation (4), and reveal only:where denotes the maximum split gain (max split gain) and is the corresponding feature id and threshold value.
- Step 4: Best Global Split Selection (AP)
- AP compares the best local candidate from Step 1, , with the best passive candidate from Step 3, .
- Choose the higher gain:
- Insert the node in tree based on the best global split.
- Step 5: Node partitioning and Recursion
- Partition the current instance set I according to the best split in Step 4:
- –
- Active split: , . The AP partitions locally and shares with the PP.
- –
- Passive split: , . The PP partitions and returns to the AP.
- For each child node with instance set , recursively compute maximum gain until a stopping criterion holds (i.e., , depth limit, or ), create a leaf and assign its weight via Equation (5).
- Step 6: Boosting Iterations
4.4. Boosting Round Initialization and Gradient Sharing
| Algorithm 1 Privacy-Preserving VFL-XGBoost Training |
|
4.5. Recursive Tree Construction and Secure Split Selection
| Algorithm 2 Tree_Construction(I, , , ) |
|
4.6. Secure Gain Computation on MPC Servers
4.7. MPC-XGB Inference
| Algorithm 3 GainComputationServer() |
|
| Algorithm 4 Privacy-Preserving Prediction for a New Sample |
|
5. Security Analysis
5.1. Confidentiality of Inputs and Intermediate Values
5.2. Leakage Profile
- Per-node leakage during training:
- Per-tree leakage:
- Cumulative leakage across boosting rounds:
- Inference-time leakage:
5.3. Simulation-Based Security and Final Output Disclosure
6. Performance Evaluation
7. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- The European Parliament; The Council of The European Union. General Data Protection Regulation (GDPR), Regulation (EU) 2016/679. In Official Journal of the European Union (OJ L 119); Publications Office of the European Union: Luxembourg, 2016. [Google Scholar]
- House, T.W. Consumer Data Privacy in a Networked World: A Framework for Protecting Privacy and Promoting Innovation in the Global Digital Economy. In White House Report (Consumer Privacy Bill of Rights Framework); The White House: Washington, DC, USA, 2012. [Google Scholar]
- Government, A. Privacy Act 1988. In Federal Register of Legislation; Australian Government: Canberra, Australia, 1988. [Google Scholar]
- Ramay, A.; He, E.; Yang, M.; Sarwar, T.; Wang, X.; Yi, X. MPC-XGB: Privacy-Preserving Vertical Federated XGBoost via Secure Multiparty Computation. In Proceedings of the 2025 IEEE International Conference on Big Data (BigData); IEEE: Piscataway, NJ, USA, 2025; pp. 1807–1812. [Google Scholar] [CrossRef]
- McMahan, H.B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 20–22 April 2017. [Google Scholar]
- Yang, Q.; Liu, Y.; Chen, T.; Tong, Y. Federated machine learning: Concept and applications. ACM Trans. Intell. Syst. Technol. 2019, 10, 12. [Google Scholar] [CrossRef]
- Mammen, P.M. Federated learning: Opportunities and challenges. arXiv 2021, arXiv:2101.05428. [Google Scholar] [CrossRef]
- Liu, Z.; Guo, J.; Yang, W.; Fan, J.; Lam, K.Y.; Zhao, J. Privacy-Preserving Aggregation in Federated Learning: A Survey. In IEEE Transactions on Big Data; IEEE: Piscataway, NJ, USA, 2022. [Google Scholar] [CrossRef]
- Yin, X.; Zhu, Y.; Hu, J. A Comprehensive Survey of Privacy-Preserving Federated Learning: A Taxonomy, Review, and Future Directions. ACM Comput. Surv. 2021, 54, 131. [Google Scholar] [CrossRef]
- Li, Q.; Wen, Z.; Wu, Z.; Hu, S.; Wang, N.; Li, Y.; Liu, X.; He, B. A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection. IEEE Trans. Knowl. Data Eng. 2021, 35, 3347–3366. [Google Scholar] [CrossRef]
- Zhang, C.; Li, S.; Xia, J.; Wang, W.; Yan, F.; Liu, Y. BatchCrypt: Efficient Homomorphic Encryption for Cross-Silo Federated Learning. In Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC 20); USENIX Association: Berkeley, CA, USA, 2020; pp. 493–506. [Google Scholar]
- Fang, H.; Qian, Q. Privacy preserving machine learning with homomorphic encryption and federated learning. Future Internet 2021, 13, 94. [Google Scholar] [CrossRef]
- Yi, X.; Paulet, R.; Bertino, E. Homomorphic Encryption; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
- Wei, K.; Li, J.; Ding, M.; Ma, C.; Yang, H.H.; Farokhi, F.; Jin, S.; Quek, T.Q.; Poor, H.V. Federated learning with differential privacy: Algorithms and performance analysis. IEEE Trans. Inf. Forensics Secur. 2020, 15, 3454–3469. [Google Scholar] [CrossRef]
- Truex, S.; Liu, L.; Chow, K.H.; Gursoy, M.E.; Wei, W. LDP-Fed: Federated Learning with Local Differential Privacy. In Proceedings of the Third ACM International Workshop on Edge Systems, Analytics and Networking; Association for Computing Machinery: New York, NY, USA, 2020; pp. 61–66. [Google Scholar]
- Dwork, C. Differential Privacy. In Proceedings of the 33rd International Colloquium on Automata, Languages and Programming (ICALP); Springer: Berlin/Heidelberg, Germany, 2006; pp. 1–12. [Google Scholar]
- Bonawitz, K.; Ivanov, V.; Kreuter, B.; Marcedone, A.; McMahan, H.B.; Patel, S.; Ramage, D.; Segal, A.; Seth, K. Practical secure aggregation for federated learning on user-held data. arXiv 2016, arXiv:1611.04482. [Google Scholar] [CrossRef]
- Cramer, R.; Damgård, I.; Maurer, U. General secure multi-party computation from any linear secret-sharing scheme. In Proceedings of the International Conference on the Theory and Applications of Cryptographic Techniques; Springer: Berlin/Heidelberg, Germany, 2000; pp. 316–334. [Google Scholar]
- Truex, S.; Baracaldo, N.; Anwar, A.; Steinke, T.; Ludwig, H.; Zhang, R.; Zhou, Y. A hybrid approach to privacy-preserving federated learning. In Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security; Association for Computing Machinery: New York, NY, USA, 2019; pp. 1–11. [Google Scholar]
- Kaminaga, H.; Awaysheh, F.M.; Alawadi, S.; Kamm, L. MPCFL: Towards Multi-party Computation for Secure Federated Learning Aggregation. In Proceedings of the 16th IEEE/ACM International Conference on Utility and Cloud Computing; Association for Computing Machinery: New York, NY, USA, 2023; pp. 1–10. [Google Scholar]
- McMahan, H.B.; Ramage, D.; Talwar, K.; Zhang, L. Learning Differentially Private Recurrent Language Models. In Proceedings of the 6th International Conference on Learning Representations (ICLR 2018), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- McMahan, H.B.; Moore, E.; Ramage, D.; y Arcas, B.A. Federated Learning of Deep Networks Using Model Averaging. arXiv 2016, arXiv:1602.05629. [Google Scholar]
- Cheng, K.; Fan, T.; Jin, Y.; Liu, Y.; Chen, T.; Papadopoulos, D.; Yang, Q. SecureBoost: A lossless federated learning framework. IEEE Intell. Syst. 2021, 36, 87–98. [Google Scholar] [CrossRef]
- Jayaram, K.R.; Muthusamy, V.; Thomas, G.; Verma, A.; Purcell, M. Towards End to End Secure and Efficient Federated Learning for XGBoost. In Proceedings of the AAAI Workshop on Federated Learning (FL-AAAI), Vancouver, BC, Canada, 28 February–1 March 2022. [Google Scholar]
- Zhang, H.; Hong, J.; Dong, F.; Drew, S.; Xue, L.; Zhou, J. A privacy-preserving hybrid federated learning framework for financial crime detection. arXiv 2023, arXiv:2302.03654. [Google Scholar] [CrossRef]
- Chen, W.; Ma, G.; Fan, T.; Kang, Y.; Xu, Q.; Yang, Q. SecureBoost+: A High Performance Gradient Boosting Tree Framework for Large Scale Vertical Federated Learning. arXiv 2021, arXiv:2110.10927. [Google Scholar] [CrossRef]
- Fang, W.; Zhao, D.; Tan, J.; Chen, C.; Yu, C.; Wang, L.; Wang, L.; Zhou, J.; Zhang, B. Large-Scale Secure XGB for Vertical Federated Learning. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management; Association for Computing Machinery: New York, NY, USA, 2021; pp. 443–452. [Google Scholar]
- Xie, L.; Liu, J.; Lu, S.; Chang, T.H.; Shi, Q. An efficient learning framework for federated XGBoost using secret sharing and distributed optimization. ACM Trans. Intell. Syst. Technol. 2022, 13, 77. [Google Scholar] [CrossRef]
- Feng, Z.; Xiong, H.; Song, C.; Yang, S.; Zhao, B.; Wang, L.; Chen, Z.; Yang, S.; Liu, L.; Huan, J. SecureGBM: Secure Multi-Party Gradient Boosting. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data); IEEE: Piscataway, NJ, USA, 2019; pp. 1312–1321. [Google Scholar] [CrossRef]
- Fang, W.; Chen, C.; Tan, J.; Yu, C.; Lu, Y.; Wang, L.; Wang, L.; Zhou, J.; Liu, A.X. A Hybrid-Domain Framework for Secure Gradient Tree Boosting. arXiv 2020, arXiv:2005.08479. [Google Scholar] [CrossRef]
- Li, Q.; Zhaomin, W.; Cai, Y.; Yung, C.M.; Fu, T.; He, B. FedTree: A Federated Learning System for Trees. In Proceedings of Machine Learning and Systems; MLSys.org: Indio, CA, USA, 2023; Volume 5. [Google Scholar]
- Cheng, Y.; Liu, Y.; Chen, T. FederBoost: Private Federated Learning for GBDT. arXiv 2021, arXiv:2107.01402. [Google Scholar] [CrossRef]
- Liu, Y.; Kang, Y.; Zou, T.; Pu, Y.; He, Y.; Ye, X.; Ouyang, Y.; Zhang, Y.Q.; Yang, Q. Vertical Federated Learning: Concepts, Advances, and Challenges. IEEE Trans. Knowl. Data Eng. 2024, 36, 3615–3634. [Google Scholar] [CrossRef]
- Qian, B.; Xie, Y.; Li, Y.; Ding, B.; Zhou, J. Tree-based Models for Vertical Federated Learning: A Survey. ACM Comput. Surv. 2025, 57, 241. [Google Scholar] [CrossRef]
- Fan, T.; Chen, W.; Ma, G.; Kang, Y.; Fan, L.; Yang, Q. SecureBoost+: Large Scale and High-Performance Vertical Federated Gradient Boosting Decision Tree. In Trustworthy Federated Learning; Springer: Singapore, 2024; pp. 365–381. [Google Scholar] [CrossRef]
- Wu, Y.; Cai, S.; Xiao, X.; Chen, G.; Ooi, B.C. Privacy Preserving Vertical Federated Learning for Tree-based Models. Proc. Vldb Endow. 2020, 13, 2090–2103. [Google Scholar] [CrossRef]
- Li, X.; Hu, Y.; Liu, W.; Feng, H.; Peng, L.; Hong, Y.; Ren, K.; Qin, Z. OpBoost: A Vertical Federated Tree Boosting Framework Based on Order-Preserving Desensitization. Proc. Vldb Endow. 2022, 16, 202–215. [Google Scholar] [CrossRef]
- Wang, H.; Guo, Y.; Hu, S.; Luo, X.; Wang, M.; Xu, M. SecureXGB: A Secure and Efficient Multi-Party Protocol for Vertical Federated XGBoost. ACM Trans. Internet Technol. 2025, 3, 73. [Google Scholar] [CrossRef]
- Jiang, Y.; Mei, F.; Dai, T.; Li, Y. SiGBDT: Large-Scale Gradient Boosting Decision Tree Training via Function Secret Sharing. In Proceedings of the 19th ACM Asia Conference on Computer and Communications Security; Association for Computing Machinery: New York, NY, USA, 2024; pp. 274–288. [Google Scholar] [CrossRef]
- Song, A.; Cui, S.; Bai, J.; Cheng, K.; Shen, Y.; Russello, G. Guard-GBDT: Efficient Privacy-Preserving Approximated GBDT Training on Vertical Dataset. arXiv 2025, arXiv:2507.20688. [Google Scholar]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
- Nielsen, D. Tree Boosting with XGBoost: Why Does XGBoost Win “Every” Machine Learning Competition? Master’s Thesis, NTNU, Trondheim, Norway, 2016. [Google Scholar]
- Araki, T.; Furukawa, J.; Lindell, Y.; Nof, A.; Ohara, K. High-throughput semi-honest secure three-party computation with an honest majority. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security; Association for Computing Machinery: New York, NY, USA, 2016; pp. 805–817. [Google Scholar]
- Mohassel, P.; Rindal, P. ABY3: A Mixed Protocol Framework for Machine Learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security; Association for Computing Machinery: New York, NY, USA, 2018; pp. 35–52. [Google Scholar] [CrossRef]
- Chida, K.; Genkin, D.; Hamada, K.; Ikarashi, D.; Kikuchi, R.; Lindell, Y.; Nof, A. Fast Large-Scale Honest-Majority MPC for Malicious Adversaries. In Advances in Cryptology—CRYPTO 2018; Springer: Cham, Switzerland, 2018; pp. 34–64. [Google Scholar] [CrossRef]
- Kaggle. Give Me Some Credit: Dataset. Available online: https://www.kaggle.com/c/GiveMeSomeCredit (accessed on 10 March 2025).
- Yeh, I.C. Default of Credit Card Clients Dataset. Available online: https://www.kaggle.com/datasets/uciml/default-of-credit-card-clients-dataset (accessed on 10 March 2025).
- Keller, M. MP-SPDZ: A Versatile Framework for Multi-Party Computation. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security; Association for Computing Machinery: New York, NY, USA, 2020; pp. 1575–1590. [Google Scholar] [CrossRef]



| Model | Labels (AP) | Grad/Hess (AP) | Grad/Hess (PP) | Features (AP) | Features (PP) |
|---|---|---|---|---|---|
| SecureBoost [23] | ✓ | ✓ | ✗ | ✓ | ✓ |
| Additive SS [27] | ✓ | ✓ | ✓ | ✗ | ✗ |
| Hybrid FL [25] | ✓ | ✓ | ✗ | ✓ | ✗ |
| Our Work | ✓ | ✓ | ✓ | ✓ | ✓ |
| Metric | Non-Private VFL XGBoost | SecureBoost | Hybrid FL | MPC-XGB |
|---|---|---|---|---|
| Accuracy | 0.9015 | 0.92 | 0.94 | 0.93 |
| Precision | 0.37 | 0.45 | 0.98 | 0.4900 |
| Recall | 0.56 | 0.23 | 0.75 | 0.2500 |
| F1 | 0.47 | 0.31 | 0.81 | 0.3300 |
| AUC | 0.87 | 0.7913 | 0.66 | 0.8157 |
| Metric | 50% | 30% | 15% | 5% |
| Accuracy | 0.9323 | 0.9317 | 0.9171 | 0.9100 |
| Precision | 0.4936 | 0.4901 | 0.4015 | 0.3696 |
| Recall | 0.3177 | 0.2530 | 0.3848 | 0.4752 |
| F1 | 0.4136 | 0.3345 | 0.3852 | 0.4158 |
| AUC | 0.8109 | 0.7958 | 0.8119 | 0.8002 |
| Metric | 50% | 30% | 15% | 5% |
| Accuracy | 0.84 | 0.82 | 0.80 | 0.7950 |
| Precision | 0.50 | 0.57 | 0.5512 | 0.5445 |
| Recall | 0.47 | 0.36 | 0.3250 | 0.3133 |
| F1 | 0.49 | 0.441 | 0.409 | 0.3970 |
| AUC | 0.70 | 0.69 | 0.6823 | 0.6618 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Ramay, A.; He, E.; Yang, M.; Sarwar, T.; Wang, X.; Yi, X. Vertical Federated XGBoost with Privacy Preservation via Secure Multiparty Computation. J. Cybersecur. Priv. 2026, 6, 79. https://doi.org/10.3390/jcp6030079
Ramay A, He E, Yang M, Sarwar T, Wang X, Yi X. Vertical Federated XGBoost with Privacy Preservation via Secure Multiparty Computation. Journal of Cybersecurity and Privacy. 2026; 6(3):79. https://doi.org/10.3390/jcp6030079
Chicago/Turabian StyleRamay, Asma, Estrid He, Mengmeng Yang, Tabinda Sarwar, Xinqian Wang, and Xun Yi. 2026. "Vertical Federated XGBoost with Privacy Preservation via Secure Multiparty Computation" Journal of Cybersecurity and Privacy 6, no. 3: 79. https://doi.org/10.3390/jcp6030079
APA StyleRamay, A., He, E., Yang, M., Sarwar, T., Wang, X., & Yi, X. (2026). Vertical Federated XGBoost with Privacy Preservation via Secure Multiparty Computation. Journal of Cybersecurity and Privacy, 6(3), 79. https://doi.org/10.3390/jcp6030079

