A Federated Generalized Linear Model for PrivacyPreserving Analysis
Abstract
:1. Introduction
1.1. Federated Learning
1.2. Generalized Linear Models
1.2.1. Estimation of a Centralized GLM
Algorithm 1 Fisher Scoringbased Centralized GLM 

2. Materials and Methods
2.1. Setup
2.2. Algorithm for a Federated GLM
2.3. Validation
2.3.1. Accuracy
Algorithm 2 Federated GLM 
Initialization Server
Initialization Node m

Linear Regression
Poisson Regression
Logistic Regression
2.3.2. Execution Time
3. Results and Discussion
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
AI  Artificial intelligence 
FL  Federated learning 
GLM  Generalized linear model 
ML  Machine learning 
MLE  Maximum likelihood estimation 
References
 Sagiroglu, S.; Sinanc, D. Big data: A review. In Proceedings of the 2013 International Conference on Collaboration Technologies and Systems (CTS), San Diego, CA, USA, 20–24 May 2013; pp. 42–47. [Google Scholar]
 Hatcher, W.G.; Yu, W. A survey of deep learning: Platforms, applications and emerging research trends. IEEE Access 2018, 6, 24411–24432. [Google Scholar] [CrossRef]
 Hassani, H.; Huang, X.; Silva, E. Digitalisation and big data mining in banking. Big Data Cogn. Comput. 2018, 2, 18. [Google Scholar] [CrossRef][Green Version]
 Wuest, T.; Weimer, D.; Irgens, C.; Thoben, K.D. Machine learning in manufacturing: Advantages, challenges, and applications. Prod. Manuf. Res. 2016, 4, 23–45. [Google Scholar] [CrossRef][Green Version]
 Fildes, R.; Ma, S.; Kolassa, S. Retail forecasting: Research and practice. Int. J. Forecast. 2019, 35, 1–34. [Google Scholar] [CrossRef][Green Version]
 Li, L.; Fan, Y.; Tse, M.; Lin, K.Y. A review of applications in federated learning. Comput. Ind. Eng. 2020, 149, 106854. [Google Scholar] [CrossRef]
 Group, W.A. Federated Learning White Paper; Technical Report; WeBank AI Group: Beijing, China, 2018. [Google Scholar]
 Politou, E.; Alepis, E.; Patsakis, C. Forgetting personal data and revoking consent under the GDPR: Challenges and proposed solutions. J. Cybersecur. 2018, 4, tyy001. [Google Scholar] [CrossRef]
 van Veen, E.B. Observational health research in Europe: Understanding the General Data Protection Regulation and underlying debate. Eur. J. Cancer 2018, 104, 70–80. [Google Scholar] [CrossRef] [PubMed][Green Version]
 Piper, D. Data Protection Laws of the World; Technical Report; DLA Piper: MD, USA, 2020. [Google Scholar]
 Bukaty, P. The California Consumer Privacy Act (CCPA): An Implementation Guide; IT Governance Ltd.: Ely, UK, 2019. [Google Scholar]
 Dai, W.; Wang, S.; Xiong, H.; Jiang, X. Privacy preserving federated big data analysis. In Guide to Big Data Applications; Springer: Berlin/Heidelberg, Germany, 2018; pp. 49–82. [Google Scholar]
 Xu, J.; Wang, F. Federated Learning for Healthcare Informatics. arXiv 2019, arXiv:1911.06270. [Google Scholar] [CrossRef] [PubMed]
 Zhang, C.; Xie, Y.; Bai, H.; Yu, B.; Li, W.; Gao, Y. A survey on federated learning. Knowl.Based Syst. 2021, 216, 106775. [Google Scholar] [CrossRef]
 Lindell, Y.; Pinkas, B. Privacy preserving data mining. J. Cryptol. 2002, 15, 36–54. [Google Scholar] [CrossRef]
 Wild, E.; Mangasarian, O. PrivacyPreserving Classification of Horizontally Partitioned Data via Random Kernels; Technical Report; University of Wisconsin: Madison, WI, USA, 2007. [Google Scholar]
 Gao, D.; Ju, C.; Wei, X.; Liu, Y.; Chen, T.; Yang, Q. Hhhfl: Hierarchical heterogeneous horizontal federated learning for electroencephalography. arXiv 2019, arXiv:1909.05784. [Google Scholar]
 Tian, Z.; Zhang, R.; Hou, X.; Liu, J.; Ren, K. Federboost: Private federated learning for gbdt. arXiv 2020, arXiv:2011.02796. [Google Scholar]
 Zhao, L.; Ni, L.; Hu, S.; Chen, Y.; Zhou, P.; Xiao, F.; Wu, L. Inprivate digging: Enabling treebased distributed data mining with differential privacy. In Proceedings of the IEEE INFOCOM 2018IEEE Conference on Computer Communications, Honolulu, HI, USA, 15–19 April 2018; pp. 2087–2095. [Google Scholar]
 Slavkovic, A.B.; Nardi, Y.; Tibbits, M.M. “Secure” Logistic Regression of Horizontally and Vertically Partitioned Distributed Databases. In Proceedings of the Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007), Omaha, NE, USA, 28–31 October 2007; pp. 723–728. [Google Scholar]
 Lu, C.L.; Wang, S.; Ji, Z.; Wu, Y.; Xiong, L.; Jiang, X.; OhnoMachado, L. WebDISCO: A web service for distributed cox model learning without patientlevel data sharing. J. Am. Med. Inform. Assoc. 2015, 22, 1212–1219. [Google Scholar] [CrossRef] [PubMed][Green Version]
 Bonawitz, K.; Eichner, H.; Grieskamp, W.; Huba, D.; Ingerman, A.; Ivanov, V.; Kiddon, C.; Konecny, J.; Mazzocchi, S.; McMahan, H.B.; et al. Towards Federated Learning at Scale: System Design. In Proceedings of the 2nd Conference on Systems and Machine Learning (SysML), Standford, CA, USA, 31 March–2 April 2019. [Google Scholar]
 McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communicationefficient learning of deep networks from decentralized data. In Proceedings of the Artificial Intelligence and Statistics, PMLR, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
 McCullagh, P.; Nelder, J.A. Generalized Linear Models; Routledge: London, UK, 2019. [Google Scholar]
 R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2022. [Google Scholar]
 Van Rossum, G.; Drake, F.L. Python 3 Reference Manual; Python Software Foundation: DE, USA, 2009. [Google Scholar]
 Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikitlearn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
 Guyon, I.; Gunn, S.; BenHur, A.; Dror, G. Result analysis of the NIPS 2003 feature selection challenge. Adv. Neural Inf. Process. Syst. 2004, 17, 1–8. [Google Scholar]
 MoncadaTorres, A.; Martin, F.; Sieswerda, M.; van Soest, J.; Geleijnse, G. VANTAGE6: An open source priVAcy preserviNg federaTed leArninG infrastructurE for Secure Insight eXchange. In Proceedings of the AMIA Annual Symposium Proceedings, Online, 14–18 November 2020; pp. 870–877. [Google Scholar]
 Hlavac, M. stargazer: WellFormatted Regression and Summary Statistics Tables; R Package Version 5.2.3; Social Policy Institute: Bratislava, Slovakia, 2022. [Google Scholar]
 Hartmann, F. Federated Learning. Master’s Thesis, Frei Universität Berlin, Berlin, Germany, 2018. [Google Scholar]
 Smits, D.; van Beusekom, B.; Martin, F.; Veen, L.; Geleijnse, G.; MoncadaTorres, A. An Improved Infrastructure for PrivacyPreserving Analysis of Patient Data. In Proceedings of the International Conference of Informatics, Management, and Technology in Healthcare (ICIMTH), Athens, Greece, 1 July–3 July 2022; Volume 295, pp. 144–147. [Google Scholar]
 Wenzel, H.H.; Norberg Hardie, A.; Bekkers, R.L.; Falconer, H.; Hogdall, C.K.; Jensen, P.T.; Lemmens, V.E.; Martin, F.; van Gestel, A.J.; MoncadaTorres, A.; et al. Using Federated Learning to Identify Women with Early Stage Cervical Cancer at Low Risk For Lymph Node Metastases. 2022; under review. [Google Scholar]
 Wenzel, H. Improving Quality of Cervical Cancer Care with (Inter)National Cancer Registry Data. Ph.D. Thesis, University of Groningen, Groningen, The Netherlands, 2022. [Google Scholar]
 Rosenbaum, P.R.; Rubin, D.B. The central role of the propensity score in observational studies for causal effects. Biometrika 1983, 70, 41–55. [Google Scholar] [CrossRef]
 Hamersma, D.T. A Comparison of the Quality of Breast Cancer Care in Norway and The Netherlands. Master’s Thesis, University of Twente, Enschede, The Netherlands, 2020. [Google Scholar]
 Yang, Q.; Liu, Y.; Chen, T.; Tong, Y. Federated machine learning: Concept and applications. ACM Trans. Intell. Syst. Technol. (TIST) 2019, 10, 12. [Google Scholar] [CrossRef]
 Nishio, T.; Yonetani, R. Client selection for federated learning with heterogeneous resources in mobile edge. In Proceedings of the ICC 2019–2019 IEEE International Conference on Communications (ICC), Shanghai, China, 20–24 May 2019; pp. 1–7. [Google Scholar]
 Konečnỳ, J.; McMahan, H.B.; Yu, F.X.; Richtárik, P.; Suresh, A.T.; Bacon, D. Federated learning: Strategies for improving communication efficiency. arXiv 2016, arXiv:1610.05492. [Google Scholar]
Family  Parameter  Coefficients  Std. Error  pValues  zValues  

C  F  C  F  C  F  C  F  
Linear  (Intercept)  $0.069$  $0.069$  $0.045$  $0.045$  $0.129$  $0.129$  $1.520$  $1.520$ 
${x}_{1}$  $0.221$  $0.221$  $0.017$  $0.017$  0  0  $12.763$  $12.763$  
${x}_{2}$  $0.499$  $0.499$  $0.018$  $0.018$  0  0  $27.212$  $27.212$  
Poisson  (Intercept)  $0.595$  $0.595$  $0.021$  $0.021$  0  0  $28.465$  $28.465$ 
${x}_{1}$  $0.269$  $0.269$  $0.007$  $0.007$  0  0  $37.220$  $37.220$  
${x}_{2}$  $0.446$  $0.446$  $0.007$  $0.007$  0  0  $62.566$  $62.566$  
Logistic  (Intercept)  $0.104$  $0.104$  $0.064$  $0.064$  $0.106$  $0.106$  $1.617$  $1.617$ 
${x}_{1}$  $2.593$  $2.593$  $0.093$  $0.093$  0  0  $27.802$  $27.802$  
${x}_{2}$  $0.054$  $0.054$  $0.051$  $0.051$  $0.291$  $0.291$  $1.055$  $1.055$ 
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. 
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cellamare, M.; van Gestel, A.J.; Alradhi, H.; Martin, F.; MoncadaTorres, A. A Federated Generalized Linear Model for PrivacyPreserving Analysis. Algorithms 2022, 15, 243. https://doi.org/10.3390/a15070243
Cellamare M, van Gestel AJ, Alradhi H, Martin F, MoncadaTorres A. A Federated Generalized Linear Model for PrivacyPreserving Analysis. Algorithms. 2022; 15(7):243. https://doi.org/10.3390/a15070243
Chicago/Turabian StyleCellamare, Matteo, Anna J. van Gestel, Hasan Alradhi, Frank Martin, and Arturo MoncadaTorres. 2022. "A Federated Generalized Linear Model for PrivacyPreserving Analysis" Algorithms 15, no. 7: 243. https://doi.org/10.3390/a15070243