Privacy-Preserving K-Nearest Neighbors Training over Blockchain-Based Encrypted Health Data
Abstract
:1. Introduction
- Most of the training phases manipulate intimate data samples such as medical data reported from clinical wearable IoT devices, resulting in the leakage of private or sensitive and confidential information at the time of training tasks.
- Latent invaders may cause unauthorized modification of data records by altering or tampering at the time of the data sharing process, resulting in an inaccurate classification of the ML model.
- The data provider may lose authority, and replication of the shared datasets may occur as datasets are available to the associates.
- To establish protected and trustworthy IoT data sharing, Blockchain technology is employed. All the IoT data are encrypted locally by the own private key of the respected data provider. The encrypted data are recorded on a Blockchain by uniquely formatted transactions.
- We designed protected building blocks, such as SPO (addition, subtraction), SBO, and SC using the PHC, i.e., Paillier, and developed a secure -NN training algorithm. There is no requirement for a trusted third-party.
- Rigorous analysis has been done to prove that the secure -NN can protect data privacy at the time of training, achieve similar accuracy as general -NN and outperform all the previous state of the art method.
2. Related Work
2.1. Privacy-Preserving ML Training
2.2. Privacy-Preserving ML Classification
2.3. The Novelty of This Paper
3. Preliminaries
3.1. Notation
3.2. Homomorphic Cryptosystem
- and are the message space and all ciphertexts outcome respectively by are elements of .
- is held for any , any output by , and any output by .
3.3. -Nearest Neighbors (-NN)
Algorithm 1: Basic NN |
3.4. Blockchain System
- Decentralized: It is developed on a peer-to-peer network as a shared ledger, and there is no requirement of a trusted third-party.
- Tamper-proof: Consensus protocols are employed by Blockchain, such as Proof-of-Work (PoW). Thus, Data manipulation is impractical.
- Traceability: The rest participants can easily verify the transactions between two parties in a Blockchain system.
4. Problem Description
4.1. System Design
- ZigBee, 3rd generation (3G)/4th generation (4G), and Wireless Fidelity (WiFi) are examples of the wired or wireless network through which IoT devices can sense and transmit valuable information, including medical data, smart cities, etc. In this study, due to the lack of computational capabilities, IoT devices will not participate in the data sharing and analysis processes.
- Data providers gather all the data from IoT devices within their range. All the data comprises sensitive information, so all the data are encrypted using partially homomorphic encryption by the data provider and registered in a Blockchain.
- To gather the encrypted IoT data from all data providers, the Blockchain-based IoT platform serves as a distributed database, where protocols are maintained, and all data are recorded in a shared ledger. The built-in consensus mechanism ensures the sharing of IoT data in a secure and tamper-proof way.
- IoT data analysts intend to get a rooted perspicacity within the data registered in the Blockchain-based platform by using the existing analyzing techniques. Data analysts will obtain encrypted data from corresponding data providers in order to train the NN classifiers.
4.2. Threat Type
- Recognized Ciphertext Model. The data analyst can merely obtain the encrypted IoT data registered in the Blockchain Platform. The IoT data analysts can record intermediate outputs when training the secure algorithm, such as iteration steps.
- Recognized Background Model. The IoT data analyst expects to know more further details of shared data. However, from the shared ciphertext model, an IoT data analyst may gather more information by using her previous knowledge. To be more specific, the IoT data analyst can conspire with distinct IoT data providers to infer the sensitive information of other participants.
4.3. Design Purposes
- At the time of encountering curious-but-honest foe, the data analyst and individual data provider’s data are protected from disclosure.
- At the time of encountering more than one parties conspire with each other, the data analyst and individual data provider’s privacy also will be protected from disclosure.
5. The Construction of Secure -NN
5.1. System Overview
5.2. Encrypted Data Sharing via Blockchain
- The address of the data provider
- The encrypted version of data
- Name of the IoT device from where the data is generated
- The address of the data analyst
- The encrypted version of data
- Name of the IoT device from where the data is generated
5.3. Building Blocks
5.3.1. -NN
5.3.2. Secure Polynomial Operations (SPO)
5.3.3. Secure Biasing Operations (SBO)
Algorithm 2: Secure Comparison |
5.3.4. Secure Comparison (SC)
5.4. Training Algorithm of Secure -NN
Algorithm 3: Secure NN Training Algorithm |
6. Security Analysis
6.1. Background of Security Proof
6.2. Security Proof for Secure Comparison
6.3. Security Proof for Secure -NN Training Algorithm
7. Performance Evaluation
7.1. Experiment Setup
7.1.1. Testbed
7.1.2. Dataset
7.1.3. Float Format Conversion
7.1.4. Key Length Setting
7.2. Evaluation Parameters
7.3. Efficiency
7.3.1. Building Blocks Evaluation
7.3.2. Scalability Evaluation
8. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Hasan, A.S.M.T.; Qu, Q.; Li, C.; Chen, L.; Jiang, Q. An Effective Privacy Architecture to Preserve User Trajectories in Reward-Based LBS Applications. ISPRS Int. J. Geo-Inf. 2018, 7, 53. [Google Scholar] [CrossRef] [Green Version]
- Vongsingthong, S.; Smanchat, S. Internet of Things: A review of applications & technologies. Suranaree J. Sci. Technol. 2014, 1, 359–374. [Google Scholar]
- Zhang, Y.; Yu, R.; Nekovee, M.; Liu, Y.; Xie, S.; Gjessing, S. Cognitive machine-to-machine communications: Visions and potentials for the smart grid. IEEE Netw. 2012, 26, 6–13. [Google Scholar] [CrossRef]
- Provost, F.; Kohavi, R. On applied research in machine learning. Mach. Learn. Boston 1998, 30, 127–132. [Google Scholar] [CrossRef]
- Likas, A.; Vlassis, N.; Verbeek, J.J. The global k-means clustering algorithm. Pattern Recognit. 2003, 36, 451–461. [Google Scholar] [CrossRef] [Green Version]
- Soucy, P.; Mineau, G.W. A simple K-NN algorithm for text categorization. In Proceedings of the 2001 IEEE International Conference on Data Mining IEEE, San Jose, CA, USA, 29 November–2 December 2001. [Google Scholar]
- Barlow, H.B. Unsupervised learning. Neural Comput. 1989, 1, 295–311. [Google Scholar] [CrossRef]
- Anliker, U.; Ward, J.A.; Lukowicz, P.; Troster, G.; Dolveck, F.; Baer, M.; Keita, F.; Schenker, E.B.; Catarsi, F.; Coluccini, L.; et al. AMON: A wearable multiparameter medical monitoring and alert system. IEEE Trans. Inf. Technol. Biomed. 2004, 8, 415–427. [Google Scholar] [CrossRef] [Green Version]
- Baig, M.M.; Gholamhosseini, H. Smart health monitoring systems: An overview of design and modeling. J. Med. Syst. 2013, 37, 1–14. [Google Scholar] [CrossRef]
- Lee, H.; Choi, T.K.; Lee, Y.B.; Cho, H.R.; Ghaffari, R.; Wang, L.; Choi, H.J.; Chung, T.D.; Lu, N.; Hyeon, T.; et al. A graphene-based electrochemical device with thermoresponsive microneedles for diabetes monitoring and therapy. Nat. Nanotechnol. 2016, 11, 556–572. [Google Scholar] [CrossRef]
- Shen, M.; Wei, M.; Zhu, L.; Wang, M. Classification of encrypted traffic with second-order markov chains and application attribute bigrams. IEEE Trans. Inf. Forensics Secur. 2017, 12, 1830–1843. [Google Scholar] [CrossRef]
- Sun, C.; Shrivastava, A.; Singh, S.; Gupta, A. Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 843–852. [Google Scholar]
- Shen, M.; Ma, B.; Zhu, L.; Mijumbi, R.; Du, X.; Hu, J. Cloud-based approximate constrained shortest distance queries over encrypted graphs with privacy protection. IEEE Trans. Inf. Forensics Secur. 2018, 13, 940–953. [Google Scholar] [CrossRef] [Green Version]
- Bost, R.; Popa, R.A.; Tu, S.; Goldwasser, S. Machine learning classification over encrypted data. In Proceedings of the Network and Distributed System Security Symposium, San Diego, CA, USA, 23–26 February 2014. [Google Scholar]
- Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H.B.; Mironov, I.; Talwar, K.; Zhang, L. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security; ACM: New York, NY, USA, 2016; pp. 308–318. [Google Scholar]
- Wang, Q.; Hu, S.; Du, M.; Wang, J.; Ren, K. Learning privately: Privacy-preserving canonical correlation analysis for cross-media retrieval. In Proceedings of the IEEE INFOCOM 2017—IEEE Conference on Computer Communications, Atlanta, GA, USA, 1–4 May 2017; pp. 1–9. [Google Scholar]
- Rahulamathavan, Y.; Phan, R.C.W.; Veluru, S.; Cumanan, K.; Rajarajan, M. Privacy-preserving multi-class support vector machine for outsourcing the data classification in cloud. IEEE Trans. Dependable Secure Comput. 2014, 11, 467–479. [Google Scholar] [CrossRef] [Green Version]
- Li, H.; Zhu, L.; Shen, M.; Gao, F.; Tao, X.; Liu, S. Blockchain-based data preservation system for medical data. J. Med. Syst. 2018, 42, 141. [Google Scholar] [CrossRef]
- Qi, Y.; Atallah, M.J. Efficient privacy-preserving k-nearest neighbor search. In Proceedings of the 28th International Conference on Distributed Computing Systems, Beijing, China, 17–20 June 2008; pp. 311–319. [Google Scholar]
- Zhan, J.Z.; Chang, L.; Matwin, S. Privacy preserving k-nearest neighbor classification. IJ Netw. Secur. 2005, 1, 46–51. [Google Scholar]
- Ni, W.; Gu, M.; Chen, X. Location privacy-preserving k nearest neighbor query under user’s preference. Knowl. Based Syst. 2016, 103, 19–27. [Google Scholar] [CrossRef]
- Rong, H.; Wang, H.M.; Liu, J.; Xian, M. Privacy-preserving k-nearest neighbor computation in multiple cloud environments. IEEE Access 2016, 4, 9589–9603. [Google Scholar] [CrossRef]
- Songhori, E.M.; Hussain, S.U.; Sadeghi, A.R.; Koushanfar, F. Compacting privacy-preserving k-nearest neighbor search using logic synthesis. In Proceedings of the 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 8–12 June 2015; pp. 1–6. [Google Scholar]
- Wu, W.; Parampalli, U.; Liu, J.; Xian, M. Privacy preserving k-nearest neighbor classification over encrypted database in outsourced cloud environments. World Wide Web 2019, 22, 101–123. [Google Scholar] [CrossRef]
- Park, J.; Lee, D.H. Privacy preserving k-nearest neighbor for medical diagnosis in e-health cloud. J. Healthc. Eng. 2018, 2018. [Google Scholar] [CrossRef] [Green Version]
- Yang, S.; Tang, S.; Zhang, X. Privacy-preserving k nearest neighbor query with authentication on road networks. J. Parallel Distrib. Comput. 2019, 134, 25–36. [Google Scholar] [CrossRef]
- Xiong, L.; Chitti, S.; Liu, L. K nearest neighbor classification across multiple private databases. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management, New York, NY, USA, 5–11 November 2006; pp. 840–841. [Google Scholar] [CrossRef]
- Zhang, F.; Zhao, G.; Xing, T. Privacy-preserving distributed k-nearest neighbor mining on horizontally partitioned multi-party data. In International Conference on Advanced Data Mining and Applications; Springer: Berlin/Heidelberg, Germany, 2009; pp. 755–762. [Google Scholar]
- Shen, M.; Tang, X.; Zhu, L.; Du, X.; Guizani, M. Privacy-Preserving Support Vector Machine Training Over Blockchain-Based Encrypted IoT Data in Smart Cities. IEEE Internet Things J. 2019, 6, 7702–7712. [Google Scholar] [CrossRef]
- Huang, M.; Han, H.; Wang, H.; Li, L.; Zhang, Y.; Bhatti, U.A. A Clinical Decision Support Framework for Heterogeneous Data Sources. IEEE J. Biomed. Health Inform. 2018, 22, 1824–1833. [Google Scholar] [CrossRef] [PubMed]
- Can, Y.S.; Chalabianloo, N.; Ekiz, D.; Ersoy, C. Continuous stress detection using wearable sensors in real life: Algorithmic programming contest case study. Sensors 2019, 19, 1849. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Yin, H.; Jha, N.K. A health decision support system for disease diagnosis based on wearable medical sensors and machine learning ensembles. IEEE Trans. Multi-Scale Comput. Syst. 2017, 3, 228–241. [Google Scholar] [CrossRef]
- Katz, J.; Lindell, Y. Introduction to modern cryptography. In CRC Cryptography and Network Security Series; CRC Press: Boca Raton, FL, USA, 2014. [Google Scholar]
- FG-Serrano, F.-J.; N-Vzquez, A.; A-Martn, A. Training Support Vector Machines with privacy-protected data. Pattern Recognit. 2017, 72, 93–107. [Google Scholar] [CrossRef]
- Cock, M.; Dowsley, R.; Nascimento, A.C.A.; Newman, S.C. Fast, privacy preserving linear regression over distributed datasets based on pre-distributed data. In Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security, AISec ’15; ACM: New York, NY, USA, 2015; pp. 3–14. [Google Scholar]
- Graepel, T.; Lauter, K.; Naehrig, M. Ml confidential: Machine learning on encrypted data. In International Conference on Information Security and Cryptology—ICISC 2012; Springer: Berlin/Heidelberg, Germany, 2013; pp. 1–21. [Google Scholar]
- Liu, X.; Lu, R.; Ma, J.; Chen, L.; Qin, B. Privacy-preserving patientcentric clinical decision support system on naive bayesian classification. IEEE J. Biomed. Health Inform. 2016, 20, 655–668. [Google Scholar] [CrossRef]
- Vaidya, J.; Shafiq, B.; Fan, W.; Mehmood, D.; Lorenzi, D. A random decision tree framework for privacy-preserving data mining. IEEE Trans. Dependable Secure Comput. 2014, 11, 399–411. [Google Scholar] [CrossRef]
- Aono, Y.; Hayashi, T.; Phong, L.T.; Wang, L. Privacy-preserving logistic regression with distributed data sources via homomorphic encryption. IEICE Trans. Inf. Syst. 2016, 99, 2079–2089. [Google Scholar] [CrossRef] [Green Version]
- Aono, Y.; Hayashi, T.; P, L.T.; Wang, L. Scalable and secure logistic regression via homomorphic encryption. In Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, CODASPY ’16; ACM: New York, NY, USA, 2016; pp. 142–144. [Google Scholar]
- Hasan, A.S.M.T.; Jiang, Q.; Chen, H.; Wang, S. A New Approach to Privacy-Preserving Multiple Independent Data Publishing. Appl. Sci. 2018, 8, 783. [Google Scholar] [CrossRef] [Green Version]
- Hasan, A.S.M.T.; Jiang, Q.; Li, C. An Effective Grouping Method for Privacy-Preserving Bike Sharing Data Publishing. Future Internet 2017, 9, 65. [Google Scholar] [CrossRef] [Green Version]
- Hasan, A.S.M.T.; Jiang, Q.; Luo, J.; Li, C.; Chen, L. An effective value swapping method for privacy preserving data publishing. Secur. Comm. Netw. 2016, 9, 3219–3228. [Google Scholar] [CrossRef]
- De Cock, M.; Dowsley, R.; Horst, C.; Katti, R.; Nascimento, A.; Poon, W.; Truex, S. Efficient and Private Scoring of Decision Trees, Support Vector Machines and Logistic Regression Models based on PreComputation. IEEE Trans. Dependable Secure Comput. 2017, 16, 217–230. [Google Scholar] [CrossRef]
- Wang, W.; Vong, C.; Yang, Y.; Wong, P. Encrypted image classification based on multilayer extreme learning machine. Multidimens. Syst. Signal Process. 2017, 28, 851–865. [Google Scholar] [CrossRef]
- Zhu, H.; Liu, X.; Lu, R.; Li, H. Efficient and privacy-preserving online medical prediagnosis framework using nonlinear svm. IEEE J. Biomed. Health Inform. 2017, 21, 838–850. [Google Scholar] [CrossRef] [PubMed]
- Goldreich, O. Foundations of Cryptography: Volume 2, Basic Applications; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
- Canetti, R. Security and composition of multiparty cryptographic protocols. J. Cryptol. 2000, 13, 143–202. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Gao, F.; Zhu, L.; Shen, M.; Sharif, K.; Wan, Z.; Ren, K. A Blockchain-based privacy-preserving payment mechanism for vehicleto-grid networks. IEEE Netw. 2018, 32, 184–192. [Google Scholar] [CrossRef]
- Shen, M.; Ma, B.; Zhu, L.; Du, X.; Xu, K. Secure phrase search for intelligent processing of encrypted data in cloud-based iot. IEEE Internet Things J. 2018, 6, 1998–2008. [Google Scholar] [CrossRef] [Green Version]
- Zhu, L.; Tang, X.; Shen, M.; Du, X.; Guizani, M. Privacy-preserving ddos attack detection using cross-domain traffic in software defined networks. IEEE J. Selec. Areas Commun. 2018, 36, 628–643. [Google Scholar] [CrossRef]
- Du, X.; Guizani, M.; Xiao, Y.; Chen, H. A routing-driven elliptic curve cryptography based key management scheme for heterogeneous sensor networks. IEEE Trans. Wirel. Commun. 2009, 8, 1223–1229. [Google Scholar] [CrossRef]
- Xiao, Y.; Rayi, V.K.; Sun, B.; Du, X.; Hu, F.; Galloway, M. A survey of key management schemes in wireless sensor networks. Comput. Commun. 2007, 30, 2314–2341. [Google Scholar] [CrossRef]
- Du, X.; Xiao, Y.; Guizani, M.; Chen, H.H. An effective key management scheme for heterogeneous sensor networks. Ad Hoc Netw. 2007, 5, 24–34. [Google Scholar] [CrossRef]
- Dheeru, D.; Karra, T.E. UCI Mach Learning Repository; University of California, School of Information and Computer Science: Irvine, CA, USA, 2017. [Google Scholar]
- Detrano, R.; Janosi, A.; Steinbrunn, W.; Pfisterer, M.; Schmid, J.; Sandhu, S.; Guppy, K.H.; Lee, S.; Froelicher, V. International application of a new probability algorithm for the diagnosis of coronary artery disease. Am. J. Cardiol. 1989, 64, 304–310. [Google Scholar] [CrossRef]
Signs | Interpretations | Signs | Interpretations |
---|---|---|---|
D | dataset | d and | model parameters |
d | distance | initial centroid | |
record in dataset | t | threshold | |
euler phi-function | class label | ||
m | size of the dataset D | the encryption of | |
A | Labeled data’s array | - | m under Paillier |
Datasets | Instances Number | Attributes Number | Discrete Attributes | Numerical Attributes |
---|---|---|---|---|
BCWD | 699 | 9 | 0 | 9 |
HDD | 303 | 13 | 13 | 0 |
DD | 768 | 9 | 0 | 9 |
Parameter | Model | Datasets | ||
---|---|---|---|---|
BCWD | HDD | DD | ||
Accuracy | SVM | |||
Secure SVM | ||||
NN (t = 8) | ||||
Secure NN (t = 8) | ||||
Precision | SVM | |||
Secure SVM | ||||
NN (t = 8) | ||||
Secure NN (t = 8) | ||||
Recall | SVM | |||
Secure SVM | ||||
NN (t = 8) | ||||
Secure NN (t = 8) |
Dataset | Time | Secure SVM | Secure NN |
---|---|---|---|
BCWD | Total | 3674 s | 3357.2 s |
P | 2789 s | 2534 s | |
C | 1066 s | 860 s | |
3462 s | 3113 s | ||
HDD | Total | 2735 s | 2534 s |
P | 1761 s | 1520 s | |
C | 924 s | 765 s | |
2333 s | 1922 s | ||
DD | Total | 3959 s | 3709 s |
P | 3199 s | 2920 s | |
C | 1045 s | 995 s | |
3773 s | 3527 s |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Haque, R.U.; Hasan, A.S.M.T.; Jiang, Q.; Qu, Q. Privacy-Preserving K-Nearest Neighbors Training over Blockchain-Based Encrypted Health Data. Electronics 2020, 9, 2096. https://doi.org/10.3390/electronics9122096
Haque RU, Hasan ASMT, Jiang Q, Qu Q. Privacy-Preserving K-Nearest Neighbors Training over Blockchain-Based Encrypted Health Data. Electronics. 2020; 9(12):2096. https://doi.org/10.3390/electronics9122096
Chicago/Turabian StyleHaque, Rakib Ul, A S M Touhidul Hasan, Qingshan Jiang, and Qiang Qu. 2020. "Privacy-Preserving K-Nearest Neighbors Training over Blockchain-Based Encrypted Health Data" Electronics 9, no. 12: 2096. https://doi.org/10.3390/electronics9122096