Maximized PrivacyPreserving Outsourcing on Support Vector Clustering
Abstract
:1. Introduction
 (1)
 A reformative SVC with elementary operations (RSVCEO) is proposed with an updated dual coordinate descent (DCD) solver for the dual problem. In the training phase, it prefers a linear method to approach the objective without losing validity. Furthermore, it easily trades time for space by calculating kernel values on demand.
 (2)
 A fresh convex decomposition clustering labeling (FCDCL) method is presented, by which iterations are not required for convex decomposition. In connectivity analysis, the traditional segmer sampling checks in feature space are also avoided. Consequently, to complete calculations in each step of the outsourcing scene, a single interaction is sufficient for prototype finding and connectivity analysis. Without a mass of essential iterations, the labeling phase will not hurt the outsourcing.
 (3)
 Towards privacypreserving, MPPSVC is proposed by introducing homomorphic encryption, twoparty computation, and linear transformation to reconstruct RSVCEO. Although these protocols limit the calculation types, MPPSVC works well with maximized outsourcing capability. In the field of SVC, to the best of our knowledge, it is the first known client–server privacypreserving clustering framework solving the dual problem without interaction. Furthermore, it can securely outsource the cluster number analysis and cluster assignment on demand.
2. Preliminaries
2.1. Support Vector Clustering
2.1.1. Estimation of a Trained Support Function
2.1.2. Cluster Assignments
2.2. Homomorphic Encryption
 (1)
 Encryption: Ciphertext $\left[\phantom{\rule{0.166667em}{0ex}}\left[m\right]\phantom{\rule{0.166667em}{0ex}}\right]={g}^{m}\xb7{r}^{n}\phantom{\rule{3.33333pt}{0ex}}mod\phantom{\rule{0.277778em}{0ex}}{n}^{2}$.
 (2)
 Decryption: Plaintext $m=(L(\left[\phantom{\rule{0.166667em}{0ex}}\left[m\right]\phantom{\rule{0.166667em}{0ex}}\right]\phantom{\rule{3.33333pt}{0ex}}mod\phantom{\rule{0.277778em}{0ex}}{n}^{2})/L({g}^{\lambda}\phantom{\rule{3.33333pt}{0ex}}mod\phantom{\rule{0.277778em}{0ex}}{n}^{2}))\phantom{\rule{3.33333pt}{0ex}}mod\phantom{\rule{0.277778em}{0ex}}n$.
 (3)
 Additive homomorphism: $\left[\phantom{\rule{0.166667em}{0ex}}[{m}_{1}+{m}_{2}]\phantom{\rule{0.166667em}{0ex}}\right]=\left[\phantom{\rule{0.166667em}{0ex}}\left[{m}_{1}\right]\phantom{\rule{0.166667em}{0ex}}\right]\xb7\left[\phantom{\rule{0.166667em}{0ex}}\left[{m}_{2}\right]\phantom{\rule{0.166667em}{0ex}}\right]$, $\left[\phantom{\rule{0.166667em}{0ex}}[\alpha \xb7{m}_{1}]\phantom{\rule{0.166667em}{0ex}}\right]={\left[\phantom{\rule{0.166667em}{0ex}}\left[{m}_{1}\right]\phantom{\rule{0.166667em}{0ex}}\right]}^{\alpha}$. Here, ${m}_{1}$ and ${m}_{2}$ are two numeric features in PD and $\alpha $ is an integer constant. In practical situations, if $\alpha <0$, its equivalence class value $\alpha \phantom{\rule{3.33333pt}{0ex}}mod\phantom{\rule{0.277778em}{0ex}}n$ can be a substitution.
2.3. Privacy Violation and Limited Operations of SVC Outsourcing
2.3.1. Privacy Violation of SVC
2.3.2. Limited Operations of Homomorphic Encryption
3. Reformative Support Vector Clustering with Elementary Operations
3.1. DCD Solver for SVC’s Dual Problem
Algorithm 1 DCD solver for SVC’s dual problem in Equation (2). 
Require: Dataset $\mathcal{X}$, kernel width q, and penalty C Ensure: Coefficient vector $\mathit{\beta}$

3.2. Convex Decomposition without Iterative Analysis
 (1)
 If the division position is not in the SVs subset in which each one links up two nearest neighboring convexity and concavity, for instance, ${\mathbf{x}}_{21}$ is replaced by ${\mathbf{x}}_{22}$, then the corresponding convex hull ${\mathit{S}}_{2}$ has to be further split into two convex hulls enclosed by $\{{\mathbf{x}}_{21},{\mathbf{x}}_{22},{\mathbf{x}}_{25}\}$ and $\{{\mathbf{x}}_{22}^{\prime},{\mathbf{x}}_{23},{\mathbf{x}}_{24}\}$, respectively, even though all these SVs converge to ${\mathit{S}}_{2}$. Here, ${\mathbf{x}}_{22}^{\prime}$ is an imaginary sample that is extremely similar to ${\mathbf{x}}_{22}$. Because no overlapping region or intersection of vertices of convex hulls is allowed, unfortunately, this result conflicts with the definition that SVs in a convex hull will converge to the same SEV.
 (2)
 From the other perspective, if ${\mathbf{x}}_{21}$ is not in the division position which only includes $\{{\mathbf{x}}_{11},{\mathbf{x}}_{17},{\mathbf{x}}_{25}\}$, we have to find a substitute ${\mathbf{x}}_{21}^{\prime}$ to construct the convex hull ${\mathit{S}}_{2}$. ${\mathbf{x}}_{21}^{\prime}$ might locate between ${\mathbf{x}}_{11}$ and ${\mathbf{x}}_{21}$ or between ${\mathbf{x}}_{21}$ and ${\mathbf{x}}_{22}$. For the prior case, it means ${\mathbf{x}}_{21}$ is not the closest SV to ${\mathbf{x}}_{11}$ that violates the premise of ${\mathbf{x}}_{21}$ linking up two nearest neighboring convex hulls. For the latter case, no matter ${\mathbf{x}}_{21}^{\prime}$ is a new arrival SV or is just ${\mathbf{x}}_{22}$, the concavity will be formed along $\overrightarrow{{\mathbf{x}}_{12}{\mathbf{x}}_{11}{\mathbf{x}}_{21}^{\prime}}$ or $\overrightarrow{{\mathbf{x}}_{12}{\mathbf{x}}_{11}{\mathbf{x}}_{22}}$. However, it becomes true only when ${\mathbf{x}}_{21}$ is no longer on the cluster boundary or ${\mathbf{x}}_{11}$ converging to ${\mathit{S}}_{2}$. Unfortunately, this violates the hypothesis of ${\mathbf{x}}_{21}$ being a SV and ${\mathbf{x}}_{11}$ converging to ${\mathit{S}}_{1}$.
Algorithm 2 Noniterative convex decomposition. 
Require: SVs set ${\mathcal{X}}_{S}$, decomposition threshold ${\eta}_{1}$ Ensure: Convex hulls ${S}_{\mathrm{CH}}$, set of triples ${S}_{\mathrm{Tri}}$

3.3. Connectivity Analysis Irrespective of Sampling
3.4. Implementation of RSVCEO
Algorithm 3 RSVCEO. 
Require: Dataset $\mathcal{X}$, kernel width q, penalty C, and thresholds ${\eta}_{1},{\eta}_{2}$ or the cluster number K Ensure: Clustering labels for all the data samples

3.5. Time Complexity of RSVCEO
4. Maximized PrivacyPreserving Outsourcing on SVC
4.1. PrivacyPreserving Primitives
4.1.1. Secure Multiplication
4.1.2. Secure Comparison
4.1.3. Secure Vector Distance Measurement
Algorithm 4 Secure comparison protocol. 
Require: Server: $\left[\phantom{\rule{0.166667em}{0ex}}\left[{m}_{1}\right]\phantom{\rule{0.166667em}{0ex}}\right]$, $\left[\phantom{\rule{0.166667em}{0ex}}\left[{m}_{2}\right]\phantom{\rule{0.166667em}{0ex}}\right]$, ${K}_{\mathrm{pub}}$; Client: ${K}_{\mathrm{pri}}$ Ensure: Server output $\left[\phantom{\rule{0.166667em}{0ex}}[max({m}_{1},{m}_{2})]\phantom{\rule{0.166667em}{0ex}}\right]$ or $\left[\phantom{\rule{0.166667em}{0ex}}[min({m}_{1},{m}_{2})]\phantom{\rule{0.166667em}{0ex}}\right]$

4.1.4. Secure 1NN Query
Algorithm 5 Secure 1NN query protocol. 
Require: Encrypted data $\left[\phantom{\rule{0.166667em}{0ex}}\right[\mathbf{x}\left]\phantom{\rule{0.166667em}{0ex}}\right]$ and dataset SVs set $\left[\phantom{\rule{0.166667em}{0ex}}\right[\mathcal{X}\left]\phantom{\rule{0.166667em}{0ex}}\right]$ Ensure: the nearest neighbor $\left[\phantom{\rule{0.166667em}{0ex}}\left[{\mathbf{x}}_{1\mathrm{NN}}\right]\phantom{\rule{0.166667em}{0ex}}\right]$ in $\left[\phantom{\rule{0.166667em}{0ex}}\right[\mathcal{X}\left]\phantom{\rule{0.166667em}{0ex}}\right]$

4.2. The Proposed MPPSVC Model
4.2.1. Preventing Data Recovery from Kernel Matrix
4.2.2. PrivacyPreserving DCD Solver
4.2.3. PrivacyPreserving Convex Decomposition
4.2.4. PrivacyPreserving Connectivity Analysis
Algorithm 6 Connectivity analysis in ED with K. 
Require: Server: $\left[\phantom{\rule{0.166667em}{0ex}}\left[{S}_{\mathrm{CH}}\right]\phantom{\rule{0.166667em}{0ex}}\right]$, $\left[\phantom{\rule{0.166667em}{0ex}}\left[{S}_{\mathrm{Tri}}\right]\phantom{\rule{0.166667em}{0ex}}\right]$, ${K}_{\mathrm{pub}}$, K; Client: ${K}_{\mathrm{pri}}$ Ensure: Server output encrypted array of labels $\left[\phantom{\rule{0.166667em}{0ex}}\left[{L}_{\mathrm{SV}}\right]\phantom{\rule{0.166667em}{0ex}}\right]$

Algorithm 7 Connectivity analysis in ED with ${\eta}_{2}$. 
Require: Server: $\left[\phantom{\rule{0.166667em}{0ex}}\left[{S}_{\mathrm{CH}}\right]\phantom{\rule{0.166667em}{0ex}}\right]$, $\left[\phantom{\rule{0.166667em}{0ex}}\left[{S}_{\mathrm{Tri}}\right]\phantom{\rule{0.166667em}{0ex}}\right]$, ${K}_{\mathrm{pub}}$, $\left[\phantom{\rule{0.166667em}{0ex}}\left[{\eta}_{2}\right]\phantom{\rule{0.166667em}{0ex}}\right]$; Client: ${K}_{\mathrm{pri}}$ Ensure: Server output encrypted array of labels $\left[\phantom{\rule{0.166667em}{0ex}}\left[{L}_{\mathrm{SV}}\right]\phantom{\rule{0.166667em}{0ex}}\right]$

4.2.5. PrivacyPreserving Remaining Samples Labeling
4.3. Work Mode of MPPSVC
4.4. Analysis of MPPSVC
4.4.1. Time Complexity of MPPSVC
 (1)
 For step 1, “Precomputed Q” and “Instance $K(\xb7,\xb7)$” denote doing Algorithm 1 of RSVCEO with the precomputed kernel matrix or calculating the row of kernel values on demand, respectively. Although $O\left(d{N}^{2}\right)$ seems timeconsuming, in practice, it is frequently much lower than $O\left({N}^{3}\right)$ of the classical SVC, which consumes space $O\left({N}^{2}\right)$. For MPPSVC, the client is recommended to outsource a precomputation in $O\left({N}^{2}\right)$ for the orthogonal matrix M following [27,28] before Step 1. Then, nothing has to be done by the client except decrypting $\overline{\mathit{\beta}}$ by $\mathit{\beta}=M\overline{\mathit{\beta}}$ in $O\left(N\right)$. The major workload of $O\left({N}^{2}\right)$ is moved to the server, which may afford the space complexity of $O\left({N}^{2}\right)$.
 (2)
 Step 2 is not in the classical SVC. The most recent convex decomposition based method [22] takes $O\left(\zeta {N}_{\mathrm{SV}}\right)$ and the SEVs based methods [12,29] require $O\left(\zeta N\right)$. Here, $\zeta $ is the number of iterations. For RSVCEO, the difference brought by the “Instance $K(\xb7,\xb7)$” is calculating $K({\mathbf{x}}_{j},\mathbf{x})$ with ddimensional input before getting the gradient in Equation (12). For MPPSVC, upon using the kernel matrix of SVs or not, the client’s complexity is $O(\ell {N}_{\mathrm{SV}}^{2})$ where $\ell =\{1,d\}$. Meanwhile, the server is designated by a polling program in $O\left(d{N}_{\mathrm{SV}}^{2}\right)$. Notice that the essential tasks of Algorithm 2 for the client in ED have been cut down even though they have similar time complexity with that of in PD.
 (3)
 In Step 3, connectivity analysis of the classical SVC is m times sampling checks for each SVspair. However, calculating Equation (3) requires ${N}_{\mathrm{SV}}$ SVs, the whole consumption is up to $O\left(m{N}_{\mathrm{SV}}^{3}\right)$. On the contrary, RSVCEO only requires $O\left(\rho {N}_{\mathrm{CH}}\right)$ due to the direct use of the convergence directions. $\rho \in (2,3]$ is because Algorithms 6 and 7 are provided for choice. Similarly, the client responses SMP, SVDMP, and SCP in Algorithm 6 or SCP in Algorithm 7 with $O\left({N}_{\mathrm{CH}}\right)$.
 (4)
 Step 4 is particular for the classical SVC while the prior leaves an adjacency matrix. Its complexity is $O\left({N}_{\mathrm{SV}}^{2}\right)$.
 (5)
 Similar to classification, the traversal of the whole set in Step 5 cannot be avoided for every method. Its complexity is $O\left(dN{N}_{\mathrm{SV}}\right)$ or $O\left(dN{N}_{\mathrm{CH}}\right)$ except for the server in MPPSVC. Using S1NNQ, the major operations are carried out with the client under protocols of SVDMP and SCP.
4.4.2. Communication Complexity of MPPSVC
4.4.3. Security of MPPSVC under the SemiHonest Model
 (1)
 According to Section 4.2.1 and Section 4.2.2, the execution image of the server is given by $\overline{Q}$ and C. $\overline{Q}$ is a diagonal matrix protected by the matrix M secretly held by the client and C is a singleuse parameter without strict limitation. Notice that from $\overline{Q}$ to the kernel matrix $\tilde{Q}$ is a onetomany mapping. No one can recover an Ndimensional vector by only one number. Without any plain item of $\tilde{Q}$, the server cannot infer any data sample even though it occasionally gets several data samples. When the server finishes Algorithm 1, the output $\left[\phantom{\rule{0.166667em}{0ex}}\left[\overline{\mathit{\beta}}\right]\phantom{\rule{0.166667em}{0ex}}\right]$ is naturally protected by M.
 (2)
 For the second step, the major works are carried out on the client side as a response to the server. As marked in Figure 3, the accessible sensitive data are $\left[\phantom{\rule{0.166667em}{0ex}}\left[{\mathcal{X}}_{\mathrm{SV}}\right]\phantom{\rule{0.166667em}{0ex}}\right]$, $\left[\phantom{\rule{0.166667em}{0ex}}\left[\overrightarrow{\mathbf{x}}\right]\phantom{\rule{0.166667em}{0ex}}\right]$, $\left[\phantom{\rule{0.166667em}{0ex}}\left[{\eta}_{1}\right]\phantom{\rule{0.166667em}{0ex}}\right]$, $\left[\phantom{\rule{0.166667em}{0ex}}\left[{S}_{\mathrm{CH}}\right]\phantom{\rule{0.166667em}{0ex}}\right]$ and $\left[\phantom{\rule{0.166667em}{0ex}}\left[{S}_{\mathrm{Tri}}\right]\phantom{\rule{0.166667em}{0ex}}\right]$. However, all of them are encrypted by ${K}_{\mathrm{pub}}$.
 (3)
 Step 3 depends on the output of Step 2. It includes SMP, SCP, and SVDMP whose prototypes are proved by [20]. The server can only get the number of clusters, but cannot infer any relationship between data samples in PD. Furthermore, the client can easily hide the real number by tuning the predefined parameters K or ${\eta}_{2}$. Then, an uncertain cluster number is meaningless for the server.
 (4)
 Step 5 employs S1NNQ. The server cannot infer the actual label for a plain data sample, even if it occasionally has several samples.
4.4.4. Security of MPPSVC under the Malicious Model
5. Experimental Results
5.1. Experimental Setup
5.2. Experimental Dataset
5.3. Experiments in the PlainDomain
5.3.1. Analysis of Parameter Sensitivity
5.3.2. Analysis of Iteration Sensitivity
5.3.3. Performance Related to Kernel Utilization
5.3.4. Benchmark Results for Accuracy Comparison
5.3.5. Effectiveness of Capturing Data Distribution
5.4. Experiments in the EncryptedDomain
5.4.1. Performance in the EncryptedDomain
5.4.2. Computational Complexity
6. Related Work
 (1)
 For model training, the core is solving the dual problem in Equation (2). Major studies prefer generic optimization algorithms, e.g., gradient descent, sequential minimal optimization [4], and evolutionary strategies [46]. Later, studies rewrote the dual problem by introducing the Jaynes maximum entropy [47], the positionbased weight [44], and the relationship amongst SVs [26]. However, conducting a solver with the full dataset suffers from huge consumption of kernel matrix. Thus, FSSVC [26] steadily selects the boundaries while VCC [8] samples a predefined ratio $\theta \in (0,1]$ of data. Other methods related to reducing the working set and divideandconquer strategy were surveyed in [4]. However, bottlenecks still easily appear due to the nonlinear strategy and the precomputed kernel matrix. Thus, FRSVC [22] employs a linear method to seek a balance between efficiency and memory cost.
 (2)
 For the labeling phase, connectivity analysis adopts a sampling check strategy for a long time. Reducing the number of sample pairs thus becomes the first consideration, for instance, using the full dataset by CG [1], the SVs by PSVC [44], the SEVs by RCG [24], and the transition points by ESVC [29,42]. Although the number of sample pairs is gradually reduced, they have a sideeffect of additional iterations in seeking SEVs or TS. Thus, CDCL [23] suggests a compromise way of using SVs to construct convex hulls, which are employed as substitutes of SEVs. For efficiency, CDCL reduces the average sample rate by a nonlinear sampling strategy. Later, FSSVC [26] and FRSVC [22] made further improvements by reducing the average sample m close to 1. Besides, Chiang and Hao [48] introduced a cell growth strategy, which starts at any data sphere, expands by absorbing fresh neighboring spheres, and splits if its density is reduced to a certain degree. Later, CCL [43] created a new way by checking the connectivity of two SVs according to a single distance calculation. However, too strict constraints emphasized on the solver degrade its performance. In fact, for these methods, the other pricey consumption is the adjacent matrix, which usually ranges from $O\left({N}_{\mathrm{SV}}^{2}\right)$ to $O\left({N}^{2}\right)$.
7. Conclusions and Future Work
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
 BenHur, A.; Horn, D.; Siegelmann, H.T.; Vapnik, V.N. Support Vector Clustering. J. Mach. Learn. Res. 2001, 2, 125–137. [Google Scholar] [CrossRef]
 Saltos, R.; Weber, R.; Maldonado, S. Dynamic RoughFuzzy Support Vector Clustering. IEEE Trans. Fuzzy Syst. 2017, 25, 1508–1521. [Google Scholar] [CrossRef]
 Ye, Q.; Zhao, H.; Li, Z.; Yang, X.; Gao, S.; Yin, T.; Ye, N. L1Norm Distance MinimizationBased Fast Robust Twin Support Vector kPlane Clustering. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 4494–4503. [Google Scholar] [CrossRef] [PubMed]
 Li, H.; Ping, Y. Recent Advances in Support Vector Clustering: Theory and Applications. Int. J. Pattern Recogn. Artif. Intell. 2015, 29, 1550002. [Google Scholar] [CrossRef]
 Sheng, Y.; Hou, C.; Si, W. Extract Pulse Clustering in Radar Signal Sorting. In Proceedings of the 2017 International Applied Computational Electromagnetics Society Symposium–Italy (ACES), Florence, Italy, 26–30 March 2017; pp. 1–2. [Google Scholar]
 Lawal, I.A.; Poiesi, F.; Anguita, D.; Cavallaro, A. Support Vector Motion Clustering. IEEE Trans. Circuits Syst. Video Technol. 2017, 27, 2395–2408. [Google Scholar] [CrossRef]
 Pham, T.; Le, T.; Dang, H. Scalable Support Vector Clustering Using Budget. arXiv 2017, arXiv:1709.06444v1. [Google Scholar]
 Kim, K.; Son, Y.; Lee, J. Voronoi CellBased Clustering Using a Kernel Support. IEEE Trans. Knowl. Data Eng. 2015, 27, 1146–1156. [Google Scholar] [CrossRef]
 Yu, Y.; Li, H.; Chen, R.; Zhao, Y.; Yang, H.; Du, X. Enabling Secure Intelligent Network with CloudAssisted PrivacyPreserving Machine Learning. IEEE Netw. 2019, 33, 82–87. [Google Scholar] [CrossRef]
 Song, C.; Ristenpart, T.; Shmatikov, V. Machine Learning Models that Remember Too Much. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS’ 2017, Dallas, TX, USA, 30 October–3 November 2017; ACM: New York, NY, USA, 2017; pp. 587–601. [Google Scholar]
 Dritsas, E.; Kanavos, A.; Trigka, M.; Sioutas, S.; Tsakalidis, A. Storage Efficient Trajectory Clustering and kNN for Robust Privacy Preserving SpatioTemporal Databases. Algorithms 2019, 12, 266. [Google Scholar] [CrossRef] [Green Version]
 Jung, K.H.; Lee, D.; Lee, J. Fast supportbased clustering method for largescale problems. Pattern Recogn. 2010, 43, 1975–1983. [Google Scholar] [CrossRef]
 Paillier, P. PublicKey Cryptosystems Based on Composite Degree Residuosity Classes. In Proceedings of the 17th International Conference on Theory and Application of Cryptographic Techniques (EUROCRYPT’ 99), Prague, Czech Republic, 2–6 May 1999; pp. 223–238. [Google Scholar]
 Shan, Z.; Ren, K.; Blanton, M.; Wang, C. Practical Secure Computation Outsourcing: A Survey. ACM Comput. Surv. 2018, 51, 31:1–31:40. [Google Scholar] [CrossRef]
 Liu, X.; Deng, R.; Choo, K.R.; Yang, Y. PrivacyPreserving Outsourced Support Vector Machine Design for Secure Drug Discovery. IEEE Trans. Cloud Comput. 2019, 1–14. [Google Scholar] [CrossRef]
 Rahulamathavan, Y.; Veluru, S.; Phan, R.C.W.; Chambers, J.A.; Rajarajan, M. PrivacyPreserving Clinical Decision Support System using Gaussian Kernel based Classification. IEEE J. Biomed. Health Inform. 2014, 18, 56–66. [Google Scholar] [CrossRef] [PubMed] [Green Version]
 Rahulamathavan, Y.; Phan, R.C.W.; Veluru, S.; Cumanan, K.; Rajarajan, M. PrivacyPreserving MultiClass Support Vector Machine for Outsourcing the Data Classification in Cloud. IEEE Trans. Dependable Secure Comput. 2014, 11, 467–479. [Google Scholar] [CrossRef] [Green Version]
 Karapiperis, D.; Verykios, V.S. An LSHbased Blocking Approach with A Homomorphic Matching Technique for Privacypreserving Record Linkage. IEEE Trans. Knowl. Data Eng. 2015, 27, 909–921. [Google Scholar] [CrossRef]
 Lin, K.P.; Chang, Y.W.; Chen, M.S. Secure Support Vector Machines Outsourcing with Random Linear Transformation. Knowl. Inf. Syst. 2015, 44, 147–176. [Google Scholar] [CrossRef]
 Samanthula, B.; Elmehdwi, Y.; Jiang, W. kNearest Neighbor Classification over Semantically Secure Encrypted Relational Data. IEEE Trans. Knowl. Data Eng. 2015, 27, 1261–1273. [Google Scholar] [CrossRef] [Green Version]
 Lin, K.P.; Chen, M.S. On the Design and Analysis of the PrivacyPreserving SVM Classifier. IEEE Trans. Knowl. Data Eng. 2011, 23, 1704–1717. [Google Scholar] [CrossRef]
 Ping, Y.; Tian, Y.; Guo, C.; Wang, B.; Yang, Y. FRSVC: Towards making support vector clustering consume less. Pattern Recogn. 2017, 69, 286–298. [Google Scholar] [CrossRef]
 Ping, Y.; Tian, Y.; Zhou, Y.; Yang, Y. Convex Decomposition Based Cluster Labeling Method for Support Vector Clustering. J. Comput. Sci. Technol. 2012, 27, 428–442. [Google Scholar] [CrossRef]
 Lee, J.; Lee, D. An Improved Cluster Labeling Method for Support Vector Clustering. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 461–464. [Google Scholar] [PubMed]
 Ping, Y.; Zhou, Y.; Yang, Y. A Novel Scheme for Accelerating Support Vector Clustering. Comput. Inform. 2011, 31, 1001–1026. [Google Scholar]
 Ping, Y.; Chang, Y.; Zhou, Y.; Tian, Y.; Yang, Y.; Zhang, Z. Fast and Scalable Support Vector Clustering for Largescale Data Analysis. Knowl. Inf. Syst. 2015, 43, 281–310. [Google Scholar] [CrossRef]
 Luo, C.; Zhang, K.; Salinas, S.; Li, P. SecFact: Secure Largescale QR and LU Factorizations. IEEE Trans. Big Data 2019, 1–13. [Google Scholar] [CrossRef]
 Zhou, L.; Li, C. Outsourcing EigenDecomposition and Singular Value Decomposition of Large Matrix to a Public Cloud. IEEE Access 2016, 4, 869–879. [Google Scholar] [CrossRef]
 Lee, J.; Lee, D. Dynamic Characterization of Cluster Structures for Robust and Inductive Support Vector Clustering. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1869–1874. [Google Scholar]
 Xu, R.; Wunsch, D.C. Clustering; A John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
 Frank, A.; Asuncion, A. UCI Machine Learning Repository. 2010. Available online: http://archive.ics.uci.edu/ml (accessed on 23 July 2018).
 Ping, Y.; Zhou, Y.; Xue, C.; Yang, Y. Efficient representation of text with multiple perspectives. J. China Univ. Posts Telecommun. 2012, 19, 101–111. [Google Scholar] [CrossRef]
 Graven, M.; DiPasquo, D.; Freitag, D.; McCallum, A.; Mitchell, T.; Nigam, K.; Slattery, S. Learning to Extract Symbolic Knowledge form The World Wide Web. In Proceedings of the 15th National Conference for Artificial Intelligence (AAAI’98), Madison, WI, USA, 26–30 July 1998; pp. 509–516. [Google Scholar]
 Lang, K. NewsWeeder: Learning to filter netnews. In Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, CA, USA, 9–12 July 1995; pp. 331–339. [Google Scholar]
 Lewis, D.D. Reuters21578 Text Categorization Collection. 1997. Available online: http://kdd.ics.uci.edu/databases/reuters21578/ (accessed on 19 March 2012).
 Hersh, W.R.; Buckley, C.; Leone, T.J.; Hickam, D.H. Ohsumed: An Interactive Retrieval Evaluation and New Large Test Collection for Research. In Proceedings of the 17th Annual ACM SIGIR Conference, Dublin, Ireland, 3–6 July 1994; pp. 192–201. [Google Scholar]
 UNIBS. The UNIBS Anonymized 2009 Internet Traces. 18 March 2010. Available online: http://www.ing.unibs.it/ntw/tools/traces (accessed on 12 May 2011).
 Peng, J.; Zhou, Y.; Wang, C.; Yang, Y.; Ping, Y. Early TCP Traffic Classification. J. Appl. Sci.Electron. Inf. Eng. 2011, 29, 73–77. [Google Scholar]
 Guo, C.; Zhou, Y.; Ping, Y.; Zhang, Z.; Liu, G.; Yang, Y. A Distance Sumbased Hybrid Method for Intrusion Detection. Appl. Intell. 2014, 40, 178–188. [Google Scholar] [CrossRef]
 UCI. Kdd Cup 99 Intrusion Detection Dataset. 1999. Available online: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html (accessed on 10 February 2016).
 Hsieh, C.J.; Chang, K.W.; Lin, C.J.; Keerthi, S.S.; Sundararajan, S. A Dual Coordinate Descent Method for Largescale Linear SVM. In Proceedings of the 25th International Conference on Machine Learning (ICML ’08), Helsinki, Finland, 5–9 July 2008; ACM: New York, NY, USA, 2008; pp. 408–415. [Google Scholar]
 Lee, D.; Jung, K.H.; Lee, J. Constructing Sparse Kernel Machines Using Attractors. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 20, 721–729. [Google Scholar]
 Lee, S.H.; Daniels, K.M. Cone Cluster Labeling for Support Vector Clustering. In Proceedings of the 6th SIAM Conference on Data Mining, Bethesda, MD, USA, 20–22 April 2006; pp. 484–488. [Google Scholar]
 Wang, C.D.; Lai, J.H. Position Regularized Support Vector Domain Description. Pattern Recogn. 2013, 46, 875–884. [Google Scholar] [CrossRef]
 Garcia, S.; Herrera, F. An Extension on “Statistical Comparisons of Classifiers over Multiple Datasets” for all Pairwise Comparisons. J. Mach. Learn. Res. 2008, 9, 2677–2694. [Google Scholar]
 Jun, S.H. Improvement of Support Vector Clustering using Evolutionary Programming and Bootstrap. Int. J. Fuzzy Logic Intell. Syst. 2008, 8, 196–201. [Google Scholar] [CrossRef] [Green Version]
 Guo, C.; Li, F. An Improved Algorithm for Support Vector Clustering based on Maximum Entropy Principle and Kernel Matrix. Expert Syst. Appl. 2011, 38, 8138–8143. [Google Scholar] [CrossRef]
 Chiang, J.H.; Hao, P.Y. A New Kernelbased Fuzzy Clustering Approach: Support Vector Clustering with Cell Growing. IEEE Trans. Fuzzy Syst. 2003, 11, 518–527. [Google Scholar] [CrossRef]
 Hua, J.; Shi, G.; Zhu, H.; Wang, F.; Liu, X.; Li, H. CAMPS: Efficient and PrivacyPreserving Medical Primary Diagnosis over Outsourced Cloud. Inf. Sci. 2019. [Google Scholar] [CrossRef]
 Sumana, M.; Hareesha, K. Modelling A Secure Support Vector Machine Classifier for Private Data. Int. J. Inf. Comput. Secur. 2018, 10, 25–41. [Google Scholar] [CrossRef]
 Jia, Q.; Guo, L.; Jin, Z.; Fang, Y. Preserving Model Privacy for Machine Learning in Distributed Systems. IEEE Trans. Parallel Distrib. Syst. 2018, 29, 1808–1822. [Google Scholar] [CrossRef]
 Liu, X.; Choo, K.R.; Deng, R.H.; Lu, R.; Weng, J. Efficient and PrivacyPreserving Outsourced Calculation of Rational Numbers. IEEE Trans. Dependable Secure Comput. 2018, 15, 27–39. [Google Scholar] [CrossRef]
 Salinas, S.; Luo, C.; Liao, W.; Li, P. Efficient Secure Outsourcing of Largescale Quadratic Programs. In Proceedings of the ASIA CCS ’16: 11th ACM on Asia Conference on Computer and Communications Security, Xi’an, China, 31 May–3 June 2016; ACM: New York, NY, USA, 2016; pp. 281–292. [Google Scholar]
 GonzalezSerrano, F.J.; NaviaVazquez, A.; AmorMartin, A. Training Support Vector Machines with privacyprotected data. Pattern Recogn. 2017, 72, 93–107. [Google Scholar] [CrossRef]
Classical SVC [1]  RSVCEO  MPPSVC  

PreComputed $\tilde{\mathit{Q}}$  Instance $\mathit{K}(\xb7,\xb7)$  Client  Server  
Step 1.  DCD Solver for coefficient indicating SVs^{*}  $O\left({N}^{3}\right)$  $O\left({N}^{2}\right)$  $O\left(d{N}^{2}\right)$  $O\left(N\right)$  $O\left({N}^{2}\right)$ 
Step 2.  Convex decomposition in one iter. using SVs  —  $O\left({N}_{\mathrm{SV}}^{2}\right)$  $O\left(d{N}_{\mathrm{SV}}^{2}\right)$  $O(\ell {N}_{\mathrm{SV}}^{2})$  $O\left(d{N}_{\mathrm{SV}}^{2}\right)$ 
Step 3.  Connectivity analysis withprior knowledge  $O(m{N}_{\mathrm{SV}}^{2}\xb7{N}_{\mathrm{SV}})$  $O\left(\rho {N}_{\mathrm{CH}}\right)$  $O\left({N}_{\mathrm{CH}}\right)$  $O\left(\rho {N}_{\mathrm{CH}}\right)$  
Step 4.  Find the connected components by DFS  $O\left({N}_{\mathrm{SV}}^{2}\right)$  —  —  —  
Step 5.  Labeling the remaining data samples  $O\left(dN{N}_{\mathrm{SV}}\right)$  $O\left(dN{N}_{\mathrm{SV}}\right)$ or $O\left(dN{N}_{\mathrm{CH}}\right)$  $O\left(N{N}_{\mathrm{SV}}\right)$ or $O\left(N{N}_{\mathrm{CH}}\right)$  $O\left(dN{N}_{\mathrm{SV}}\right)$ or $O\left(dN{N}_{\mathrm{CH}}\right)$ 
Datasets  Dataset Description  

Dims  Size  # of Classes  
wisconsin  9  683  2 
glass  9  214  7 
wine  13  178  3 
movement_libras (mLibras)  90  360  15 
P2P traffic (P2PT)  4  9206  4 
WebKB  4  4199  4 
abalone  7  4177  29 
Reuters21578(Reuters)  10  9990  10 
Ohsumed (Oh)  23  13,929  23 
20Newsgroups (20NG)  20  13,998  20 
shuttle (sh)  9  43,500  7 
kddcup99  9  494,021  5 
Dataset  CG  RCG  ESVC  CCL  FSVC  PSVC  CDCL  VCC  FSSVC  FRSVC  RSVCEO 

wisconsin  0.7793  0.8035  0.1344  0.9076  0.6687  0.2574  0.8685  0.8029 ± 0.0514  0.9248  0.8798  0.8632 
glass  0.2771  0.2823  0.2743  0.2201  0.2458  0.2801  0.2911  0.2771 ± 0.0031  0.2998  0.2582  0.3540 
wine  0.5912  0.7928  0.4159  0.8190  0.8042  0.3809  0.8961  0.8088 ± 0.0563  0.8992  0.8483  0.8185 
mLibras  0.2422  0.2356  —  0.0899  0.1421  0.2541  0.3320  0.3065 ± 0.0181  0.3703  0.3366  0.3732 
P2PT  —  0.8815  —  —  0.8367  —  0.8917  0.7389 ± 0.0066  0.8815  0.8678  0.8807 
WebKB  —  0.3072  —  —  0.5144  —  0.4645  0.4434 ± 0.0156  0.5670  0.6395  0.5381 
abalone  —  0.0332  —  —  0.0515  —  0.0603  0.0710 ± 0.0011  0.0587  0.0657  0.0996 
Reuters  —  0.0148  —  —  0.4775  —  0.8064  0.4908 ± 0.0616  0.5831  0.7295  0.8338 
Oh  —  —  —  —  —  —  —  0.4280 ± 0.0292  0.4514  0.4840  0.5019 
20NG  —  —  —  —  —  —  —  0.4397 ± 0.0461  0.3628  0.4927  0.6084 
sh  —  —  0.59 [12]  —  0.58 [12]  —  —  0.5898 ± 0.0198  0.6857  0.8050  0.7337 
kddcup99  —  —  —  —  —  —  —  —  —  —  0.7621 
Methods  Average Ranks  Unadjusted p  ${\mathit{p}}_{\mathbf{Homm}}$ 

Control Method: RSVCEO, Average Rank = 2.1667  
CG  8.3333  5.2539 $\times \phantom{\rule{3.33333pt}{0ex}}{10}^{6}$  4.2031 $\times \phantom{\rule{3.33333pt}{0ex}}{10}^{5}$ 
RCG  6.7500  7.1174 $\times \phantom{\rule{3.33333pt}{0ex}}{10}^{4}$  0.0036 
ESVC  8.7083  1.3562 $\times \phantom{\rule{3.33333pt}{0ex}}{10}^{6}$  1.3562 $\times \phantom{\rule{3.33333pt}{0ex}}{10}^{5}$ 
CCL  8.0417  1.4315 $\times \phantom{\rule{3.33333pt}{0ex}}{10}^{5}$  1.0020 $\times \phantom{\rule{3.33333pt}{0ex}}{10}^{4}$ 
FSVC  7.1250  2.5028 $\times \phantom{\rule{3.33333pt}{0ex}}{10}^{4}$  0.0015 
PSVC  8.4583  3.3728 $\times \phantom{\rule{3.33333pt}{0ex}}{10}^{6}$  2.6982 $\times \phantom{\rule{3.33333pt}{0ex}}{10}^{5}$ 
CDCL  4.6250  0.0694  0.2083 
VCC  5.5000  0.0138  0.0553 
FSSVC  2.9167  0.5796  0.5796 
FRSVC  3.3749  0.3722  0.5796 
Dataset  Scaling Factor $\mathit{\gamma}$  

${\mathbf{10}}^{\mathbf{1}}$  ${\mathbf{10}}^{\mathbf{2}}$  ${\mathbf{10}}^{\mathbf{3}}$  ${\mathbf{10}}^{\mathbf{4}}$  ${\mathbf{10}}^{\mathbf{5}}$  
wisconsin  0.8254  0.8519  0.8519  0.8519  0.8519 
glass  0.1817  0.3197  0.3540  0.3540  0.3540 
wine  0.4272  0.8185  0.8185  0.8185  0.8185 
mLibras  0.3281  0.3264  0.3415  0.3415  0.3415 
P2PT  0.7757  0.7938  0.8048  0.8048  0.8048 
WebKB  0.4824  0.5235  0.4428  0.4795  0.4795 
abalone  0.0870  0.0915  0.0917  0.0917  0.0917 
Reuters  0.7860  0.7284  0.7063  0.7062  0.7062 
Oh  0.4851  0.4954  0.4898  0.4898  0.4898 
20NG  0.4752  0.5120  0.5171  0.5169  0.5169 
sh  0.6586  0.6346  0.6346  0.6596  0.6596 
kddcup99  0.6799  0.7629  0.7714  0.7620  0.7620 
Dataset  Step 1  Step 2  Step 3  Step 5  #SVs  #Convex Hulls  

C  S  C  S  C  S  C  S  
wisconsin  1.2524  0.0194  21.6361  43.4891  1.3200  9.4463  0.0200  1.4403  64  30 
glass  0.7072  0.0059  14.4307  22.4403  1.2900  9.4809  0.0164  1.3288  44  31 
wine  1.1567  0.0567  24.8310  40.4476  1.8900  21.4400  0.0244  3.0628  51  43 
mLibras  4.3020  0.0186  195.1682  305.3468  0.8100  67.2533  0.0713  12.9299  62  23 
P2PT  4.6369  0.0805  103.4480  1577.1325  21.1350  356.6295  0.0112  8.0530  661  471 
WebKB  0.7981  0.0057  41.7010  267.3922  6.1050  37.9054  0.0031  2.7794  266  137 
abalone  0.9594  0.0131  49.0764  225.5012  2.6700  17.8104  0.0052  2.9876  189  69 
Reuters  0.5403  0.0148  27.2192  59.0562  0.5250  4.5836  0.0081  1.4473  75  15 
Oh  1.9096  0.0171  93.8732  255.4895  1.1250  18.1687  0.0175  4.8598  113  25 
20NG  0.8993  0.0140  45.5341  77.9957  0.8250  16.2751  0.0152  2.8590  63  25 
sh  1.4285  0.0259  45.9365  166.4430  0.8550  6.2177  0.0103  2.3485  141  22 
kddcup99  0.4751  1.3404  24.8795  54.2872  0.6900  4.7367  0.0068  1.3015  74  17 
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ping, Y.; Hao, B.; Hei, X.; Wu, J.; Wang, B. Maximized PrivacyPreserving Outsourcing on Support Vector Clustering. Electronics 2020, 9, 178. https://doi.org/10.3390/electronics9010178
Ping Y, Hao B, Hei X, Wu J, Wang B. Maximized PrivacyPreserving Outsourcing on Support Vector Clustering. Electronics. 2020; 9(1):178. https://doi.org/10.3390/electronics9010178
Chicago/Turabian StylePing, Yuan, Bin Hao, Xiali Hei, Jie Wu, and Baocang Wang. 2020. "Maximized PrivacyPreserving Outsourcing on Support Vector Clustering" Electronics 9, no. 1: 178. https://doi.org/10.3390/electronics9010178