# Heterogeneous Distributed Big Data Clustering on Sparse Grids

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Clustering on Sparse Grids

## 3. Estimating Densities on Sparse Grids

#### 3.1. Sparse Grids

#### 3.2. The Sparse Grid Density Estimation

#### 3.3. Streaming Algorithms for the Sparse Grid Density Estimation

Algorithm 1: The streaming algorithm for computing the right-hand side $\mathbf{b}$ |

Algorithm 2: The streaming algorithm for computing the matrix-vector multiplication ${\mathbf{v}}^{\prime}=(B+\lambda I)\mathbf{v}$ |

## 4. Other Steps

#### 4.1. Computing the k-Nearest-Neighbor Graph

Algorithm 3: A variant of the $\mathcal{O}\left({m}^{2}\right)$ k-nearest-neighbor algorithm that uses b bins. |

#### 4.2. Pruning the k-Nearest-Neighbor Graph

Algorithm 4: A streaming algorithm for pruning low-density nodes and edges of the k-nearest-neighbor graph. The density function is evaluated at the location of the nodes and at the midpoints of the edges. |

#### 4.3. Connected Component Detection

## 5. Implementation

#### 5.1. Node-Level Implementation

#### 5.2. Distributed Implementation

## 6. Results

#### 6.1. Hardware Platforms

#### 6.2. Datasets and Experimental Setup

#### 6.3. Node-Level Performance and Performance-Portability

#### 6.4. Clustering Quality and Parameter Tuning

#### 6.5. Distributed Results on Hazel Hen

#### 6.6. Distributed Results on Piz Daint

## 7. Discussion and Future Work

## 8. Materials and Methods

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning, 2nd ed.; Springer Series in Statistics; Springer: New York, NY, USA, 2009. [Google Scholar]
- Kanungo, T.; Mount, D.M.; Netanyahu, N.S.; Piatko, C.D.; Silverman, R.; Wu, A.Y. An Efficient k-Means Clustering Algorithm: Analysis and Implementation. IEEE Trans. Pattern Anal. Mach. Intell.
**2002**, 24, 881–892. [Google Scholar] [CrossRef] - Arthur, D.; Vassilvitskii, S. K-means++: The Advantages of Careful Seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, USA, 7–9 January 2007; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2007; pp. 1027–1035. [Google Scholar]
- Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A Density-based Algorithm for Discovering Clusters a Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, 2–4 August 1996; pp. 226–231. [Google Scholar]
- Song, H.; Lee, J.G. RP-DBSCAN: A Superfast Parallel DBSCAN Algorithm Based on Random Partitioning. In Proceedings of the 2018 International Conference on Management of Data, Houston, TX, USA, 10–15 June 2018; ACM: New York, NY, USA, 2018; pp. 1173–1187. [Google Scholar]
- Gan, J.; Tao, Y. DBSCAN Revisited: Mis-Claim, Un-Fixability, and Approximation. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Australia, 31 May–4 June 2015; ACM: New York, NY, USA, 2015; pp. 519–530. [Google Scholar]
- Hinneburg, A.; Gabriel, H.H. DENCLUE 2.0: Fast Clustering Based on Kernel Density Estimation. In Proceedings of the 7th International Conference on Intelligent Data Analysis, Ljubljana, Slovenia, 6–8 September 2007; Springer: Berlin/Heidelberg, Germany, 2007; pp. 70–80. [Google Scholar]
- von Luxburg, U. A tutorial on spectral clustering. Stat. Comput.
**2007**, 17, 395–416. [Google Scholar] [CrossRef] [Green Version] - Zupan, J.; Novič, M.; Li, X.; Gasteiger, J. Classification of multicomponent analytical data of olive oils using different neural networks. Anal. Chim. Acta
**1994**, 292, 219–234. [Google Scholar] [CrossRef] - Estivill-Castro, V. Why So Many Clustering Algorithms: A Position Paper. SIGKDD Explor. Newsl.
**2002**, 4, 65–75. [Google Scholar] [CrossRef] - Takizawa, H.; Kobayashi, H. Hierarchical parallel processing of large scale data clustering on a PC cluster with GPU co-processing. J. Supercomput.
**2006**, 36, 219–234. [Google Scholar] [CrossRef] - Fang, W.; Lau, K.K.; Lu, M.; Xiao, X.; Lam, C.K.; Yang, P.Y.; He, B.; Luo, Q.; Sander, P.V.; Yang, K. Parallel Data Mining on Graphics Processors; Technical Report HKUST-CS08-07; Hong Kong University of Science and Technology: Hong Kong, China, 2008. [Google Scholar]
- Jian, L.; Wang, C.; Liu, Y.; Liang, S.; Yi, W.; Shi, Y. Parallel data mining techniques on Graphics Processing Unit with Compute Unified Device Architecture (CUDA). J. Supercomput.
**2013**, 64, 942–967. [Google Scholar] [CrossRef] - Bhimani, J.; Leeser, M.; Mi, N. Accelerating K-Means Clustering with Parallel Implementations and GPU Computing. In Proceedings of the 2015 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, USA, 15–17 September 2015; pp. 1–6. [Google Scholar]
- Farivar, R.; Rebolledo, D.; Chan, E.; Campbell, R.H. A Parallel Implementation of K-Means Clustering on GPUs. In Proceedings of the 2008 International Conference on Parallel and Distributed Processing Techniques and Applications, PDPTA 2008, Las Vegas, NV, USA, 14–17 July 2008; pp. 340–345. [Google Scholar]
- Böhm, C.; Noll, R.; Plant, C.; Wackersreuther, B. Density-based Clustering Using Graphics Processors. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, Hong Kong, China, 2–6 November 2009; ACM: New York, NY, USA, 2009; pp. 661–670. [Google Scholar]
- Andrade, G.; Ramos, G.; Madeira, D.; Sachetto, R.; Ferreira, R.; Rocha, L. G-DBSCAN: A GPU Accelerated Algorithm for Density-based Clustering. Procedia Comput. Sci.
**2013**, 18, 369–378. [Google Scholar] [CrossRef] [Green Version] - Bahmani, B.; Moseley, B.; Vattani, A.; Kumar, R.; Vassilvitskii, S. Scalable K-Means++. Proc. VLDB Endow.
**2012**, 5, 622–633. [Google Scholar] [CrossRef] - He, Y.; Tan, H.; Luo, W.; Feng, S.; Fan, J. MR-DBSCAN: A scalable MapReduce-based DBSCAN algorithm for heavily skewed data. Front. Comput. Sci.
**2014**, 8, 83–99. [Google Scholar] [CrossRef] - Bellman, R. Adaptive Control Processes: A Guided Tour; Rand Corporation. Research Studies; Princeton University Press: Princeton, NJ, USA, 1961. [Google Scholar]
- Peherstorfer, B.; Pflüger, D.; Bungartz, H.J. Clustering Based on Density Estimation with Sparse Grids. In KI 2012: Advances in Artificial Intelligence; Lecture Notes in Computer Science; Glimm, B., Krüger, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7526, pp. 131–142. [Google Scholar]
- Pflüger, D. Spatially Adaptive Sparse Grids for High-Dimensional Problems; Verlag Dr.Hut: München, Germany, 2010. [Google Scholar]
- Garcke, J. Maschinelles Lernen Durch Funktionsrekonstruktion Mit Verallgemeinerten Dünnen Gittern. Ph.D. Thesis, Universität Bonn, Institut für Numerische Simulation, Bonn, Germany, 2004. [Google Scholar]
- Heinecke, A.; Pflüger, D. Emerging Architectures Enable to Boost Massively Parallel Data Mining Using Adaptive Sparse Grids. Int. J. Parallel Program.
**2012**, 41, 357–399. [Google Scholar] [CrossRef] - Heinecke, A.; Karlstetter, R.; Pflüger, D.; Bungartz, H.J. Data Mining on Vast Datasets as a Cluster System Benchmark. Concurr. Comput. Pract. Exp.
**2015**, 28, 2145–2165. [Google Scholar] [CrossRef] - Pfander, D.; Heinecke, A.; Pflüger, D. A new Subspace-Based Algorithm for Efficient Spatially Adaptive Sparse Grid Regression, Classification and Multi-evaluation. In Sparse Grids and Applications—Stuttgart 2014; Garcke, J., Pflüger, D., Eds.; Springer: Berlin/Heidelberg, Germany, 2016; pp. 221–246. [Google Scholar]
- Bungartz, H.J.; Griebel, M. Sparse Grids. Acta Numer.
**2004**, 13, 1–123. [Google Scholar] [CrossRef] - Hegland, M.; Hooker, G.; Roberts, S. Finite Element Thin Plate Splines In Density Estimation. ANZIAM J.
**2000**, 42, 712–734. [Google Scholar] [CrossRef] - Fog, A. Instruction Tables; Technical Report; Technical University of Denmark: Lyngby, Denmark, 2018. [Google Scholar]
- Peherstorfer, B.; Pflüger, D.; Bungartz, H.J. Density Estimation with Adaptive Sparse Grids for Large Data Sets. In Proceedings of the 2014 SIAM International Conference on Data Mining, Philadelphia, PA, USA, 24–26 April 2014; pp. 443–451. [Google Scholar]
- Franzelin, F.; Pflüger, D. From Data to Uncertainty: An Efficient Integrated Data-Driven Sparse Grid Approach to Propagate Uncertainty; Springer: Cham, Switzerland, 2016; pp. 29–49. [Google Scholar]
- Datar, M.; Immorlica, N.; Indyk, P.; Mirrokni, V.S. Locality-Sensitive Hashing Scheme Based on P-stable Distributions. In Proceedings of the Twentieth Annual Symposium on Computational Geometry, Brooklyn, NY, USA, 9–11 June 2004; ACM: New York, NY, USA, 2004; pp. 253–262. [Google Scholar]
- SG++: General Sparse Grid Toolbox. Available online: https://github.com/SGpp/SGpp (accessed on 14 January 2019).

**Figure 1.**The application of the sparse grid clustering algorithm to a 2-D dataset with three slightly overlapping clusters. After calculating the sparse grid density estimation and the k-nearest-neighbor graph, the graph is pruned using the density estimation. This splits the graph into three connected components.

**Figure 3.**The nodal and the sparse grid in Figure 3a,b both have discretization level $l=3$ and are equal for $d=1$. Both use hat functions ${\varphi}_{l,i}$ as basis functions, but in a nodal and in a hierarchical formulation. Note that sparse grids employ less grid points compared to full grids of the same level for $\mathrm{d}\ge 2$ (see Figure 2).

**Figure 4.**The effect of the regularization parameter $\lambda $ on a 2-D dataset with four data points. For smaller $\lambda $ values, the function becomes more similar to the initial density guess of Dirac $\delta $ functions. The function becomes smoother for higher values of $\lambda $.

**Figure 5.**The distributed clustering algorithm from the perspective of the manager node. The (inexpensive) assignment of index ranges is not shown.

**Figure 6.**The duration of the node-level experiments with one million data points. Because the 1M-100C dataset requires a larger grid, the density estimation takes up most of the overall runtime.

**Table 1.**The number of floating-point operations for the different OpenCL kernels and the arithmetic intensities (in floating point operations per byte) for a work-group size (ws) of one thread and 128 threads. The peak limit states the achievable fraction of the peak performance given the instruction mix of the computed kernels.

Kernel | FP Ops./Complexity | Arith. Int. (ws = 1) | Arith. Int. (ws = 128) | Peak Lim. (%) |
---|---|---|---|---|

density right-hand side | $N\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}m\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}d\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}6$ | $1.5$ FB^{−1} | $192$ FB^{−1} | 67% |

density matrix-vector | $\text{CG-iter}.\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}{N}^{2}\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}d\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}14$ | $1.2$ FB^{−1} | $149$ FB^{−1} | 64% |

create graph | ${m}^{2}\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}d\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}3$ | $1.0$ FB^{−1} | $129$ FB^{−1} | 83% |

prune graph | $m\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}N\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}(k+1)\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}d\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}6$ | $4.5$ FB^{−1} | $576$ FB^{−1} | 67% |

**Table 2.**The hardware platforms used in the distributed and node-level experiments. We list the frequency type that best matches our observations during the experiments.

Device | Type | Cores/Shaders | Frequency | Peak (SP) | Mem. Bandw. | Machine Balance |
---|---|---|---|---|---|---|

Tesla P100 | GPU | 3584 | $1.3$$\mathrm{G}$$\mathrm{Hz}$ (boost) | $9.5$$\mathrm{T}$$\mathrm{F}$ | 720 GB s^{−1} | $12.9$ FB^{−1} |

FirePro W8100 | GPU | 2560 | $0.8$$\mathrm{G}$$\mathrm{Hz}$ (max) | $4.2$$\mathrm{T}$$\mathrm{F}$ | 320 GB s^{−1} | $13.2$ FB^{−1} |

2xXeon E5-2680v3 | CPU | 24 | $2.5$$\mathrm{G}$$\mathrm{Hz}$ (base) | $1.9$$\mathrm{T}$$\mathrm{F}$ | 137 GB s^{−1} | $14.0$ FB^{−1} |

Name | Size | Clust. | $\mathit{\sigma}$ | Dim. | Dist. | Noise | Type |
---|---|---|---|---|---|---|---|

10M-3C | 10M | 3 | 0.12 | 10 | $3\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}\sigma $ | 0% | distributed |

100M-3C | 100M | 3 | 0.12 | 10 | $3\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}\sigma $ | 0% | distributed |

1M-10C | 1M | 10 | 0.05 | 10 | $7\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}\sigma $ | 2% | node-level |

1M-100C | 1M | 100 | 0.05 | 10 | $7\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}\sigma $ | 2% | node-level |

10M-100C | 10M | 100 | 0.05 | 10 | $7\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}\sigma $ | 2% | node-level |

**Table 4.**The parameters used for configuring the clustering algorithm and the adjusted Rand index (ARI) for the node-level experiments. In the distributed runs, the threshold was specified as a fraction of the maximum surplus of the density function. The node-level runs used an absolute threshold value.

Name | λ | Threshold t | Level | Grid Points | CG $\mathit{\u03f5}$ | k | ARI | Type |
---|---|---|---|---|---|---|---|---|

1M-10C | 1E-5 | 667 | 6 | 76k | 1E-2 | 6 | 1.0 | node-level |

1M-100C | 1E-6 | 556 | 7 | 0.4M | 1E-2 | 6 | 0.85 | node-level |

10M-10C | 1E-5 | 1167 | 7 | 0.4M | 1E-2 | 6 | 1.0 | node-level |

10M-100C | 1E-6 | 1000 | 7 | 0.4M | 1E-2 | 6 | 0.90 | node-level |

10M-3C | 1E-6 | $0.7\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}max\left(\mathit{\alpha}\right)$ | 7 | 0.4M | 1E-3 | 5 | - | distributed |

100M-3C | 1E-6 | $0.7\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}max\left(\mathit{\alpha}\right)$ | 8 | 1.9M | 1E-3 | 5 | - | distributed |

**Table 5.**The node-level performance of the clustering algorithm. All results are for single-precision arithmetic. The performance was measured with the 10M-10C dataset and the parameters listed in Table 4. Note that the achievable peak performance is limited by the instruction mix to values significantly below 100%.

Kernel | Result Type | Tesla P100 | FirePro W8100 | 2xE5-2680v3 |
---|---|---|---|---|

dens. right-hand side | GFLOPS | 4584 | 2271 ($753\text{}\mathrm{M}\mathrm{Hz}$) | 1177 |

limit: 67% peak | peak (of lim.) | 48% (72%) | 59% (88%) | 61% (91%) |

dens. matrix-vector | GFLOPS | 4090 | 1939 ($759\text{}\mathrm{M}\mathrm{Hz}$) | 919 |

limit: 64% peak | peak (of lim.) | 43% (67%) | 50% (78%) | 48% (75%) |

create graph | GFLOPS | 5474 | 1433 ($467\text{}\mathrm{M}\mathrm{Hz}$) | 852 |

limit: 83% peak | peak (of lim.) | 58% (70%) | 60% (72%) | 44% (53%) |

prune graph | GFLOPS | 5360 | 1817 ($822\text{}\mathrm{M}\mathrm{Hz}$) | 1265 |

limit: 67% peak | peak (of lim.) | 56% (84%) | 43% (64%) | 66% (99%) |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Pfander, D.; Daiß, G.; Pflüger, D.
Heterogeneous Distributed Big Data Clustering on Sparse Grids. *Algorithms* **2019**, *12*, 60.
https://doi.org/10.3390/a12030060

**AMA Style**

Pfander D, Daiß G, Pflüger D.
Heterogeneous Distributed Big Data Clustering on Sparse Grids. *Algorithms*. 2019; 12(3):60.
https://doi.org/10.3390/a12030060

**Chicago/Turabian Style**

Pfander, David, Gregor Daiß, and Dirk Pflüger.
2019. "Heterogeneous Distributed Big Data Clustering on Sparse Grids" *Algorithms* 12, no. 3: 60.
https://doi.org/10.3390/a12030060