# Refined Mode-Clustering via the Gradient of Slope

^{*}

## Abstract

**:**

## 1. Introduction

- We propose a new clustering method by the slope function that has an additional attribute label of each cluster (Section 3).
- We propose new two-sample tests using the clustering result (Section 4).
- We introduce a visualization method using the detected clusters (Algorithm 3).
- We derive both statistical and computational guarantees of the proposed method (Section 7).

## 2. Review of Mode-Clustering

## 3. Clustering via the Gradient of Slope

#### 3.1. Refining the Clusters by the Gradient of Slope

**Lemma**

**1.**

#### 3.2. Type of Clusters

#### 3.3. Estimators

Algorithm 1: Slope minimization via gradient descent. |

1. Input: ${\widehat{p}}_{n}\left(x\right)$ and a point x. 2. Initialize ${x}_{0}=x$ and iterate the following equation until convergence: ($\gamma $ is a step size that could be set to a constant)
$${x}_{t}={x}_{t-1}-\gamma \xb7{\nabla}^{2}{\widehat{p}}_{n}\left({x}_{t-1}\right)\nabla {\widehat{p}}_{n}\left({x}_{t-1}\right).$$
3. Output: ${x}_{\infty}$. |

## 4. Enhancements in Two-Sample Tests

Algorithm 2: Local two-sample test. |

1. Combine two samples (${G}_{1}$ and ${G}_{2}$) into one, called ${G}_{\mathsf{all}}$ and compute ${r}_{0}=\frac{N}{N+M}$ from Equation (6). 2. Construct a kernel density estimator using ${G}_{\mathsf{all}}$ and its slope function and apply Algorithm 1 to form clusters based on the convergent point. 3. Assign an attribute to each cluster according to Equation (2). 4. Let robust clusters and boundary clusters be ${D}_{1},{D}_{2},\dots ,{D}_{J}$, where ${D}_{j}\subset {G}_{\mathsf{all}}$ for each j. 5. For each cluster ${D}_{j}$, compute ${r}_{j}$ from Equation (7) and construct Z statistic:
$${Z}_{j}=\frac{{r}_{j}-{r}_{0}}{\sqrt{{r}_{0}(1-{r}_{0})/{n}_{j}}}.$$
Find the corresponding p-value ${p}_{j}$. 6. Reject ${H}_{0}$ if ${p}_{j}<\alpha /J$ for some j under the significance level $\alpha $. |

#### 4.1. An Approximation Method

## 5. Simulations

#### 5.1. Clustering

**Two-Gaussian mixture.**We sample $n=400$ points from a mixture of two-dimensional normals $N({\mathit{\mu}}_{1},\mathbf{\Sigma})$ and $N({\mathit{\mu}}_{2},\mathbf{\Sigma})$ with equal proportions under the following three scenarios:

- Spherical: ${\mathit{\mu}}_{1}$ =
**0**, ${\mathit{\mu}}_{2}=3{e}_{1}+3{e}_{2}$, and $\Sigma ={I}_{2}$. - Elliptical: ${\mathit{\mu}}_{1}$ =
**0**, ${\mathit{\mu}}_{2}=3{e}_{1}+3{e}_{2}$, and $\Sigma =\mathrm{diag}(1,3)$. (Note that these clusters are elongated in noise directions.) - Outliers: Same construction as Spherical, but with 60 random points (noise) from a uniform distribution over $(-5,8)\times (-5,8)$. By design, the outliers differ in such a way that they can only add a little ambiguity.

**Four-Gaussian mixture.**To show how boundary clusters can serve as bridges among robust clusters, we consider a four-Gaussian mixture. We sample $n=800$ from a mixture of four two-dimensional normals $N(0,0.1{I}_{2})$, $N(0.5{e}_{1},0.1{I}_{2})$, $N(0.5{e}_{2},0.1{I}_{2})$ and $N(0.5{e}_{1}+0.5{e}_{2},0.1{I}_{2})$ with equal proportion. Then we apply our method and display the result in Figure 3. Each colored region is the basin of attraction of a local minimum of $s\left(x\right)$. The red ‘+’s are the corresponding local minima to each of the basin of attraction. Clearly, we see how robust clusters are connected by the boundary clusters so the additional attributes provide useful information on the connectivity among density modes.

**Comparison.**To better illustrate the strength of our proposed method, we generate an unbalanced four-Gaussian mixture. We sample $n=2400$ from a mixture of four two-dimensional normals $N(0,0.5{I}_{2})$, $N(2{e}_{1},0.5{I}_{2})$, $N(5{e}_{2},0.5{I}_{2})$ and $N(2{e}_{1}+5{e}_{2},0.5{I}_{2})$ with proportion $\frac{5}{12},\frac{5}{12},\frac{1}{12},\frac{1}{12}$, respectively. Then we apply our method and compare it with the density-based spatial clustering of applications with noise (DBSCAN) [28] in Figure 4. DBSCAN is a classical non-parametric, density-based clustering method that estimates the density around each data point by counting the number of points in a certain neighborhood and applies a threshold minPts to identify core, border and noise points. DBSCAN requires two parameters: the minimum number of nearby points required to form a core point (minPts) and the radius of a neighborhood with respect to a certain point (eps). Two points are connected if they are within the distance of eps. Clusters are the connected components of connected core points. Border points are points connected to a core point, but which do not have enough neighbors to be a core point. Here, we investigate the feasibility of using border points to detect the connectivity of clusters. These two parameters, minPts and eps, are very hard to choose. In the top two rows of Figure 4, we set minPts equal to 5 and 10 and change the value of eps to see if we can find the connectivity of core points using border points (gray points). Our results show that it is not possible to use border points to find the connectivity of the top two clusters and the bottom two clusters at the same time. When we are able to detect the connectivity of bottom two clusters (panel (f)), we are not able to find the top two clusters. On the other hand, when we can find the connectivity of the top two clusters (panel (c,h)), the bottom two clusters have already merged into a single cluster. The limitation of DBSCAN is that it is based on the density level set, so when the structures involve different density values, DBSCAN will not be applicable. In contrast, our method only requires one parameter, bandwidth, and it has good performance in this case. From Figure 4i–l, our method detects four robust clusters and their boundaries correctly. In addition, this result also shows that our method is robust to the bandwidth selection.

#### 5.2. Two-Sample Test

## 6. Real Data Application

#### 6.1. Applications to Astronomy

#### 6.2. Application to GvHD Data

Algorithm 3: Visualization based on slope function. |

1–4. The same steps as Algorithm 2. 5. Let robust clusters be $\{{R}_{1},{R}_{2},\dots ,{R}_{{J}_{1}}\}$ and boundary clusters be $\{{B}_{1},{B}_{2},\dots ,{B}_{{J}_{2}}\}$. 6. For each pair of ${R}_{{j}_{1}}$ and ${B}_{{j}_{2}}$, compute their Hausdorff distance (minimal distance of all pairs):
$${\mathrm{edge}}_{{\mathrm{j}}_{1},{\mathrm{j}}_{2}}=\mathrm{Haus}\left({\mathrm{R}}_{{\mathrm{j}}_{1}},{\mathrm{B}}_{{\mathrm{j}}_{2}}\right).$$
7. Apply multidimensional scaling to local minima corresponding to robust and boundary clusters. Let their 2 dimensional representation point be ${s}_{1}^{*},\cdots {s}_{{J}_{1}+{J}_{2}}^{*}$. 8. For each cluster ${D}_{j}$ in $\{{R}_{1},{R}_{2},\dots ,{R}_{{J}_{1}},{B}_{1},{B}_{2},\dots ,{B}_{{J}_{2}}\}$, plot a pie chart centered at corresponding ${s}_{j}^{*}$ with radius proportional to $\sqrt{|{D}_{j}|}$. The pie chart contains two groups, each with ratio $\left(\frac{|{D}_{j}\cap {G}_{1}|}{|{D}_{j}|},\frac{|{D}_{j}\cap {G}_{2}|}{|{D}_{j}|}\right)$. 9. Label the robust clusters and boundary clusters, and add an edge between a pair of robust cluster ${R}_{{j}_{1}}$ and boundary cluster ${B}_{{j}_{2}}$ if ${\mathrm{edge}}_{{\mathrm{j}}_{1},{\mathrm{j}}_{2}}\le $ $4\times \sqrt{{h}^{2}\times d}$, where d is the number of dimensions. |

## 7. Theory

**Assumptions.**

- (P)
- The density function $p\left(x\right)$ is four-times bounded and continuously differentiable.
- (L)
- $s\left(x\right)$ is a Morse function.
- (K)
- The kernel K is four-times bounded and continuously differentiable. Moreover, the collection of kernel functions and their partial derivatives up to the third order satisfy the VC-type conditions in Giné and Guillou [36]. See Appendix A for more details.

#### 7.1. Estimation Consistency

**Theorem**

**1**

- (A1)
- There exists ${\eta}_{1}>0$ such that for any point x with $\parallel \nabla s\left(x\right)\parallel \le {\eta}_{1}$ and $0>-{\lambda}_{0}^{\prime}/2\ge {\lambda}_{(s,d)}\left(x\right)$, we have ${min}_{m\in \mathcal{S}}\parallel m-x\parallel \le \frac{{\lambda}_{0}^{\prime}}{2d{c}_{1}}$, where $0<{\lambda}_{0}^{\prime}\le \left|{\lambda}_{(s,l)}\left(m\right)\right|$ for $l=1,2,\dots ,d$ and $m\in \mathcal{S}$.

- $\left|\mathcal{S}\right|=|\widehat{\mathcal{S}}|$, and
- for every point $m\in \mathcal{S}$, there exists a unique element $\widehat{m}\in \widehat{\mathcal{S}}$ such that$$\parallel \widehat{m}-m\parallel =O\left({h}^{2}\right)+{O}_{P}\left(\sqrt{\frac{1}{n{h}^{d+4}}}\right)$$

**Corollary**

**1.**

**Theorem**

**2**

**Remark.**It is possible to obtain the clustering consistency in the sense that the clustering based on s and ${\widehat{s}}_{n}$ are asymptotically the same [41]. In [41], the authors placed conditions on the density function and showed that the mode-clustering of $\widehat{p}$ leads to a consistent partition of the data compared to the mode-clustering of p. If we generalize their conditions to the slope s, we will obtain a similar clustering consistency result.

#### 7.2. Algorithmic Consistency

- (A2)
- There are positive numbers ${R}_{0},{\eta}_{1},{\lambda}_{0}>0$ such that for all $x\in B(m,{R}_{0})$, where $m\in \mathcal{S},$ and $B(m,{R}_{0})$ is a ball with center m and radius ${R}_{0}$, all eigenvalues of Hessian matrix ${\nabla}^{2}s\left(x\right)$ are above ${\lambda}_{0}$ and $\parallel \nabla s\left(x\right)\parallel \le {\eta}_{1}$.

**Theorem**

**3**

## 8. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Appendix A. Proofs

**Proof of Lemma**

**1**:

**Proof of Theorem**

**1**:

**Theorem**

**A1**

**Theorem**

**A2**

**Proof of Theorem**

**2**:

**Lemma**

**A1.**

- Property 1: When a function $f\left(x\right)$ has an L-Lipschitz continuous gradient, then$$f\left(x\right)-f\left(y\right)\le \langle x-y,\nabla f\left(y\right)\rangle +\frac{L}{2}{\parallel x-y\parallel}^{2}\phantom{\rule{1.em}{0ex}}\phantom{\rule{4.pt}{0ex}}\mathrm{for}\phantom{\rule{4.pt}{0ex}}\mathrm{every}\phantom{\rule{4.pt}{0ex}}x,\phantom{\rule{4.pt}{0ex}}y\phantom{\rule{4.pt}{0ex}}\in {\mathbb{R}}^{n}.$$In addition, constant L is greater than or equal to the maximum eigenvalue of Hessian matrix of $f\left(x\right)$.
- Property 2:Let ${f}^{*}=f\left({x}^{*}\right)={min}_{x}f\left(x\right)$, where ${x}^{*}$ is the true minimum of the function $f\left(x\right)$. The function $f\left(x\right)$ is called ${C}_{m}$ strongly convex if and only if there exists a constant ${C}_{m}>0$ such that the $f\left(x\right)-\frac{{C}_{m}}{2}{\parallel x\parallel}^{2}$ is a convex function. In addition, for each step t, we have:$${f}^{*}-f\left({x}_{t}\right)\ge {({x}^{*}-{x}_{t})}^{T}\nabla f\left({x}_{t}\right)+\frac{{C}_{m}}{2}{\parallel {x}^{*}-{x}_{t}\parallel}^{2},$$which implies$${\left({x}_{t}-{x}^{*}\right)}^{T}\nabla f\left({x}_{t}\right)\ge f\left({x}_{t}\right)-{f}^{*}+\frac{{C}_{m}}{2}{\parallel {x}^{*}-{x}_{t}\parallel}^{2}.$$
- Property 3: Let ${f}^{*}=f\left({x}^{*}\right)=0$, where ${x}^{*}$ is the true minimum of the function $f\left(x\right)$. Assume function $f\left(x\right)$ has an L-Lipschitz continuous gradient. Then, we have:$$f\left(x\right)\ge \frac{1}{2L}{\parallel \nabla f\left(x\right)\parallel}^{2}+{f}^{*}.$$
- Property 4: By the settings in Property 2 and Property 3, we have:$${\parallel \nabla f\left(x\right)\parallel}^{2}\ge {C}_{m}^{2}{\parallel x-{x}^{*}\parallel}^{2}\ge \frac{2(f\left(x\right)-{f}^{*}){C}_{m}^{2}}{L}\ge 2f\left(x\right){C}_{m}^{2}/L.$$

**Proof of Lemma**

**A1**:

**Proof of Theorem**

**3**:

## References

- Li, J.; Ray, S.; Lindsay, B.G. A Nonparametric Statistical Approach to Clustering via Mode Identification. J. Mach. Learn. Res.
**2007**, 8, 1687–1723. [Google Scholar] - Chacón, J.E. Clusters and water flows: A novel approach to modal clustering through Morse theory. arXiv
**2012**, arXiv:1212.1384. [Google Scholar] - Arias-Castro, E.; Mason, D.; Pelletier, B. On the Estimation of the Gradient Lines of a Density and the Consistency of the Mean-Shift Algorithm. J. Mach. Learn. Res.
**2016**, 17, 1–28. [Google Scholar] - Chen, Y.C.; Genovese, C.R.; Wasserman, L. A comprehensive approach to mode-clustering. Electron. J. Stat.
**2016**, 10, 210–241. [Google Scholar] [CrossRef] - Fukunaga, K.; Hostetler, L. The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans. Inf. Theory
**1975**, 21, 32–40. [Google Scholar] [CrossRef] [Green Version] - Cheng, Y. Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell.
**1995**, 17, 790–799. [Google Scholar] [CrossRef] [Green Version] - Carreira-Perpiñán, M.Á. A review of mean-shift algorithms for clustering. arXiv
**2015**, arXiv:1503.00687. [Google Scholar] - Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer Series in Statistics; Springer New York Inc.: New York, NY, USA, 2001. [Google Scholar]
- Hennig, C.; Meila, M.; Murtagh, F.; Rocci, R. Handbook of Cluster Analysis; CRC Press: Boca Raton, FL, USA, 2015. [Google Scholar]
- York, D.G.; Adelman, J., Jr.; Anderson, J.J.E.; Bahcall, N.A.; Yasuda, N. The Sloan Digital Sky Survey: Technical Summary. Astron. J.
**2000**, 120, 1579–1587. [Google Scholar] [CrossRef] - Comaniciu, D.; Meer, P. Mean shift analysis and applications. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Corfu, Greece, 20–27 September 1999; Volume 2, pp. 1197–1203. [Google Scholar]
- Chacón, J.E.; Duong, T. Data-driven density derivative estimation, with applications to nonparametric clustering and bump hunting. Electron. J. Stat.
**2013**, 7, 499–532. [Google Scholar] [CrossRef] - Chacón, J.E. A population background for nonparametric density-based clustering. Stat. Sci.
**2015**, 30, 518–532. [Google Scholar] [CrossRef] [Green Version] - Chen, Y.C. A tutorial on kernel density estimation and recent advances. Biostat. Epidemiol.
**2017**, 1, 161–187. [Google Scholar] [CrossRef] - Scrucca, L. Identifying Connected Components in Gaussian Finite Mixture Models for Clustering. Comput. Stat. Data Anal.
**2016**, 93, 5–17. [Google Scholar] [CrossRef] - Bonis, T.; Oudot, S. A fuzzy clustering algorithm for the mode-seeking framework. Pattern Recognit. Lett.
**2018**, 102, 37–43. [Google Scholar] [CrossRef] [Green Version] - Jiang, H.; Kpotufe, S. Modal-set estimation with an application to clustering. In Artificial Intelligence and Statistics; PMLR: Fort Lauderdale, FL, USA, 2017; pp. 1197–1206. [Google Scholar]
- Menardi, G. A Review on Modal Clustering. Int. Stat. Rev.
**2015**, 84. [Google Scholar] [CrossRef] - Morse, M. Relations Between the Critical Points of a Real Function of n Independent Variables. Trans. Am. Math. Soc.
**1925**, 27, 345–396. [Google Scholar] - Milnor, J.; Spivak, M.; Wells, R. Morse Theory. (AM-51); Annals of Mathematics Studies, Princeton University Press: Princeton, NJ, USA, 1963; Volume 51. [Google Scholar]
- Banyaga, A.; Hurtubise, D. Lectures on Morse Homology; Texts in the Mathematical Sciences; Springer: Amsterdam, The Netherlands, 2013. [Google Scholar]
- Matsumoto, Y. An introduction to Morse Theory; American Mathematical Society: Providence, RI, USA, 2002. [Google Scholar]
- Wasserman, L. All of Nonparametric Statistics (Springer Texts in Statistics); Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Chacón, E.J.; Duong, T.; Wand, P.M. Asymptotics for general multivariate kernel density derivative estimators. Stat. Sin.
**2011**, 21, 807. [Google Scholar] [CrossRef] [Green Version] - Scott, D. Multivariate Density Estimation: Theory, Practice, and Visualization; Wiley Series in Probability and Statistics; Wiley: Hoboken, NJ, USA, 2015. [Google Scholar]
- Breiman, L.; Meisel, W.; Purcell, E. Variable Kernel Estimates of Multivariate Densities. Technometrics
**1977**, 19, 135–144. [Google Scholar] [CrossRef] - Silverman, B.W. Density Estimation for Statistics and Data Analysis; Chapman and Hall: London, UK, 1986. [Google Scholar]
- Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, 2–4 August 1996; pp. 226–231. [Google Scholar]
- Székely, G.J.; Rizzo, M.L. Testing for equal distributions in high dimensions. InterStat
**2004**, 5, 1249–1272. [Google Scholar] - Gretton, A.; Borgwardt, K.M.; Rasch, M.J.; Schölkopf, B.; Smola, A. A Kernel Two-sample Test. J. Mach. Learn. Res.
**2012**, 13, 723–773. [Google Scholar] - Massey, F.J. The Kolmogorov-Smirnov Test for Goodness of Fit. J. Am. Stat. Assoc.
**1951**, 46, 68–78. [Google Scholar] [CrossRef] - Bond, J.R.; Kofman, L.; Pogosyan, D. How filaments of galaxies are woven into the cosmic web. Nature
**1996**, 380, 603. [Google Scholar] [CrossRef] - Koester, B.; McKay, T.; Annis, J.; Wechsler, R.H.; Evrard, A.; Bleem, L.; York, D. A MaxBCG catalog of 13,823 galaxy clusters from the sloan digital sky survey. Astrophys. J.
**2007**, 660, 239–255. [Google Scholar] [CrossRef] - Koester, B.P.; McKay, T.A.; Annis, J.; Wechsler, R.H.; Evrard, A.E.; Rozo, E.; Bleem, L.; Sheldon, E.S.; Johnston, D. MaxBCG: A Red-Sequence Galaxy Cluster Finder. Astrophys. J.
**2007**, 660, 221–238. [Google Scholar] [CrossRef] - Brinkman, R.R.; Gasparetto, M.; Lee, S.J.J.; Ribickas, A.J.; Perkins, J.; Janssen, W.; Smiley, R.; Smith, C. High-Content Flow Cytometry and Temporal Data Analysis for Defining a Cellular Signature of Graft-Versus-Host Disease. Biol. Blood Marrow Transplant.
**2007**, 13, 691–700. [Google Scholar] [CrossRef] [Green Version] - Giné, E.; Guillou, A. Rates of strong uniform consistency for multivariate kernel density estimators. In Annales de l’Institut Henri Poincare (B) Probability and Statistics; Elsevier: Amsterdam, The Netherlands, 2002; Volume 38, pp. 907–921. [Google Scholar]
- Genovese, C.R.; Perone-Pacifico, M.; Verdinelli, I.; Wasserman, L. The geometry of nonparametric filament estimation. J. Am. Stat. Assoc.
**2012**, 107, 788–799. [Google Scholar] [CrossRef] [Green Version] - Genovese, C.R.; Perone-Pacifico, M.; Verdinelli, I.; Wasserman, L. Nonparametric ridge estimation. Ann. Stat.
**2014**, 42, 1511–1545. [Google Scholar] [CrossRef] - Vieu, P. A note on density mode estimation. Stat. Probab. Lett.
**1996**, 26, 297–307. [Google Scholar] [CrossRef] - Chazal, F.; Fasy, B.; Lecci, F.; Michel, B.; Rinaldo, A.; Rinaldo, A.; Wasserman, L. Robust topological inference: Distance to a measure and kernel distance. J. Mach. Learn. Res.
**2017**, 18, 5845–5884. [Google Scholar] - Chen, Y.C.; Genovese, C.R.; Wasserman, L. Statistical inference using the Morse-Smale complex. Electron. J. Stat.
**2017**, 11, 1390–1433. [Google Scholar] [CrossRef] - Nesterov, Y. Introductory Lectures on Convex Optimization: A Basic Course, 1st ed.; Springer Publishing Company: New York, NY, USA, 2014. [Google Scholar]
- Ruder, S. An overview of gradient descent optimization algorithms. arXiv
**2016**, arXiv:1609.04747. [Google Scholar]

**Figure 1.**Using two clustering methods to learn the cosmic webs.

**Left**: the raw galaxy data from the Sloan Digital Sky Survey.

**Middle**: the clustering result using the conventional mode/mean-shift clustering. This conventional mode-clustering method fails to detect the connectivity among clusters.

**Right**: the clustering result based on our method, where the color indicates different types of clusters.

**Figure 2.**Simulations with different data settings. Picture (

**a**,

**d**), picture (

**b**,

**e**), and picture (

**c**,

**f**) display, respectively, the three different simulation scenarios: Spherical, Elliptical, and Outliers. In picture (

**a**–

**c**), each colored region is the basin of attraction of a local minimum of $s\left(x\right)$, while the grey regions are the regions that belong to outlier clusters. Picture (

**d**–

**f**) provides an example of clustering of data points. Points that labeled purple, green, and orange are assigned to robust, boundary, and outlier clusters, respectively.

**Figure 3.**Example of the basins of attraction of a Gaussian mixture. Four groups of data are separated into three types of clusters. We partition the space into 10 parts. ‘R’ represents the region of the robust cluster, ‘B’ represents the region of the boundary cluster, and ‘O’ represents the region of the outlier cluster.

**Figure 4.**Picture (

**a**–

**f**) displays the simulations using DBSCAN with different parameters settings, where minPts represents the the minimum number of points required to form a dense region and eps represents the radius of a neighborhood with respect to certain point. Picture (

**i**–

**l**) displays the simulations using our proposed method with different bandwidth, where h represents the bandwidth selected according to Equation (8). In Picture (

**a**–

**h**), each colored region is the cluster detected by DBSCAN, while the gray and black points are points that are border points and outliers, respectively. In Picture (

**i**–

**l**), points that are labeled blue, orange, and green are assigned to robust, boundary, and outlier clusters, respectively.

**Figure 5.**Power analysis of the proposed method. We compare the power of our two-sample test with three other approaches: the energy test, the kernel test, the KS test with only the first variable, and the KS test with only the second variable. In the

**left**panel, we vary the variance of the second Gaussian. In the

**right**panel, we fix the two distributions and increase the sample size. In both cases, our method has a higher power than the other three naive approaches.

**Figure 6.**We show that the gradient flow method is better in detecting the ‘Cosmic Web’ [32] in our universe. For comparison, we perform the k-means clustering method with 20 centers and traditional mode-clustering to show that our proposed method is better to detect the ‘Cosmic Web’ in our universe. The blue “×”s are the points from image analysis. The results do not structurally correlate with the locations of blue “×”s.

**Figure 7.**Visualization of GvHD dataset. We apply Algorithm 3 for visulization. Blue lines represent the connections among clusters. Each pie chart describes the total amount of corresponding clusters that is divided between the positive group and the control group.

**Table 1.**Summary of estimated proportion in each group. Note that “Proportion” in the table is referred to as the proportion of the positive group.

Cluster | Proportion | 5% CI | 95% CI | Z Score | Cluster Type |
---|---|---|---|---|---|

1 | 0.910 | 0.900 | 0.920 | 46.980 | Robust Cluster |

2 | 0.010 | 0.010 | 0.020 | −69.620 | Robust Cluster |

3 | 0.680 | 0.650 | 0.720 | 5.550 | Robust Cluster |

4 | 0.370 | 0.350 | 0.390 | −17.570 | Boundary Cluster |

5 | 0.800 | 0.770 | 0.830 | 11.470 | Boundary Cluster |

6 | 0.410 | 0.380 | 0.440 | −9.920 | Boundary Cluster |

7 | 0.920 | 0.900 | 0.940 | 19.170 | Robust Cluster |

8 | 0.420 | 0.370 | 0.470 | −5.930 | Boundary Cluster |

Overall | 0.570 | ||||

Proportion |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Zhang, K.; Chen, Y.-C.
Refined Mode-Clustering via the Gradient of Slope. *Stats* **2021**, *4*, 486-508.
https://doi.org/10.3390/stats4020030

**AMA Style**

Zhang K, Chen Y-C.
Refined Mode-Clustering via the Gradient of Slope. *Stats*. 2021; 4(2):486-508.
https://doi.org/10.3390/stats4020030

**Chicago/Turabian Style**

Zhang, Kunhui, and Yen-Chi Chen.
2021. "Refined Mode-Clustering via the Gradient of Slope" *Stats* 4, no. 2: 486-508.
https://doi.org/10.3390/stats4020030