Next Article in Journal
Research on Degeneration Model of Neural Network for Deep Groove Ball Bearing Based on Feature Fusion
Previous Article in Journal
Vertex Cover Reconfiguration and Beyond
Previous Article in Special Issue
Comparison of Internal Clustering Validation Indices for Prototype-Based Clustering
Article Menu

Export Article

Open AccessArticle
Algorithms 2018, 11(2), 19; doi:10.3390/a11020019

Common Nearest Neighbor Clustering—A Benchmark

Department of Biology, Chemistry, Pharmacy, Freie Universität Berlin, Takustraße 3, D-14195 Berlin, Germany
*
Author to whom correspondence should be addressed.
Received: 28 July 2017 / Revised: 8 September 2017 / Accepted: 25 January 2018 / Published: 9 February 2018
(This article belongs to the Special Issue Clustering Algorithms 2017)
View Full-Text   |   Download PDF [5410 KB, uploaded 9 February 2018]   |  

Abstract

Cluster analyses are often conducted with the goal to characterize an underlying probability density, for which the data-point density serves as an estimate for this probability density. We here test and benchmark the common nearest neighbor (CNN) cluster algorithm. This algorithm assigns a spherical neighborhood R to each data point and estimates the data-point density between two data points as the number of data points N in the overlapping region of their neighborhoods (step 1). The main principle in the CNN cluster algorithm is cluster growing. This grows the clusters by sequentially adding data points and thereby effectively positions the border of the clusters along an iso-surface of the underlying probability density. This yields a strict partitioning with outliers, for which the cluster represents peaks in the underlying probability density—termed core sets (step 2). The removal of the outliers on the basis of a threshold criterion is optional (step 3). The benchmark datasets address a series of typical challenges, including datasets with a very high dimensional state space and datasets in which the cluster centroids are aligned along an underlying structure (Birch sets). The performance of the CNN algorithm is evaluated with respect to these challenges. The results indicate that the CNN cluster algorithm can be useful in a wide range of settings. Cluster algorithms are particularly important for the analysis of molecular dynamics (MD) simulations. We demonstrate how the CNN cluster results can be used as a discretization of the molecular state space for the construction of a core-set model of the MD improving the accuracy compared to conventional full-partitioning models. The software for the CNN clustering is available on GitHub. View Full-Text
Keywords: density-based clustering; molecular dynamics simulations; Markov state models; core sets; milestoning density-based clustering; molecular dynamics simulations; Markov state models; core sets; milestoning
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Lemke, O.; Keller, B.G. Common Nearest Neighbor Clustering—A Benchmark. Algorithms 2018, 11, 19.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Algorithms EISSN 1999-4893 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top