Next Article in Journal / Special Issue
Intracranial Hemorrhage Segmentation Using A Deep Convolutional Model
Previous Article in Journal
Acknowledgement to Reviewers of Data in 2019
Previous Article in Special Issue
Matrix Metalloproteinases as Markers of Acute Inflammation Process in the Pulmonary Tuberculosis
Open AccessData Descriptor

The Fundamental Clustering and Projection Suite (FCPS): A Dataset Collection to Test the Performance of Clustering and Data Projection Algorithms

by Alfred Ultsch 1 and Jörn Lötsch 2,3,*
1
DataBionics Research Institute, University of Marburg, Hans-Meerwein-Straße, 35032 Marburg, Germany
2
Institute of Clinical Pharmacology, Goethe - University, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
3
Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Project Group Translational Medicine and Pharmacology TMP, Theodor – Stern - Kai 7, 60590 Frankfurt am Main, Germany
*
Author to whom correspondence should be addressed.
Received: 27 December 2019 / Revised: 25 January 2020 / Accepted: 28 January 2020 / Published: 30 January 2020
(This article belongs to the Special Issue Benchmarking Datasets in Bioinformatics)
In the context of data science, data projection and clustering are common procedures. The chosen analysis method is crucial to avoid faulty pattern recognition. It is therefore necessary to know the properties and especially the limitations of projection and clustering algorithms. This report describes a collection of datasets that are grouped together in the Fundamental Clustering and Projection Suite (FCPS). The FCPS contains 10 datasets with the names "Atom", "Chainlink", "EngyTime", "Golfball", "Hepta", "Lsun", "Target", "Tetra", "TwoDiamonds", and "WingNut". Common clustering methods occasionally identified non-existent clusters or assigned data points to the wrong clusters in the FCPS suite. Likewise, common data projection methods could only partially reproduce the data structure correctly on a two-dimensional plane. In conclusion, the FCPS dataset collection addresses general challenges for clustering and projection algorithms such as lack of linear separability, different or small inner class spacing, classes defined by data density rather than data spacing, no cluster structure at all, outliers, or classes that are in contact. This report describes a collection of datasets that are grouped together in the Fundamental Clustering and Projection Suite (FCPS). It is designed to address specific problems of structure discovery in high-dimensional spaces.
Keywords: Dataset: Available as a supplementary file in this submission. link www.mdpi.com/xxx/s1. Dataset: Available as a supplementary file in this submission. link www.mdpi.com/xxx/s1.
MDPI and ACS Style

Ultsch, A.; Lötsch, J. The Fundamental Clustering and Projection Suite (FCPS): A Dataset Collection to Test the Performance of Clustering and Data Projection Algorithms. Data 2020, 5, 13.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop