Algorithms 2018, 11(11), 177; https://doi.org/10.3390/a11110177
Understanding and Enhancement of Internal Clustering Validation Indexes for Categorical Data
Donlinks School of Economics and Management, University of Science and Technology Beijing, Beijing 100083, China
*
Author to whom correspondence should be addressed.
Received: 16 September 2018 / Revised: 29 October 2018 / Accepted: 29 October 2018 / Published: 4 November 2018
Abstract
Clustering is one of the main tasks of machine learning. Internal clustering validation indexes (CVIs) are used to measure the quality of several clustered partitions to determine the local optimal clustering results in an unsupervised manner, and can act as the objective function of clustering algorithms. In this paper, we first studied several well-known internal CVIs for categorical data clustering, and proved the ineffectiveness of evaluating the partitions of different numbers of clusters without any inter-cluster separation measures or assumptions; the accurateness of separation, along with its coordination with the intra-cluster compactness measures, can notably affect performance. Then, aiming to enhance the internal clustering validation measurement, we proposed a new internal CVI—clustering utility based on the averaged information gain of isolating each cluster (CUBAGE)—which measures both the compactness and the separation of the partition. The experimental results supported our findings with regard to the existing internal CVIs, and showed that the proposed CUBAGE outperforms other internal CVIs with or without a pre-known number of clusters. View Full-Text
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Share & Cite This Article
MDPI and ACS Style
Gao, X.; Yang, M. Understanding and Enhancement of Internal Clustering Validation Indexes for Categorical Data. Algorithms 2018, 11, 177.
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.
Related Articles
Article Metrics
Comments
[Return to top]
Algorithms
EISSN 1999-4893
Published by MDPI AG, Basel, Switzerland
RSS
E-Mail Table of Contents Alert