Next Article in Journal
Deep Directional Network for Object Tracking
Previous Article in Journal
Towards the Verbal Decision Analysis Paradigm for Implementable Prioritization of Software Requirements
Article Menu

Export Article

Open AccessArticle
Algorithms 2018, 11(11), 177; https://doi.org/10.3390/a11110177

Understanding and Enhancement of Internal Clustering Validation Indexes for Categorical Data

Donlinks School of Economics and Management, University of Science and Technology Beijing, Beijing 100083, China
*
Author to whom correspondence should be addressed.
Received: 16 September 2018 / Revised: 29 October 2018 / Accepted: 29 October 2018 / Published: 4 November 2018
Full-Text   |   PDF [3601 KB, uploaded 4 November 2018]   |  

Abstract

Clustering is one of the main tasks of machine learning. Internal clustering validation indexes (CVIs) are used to measure the quality of several clustered partitions to determine the local optimal clustering results in an unsupervised manner, and can act as the objective function of clustering algorithms. In this paper, we first studied several well-known internal CVIs for categorical data clustering, and proved the ineffectiveness of evaluating the partitions of different numbers of clusters without any inter-cluster separation measures or assumptions; the accurateness of separation, along with its coordination with the intra-cluster compactness measures, can notably affect performance. Then, aiming to enhance the internal clustering validation measurement, we proposed a new internal CVI—clustering utility based on the averaged information gain of isolating each cluster (CUBAGE)—which measures both the compactness and the separation of the partition. The experimental results supported our findings with regard to the existing internal CVIs, and showed that the proposed CUBAGE outperforms other internal CVIs with or without a pre-known number of clusters. View Full-Text
Keywords: machine learning; clustering; internal clustering validation index; categorical data machine learning; clustering; internal clustering validation index; categorical data
Figures

Graphical abstract

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Supplementary material

SciFeed

Share & Cite This Article

MDPI and ACS Style

Gao, X.; Yang, M. Understanding and Enhancement of Internal Clustering Validation Indexes for Categorical Data. Algorithms 2018, 11, 177.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Algorithms EISSN 1999-4893 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top