Next Article in Journal
Thermodynamics in Curved Space-Time and Its Application to Holography
Previous Article in Journal
Space-Time Quantum Imaging
Open AccessArticle

Clustering Heterogeneous Data with k-Means by Mutual Information-Based Unsupervised Feature Transformation

Department of Electronic Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
*
Author to whom correspondence should be addressed.
Academic Editor: Raúl Alcaraz Martínez
Entropy 2015, 17(3), 1535-1548; https://doi.org/10.3390/e17031535
Received: 8 December 2014 / Revised: 9 March 2015 / Accepted: 17 March 2015 / Published: 23 March 2015
Traditional centroid-based clustering algorithms for heterogeneous data with numerical and non-numerical features result in different levels of inaccurate clustering. This is because the Hamming distance used for dissimilarity measurement of non-numerical values does not provide optimal distances between different values, and problems arise from attempts to combine the Euclidean distance and Hamming distance. In this study, the mutual information (MI)-based unsupervised feature transformation (UFT), which can transform non-numerical features into numerical features without information loss, was utilized with the conventional k-means algorithm for heterogeneous data clustering. For the original non-numerical features, UFT can provide numerical values which preserve the structure of the original non-numerical features and have the property of continuous values at the same time. Experiments and analysis of real-world datasets showed that, the integrated UFT-k-means clustering algorithm outperformed others for heterogeneous data with both numerical and non-numerical features. View Full-Text
Keywords: feature transformation; k-means; clustering heterogeneous data; numerical features; non-numerical features feature transformation; k-means; clustering heterogeneous data; numerical features; non-numerical features
MDPI and ACS Style

Wei, M.; Chow, T.W.S.; Chan, R.H.M. Clustering Heterogeneous Data with k-Means by Mutual Information-Based Unsupervised Feature Transformation. Entropy 2015, 17, 1535-1548.

Show more citation formats Show less citations formats

Article Access Map by Country/Region

1
Only visits after 24 November 2015 are recorded.
Back to TopTop