Next Article in Journal
Thermodynamics in Curved Space-Time and Its Application to Holography
Previous Article in Journal
Space-Time Quantum Imaging
Article Menu

Export Article

Open AccessArticle
Entropy 2015, 17(3), 1535-1548; doi:10.3390/e17031535

Clustering Heterogeneous Data with k-Means by Mutual Information-Based Unsupervised Feature Transformation

Department of Electronic Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
*
Author to whom correspondence should be addressed.
Academic Editor: Raúl Alcaraz Martínez
Received: 8 December 2014 / Revised: 9 March 2015 / Accepted: 17 March 2015 / Published: 23 March 2015
View Full-Text   |   Download PDF [832 KB, uploaded 23 March 2015]   |  

Abstract

Traditional centroid-based clustering algorithms for heterogeneous data with numerical and non-numerical features result in different levels of inaccurate clustering. This is because the Hamming distance used for dissimilarity measurement of non-numerical values does not provide optimal distances between different values, and problems arise from attempts to combine the Euclidean distance and Hamming distance. In this study, the mutual information (MI)-based unsupervised feature transformation (UFT), which can transform non-numerical features into numerical features without information loss, was utilized with the conventional k-means algorithm for heterogeneous data clustering. For the original non-numerical features, UFT can provide numerical values which preserve the structure of the original non-numerical features and have the property of continuous values at the same time. Experiments and analysis of real-world datasets showed that, the integrated UFT-k-means clustering algorithm outperformed others for heterogeneous data with both numerical and non-numerical features. View Full-Text
Keywords: feature transformation; k-means; clustering heterogeneous data; numerical features; non-numerical features feature transformation; k-means; clustering heterogeneous data; numerical features; non-numerical features
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Wei, M.; Chow, T.W.S.; Chan, R.H.M. Clustering Heterogeneous Data with k-Means by Mutual Information-Based Unsupervised Feature Transformation. Entropy 2015, 17, 1535-1548.

Show more citation formats Show less citations formats

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Entropy EISSN 1099-4300 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top