Next Article in Journal
Photon Detection as a Process of Information Gain
Next Article in Special Issue
Generalized Term Similarity for Feature Selection in Text Classification Using Quadratic Programming
Previous Article in Journal
Quaternion Valued Risk Diversification
Previous Article in Special Issue
Weighted Mean Squared Deviation Feature Screening for Binary Features
Open AccessArticle

CDE++: Learning Categorical Data Embedding by Enhancing Heterogeneous Feature Value Coupling Relationships

College of Computer, National University of Defense Technology, Changsha 410000, China
*
Author to whom correspondence should be addressed.
Entropy 2020, 22(4), 391; https://doi.org/10.3390/e22040391
Received: 4 March 2020 / Revised: 21 March 2020 / Accepted: 27 March 2020 / Published: 29 March 2020
(This article belongs to the Special Issue Information Theoretic Feature Selection Methods for Big Data)
Categorical data are ubiquitous in machine learning tasks, and the representation of categorical data plays an important role in the learning performance. The heterogeneous coupling relationships between features and feature values reflect the characteristics of the real-world categorical data which need to be captured in the representations. The paper proposes an enhanced categorical data embedding method, i.e., CDE++, which captures the heterogeneous feature value coupling relationships into the representations. Based on information theory and the hierarchical couplings defined in our previous work CDE (Categorical Data Embedding by learning hierarchical value coupling), CDE++ adopts mutual information and margin entropy to capture feature couplings and designs a hybrid clustering strategy to capture multiple types of feature value clusters. Moreover, Autoencoder is used to learn non-linear couplings between features and value clusters. The categorical data embeddings generated by CDE++ are low-dimensional numerical vectors which are directly applied to clustering and classification and achieve the best performance comparing with other categorical representation learning methods. Parameter sensitivity and scalability tests are also conducted to demonstrate the superiority of CDE++. View Full-Text
Keywords: categorical data; data embedding; heterogeneous couplings; hybrid clustering strategy; autoencoder; clustering; classification categorical data; data embedding; heterogeneous couplings; hybrid clustering strategy; autoencoder; clustering; classification
Show Figures

Figure 1

MDPI and ACS Style

Dong, B.; Jian, S.; Zuo, K. CDE++: Learning Categorical Data Embedding by Enhancing Heterogeneous Feature Value Coupling Relationships. Entropy 2020, 22, 391.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop