Next Article in Journal
Time Reversibility, Correlation Decay and the Steady State Fluctuation Relation for Dissipation
Next Article in Special Issue
Kernel Spectral Clustering for Big Data Networks
Previous Article in Journal
Information Theory for Correlation Analysis and Estimation of Uncertainty Reduction in Maps and Models
Open AccessArticle

Discretization Based on Entropy and Multiple Scanning

Department of Electrical Engineering and Computer Science, University of Kansas, 3014 Eaton Hall, Lawrence, KS 66045, USA
Department of Expert Systems and Artificial Intelligence, University of Information Technology and Management, Rzeszow 35-225, Poland
Entropy 2013, 15(5), 1486-1502;
Received: 28 February 2013 / Revised: 16 April 2013 / Accepted: 18 April 2013 / Published: 25 April 2013
(This article belongs to the Special Issue Big Data)
PDF [221 KB, uploaded 24 February 2015]


In this paper we present entropy driven methodology for discretization. Recently, the original entropy based discretization was enhanced by including two options of selecting the best numerical attribute. In one option, Dominant Attribute, an attribute with the smallest conditional entropy of the concept given the attribute is selected for discretization and then the best cut point is determined. In the second option, Multiple Scanning, all attributes are scanned a number of times, and at the same time the best cut points are selected for all attributes. The results of experiments on 17 benchmark data sets, including large data sets, with 175 attributes or 25,931 cases, are presented. For comparison, the results of experiments on the same data sets using the global versions of well-known discretization methods of Equal Interval Width and Equal Frequency per Interval are also included. The entropy driven technique enhanced both of these methods by converting them into globalized methods. Results of our experiments show that the Multiple Scanning methodology is significantly better than both: Dominant Attribute and the better results of Globalized Equal Interval Width and Equal Frequency per Interval methods (using two-tailed test and 0.01 level of significance). View Full-Text
Keywords: numerical attributes; entropy; discretization; data mining numerical attributes; entropy; discretization; data mining
This is an open access article distributed under the Creative Commons Attribution License (CC BY 3.0).

Share & Cite This Article

MDPI and ACS Style

Grzymala-Busse, J.W. Discretization Based on Entropy and Multiple Scanning. Entropy 2013, 15, 1486-1502.

Show more citation formats Show less citations formats

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Entropy EISSN 1099-4300 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top