Next Article in Journal
Time Reversibility, Correlation Decay and the Steady State Fluctuation Relation for Dissipation
Next Article in Special Issue
Kernel Spectral Clustering for Big Data Networks
Previous Article in Journal
Information Theory for Correlation Analysis and Estimation of Uncertainty Reduction in Maps and Models
Entropy 2013, 15(5), 1486-1502; doi:10.3390/e15051486

Discretization Based on Entropy and Multiple Scanning

1 Department of Electrical Engineering and Computer Science, University of Kansas, 3014 Eaton Hall, Lawrence, KS 66045, USA 2 Department of Expert Systems and Artificial Intelligence, University of Information Technology and Management, Rzeszow 35-225, Poland
Received: 28 February 2013 / Revised: 16 April 2013 / Accepted: 18 April 2013 / Published: 25 April 2013
(This article belongs to the Special Issue Big Data)
Download PDF [221 KB, uploaded 24 February 2015]


In this paper we present entropy driven methodology for discretization. Recently, the original entropy based discretization was enhanced by including two options of selecting the best numerical attribute. In one option, Dominant Attribute, an attribute with the smallest conditional entropy of the concept given the attribute is selected for discretization and then the best cut point is determined. In the second option, Multiple Scanning, all attributes are scanned a number of times, and at the same time the best cut points are selected for all attributes. The results of experiments on 17 benchmark data sets, including large data sets, with 175 attributes or 25,931 cases, are presented. For comparison, the results of experiments on the same data sets using the global versions of well-known discretization methods of Equal Interval Width and Equal Frequency per Interval are also included. The entropy driven technique enhanced both of these methods by converting them into globalized methods. Results of our experiments show that the Multiple Scanning methodology is significantly better than both: Dominant Attribute and the better results of Globalized Equal Interval Width and Equal Frequency per Interval methods (using two-tailed test and 0.01 level of significance).
Keywords: numerical attributes; entropy; discretization; data mining numerical attributes; entropy; discretization; data mining
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Share & Cite This Article

Further Mendeley | CiteULike
Export to BibTeX |
MDPI and ACS Style

Grzymala-Busse, J.W. Discretization Based on Entropy and Multiple Scanning. Entropy 2013, 15, 1486-1502.

View more citation formats

Related Articles

Article Metrics

For more information on the journal, click here


Cited By

[Return to top]
Entropy EISSN 1099-4300 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert