Entropy 2013, 15(5), 1486-1502; doi:10.3390/e15051486
Article

Discretization Based on Entropy and Multiple Scanning

Received: 28 February 2013; in revised form: 16 April 2013 / Accepted: 18 April 2013 / Published: 25 April 2013
(This article belongs to the Special Issue Big Data)
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract: In this paper we present entropy driven methodology for discretization. Recently, the original entropy based discretization was enhanced by including two options of selecting the best numerical attribute. In one option, Dominant Attribute, an attribute with the smallest conditional entropy of the concept given the attribute is selected for discretization and then the best cut point is determined. In the second option, Multiple Scanning, all attributes are scanned a number of times, and at the same time the best cut points are selected for all attributes. The results of experiments on 17 benchmark data sets, including large data sets, with 175 attributes or 25,931 cases, are presented. For comparison, the results of experiments on the same data sets using the global versions of well-known discretization methods of Equal Interval Width and Equal Frequency per Interval are also included. The entropy driven technique enhanced both of these methods by converting them into globalized methods. Results of our experiments show that the Multiple Scanning methodology is significantly better than both: Dominant Attribute and the better results of Globalized Equal Interval Width and Equal Frequency per Interval methods (using two-tailed test and 0.01 level of significance).
Keywords: numerical attributes; entropy; discretization; data mining
PDF Full-text Download PDF Full-Text [221 KB, uploaded 25 April 2013 10:49 CEST]

Export to BibTeX |
EndNote


MDPI and ACS Style

Grzymala-Busse, J.W. Discretization Based on Entropy and Multiple Scanning. Entropy 2013, 15, 1486-1502.

AMA Style

Grzymala-Busse JW. Discretization Based on Entropy and Multiple Scanning. Entropy. 2013; 15(5):1486-1502.

Chicago/Turabian Style

Grzymala-Busse, Jerzy W. 2013. "Discretization Based on Entropy and Multiple Scanning." Entropy 15, no. 5: 1486-1502.

Entropy EISSN 1099-4300 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert