Next Article in Journal
Tsallis-Based Nonextensive Analysis of the Southern California Seismicity
Previous Article in Journal
Application of the EGM Method to a LED-Based Spotlight: A Constrained Pseudo-Optimization Design Process Based on the Analysis of the Local Entropy Generation Maps
Entropy 2011, 13(7), 1229-1266; doi:10.3390/e13071229
Article

On Accuracy of PDF Divergence Estimators and Their Applicability to Representative Data Sampling

* ,
 and
Received: 28 May 2011; Accepted: 2 July 2011 / Published: 8 July 2011
Download PDF [4212 KB, uploaded 8 July 2011]
Abstract: Generalisation error estimation is an important issue in machine learning. Cross-validation traditionally used for this purpose requires building multiple models and repeating the whole procedure many times in order to produce reliable error estimates. It is however possible to accurately estimate the error using only a single model, if the training and test data are chosen appropriately. This paper investigates the possibility of using various probability density function divergence measures for the purpose of representative data sampling. As it turned out, the first difficulty one needs to deal with is estimation of the divergence itself. In contrast to other publications on this subject, the experimental results provided in this study show that in many cases it is not possible unless samples consisting of thousands of instances are used. Exhaustive experiments on the divergence guided representative data sampling have been performed using 26 publicly available benchmark datasets and 70 PDF divergence estimators, and their results have been analysed and discussed.
Keywords: cross-validation; divergence estimation; generalisation error estimation; Kullback-Leibler divergence; sampling cross-validation; divergence estimation; generalisation error estimation; Kullback-Leibler divergence; sampling
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Export to BibTeX |
EndNote


MDPI and ACS Style

Budka, M.; Gabrys, B.; Musial, K. On Accuracy of PDF Divergence Estimators and Their Applicability to Representative Data Sampling. Entropy 2011, 13, 1229-1266.

AMA Style

Budka M, Gabrys B, Musial K. On Accuracy of PDF Divergence Estimators and Their Applicability to Representative Data Sampling. Entropy. 2011; 13(7):1229-1266.

Chicago/Turabian Style

Budka, Marcin; Gabrys, Bogdan; Musial, Katarzyna. 2011. "On Accuracy of PDF Divergence Estimators and Their Applicability to Representative Data Sampling." Entropy 13, no. 7: 1229-1266.


Entropy EISSN 1099-4300 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert