Next Article in Journal
Robust Fitting of a Wrapped Normal Model to Multivariate Circular Data and Outlier Detection
Next Article in Special Issue
Base Dependence of Benford Random Variables
Previous Article in Journal
Analysis of ‘Pre-Fit’ Datasets of gLAB by Robust Statistical Techniques
Article

On the Mistaken Use of the Chi-Square Test in Benford’s Law

Independent Researcher, New York, NY 10002, USA
Academic Editors: Claudio Lupi, Roy Cerqueti and Marcel Ausloos
Stats 2021, 4(2), 419-453; https://doi.org/10.3390/stats4020027
Received: 1 April 2021 / Revised: 8 May 2021 / Accepted: 12 May 2021 / Published: 28 May 2021
(This article belongs to the Special Issue Benford's Law(s) and Applications)
Benford’s Law predicts that the first significant digit on the leftmost side of numbers in real-life data is distributed between all possible 1 to 9 digits approximately as in LOG(1 + 1/digit), so that low digits occur much more frequently than high digits in the first place. Typically researchers, data analysts, and statisticians, rush to apply the chi-square test in order to verify compliance or deviation from this statistical law. In almost all cases of real-life data this approach is mistaken and without mathematical-statistics basis, yet it had become a dogma or rather an impulsive ritual in the field of Benford’s Law to apply the chi-square test for whatever data set the researcher is considering, regardless of its true applicability. The mistaken use of the chi-square test has led to much confusion and many errors, and has done a lot in general to undermine trust and confidence in the whole discipline of Benford’s Law. This article is an attempt to correct course and bring rationality and order to a field which had demonstrated harmony and consistency in all of its results, manifestations, and explanations. The first research question of this article demonstrates that real-life data sets typically do not arise from random and independent selections of data points from some larger universe of parental data as the chi-square approach supposes, and this conclusion is arrived at by examining how several real-life data sets are formed and obtained. The second research question demonstrates that the chi-square approach is actually all about the reasonableness of the random selection process and the Benford status of that parental universe of data and not solely about the Benford status of the data set under consideration, since the focus of the chi-square test is exclusively on whether the entire process of data selection was probable or too rare. In addition, a comparison of the chi-square statistic with the Sum of Squared Deviations (SSD) measure of distance from Benford is explored in this article, pitting one measure against the other, and concluding with a strong preference for the SSD measure. View Full-Text
Keywords: Benford’s Law; digits; digit distribution; chi-square test; chain of distributions; order of magnitude; sum of squared deviations; threshold values Benford’s Law; digits; digit distribution; chi-square test; chain of distributions; order of magnitude; sum of squared deviations; threshold values
Show Figures

Figure 1

MDPI and ACS Style

Kossovsky, A.E. On the Mistaken Use of the Chi-Square Test in Benford’s Law. Stats 2021, 4, 419-453. https://doi.org/10.3390/stats4020027

AMA Style

Kossovsky AE. On the Mistaken Use of the Chi-Square Test in Benford’s Law. Stats. 2021; 4(2):419-453. https://doi.org/10.3390/stats4020027

Chicago/Turabian Style

Kossovsky, Alex E. 2021. "On the Mistaken Use of the Chi-Square Test in Benford’s Law" Stats 4, no. 2: 419-453. https://doi.org/10.3390/stats4020027

Find Other Styles

Article Access Map by Country/Region

1
Back to TopTop