# A New Benford Test for Clustered Data with Applications to American Elections

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Benford’s Law and Generalizations

## 3. Empirical Estimates Using 2004 Election Data

#### 3.1. Modeling with the Discrete Weibull

#### 3.2. Data and Model Estimation

## 4. Comparisons of 1-BL 3 and 1-BL 10 to Potential Vote Distributions

#### 4.1. Simulations Using Chi-Squared Testing

#### 4.2. Simulations Using Binomial Testing

## 5. Applications of 1-BL 3 to the 2004 US Presidential Election

#### 5.1. Chi-Square Test Results

#### 5.2. Binomial Test Results

## 6. Discussion

## 7. Conclusions and Future Research

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Engel, H.A.; Leuenberger, C. Benford’s law for exponential random variables. Stat. Probab. Lett.
**2003**, 63, 361–365. [Google Scholar] [CrossRef] - Miller, S.J.; Nigrini, M.J. Order statistics and Benford’s law. Int. J. Math. Math. Sci.
**2008**, 2008, 382948. [Google Scholar] [CrossRef] - Miller, S.J.; Berger, A.; Hill, T. Theory and Applications of Benford’s Law. 2015. Available online: https://www.researchgate.net/publication/280157559_Benford%27s_Law_Theory_and_Applications (accessed on 29 June 2022).
- Berger, A.; Hill, T.P. Benford’s law strikes back: No simple explanation in sight for mathematical gem. Math. Intell.
**2011**, 33, 85. [Google Scholar] [CrossRef] - Deckert, J.; Myagkov, M.; Ordeshook, P.C. Benford’s Law and the detection of election fraud. Political Anal.
**2011**, 19, 245–268. [Google Scholar] [CrossRef] - Mebane, W.R. Comment on “Benford’s Law and the detection of election fraud”. Political Anal.
**2011**, 19, 269–272. [Google Scholar] [CrossRef] - Mebane, W. Inappropriate Applications of Benford’s Law Regularities to Some Data from the 2020 Presidential Election in the United States. 2020. Available online: http://www.umich.edu/~wmebane/inapB.pdf (accessed on 29 June 2022).
- Mebane, W.R., Jr. Election forensics: Vote counts and Benford’s law. In Proceedings of the Summer Meeting of the Political Methodology Society, UC-Davis, Davis, CA, USA, 20–22 July 2006; Volume 17. [Google Scholar]
- Mebane, W.R.; Larijani, A.; Mehralizadeh, M.; Moeen, M.; Hashemi, A.; These, R. Note on the presidential election in Iran, June 2009. Available online: http://websites.umich.edu/~wmebane/note29jun2009.pdf (accessed on 29 June 2022).
- Mebane, W.R. Second-digit tests for voters’ election strategies and election fraud. In Proceedings of the Annual Meeting of the Midwest Political Science Association, Chicago, IL, USA, 12–15 April 2012. [Google Scholar]
- Kalinin, K.; Mebane, W.R. Understanding electoral frauds through evolution of Russian federalism: From “bargaining loyalty” to “signaling loyalty”. In Proceedings of the Annual Meeting of the Midwest Political Science Association, Chicago, IL, USA, 31 March–2 April 2011; pp. 1–26. [Google Scholar]
- Mebane, W.R. Election fraud or strategic voting? Can second-digit tests tell the difference. In Proceedings of the Summer Meeting of the Political Methodology Society, Iowa, IA, USA, 22–24 July 2010. [Google Scholar]
- Mebane, W.R., Jr. Fraud in the 2009 presidential election in Iran? Chance
**2010**, 23, 6–15. [Google Scholar] [CrossRef] - Mebane, W.R., Jr. Election forensics: The meanings of precinct vote counts’ second digits. In Proceedings of the Summer Meeting of the Political Methodology Society, Charlottesville, VA, USA, 18–20 July 2013. [Google Scholar]
- Mebane, W.R., Jr.; Kalinin, K. Electoral falsification in Russia: Complex diagnostics selections 2003–2004, 2007–2008 (in Russian). Elect. Policy REO
**2009**, 2, 57–70. [Google Scholar] - Mebane, W.R., Jr.; Kalinin, K. Electoral fraud in Russia: Vote counts analysis using second-digit mean tests. In Proceedings of the Annual Meeting of the Midwest Political Science Association, Chicago, IL, USA, 22–25 April 2010; Volume 20. [Google Scholar]
- Mebane, W.R., Jr.; Kent, T.B. Second Digit Implications of Voters’ Strategies and Mobilizations in the United States during the 2000s. In Proceedings of the Annual Meeting of the Midwest Political Science Association, Chicago, IL, USA, 11–14 April 2013; pp. 11–14. [Google Scholar]
- Kossovsky, A.E.; Miller, S.J. Report on Benford’s Law Analysis of 2020 Presidential Election Data. Available online: https://web.williams.edu/Mathematics/sjmiller/public_html/KossoskyMiller_FinalBenfordAnalysis.pdf (accessed on 29 June 2022).
- Newcomb, S. Note on the frequency of use of the different digits in natural numbers. Am. J. Math.
**1881**, 4, 39–40. [Google Scholar] [CrossRef] - Benford, F. The law of anomalous numbers. Proc. Am. Philos. Soc.
**1938**, 78, 551–572. [Google Scholar] - Nigrini, M.J.; Miller, S.J. Benford’s law applied to hydrology data—Results and relevance to other geophysical data. Math. Geol.
**2007**, 39, 469–490. [Google Scholar] [CrossRef] - Geyer, A.; Martí, J. Applying Benford’s law to volcanology. Geology
**2012**, 40, 327–330. [Google Scholar] [CrossRef] - Grammatikos, T.; Papanikolaou, N.I. Applying Benford’s law to detect accounting data manipulation in the banking industry. J. Financ. Serv. Res.
**2021**, 59, 115–142. [Google Scholar] [CrossRef] - Jolion, J.M. Images and Benford’s law. J. Math. Imaging Vis.
**2001**, 14, 73–81. [Google Scholar] [CrossRef] - Arshadi, L.; Jahangir, A.H. Benford’s law behavior of Internet traffic. J. Netw. Comput. Appl.
**2014**, 40, 194–205. [Google Scholar] [CrossRef] - Villas-Boas, S.B.; Fu, Q.; Judge, G. Benford’s law and the FSD distribution of economic behavioral micro data. Phys. A Stat. Mech. Its Appl.
**2017**, 486, 711–719. [Google Scholar] [CrossRef] - Hill, T.P. A statistical derivation of the significant-digit law. Stat. Sci.
**1995**, 10, 354–363. [Google Scholar] [CrossRef] - Hill, T.P. The first digit phenomenon: A century-old observation about an unexpected pattern in many numerical tables applies to the stock market, census statistics and accounting data. Am. Sci.
**1998**, 86, 358–363. [Google Scholar] [CrossRef] - Nigrini, M.J. Benford’s Law: Applications for Forensic Accounting, Auditing, and Fraud Detection; John Wiley & Sons: Hoboken, NJ, USA, 2012; Volume 586. [Google Scholar]
- Raimi, R.A. The first digit problem. Am. Math. Mon.
**1976**, 83, 521–538. [Google Scholar] [CrossRef] - Berger, A.; Hill, T.P. An Introduction to Benford’s Law; Princeton University Press: Princeton, NJ, USA, 2015. [Google Scholar]
- Kossovsky, A.E. Benford’s Law: Theory, the General Law of Relative Quantities, and Forensic Fraud Detection Applications; World Scientific: Singapore, 2014; Volume 3. [Google Scholar]
- Nakagawa, T.; Osaki, S. The discrete Weibull distribution. IEEE Trans. Reliab.
**1975**, 24, 300–301. [Google Scholar] [CrossRef] - Peluso, A.; Vinciotti, V.; Yu, K. Discrete Weibull generalized additive model: An application to count fertility data. J. R. Stat. Soc. Ser. C (Appl. Stat.)
**2019**, 68, 565–583. [Google Scholar] [CrossRef] - El-Morshedy, M.; Eliwa, M.; El-Gohary, A.; Khalil, A. Bivariate exponentiated discrete Weibull distribution: Statistical properties, estimation, simulation and applications. Math. Sci.
**2020**, 14, 29–42. [Google Scholar] [CrossRef] - Englehardt, J.D.; Li, R. The discrete Weibull distribution: An alternative for correlated counts with confirmation for microbial counts in water. Risk Anal. Int. J.
**2011**, 31, 370–381. [Google Scholar] [CrossRef] [PubMed] - Patriarca, R.; Hu, T.; Costantino, F.; Di Gravio, G.; Tronci, M. A system-approach for recoverable spare parts management using the discrete Weibull distribution. Sustainability
**2019**, 11, 5180. [Google Scholar] [CrossRef] - Para, B.A.; Jan, T. Compound of discrete Weibull and Minimax distributions as a new count data model with application in genetics. Appl. Math. Inf. Sci. Lett.
**2017**, 5, 113–124. [Google Scholar] [CrossRef] - Ansolabehere, S.; Palmer, M.; Lee, A. Precinct-Level Election Data, 2002–2012. Harvard Dataverse. 2015. Available online: https://dataverse.harvard.edu/ (accessed on 29 June 2022).
- 2004 Election Results. Available online: https://www.fec.gov/resources/cms-content/documents/2004tables.pdf (accessed on 22 June 2022).
- Kossovsky, A.E. On the Mistaken Use of the Chi-Square Test in Benford’s Law. Stats
**2021**, 4, 419–453. [Google Scholar] [CrossRef] - Benjamini, Y.; Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.)
**1995**, 57, 289–300. [Google Scholar] [CrossRef] - Mebane, W.R.; Alvarez, R.M.; Hall, T.E.; Hyde, S.D. Election forensics: The second-digit Benford’s law test and recent American presidential elections. In Election Fraud: Detecting and Deterring Electoral Manipulation; 2008; pp. 162–181. Available online: https://www.semanticscholar.org/paper/Election-Forensics%3A-The-Second-digit-Benford%27s-Law-Mebane/56ef9eebb429bde1815aed5c49e5fc5157b5f4a8 (accessed on 29 June 2022).
- Mebane, W.R., Jr. Election forensics: Statistics, recounts and fraud. In Proceedings of the Annual Meeting of the Midwest Political Science Association, Chicago, IL, USA, 12–16 April 2007; pp. 12–16. [Google Scholar]
- Dao, J. The 2004 Campaign Battlegrounds; Ohio. The New York Times, 31 October 2004. [Google Scholar]
- Herron, M.C. Mail-In Absentee Ballot Anomalies in North Carolina’s 9th Congressional District. Elect. Law J. Rules Politics Policy
**2019**, 18, 191–213. [Google Scholar] [CrossRef] - Zhang, M.; Alvarez, R.M.; Levin, I. Election forensics: Using machine learning and synthetic data for possible election anomaly detection. PLoS ONE
**2019**, 14, e0223950. [Google Scholar] [CrossRef] - Lacasa, L.; Fernández-Gracia, J. Election forensics: Quantitative methods for electoral fraud detection. Forensic Sci. Int.
**2019**, 294, e19–e22. [Google Scholar] [CrossRef] [Green Version] - Mebane, W.R., Jr. Election forensics: Statistical interventions in election controversies. In Proceedings of the Annual Meeting of the American Political Science Association, Chicago, IL, USA, 30 August–2 September 2007; Volume 13. [Google Scholar]
- Mebane, W.R., Jr.; Klaver, J. Election Forensics: Strategies versus Election Frauds in Germany. In Proceedings of the Annual Conference of the European Political Science Association, Vienna, Austria, 25–27 June 2015. [Google Scholar]

**Figure 1.**Vote Distributions Across Precincts in US 2004 Presidential Election for Selected Counties.

**Figure 4.**Discrete Weibull parameter estimation and Kolmogorov Smirnov Goodness of Fit Test Results for Ohio, Colorado, and Wisconsin, US 2004 Presidential Election. p-values above 0.05 (indicate conformance to discrete Weibull distribution.

**Figure 5.**First Digit Base 10 Monte Carlo Analysis of Simulated Discrete Weibull Distributions with Various Choices of Parameterizations and Precinct Sizes Using Chi Squared Testing. p-values above 0.05 (plotted in blue) indicate conformance to 1-BL 10.

**Figure 6.**First Digit Base 3 Monte Carlo Analysis of Simulated Discrete Weibull Distributions with Various Choices of Parameterizations and Precinct Sizes Using Chi-Squared Testing. p-values above 0.05 (plotted in blue) indicate conformance to 1-BL 3.

**Figure 7.**First Digit Base 3 Monte Carlo Analysis of Simulated Discrete Weibull Distributions with Various Choices of Parameterizations and Precinct Sizes Using Binomial Testing. p-values above 0.05 (plotted in blue) indicate conformance to 1-BL 3.

d | Probability First Digit d | Probability Second Digit d |
---|---|---|

0 | 0.1197 | |

1 | 0.3010 | 0.1139 |

2 | 0.1761 | 0.1088 |

3 | 0.1249 | 0.1043 |

4 | 0.0969 | 0.1003 |

5 | 0.0792 | 0.0967 |

6 | 0.0669 | 0.0934 |

7 | 0.0580 | 0.0904 |

8 | 0.0512 | 0.0876 |

9 | 0.0458 | 0.0850 |

d | Probability First Digit d | Probability Second Digit d |
---|---|---|

0 | 0.4022 | |

1 | 0.6309 | 0.3247 |

2 | 0.3691 | 0.2732 |

**Table 3.**Maximum Likelihood Estimates and Goodness of Fit Test Results for Vote Counts for Selected Counties in 2004 US Presidential Election—President George W. Bush. p-values above 0.05 (plotted in blue) indicate conformance to discrete Weibull distribution.

${\widehat{\mathit{\alpha}}}_{\mathit{Bush},\mathit{i},\mathit{j}}$ | ${\widehat{\mathit{\beta}}}_{\mathit{Bush},\mathit{i},\mathit{j}}$ | p-Value | Number of Precincts | |
---|---|---|---|---|

Milwaukee County, WI | 351.457 | 1.410 | 0.321 | 560 |

Stark County, OH | 287.717 | 2.289 | 0.472 | 364 |

Grand County, CO | 205.389 | 2.706 | 0.726 | 12 |

**Table 4.**Maximum Likelihood Estimates and Goodness of Fit Test Results for Vote Counts for Selected Counties in 2004 US Presidential Election—Senator John F. Kerry. p-values above 0.05 indicate conformance to discrete Weibull distribution.

${\widehat{\mathit{\alpha}}}_{\mathit{Kerry},\mathit{i},\mathit{j}}$ | ${\widehat{\mathit{\beta}}}_{\mathit{Kerry},\mathit{i},\mathit{j}}$ | p-Value | Number of Precincts | |
---|---|---|---|---|

Milwaukee County, WI | 598.1 | 2.2 | 0.223 | 560 |

Stark County, OH | 289.057 | 4.085 | 0.304 | 364 |

Grand County, CO | 167.35 | 3.188 | 0.614 | 12 |

State | George W. Bush (R) | John F. Kerry (D) | Number of Counties |
---|---|---|---|

North Carolina | 56.02% (1,961,166) | 43.58% (1,525,849) | 100 |

Vermont | 38.80% (121,180) | 58.94% (184,067) | 16 |

Wisconsin | 49.32% (1,478,120) | 49.70% (1,489,504) | 72 |

Ohio | 50.81% (2,859,768) | 48.71% (2,741,167) | 88 |

Colorado | 51.69% (1,101,255) | 47.02% (1,001,732) | 64 |

**Table 6.**First Digit Base 3 Benford’s Analysis on Selected Battleground and Non-Battleground States in the US 2004 Presidential Election.

State | County | Candidate | ${\mathit{\chi}}^{2}$ Stat | p-Value | Number of Precincts | Adjusted p-Value |
---|---|---|---|---|---|---|

Colorado | Arapahoe | John F. Kerry | 21.152 | <0.001 | 364 | 0.001 |

El Paso | John F. Kerry | 20.513 | <0.001 | 378 | 0.001 | |

Jefferson | George W. Bush | 38.050 | <0.001 | 324 | <0.001 | |

Jefferson | John F. Kerry | 108.747 | <0.001 | 324 | <0.001 | |

North Carolina | No Anomalies Detected | |||||

Wisconsin | No Anomalies Detected | |||||

Ohio | Ashtabula | John F. Kerry | 39.385 | <0.001 | 127 | <0.001 |

Butler | John F. Kerry | 14.594 | 0.001 | 289 | 0.013 | |

Geauga | George W. Bush | 15.196 | <0.001 | 96 | 0.013 | |

Geauga | John F. Kerry | 20.812 | <0.001 | 96 | 0.001 | |

Greene | George W. Bush | 13.860 | 0.001 | 142 | 0.016 | |

Greene | John F. Kerry | 36.189 | <0.001 | 142 | <0.001 | |

Lorain | John F. Kerry | 43.509 | <0.001 | 239 | <0.001 | |

Miami | John F. Kerry | 14.669 | 0.001 | 82 | 0.013 | |

Muskingum | John F. Kerry | 13.971 | 0.001 | 85 | 0.016 | |

Portage | John F. Kerry | 27.249 | <0.001 | 129 | <0.001 | |

Summit | John F. Kerry | 68.917 | <0.001 | 475 | <0.001 | |

Vermont | No Anomalies Detected |

**Table 7.**Analysis of Whether Anomalous Counties from Table 6 Deviate from Benford’s Law.

County | Candidate | ${\mathit{\alpha}}_{\mathit{x},\mathit{i},\mathit{t}}$ | ${\mathit{\beta}}_{\mathit{x},\mathit{i},\mathit{t}}$ | Number of Precincts | p-Value (KS-Test) | Should Conform to BL but Fails? |
---|---|---|---|---|---|---|

Arapahoe | John F. Kerry | 326.634 | 2.973 | 364 | 0.003 | |

El Paso | John F. Kerry | 231.420 | 2.499 | 378 | 0.507 | * |

Jefferson | George W. Bush | 400.330 | 3.650 | 324 | 0.001 | |

Jefferson | John F. Kerry | 352.800 | 4.717 | 324 | <0.001 | |

Ashtabula | John F. Kerry | 208.527 | 4.538 | 127 | <0.001 | |

Butler | John F. Kerry | 218.783 | 2.869 | 289 | 0.037 | |

Geauga | George W. Bush | 348.690 | 4.200 | 96 | 0.003 | |

Geauga | John F. Kerry | 228.818 | 3.869 | 96 | 0.099 | * |

Greene | George W. Bush | 383.344 | 2.628 | 142 | 0.395 | * |

Greene | John F. Kerry | 244.027 | 2.272 | 142 | 0.494 | * |

Lorain | John F. Kerry | 367.292 | 2.964 | 239 | 0.074 | * |

Miami | John F. Kerry | 235.479 | 4.963 | 82 | 0.010 | |

Muskingum | John F. Kerry | 215.593 | 3.613 | 85 | 0.054 | * |

Portage | John F. Kerry | 347.487 | 4.109 | 129 | <0.001 | |

Summit | John F. Kerry | 365.481 | 3.712 | 475 | <0.001 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Anderson, K.M.; Dayaratna, K.; Gonshorowski, D.; Miller, S.J.
A New Benford Test for Clustered Data with Applications to American Elections. *Stats* **2022**, *5*, 841-855.
https://doi.org/10.3390/stats5030049

**AMA Style**

Anderson KM, Dayaratna K, Gonshorowski D, Miller SJ.
A New Benford Test for Clustered Data with Applications to American Elections. *Stats*. 2022; 5(3):841-855.
https://doi.org/10.3390/stats5030049

**Chicago/Turabian Style**

Anderson, Katherine M., Kevin Dayaratna, Drew Gonshorowski, and Steven J. Miller.
2022. "A New Benford Test for Clustered Data with Applications to American Elections" *Stats* 5, no. 3: 841-855.
https://doi.org/10.3390/stats5030049