Next Article in Journal
Disorder in ADHD and ASD Post-COVID-19
Previous Article in Journal
Reasonable Limiting of 7-Day Incidence per Hundred Thousand and Herd Immunization in Germany and Other Countries
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

Are COVID-19 Data Reliable? A Quantitative Analysis of Pandemic Data from 182 Countries

School of Business and Management, IU International University of Applied Sciences, 10247 Berlin, Germany
Sydney Medical School, Nepean Clinical School, University of Sydney, Sydney, NSW 2747, Australia
Author to whom correspondence should be addressed.
COVID 2021, 1(1), 137-152;
Submission received: 17 June 2021 / Revised: 3 July 2021 / Accepted: 7 July 2021 / Published: 16 July 2021


When it comes to COVID-19, access to reliable data is vital. It is crucial for the scientific community to use data reported by independent territories worldwide. This study evaluates the reliability of the pandemic data disclosed by 182 countries worldwide. We collected and assessed conformity of COVID-19 daily infections, deaths, tests, and vaccinations with Benford’s law since the beginning of the coronavirus pandemic. It is commonly accepted that the frequency of leading digits of the pandemic data shall conform to Benford’s law. Our analysis of Benfordness elicits that most countries partially distributed reliable data over the past eighteen months. Notably, the UK, Australia, Spain, Israel, and Germany, followed by 22 different nations, provided the most reliable COVID-19 data within the same period. In contrast, twenty-six nations, including Tajikistan, Belarus, Bangladesh, and Myanmar, published less reliable data on the coronavirus spread. In this context, over 31% of countries worldwide seem to have improved reliability. Our measurement of Benfordness moderately correlates with Johns Hopkin’s Global Health Security Index, suggesting that the quality of data may depend on national healthcare policies and systems. We conclude that economically or politically distressed societies have declined in conformity to the law over time. Our results are particularly relevant for policymakers worldwide.

1. Introduction

The World Health Organization’s (WHO) announcement on 10 January 2020 [1] confirmed first cases of the novel coronavirus—also known as COVID-19 or SARS-CoV-2—in Wuhan City, China, on 31 December 2019. After eighteen months, the world now faces millions of incidents, and a tidal wave of data on COVID-19 spread has emerged. Since the beginning of the pandemic, all countries affected by the virus unanimously reported two metrics, “New Cases”—individuals testing positive for the virus, and “New Deaths”—the number of death incidents due to infection with the virus [2]. After introducing COVID-19 tests, countries commenced reporting “New Tests”—the number of daily tests conducted. The emergence of the COVID-19 vaccines by pharmaceutical companies enabled national vaccination programs. As such, “New Vaccinations”—the number of people vaccinated daily—were reported by countries that adapted their preventive measures against the deadly virus. Today, we know that the pandemic’s growth follows the logistic law [3].
There is no doubt that access to reliable data is vital for the fight against the pandemic. Scientists typically develop new insights based on publicly available data. Restricting interventions colloquially known as lockdowns, travel bans, and flying restrictions have been implemented by many governments in numerous countries around the world. It is for this reason that public confidence in data is vital for the effective implementation of these interventions and policies. Public doubts about the credibility of COVID-19 data can cause a sluggish response to the pandemic in the beginning or, in the worst-case, civil disobedience [2].
Several researchers conducted forensic studies on COVID-19 data. Koch and Okamura [4] confirmed the reliability of the official data coming from COVID-19 flashpoints, the United States, China, and Italy. Idrovo and Manrique–Hernández likewise proved China’s conformity to the law [5]. Jackson and Sambridge [6] assessed data from 51 countries from 16 January 2020 to 9 April 2020, providing some evidence for the overall reliability of COVID-19 data worldwide. Wei and Vellwock studied the pandemic data from over twenty hot spots. They confirmed the Benfordness of few countries, such as the US, Spain, the UK, France, China, Belgium, Pakistan, and Italy. Russia and Iran were identified as clear violators of the law [7]. Lee et al. discovered that Japan did not provide reliable data [8]. According to Isea, China, Germany, Brazil, Venezuela, Norway, South Africa, Singapore, Ecuador, Egypt, Ireland, France, Australia, Colombia, India, Russia, and Croatia comply with the law. However, Italy, Portugal, Netherlands, the UK, Denmark, Belgium, and Chile did not pass three tests carried out [9]. In the most extensive study of COVID-19 data, Farhadi investigated over 100,000 integers from 154 countries and applied three goodness-of-fit measures [2]. Approximately 28% of countries adhered well to the anticipated distribution frequency, while six countries disclosed fully undependable data. Farhadi recommended replicating the very same study by applying additional tests based on larger sets of observations at a later point in time. Clearly, extending the timeframe and improving the statistical measurement shall add value to the body of knowledge. Table 1 summarizes the main characteristics of the prior research into the COVID-19 data reliability.
To this point, the previous studies on COVID-19 data faced two major challenges: (a) inconsistent data typically limited to a small sample size and (b) inconsistent application of various statistical techniques to evaluate the reliability of the data [2,3,4,5,6,7,8,9]. Our primary rationale is to overcome the prior limitations and provide a comprehensive assessment of the most recent data. Hence, we aim to assess Benfordness by extending the number of statistical tests and maximizing the timeframe of the analysis to over eighteen months since the beginning of the COVID-19 pandemic.

2. Method

2.1. Benford’s Law and Goodness-of-Fit Tests

Benford’s law (BL), also known as “the law of the first digits,” is a widely known technique for the detection of data manipulation and fraud. Its core idea relates to the frequency of leading digits in naturally generated datasets. The idea relies on Newcomb’s findings of 1881, which addressed the probabilities of first-digit numbers by the Equation (1) [10]:
P ( d ) = l o g 10 ( 1 + 1 d ) ,   d { 1 ,   2 ,   3 , ,   9 }
According to BL, the leading digits of numbers follow a particular logarithmic pattern: 30.1% for one, 17.6% for two, 12.5% for three, 9.7% for four, 7.9% for five, 6.7% for six, 5.8% for seven, 5.1% for eight, and 4.6% for nine [10,11]. See Table 2.
In an artificially generated dataset, the frequencies diverge from the BL distribution. Benfordness can only be applied to data with a geometrical tendency and characterized by the non-existence of minima and maxima. BL is common practice in social sciences and has been applied in various disciplines, such as finance and accounting [12,13,14], politics [15,16], and pandemics [2,3,4,5,6,7,8,9]. The body of knowledge is built on different goodness-of-fit tests to assess the divergence of the observed and expected frequencies. The techniques commonly used are Kolmogorov-Smirnov statistic, Chi-square, Euclidean distance, Mean Absolute Deviation, z-values, and Sum of Squared Difference.
The Kolmogorov–Smirnov statistic (K-S) is a non-parametric test for discrete data and quantifies the empirical distance between the observed and expected frequencies [17,18]. The K-S statistic was frequently applied to detect potential anomalies and incompliance of data in prior research [2,5]. Manipulation is evident if the K-S statistic is greater than the square root of the total number of the leading digits observed in a probability sample (cut-off). Once the K-S statistic is identified, the null hypothesis can be accepted, if: N D n > K n . The D n is calculated as D n = M a x x   | F n ( x ) F ( x ) | , where F n is the cumulative observed distribution, and F is Benford’s cumulative distribution. Considering the critical value at a 5% significance, K is set to 1.36; for a 1%, K is set to 1.63 [2,17]. The non-parametric K-S test is distribution-free and more powerful when the sample size is small [2,17]. This technical characteristic is of particular importance in the case of small samples of epidemics. The K-S statistic is one of the most precise techniques to assess BL reliability in the context of COVID-19 data [17].
Another prevalent technique is the Pearson’s chi-square ( χ 2 ) goodness-of-fit test with a confirmatory null hypothesis [7,9,19]. It is common knowledge that the χ 2 test is sensitive to the sample size and cannot make reliable inferences when the dataset consists of 5000 observations or more [2,19]. Principally, if the sample size is too big, the null hypothesis will likely be rejected (even if there is no significant difference between the actual and expected subsets). For a small sample size, χ 2 encounters difficulties in measurement as well. The test is computed as follows, cf. Equation (2):
χ 2 = i = 1 9 ( p ˜ i p i ) ) 2 p i
As a less sensitive technique to the sample size, researchers apply the d-factor (d*), which is calculated by the Equation (3):
d * = i = 1 9 ( p ˜ i p i ) 2 1.03606
where p ˜ i and p i are the observed and expected frequencies [2,7,19]. The d* ultimately measures the Euclidian distance between the measured and expected frequencies of leading digits. The d* quantifies the distance between the sample and the cumulative distribution function of the reference dataset after normalization by 1.03606, the maximum possible distance [19]. A d-factor (d*) equal to 0.0 suggests full conformity to BL, while the highest Euclidian distance, d* = 1.0, signifies full non-conformity. A d* above 0.2 indicates non-conformity with the law [2,7,19].
In this study, we further apply additional tests, i.e., Mean Absolute Deviation (MAD) and Sum of Squared Difference (SSD), which are less dependent on the sample size [20,21,22]. The Equation (4) shows the MAD calculation as the average absolute deviation of observed and expected frequencies:
MAD = 1 k × i = 1 k | O i E i |
SSD takes the sum of the squares of the deviation of each digit’s frequency from the expected frequency of BL [22]. SSD is calculated by the Equation (5):
SSD = i = 1 k ( O i E i ) 2 × 10 4
For both the MAD and SSD statistics, k is the number of leading digit bins, O i and E i are the observed and expected Benford’s frequencies. A M A D > 0.015 and an SSD > 100 indicate non-conformity with the law [21,22].
In order to be consistent with prior research, we postulate the following null hypothesis for all countries in scope—mainly, H0: COVID-19 data from Ji adhere to BL, where Ji stands for individual jurisdiction in the scope. We operationalize the statistical measures and goodness-of-fit tests based on a 0.05 significance level.

2.2. COVID-19 Data Sampling

Consistent with the extant body of knowledge, we desire a sample size of 1000 observations or larger for each country (also referred to jurisdiction or territory), although the minimum threshold for the analysis of the BL compliance is not specified in the body of knowledge. The measurement of Benfordness makes only sense if the COVID-19 samples are not too small. Small datasets cannot detect nonconformities to BL. The quality of the assessment rises if the sample size and range of the underpinning dataset grow.
This study employs the theory based on the largest dataset available from over 182 countries worldwide. We collected a sample from the COVID-19 database by the Centre for Systems Science and Engineering at Johns Hopkins University and Our World in Data [23]. repository; our dataset includes daily observations, such as New Cases, New Deaths, New Tests, and New Vaccinations. By adding new variables to the COVID-19 observed samples, i.e., “New Tests” and “New Vaccinations,” (which were technically not available) our study incorporates the most comprehensive dataset on SARS-CoV-2 globally. We collected 216,378 integers, which were reported by 182 countries from 21 January 2020 to 6 June 2021. We use “New Vaccinations” as additional measure, which was technically not reported and available in prior research [6]. We also documented leading digits of 86,917 new cases, 77,137 new deaths, 42,322 new tests, and 10,092 new vaccinations worldwide. Small samples of seven jurisdictions ( n < 100 ), including the Vatican and Tanzania, were removed from our analysis. Small samples cannot detect non-conformity to the law. Our final dataset consists of 176 countries with about 873 COVID-19 related observations on average.

3. Results

For each country, we evaluated the consistency of the pandemic data’s leading digits with BL. We operationalized the goodness-of-fit tests, K-S, χ 2 , d*, MAD, and SSD at a significant level. 32% of countries worldwide showed evident improvements in Benfordness, while 68% demonstrated non-conformity with BL. The results are coded as a global map (see Figure 1). As emphasized in Figure 2, twenty-seven countries (or 15.3%) disclosed data that fully comply with the law (green countries in Figure 1). For these countries, we accept the null hypothesis based on all tests. Twenty-six countries (or 14.8%) did not meet the requirements of BL tests based on all measures (see Figure 3). These jurisdictions failed to pass all goodness-of-fit tests. We rejected the null hypothesis for these jurisdictions, which are highlighted red on our global map. Notably, 123 countries or 69.9% demonstrated partial compliance with BL as they could pass at least one of the goodness-of-fit tests, flagged yellow (see Figure 4). Grey areas were excluded in our analysis as they do not offer a sufficient sample size. Figure 5 summarizes the global distribution of COVID-19 Benfordness.
Table 3 summarizes the results of the statistical analyses for all countries. The K-S, MAD, and χ 2 statistics reveal similar patterns, while Euclidean distance does not always flag non-conformity to BL. According to the K-S statistics, the most substantial evidence for abnormalities regarding BL was found with Bangladesh with D n : 0.037 and K S : 0.177 at a 5% confidence level, followed by Myanmar and Kuwait. Based on the χ 2 test, we specifically recognized Bangladesh with the most notable non-conformity (n: 1332; χ2: 177.155; p < 0.000). See Table 4 for the list of abbreviations.
In line with prior research [7,16,23], a Euclidean distance above 0.2 can be considered an indicator of BL non-conformity. In this context, Tajikistan showed by far the largest distance from Benford’s proportions according to the Euclidean statistic ( d * : 0.306 ) , followed by El Salvador ( d * : 0.226 ) , Nicaragua and Timor ( d * : 0.211 ) . Similarly, the very same countries, Tajikistan, Belarus, Nicaragua, and El Salvador, elicited large MAD and SSD statistics. Corresponding to the previous measures with a M A D : 0.082 and an S S D : 1007.9 , Tajikistan was ranked one for non-conformity to the law in our assessment. Tajikistan, Belarus, and El Salvador failed to confirm the null hypothesis based on all goodness-of-fit tests (see Table 3). It is noteworthy to emphasize that the null hypothesis can be confirmed for the UK, Spain, Australia, Israel, and Germany, as well as twenty-two nations. According to the K-S statistics, the UK (n: 1358, cut-off: 0.037, K-S: 0.015) and Australia (n: 878, cut-off: 0.046, K-S: 0.021) disclosed the most reliable data worldwide.

4. Conclusions

Can we rely on COVID-19 data? To address the central question of this study, we investigated data from 182 countries worldwide. To improve the accuracy of the statistical measurements, we applied five statistical tests. Farhadi applied K-S, chi-square, and d* statistics in a timeframe of nine months [2]; Koch and Okamura used the Kuiper test (a modified K-S test), chi-square, and d* for [4]; Wie and Vellwock operationalized d* statistics for circa eight months [7]. The results confirm that 15.34% of countries do not adhere to BL at all. This is, to some extent, in agreement with prior research [2,3,4,5,6,7,8,9]. Most European countries, except Latvia, Luxembourg, the Netherlands, and Sweden, demonstrated conformity by satisfying at least one of the tests. This is also true for the Americas and APAC regions. Apparently, the US revealed a declining pattern of BL reliability. Remarkably, the UK (n: 1358, d*: 0.022; K-S: 0.015), Australia (n: 878, d*: 0.025, K-S: 0.021), Spain (n: 690, d*: 0.026, K-S: 0.028), and Israel (n: 1335, d*: 0.028, K-S: 0.026) overwhelmingly conform to BL. This group could evidently maintain or improve the quality of COVID-19 incident reports compared to prior research; in a nine-month study of COVID-19 data, Farhadi identified the following measures for the UK (n: 593, d*: 0.076; K-S: 0.051), Australia (n: 497, d*: 0.029, K-S: 0.031), Spain (n: 353, d*: 0.086, K-S: 0.074), and Israel (n: 597, d*: 0.081, K-S: 0.087) [2]. This development goes in line with the organizational learning theory suggesting that proficiency in the fight against pandemics grows by gaining experience in managing pandemics over time. These countries introduced advanced programs to counter the epidemic within their national borders, such as non-pharmaceutical measures (lockdowns, travel bans, and social distancing) or progressive vaccinations. Consistent with Farhadi’s results, Germany was able to maintain the BL reliability (n: 911, d*: 0.040; K-S: 0.043 as of June 6, 2021, versus n: 408, d*: 0.032; K-S: 0.026 as of September 22, 2020). The US, which was identified as one of the champions of Benfordness by Farhadi, could partially pass the goodness-of-fit tests. With a non-confirmatory K-S-statistic (n:1389, cut-off: 0.036, K-S: 0.050), the null hypothesis for the US was rejected based on one test in our study. This suggests a decreased Benfordness of the US data over the past nine months from 24 February 2020 to 6 June 2021. Similarly, China conformed to BL in prior studies [2,4,5,24]; according to our results, however, China with 572 observations did not pass three tests applied in our evaluation.
Furthermore, 14.8% of the countries did not pass any of the tests applied in this study. Therefore, we rejected the null hypothesis for these cases. The most significant irregularities occurred in Tajikistan, Belarus, Myanmar, Turkey, Bangladesh, and El Salvador. In January 2021, Tajikistan declared itself COVID-19 free, which explains the lack of BL conformity in our assessment [25]. Distressed social-economic conditions can explain the poor reliability of pandemic data in Belarus and Myanmar [26,27]. Experts posit that Bangladesh has seen a decline in COVID-19 incidents as it reduced the number of tests affecting the recorded infection rates [28]. Turkey, which could not pass any goodness-of-fit test in our study, strives for a speedy recovery of its suffering hospitality industry [29]. It is noteworthy that 70% of El Salvador’s population work in an informal economy [30]. These distressed countries have been facing major economic, social, or political challenges during the course of the pandemic.
Notably, Johns Hopkins University (JHU) studied national capabilities, procedures, systems, and policies to respond to epidemic challenges across 195 countries [24]. Considering the Global Health Security Index of JHU (GHSI), we recognize consistency in the case of Tajikistan, ranked 144 for early detection and reporting. However, Latvia remains questionable. As a former member of the Soviet Union, it was placed second in the GHSI after the US and Australia; surprisingly and contrary to the JHU’s results, Latvia significantly violates BL. This was also confirmed in prior research [2]. Furthermore, JHU identified the highest scores for “early detection and reporting for epidemics of potential international concern” for the countries with significant BL conformity, such as Australia (ranked 2nd), the UK (ranked 6th), Germany (ranked 10th), or Spain (ranked 7th). However, countries with poor BL conformity demonstrated much weaker epidemic reporting capabilities worldwide. We conducted Farhadi’s approach [2], and in a further attempt, operationalized Pearson’s product-moment correlation analysis to explore the relationship between the GHSI scores and the goodness-of-fit tests. Preliminary analyses were conducted to ensure no violation of assumptions of normality, linearity, and homoscedasticity. We identified a negative partial correlation between the GHSI score and K-S statistic, which suggests correlation at a moderate level (r: −0.27, n: 176, p < 0.0005) that all countries with a high GHSI score provide reliable COVID-19 data. Our results are consistent with prior investigations [2].
Contradictions with the law may be an indicator of data manipulation or the act of creating artificially fabricated data [2,3,4,5,6,7,8,9]. If so, there is a strong need to understand the actual numbers of the death cases. Irregularities with BL may pertain to varying national policies and limited preventing capacity to detect and report COVID-19 incidents; the WHO claimed early on that some nations do not have sufficient access to testing kits [31]. To effectively fight the spread of any disease in the future, we recommend establishing global governance for pandemic measurement [2,32,33]. Having access to reliable data is vital in the fight against disastrous pandemics and submicroscopic infectious organisms.

5. Future Research

In our study, we identified tangible improvements in the quality of pandemic data, as well as diminishing data reliability in the context of COVID-19 data in several countries worldwide. It is of particular importance to understand the formative indicators and causality of these developments. To address this issue accurately, one shall further examine the economic, political, and social conditions of these jurisdictions and the impact of the COVID-19 pandemic on Benfordness. In the case of the red-flagged countries, we still need to gain a deeper understanding of the domestic healthcare processes and epidemic reporting policies. Based on the key results of this study, the authors recommend reproducing the forensic BL assessment in a later re-examination of the pandemic data. Ensuring BL reliability may be a fundamental step in the fight against epidemics in general, as well as COVID-19 in particular. We further recommend investigating the regressive and correlative relationships of our results with the social-political and economic transformation indices, e.g., the Bertelsmann Transformation Index (or BTI). Through this approach, evidence may be found for a causal relationship between the social-economic and political variables and the phenomenon of BL non-conformity of epidemic data. Ultimately, we hope to have contributed to the combat against SARS-CoV-2 with this paper.

6. Limitation

The data gathered from all countries in our study are affected by diverging public health systems and data reporting policies. Lack of common policies for reporting on a global basis, particularly in developing countries, may have affected the efficiency and effectiveness of COVID-19 reports.

Author Contributions

Conceptualization, N.F.; methodology, N.F.; software, N.F.; validation, N.F., H.L.; formal analysis, N.F.; investigation, H.L.; resources, H.L.; data curation, N.F., H.L.; writing—original draft preparation, N.F.; writing—review and editing, N.F., H.L.; visualization, N.F.; project administration, N.F., H.L.; funding acquisition, N/A; Both authors have read and agreed to the published version of the manuscript.


This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Our data on confirmed cases and deaths come from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (see, last accessed on 15 July 2021). Testing for COVID-19: this data is collected by the Our World in Data team from official reports (see, last accessed on 15 July 2021). Vaccinations against COVID-19 data is collected by the Our World in Data (see, last accessed on 15 July 2021).

Conflicts of Interest

The authors declare no conflict of interest.


  1. World Health Organization. Coronavirus Disease (COVID-19) Outbreak; WHO: Geneva, Switzerland, 2020; Available online: (accessed on 13 June 2021).
  2. Farhadi, N. Can we rely on Covid-19 data? An assessment of data from over 200 countries. Sci. Prog. 2021, 104, 1–19. [Google Scholar] [CrossRef] [PubMed]
  3. Castorina, P.; Iorio, A.; Lanteri, D. Data Analysis on Coronavirus Spreading by Macroscopic Growth Laws. Cornell University, 2020. Available online: (accessed on 13 June 2021).
  4. Koch, C.; Okamura, K. Benford’s Law and COVID-19 Reporting. 2020. Available online: (accessed on 13 June 2021).
  5. Idrovo, A.J.; Manrique-Hernandez, E.F. Data Quality of Chinese Surveillance of 270 COVID-19: Objective Analysis Based on WHO’s Situation Reports. Asia Pac. J. Public Health 2020, 32, 165–167. [Google Scholar] [CrossRef] [PubMed]
  6. Sambridge, M.; Jackson, A. National COVID Numbers—Benford’s Law Looks for Errors. Nature 2020. Available online: (accessed on 13 June 2021).
  7. Wie, A.; Vellwock, A.E. Is COVID-19 Data Reliable? A Statistical Analysis with Benford’s Law. 2020. Available online: (accessed on 13 June 2021).
  8. Lee, K.-B.; Han, S.; Jeong, Y. COVID-19 flattening the curve, and Benford’s law. Phys. A 2020. [Google Scholar] [CrossRef]
  9. Isea, R. How valid are the Reported Cases of People Infected with Covid-19 in the World? Int. J. Coronaviruses 2020, 1, 53–56. [Google Scholar] [CrossRef]
  10. Newcomb, S. Note on the Frequency of Use of the Different Digits in Natural 242 Numbers. Am. J. Math. 1881, 4, 39–40. [Google Scholar] [CrossRef] [Green Version]
  11. Benford, F. The Law of Anomalous Numbers. Proc. Am. Philos. Soc. 1938, 78, 551–572. [Google Scholar]
  12. Durtschi, C.; Hillison, W.; Pacini, C. The Effective Use of Benford’s law to assist in Detecting Fraud in Accounting Data. JFAR 2004, 5, 17–34. [Google Scholar]
  13. Grammatikos, T.; Papanikolaou, N.I. Applying Benford’s law to Detect Accounting 250 Data Manipulation in the Banking Industry. J. Financ. Serv. Res. 2021, 59, 115–142. [Google Scholar] [CrossRef]
  14. Cho, T.W.; Gaines, B.J. Breaking the (Benford) Law: Statistical Fraud Detection in Campaign Finance. Am. Stat. 2007, 61, 218–223. [Google Scholar]
  15. Roukema, B.F. A first-digit anomaly in the 2009 Iranian presidential election. J. Appl. Stat. 2014, 41, 164–199. [Google Scholar] [CrossRef] [Green Version]
  16. Furlan, L.V. Das Harmoniegesetz der Statistik: Eine Untersuchung uber die Metrische Interdependenz der Sozialen Erscheinungen. Verlag fur Recht und Gesellschaft; Verlag fur Recht und Gesellschaft: Basel, Switzerland, 1948. [Google Scholar]
  17. Simard, R.; L’Ecuyer, P. Computing the Two-Sided Kolmogorov–Smirnov Distribution. J. Stat. Softw. 2011, 39, 1–18. [Google Scholar] [CrossRef] [Green Version]
  18. Bushee, B. Benford’s Law. Wharton University, 2018. Available online: (accessed on 13 June 2021).
  19. Goodman, Q. The promises and pitfalls of Benford’s law. Significance 2016, 13, 38–41. [Google Scholar] [CrossRef]
  20. Nigrini, M.J. Benford’s Law Applications for Forensic Accounting, Auditing and Fraud Detection; Wiley Corporate, F&A: Hoboken, NJ, USA, 2012. [Google Scholar]
  21. Slepkov, A.D.; Ironside, K.B.; DiBattista, D. Benford’s Law: Textbook Exercises and Multiple-Choice Testbanks. PLoS ONE 2019, 10, e0117972. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  22. Kossovsky, A.E. Benford’s Law: Theory, the General Law of Relative Quantities, and Forensic Fraud Detection Applications; World Scientific: Singapore, 2014. [Google Scholar]
  23. Dong, E.; Du, H.; Gardner, L. An interactive web-based dashboard to track COVID-19 in real time. Lancet 2020, 20, 533–534. Available online: (accessed on 12 July 2021). [CrossRef]
  24. Johns Hopkins University. Global Health Security Index. 2019. Available online: (accessed on 13 June 2021).
  25. Putz, C. If Only It Were That Easy: Tajikistan Declares Itself COVID-19 Free. The Diplomat. 2021. Available online: (accessed on 13 June 2021).
  26. Vector, D. What’s Happening in Belarus? Here Are the Basics. New York Times. 2021. Available online: (accessed on 13 June 2021).
  27. Cuddy, A. Myanmar Coup: What Is Happening and Why? BBC News. 2021. Available online: (accessed on 13 June 2021).
  28. Deutsche Welle. Why Bangladeshis No Longer Fear the Coronavirus. 2021. Available online: (accessed on 13 June 2021).
  29. Yackley, A.J. Dollar Blow for Turkey as Tourism Season Runs into the Sand. Financial Times. 2021. Available online: (accessed on 13 June 2021).
  30. Associate Press. El Salvador’s President Wants Bitcoin as Legal Tender. 2021. Available online: (accessed on 13 June 2021).
  31. Farge, E. WHO to Start Coronavirus Testing in Rebel Syria; Iran Raises Efforts, Official Says. Available online: (accessed on 13 June 2021).
  32. Alberti, T.; Faranda, D. On the uncertainty of real-time predictions of epidemic growths: A COVID-19 case study for China and Italy. Commun. Nonlinear Sci. Numer. Simul. 2020, 90, 105372. [Google Scholar] [CrossRef] [PubMed]
  33. Balsari, S.; Buckee, C.; Khanna, T. Which COVID-19 Data Can You Trust? Harvard Business Review. 2020. Available online: (accessed on 13 June 2021).
Figure 1. Reliability of COVID-19 data worldwide.
Figure 1. Reliability of COVID-19 data worldwide.
Covid 01 00013 g001
Figure 2. Twenty-seven countries with the highest level of conformity with BL.
Figure 2. Twenty-seven countries with the highest level of conformity with BL.
Covid 01 00013 g002
Figure 3. Twenty-six countries with the lowest level of conformity with BL.
Figure 3. Twenty-six countries with the lowest level of conformity with BL.
Covid 01 00013 g003
Figure 4. 123 countries with a moderate level of conformity with BL.
Figure 4. 123 countries with a moderate level of conformity with BL.
Covid 01 00013 g004
Figure 5. Distribution of conformity to BL.
Figure 5. Distribution of conformity to BL.
Covid 01 00013 g005
Table 1. Prior research into COVID-19 data reliability measured by Benford’s law.
Table 1. Prior research into COVID-19 data reliability measured by Benford’s law.
ResearcherVariablesDeadlineNumber of Countries
Idrovo and Manrique-HernándezConfirmed cases, suspected cases, and deaths, cumulated confirmed cases and deaths21 January 2020–15 March 20201
Koch and OkamuraConfirmed cases, deaths, and recoveries20 January 2020–10 April 20203
Lee, Han, and JeongNumber of deaths22 January 2020–6 April 202010
Jackson and SambridgeCumulated confirmed cases and deaths16 January 2020–9 April 202051
IseaDaily cases and deaths29 December 2019–30 April 202023
Wei and VellwockTotal confirmed cases, daily cases and deathsnot stated–1 September 202020
FarhadiDaily cases, deaths, tests31 December 2019–24 September 2020154
Table 2. Benford’s Law Distribution of First Digit.
Table 2. Benford’s Law Distribution of First Digit.
First Digit123456789
Benford’s frequency0.3010.1760.1250.0970.0790.0670.0580.0510.046
Table 3. Results of the leading digit distribution analysis.
Table 3. Results of the leading digit distribution analysis.
LocationImproveFlagNGHSI ScoreGHSI Rankd*Chi Squarep-ValueK-SCut-OffSSDMAD123456789
United Kingdom+G13587820.0220.2690.8740.0150.0375.3080.0060.2950.1780.1220.0940.0880.0820.0590.0390.043
Cape Verde+G8837550.0351.0620.5880.0240.04612.9760.0110.2940.1590.1430.1060.0860.0610.0740.0440.033
Democratic Rep. of Congo-G99252420.0398.0250.0180.0380.04316.0500.0110.2920.1770.1510.1160.0730.0690.0480.0330.040
Sao Tome and Principe-R279311390.17047.7480.0000.1750.081308.5970.0390.4590.1940.1180.0930.0540.0290.0250.0110.018
Burkina Faso-R464301450.15055.0860.0000.1570.063242.8330.0350.4420.1900.1270.0670.0580.0340.0410.0240.017
Costa Rica-R108145620.10140.2340.0000.1260.041109.7210.0280.3780.2250.1050.0800.0510.0400.0450.0410.035
El Salvador-R115544650.22672.3380.0000.1320.040546.6130.0470.2230.3850.0720.0930.0520.0370.0390.0490.049
Equatorial Guinea+Y155161950.0815.4030.0670.0740.10971.2770.0230.3550.1610.1290.0970.1100.0260.0710.0190.032
Papua New Guinea-Y212281550.0937.9820.0180.0810.09392.5900.0240.3820.1560.1180.0850.0940.0330.0380.0610.033
Central African Republic-Y211271590.1047.9760.0190.0880.094117.2030.0270.3890.1520.0850.1280.0660.0710.0470.0330.028
St. Vincent & the Grenadines-Y162341170.13511.6590.0030.1130.107194.6770.0330.4140.1110.0930.0800.0740.0800.0370.0740.037
South Sudan-Y384221800.09211.6720.0030.0790.06991.4400.0230.3800.1380.1070.0910.0960.0490.0490.0600.029
South Korea+Y12997090.0611.8260.4010.0460.03840.5590.0170.2980.1330.1290.1260.1050.0850.0490.0420.033
New Zealand-Y71154350.07916.0600.0000.0660.05166.2060.0190.2350.2050.1200.1050.1080.0840.0600.0440.038
United States-Y13898410.07038.5990.0000.0500.03652.5950.0190.3510.1500.0880.0840.0710.0820.0650.0480.060
Hong Kong-Y574281560.09221.1680.0000.0840.05789.9730.0220.3850.1530.1080.0780.0570.0660.0510.0640.037
Bosnia and Herzegovina-Y103043790.05021.1870.0000.0630.04226.6050.0140.3050.2080.1520.0890.0810.0510.0420.0380.034
San Marino+Y290311390.10019.6380.0000.1250.080106.6770.0280.3790.1970.1340.1140.0550.0620.0310.0140.014
Saint Lucia-Y236341170.11513.3860.0010.1440.089143.0570.0320.3900.2250.1310.0680.0470.0300.0470.0380.025
Sierra Leone+Y39938920.10115.6420.0000.1120.068109.2840.0250.3910.1980.1100.0700.0480.0600.0450.0380.040
South Africa-Y130755340.0519.9140.0070.0640.03827.6180.0140.3230.2080.1350.0940.0540.0470.0490.0500.040
Trinidad and Tobago-Y85337990.06615.5020.0000.0810.04747.2590.0190.3470.2110.1010.0860.0830.0500.0470.0420.033
Dominican Republic-Y120538910.06914.7020.0010.0710.03951.7460.0190.2550.1510.1630.1220.0900.0780.0510.0520.037
Antigua and Barbuda-Y167291470.18228.8460.0000.1940.105354.0990.0440.4670.1740.1560.0600.0540.0300.0120.0360.012
Cote d’Ivoire-Y98545620.07025.3100.0000.0890.04352.2550.0200.3380.2210.1320.0900.0620.0370.0410.0390.041
Sri Lanka+Y1105341200.09438.5920.0000.0870.04195.5960.0190.3860.1770.0830.0800.0720.0570.0580.0480.038
United Arab Emirates-Y131347560.07851.9250.0000.1020.03866.1260.0230.3490.2230.1320.0960.0590.0430.0370.0340.027
Saudi Arabia-Y134549470.12837.7020.0000.1250.037176.3100.0370.2510.1000.1780.1530.1290.0730.0420.0250.048
Table 4. Abbreviations.
Table 4. Abbreviations.
LocationCountry or territory
ImproveDevelopment of Benford’s Law in reporting, including positive and negative development
FlagRed, Green, and Yellow
NNumber of sample
Chi squareThe Chi square goodness of fit test
p-valuep-value of the Chi square
K-SKolmogorov-Smirnov statistic
Cut-OffKolmogorov-Smirnov cut off
SSDSum of Squares Difference
MADMean Absolute Deviation
1Leading digit frequency with 1
2Leading digit frequency with 2
3Leading digit frequency with 3
4Leading digit frequency with 4
5Leading digit frequency with 5
6Leading digit frequency with 6
7Leading digit frequency with 7
8Leading digit frequency with 8
9Leading digit frequency with 9
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Farhadi, N.; Lahooti, H. Are COVID-19 Data Reliable? A Quantitative Analysis of Pandemic Data from 182 Countries. COVID 2021, 1, 137-152.

AMA Style

Farhadi N, Lahooti H. Are COVID-19 Data Reliable? A Quantitative Analysis of Pandemic Data from 182 Countries. COVID. 2021; 1(1):137-152.

Chicago/Turabian Style

Farhadi, Noah, and Hooshang Lahooti. 2021. "Are COVID-19 Data Reliable? A Quantitative Analysis of Pandemic Data from 182 Countries" COVID 1, no. 1: 137-152.

Article Metrics

Back to TopTop