Are COVID-19 Data Reliable? A Quantitative Analysis of Pandemic Data from 182 Countries

: When it comes to COVID-19, access to reliable data is vital. It is crucial for the scientiﬁc community to use data reported by independent territories worldwide. This study evaluates the reliability of the pandemic data disclosed by 182 countries worldwide. We collected and assessed conformity of COVID-19 daily infections, deaths, tests, and vaccinations with Benford’s law since the beginning of the coronavirus pandemic. It is commonly accepted that the frequency of leading digits of the pandemic data shall conform to Benford’s law. Our analysis of Benfordness elicits that most countries partially distributed reliable data over the past eighteen months. Notably, the UK, Australia, Spain, Israel, and Germany, followed by 22 different nations, provided the most reliable COVID-19 data within the same period. In contrast, twenty-six nations, including Tajikistan, Belarus, Bangladesh, and Myanmar, published less reliable data on the coronavirus spread. In this context, over 31% of countries worldwide seem to have improved reliability. Our measurement of Benfordness moderately correlates with Johns Hopkin’s Global Health Security Index, suggesting that the quality of data may depend on national healthcare policies and systems. We conclude that economically or politically distressed societies have declined in conformity to the law over time. Our results are particularly relevant for policymakers worldwide.


Introduction
The World Health Organization's (WHO) announcement on 10 January 2020 [1] confirmed first cases of the novel coronavirus-also known as COVID-19 or SARS-CoV-2in Wuhan City, China, on 31 December 2019. After eighteen months, the world now faces millions of incidents, and a tidal wave of data on COVID-19 spread has emerged. Since the beginning of the pandemic, all countries affected by the virus unanimously reported two metrics, "New Cases"-individuals testing positive for the virus, and "New Deaths"-the number of death incidents due to infection with the virus [2]. After introducing COVID-19 tests, countries commenced reporting "New Tests"-the number of daily tests conducted. The emergence of the COVID-19 vaccines by pharmaceutical companies enabled national vaccination programs. As such, "New Vaccinations"-the number of people vaccinated daily-were reported by countries that adapted their preventive measures against the deadly virus. Today, we know that the pandemic's growth follows the logistic law [3].
There is no doubt that access to reliable data is vital for the fight against the pandemic. Scientists typically develop new insights based on publicly available data. Restricting interventions colloquially known as lockdowns, travel bans, and flying restrictions have been implemented by many governments in numerous countries around the world. It is for this reason that public confidence in data is vital for the effective implementation of these interventions and policies. Public doubts about the credibility of COVID-19 data can cause a sluggish response to the pandemic in the beginning or, in the worst-case, civil disobedience [2]. Several researchers conducted forensic studies on COVID-19 data. Koch and Okamura [4] confirmed the reliability of the official data coming from COVID-19 flashpoints, the United States, China, and Italy. Idrovo and Manrique-Hernández likewise proved China's conformity to the law [5]. Jackson and Sambridge [6] assessed data from 51 countries from 16 January 2020 to 9 April 2020, providing some evidence for the overall reliability of COVID-19 data worldwide. Wei and Vellwock studied the pandemic data from over twenty hot spots. They confirmed the Benfordness of few countries, such as the US, Spain, the UK, France, China, Belgium, Pakistan, and Italy. Russia and Iran were identified as clear violators of the law [7]. Lee et al. discovered that Japan did not provide reliable data [8]. According to Isea, China, Germany, Brazil, Venezuela, Norway, South Africa, Singapore, Ecuador, Egypt, Ireland, France, Australia, Colombia, India, Russia, and Croatia comply with the law. However, Italy, Portugal, Netherlands, the UK, Denmark, Belgium, and Chile did not pass three tests carried out [9]. In the most extensive study of COVID-19 data, Farhadi investigated over 100,000 integers from 154 countries and applied three goodness-of-fit measures [2]. Approximately 28% of countries adhered well to the anticipated distribution frequency, while six countries disclosed fully undependable data. Farhadi recommended replicating the very same study by applying additional tests based on larger sets of observations at a later point in time. Clearly, extending the timeframe and improving the statistical measurement shall add value to the body of knowledge. Table 1 summarizes the main characteristics of the prior research into the COVID-19 data reliability. To this point, the previous studies on COVID-19 data faced two major challenges: (a) inconsistent data typically limited to a small sample size and (b) inconsistent application of various statistical techniques to evaluate the reliability of the data [2][3][4][5][6][7][8][9]. Our primary rationale is to overcome the prior limitations and provide a comprehensive assessment of the most recent data. Hence, we aim to assess Benfordness by extending the number of statistical tests and maximizing the timeframe of the analysis to over eighteen months since the beginning of the COVID-19 pandemic.

Benford's Law and Goodness-of-Fit Tests
Benford's law (BL), also known as "the law of the first digits," is a widely known technique for the detection of data manipulation and fraud. Its core idea relates to the frequency of leading digits in naturally generated datasets. The idea relies on Newcomb's findings of 1881, which addressed the probabilities of first-digit numbers by the Equation (1) [10]: According to BL, the leading digits of numbers follow a particular logarithmic pattern: 30.1% for one, 17.6% for two, 12.5% for three, 9.7% for four, 7.9% for five, 6.7% for six, 5.8% for seven, 5.1% for eight, and 4.6% for nine [10,11]. See Table 2. In an artificially generated dataset, the frequencies diverge from the BL distribution. Benfordness can only be applied to data with a geometrical tendency and characterized by the non-existence of minima and maxima. BL is common practice in social sciences and has been applied in various disciplines, such as finance and accounting [12][13][14], politics [15,16], and pandemics [2][3][4][5][6][7][8][9]. The body of knowledge is built on different goodness-of-fit tests to assess the divergence of the observed and expected frequencies. The techniques commonly used are Kolmogorov-Smirnov statistic, Chi-square, Euclidean distance, Mean Absolute Deviation, z-values, and Sum of Squared Difference.
The Kolmogorov-Smirnov statistic (K-S) is a non-parametric test for discrete data and quantifies the empirical distance between the observed and expected frequencies [17,18]. The K-S statistic was frequently applied to detect potential anomalies and incompliance of data in prior research [2,5]. Manipulation is evident if the K-S statistic is greater than the square root of the total number of the leading digits observed in a probability sample (cutoff). Once the K-S statistic is identified, the null hypothesis can be accepted, if: √ ND n > K n . The D n is calculated as D n = Max x |F n (x) − F(x)|, where F n is the cumulative observed distribution, and F is Benford's cumulative distribution. Considering the critical value at a 5% significance, K is set to 1.36; for a 1%, K is set to 1.63 [2,17]. The non-parametric K-S test is distribution-free and more powerful when the sample size is small [2,17]. This technical characteristic is of particular importance in the case of small samples of epidemics. The K-S statistic is one of the most precise techniques to assess BL reliability in the context of COVID-19 data [17].
Another prevalent technique is the Pearson's chi-square (χ 2 ) goodness-of-fit test with a confirmatory null hypothesis [7,9,19]. It is common knowledge that the χ 2 test is sensitive to the sample size and cannot make reliable inferences when the dataset consists of 5000 observations or more [2,19]. Principally, if the sample size is too big, the null hypothesis will likely be rejected (even if there is no significant difference between the actual and expected subsets). For a small sample size, χ 2 encounters difficulties in measurement as well. The test is computed as follows, cf. Equation (2): As a less sensitive technique to the sample size, researchers apply the d-factor (d*), which is calculated by the Equation (3): where p i and p i are the observed and expected frequencies [2,7,19]. The d* ultimately measures the Euclidian distance between the measured and expected frequencies of leading digits. The d* quantifies the distance between the sample and the cumulative distribution function of the reference dataset after normalization by 1.03606, the maximum possible distance [19]. A d-factor (d * ) equal to 0.0 suggests full conformity to BL, while the highest Euclidian distance, d * = 1.0, signifies full non-conformity. A d * above 0.2 indicates nonconformity with the law [2,7,19].
In this study, we further apply additional tests, i.e., Mean Absolute Deviation (MAD) and Sum of Squared Difference (SSD), which are less dependent on the sample size [20][21][22].
The Equation (4) shows the MAD calculation as the average absolute deviation of observed and expected frequencies: SSD takes the sum of the squares of the deviation of each digit's frequency from the expected frequency of BL [22]. SSD is calculated by the Equation (5): For both the MAD and SSD statistics, k is the number of leading digit bins, O i and E i are the observed and expected Benford's frequencies. A MAD > 0.015 and an SSD > 100 indicate non-conformity with the law [21,22].
In order to be consistent with prior research, we postulate the following null hypothesis for all countries in scope-mainly, H 0 : COVID-19 data from J i adhere to BL, where J i stands for individual jurisdiction in the scope. We operationalize the statistical measures and goodness-of-fit tests based on a 0.05 significance level.

COVID-19 Data Sampling
Consistent with the extant body of knowledge, we desire a sample size of 1000 observations or larger for each country (also referred to jurisdiction or territory), although the minimum threshold for the analysis of the BL compliance is not specified in the body of knowledge. The measurement of Benfordness makes only sense if the COVID-19 samples are not too small. Small datasets cannot detect nonconformities to BL. The quality of the assessment rises if the sample size and range of the underpinning dataset grow.
This study employs the theory based on the largest dataset available from over 182 countries worldwide. We collected a sample from the COVID-19 database by the Centre for Systems Science and Engineering at Johns Hopkins University and Our World in Data [23]. repository; our dataset includes daily observations, such as New Cases, New Deaths, New Tests, and New Vaccinations. By adding new variables to the COVID-19 observed samples, i.e., "New Tests" and "New Vaccinations," (which were technically not available) our study incorporates the most comprehensive dataset on SARS-CoV-2 globally. We collected 216,378 integers, which were reported by 182 countries from 21 January 2020 to 6 June 2021. We use "New Vaccinations" as additional measure, which was technically not reported and available in prior research [6]. We also documented leading digits of 86,917 new cases, 77,137 new deaths, 42,322 new tests, and 10,092 new vaccinations worldwide. Small samples of seven jurisdictions (n < 100), including the Vatican and Tanzania, were removed from our analysis. Small samples cannot detect non-conformity to the law. Our final dataset consists of 176 countries with about 873 COVID-19 related observations on average.

Results
For each country, we evaluated the consistency of the pandemic data's leading digits with BL. We operationalized the goodness-of-fit tests, K-S, χ 2 , d*, MAD, and SSD at a significant level. 32% of countries worldwide showed evident improvements in Benfordness, while 68% demonstrated non-conformity with BL. The results are coded as a global map (see Figure 1). As emphasized in Figure 2, twenty-seven countries (or 15.3%) disclosed data that fully comply with the law (green countries in Figure 1). For these countries, we accept the null hypothesis based on all tests. Twenty-six countries (or 14.8%) did not meet the requirements of BL tests based on all measures (see Figure 3). These jurisdictions failed to pass all goodness-of-fit tests. We rejected the null hypothesis for these jurisdictions, which are highlighted red on our global map. Notably, 123 countries or 69.9% demonstrated partial compliance with BL as they could pass at least one of the goodness-of-fit tests, flagged yellow (see Figure 4). Grey areas were excluded in our analysis as they do not offer a sufficient sample size. Figure 5 summarizes the global distribution of COVID-19 Benfordness.

Results
For each country, we evaluated the consistency of the pandemic data's leading digits with BL. We operationalized the goodness-of-fit tests, K-S, , d*, MAD, and SSD at a significant level. 32% of countries worldwide showed evident improvements in Benfordness, while 68% demonstrated non-conformity with BL. The results are coded as a global map (see Figure 1). As emphasized in Figure 2, twenty-seven countries (or 15.3%) disclosed data that fully comply with the law (green countries in Figure 1). For these countries, we accept the null hypothesis based on all tests. Twenty-six countries (or 14.8%) did not meet the requirements of BL tests based on all measures (see Figure 3). These jurisdictions failed to pass all goodness-of-fit tests. We rejected the null hypothesis for these jurisdictions, which are highlighted red on our global map. Notably, 123 countries or 69.9% demonstrated partial compliance with BL as they could pass at least one of the goodness-of-fit tests, flagged yellow (see Figure 4). Grey areas were excluded in our analysis as they do not offer a sufficient sample size. Figure 5 summarizes the global distribution of COVID-19 Benfordness.          Table 3 summarizes the results of the statistical analyses for all countries. The K-S, MAD, and statistics reveal similar patterns, while Euclidean distance does not always flag non-conformity to BL. According to the K-S statistics, the most substantial evidence for abnormalities regarding BL was found with Bangladesh with : 0.037 and : 0.177 at a 5% confidence level, followed by Myanmar and Kuwait. Based on the test, we specifically recognized Bangladesh with the most notable non-conformity (n: 1332; χ 2 : 177.155; p < 0.000). See Table 4 for the list of abbreviations.   Table 3 summarizes the results of the statistical analyses for all countries. The K-S, MAD, and χ 2 statistics reveal similar patterns, while Euclidean distance does not always flag non-conformity to BL. According to the K-S statistics, the most substantial evidence for abnormalities regarding BL was found with Bangladesh with D n : 0.037 and KS: 0.177 at a 5% confidence level, followed by Myanmar and Kuwait. Based on the χ 2 test, we specifically recognized Bangladesh with the most notable non-conformity (n: 1332; χ 2 : 177.155; p < 0.000). See Table 4 for the list of abbreviations.
In line with prior research [7,16,23], a Euclidean distance above 0.2 can be considered an indicator of BL non-conformity. In this context, Tajikistan showed by far the largest distance from Benford's proportions according to the Euclidean statistic (d * : 0.306) , followed by El Salvador (d * : 0.226) , Nicaragua and Timor (d * : 0.211). Similarly, the very same countries, Tajikistan, Belarus, Nicaragua, and El Salvador, elicited large MAD and SSD statistics. Corresponding to the previous measures with a MAD: 0.082 and an SSD: 1007.9, Tajikistan was ranked one for non-conformity to the law in our assessment. Tajikistan, Belarus, and El Salvador failed to confirm the null hypothesis based on all goodness-of-fit tests (see Table 3). It is noteworthy to emphasize that the null hypothesis can be confirmed for the UK, Spain, Australia, Israel, and Germany, as well as twenty-two nations. According to the K-S statistics, the UK (n: 1358, cut-off : 0.037, K-S: 0.015) and Australia (n: 878, cut-off : 0.046, K-S: 0.021) disclosed the most reliable data worldwide.      Leading digit frequency with 1 2 Leading digit frequency with 2 3 Leading digit frequency with 3 4 Leading digit frequency with 4 5 Leading digit frequency with 5 6 Leading digit frequency with 6 7 Leading digit frequency with 7 8 Leading digit frequency with 8 9 Leading digit frequency with 9

Conclusions
Can we rely on COVID-19 data? To address the central question of this study, we investigated data from 182 countries worldwide. To improve the accuracy of the statistical measurements, we applied five statistical tests. Farhadi applied K-S, chi-square, and d* statistics in a timeframe of nine months [2]; Koch and Okamura used the Kuiper test (a modified K-S test), chi-square, and d* for [4]; Wie and Vellwock operationalized d* statistics for circa eight months [7]. The results confirm that 15.34% of countries do not adhere to BL at all. This is, to some extent, in agreement with prior research [2][3][4][5][6][7][8][9]. Most European countries, except Latvia, Luxembourg, the Netherlands, and Sweden, demonstrated conformity by satisfying at least one of the tests. This is also true for the Americas and APAC regions. Apparently, the US revealed a declining pattern of BL reliability. Remarkably, the UK (n: 1358, d*: 0.022; K-S: 0.015), Australia (n: 878, d*: 0.025, K-S: 0.021), Spain (n: 690, d*: 0.026, K-S: 0.028), and Israel (n: 1335, d*: 0.028, K-S: 0.026) overwhelmingly conform to BL. This group could evidently maintain or improve the quality of COVID-19 incident reports compared to prior research; in a nine-month study of COVID-19 data, Farhadi identified the following measures for the UK (n: 593, d*: 0.076; K-S: 0.051), Australia (n: 497, d*: 0.029, K-S: 0.031), Spain (n: 353, d*: 0.086, K-S: 0.074), and Israel (n: 597, d*: 0.081, K-S: 0.087) [2]. This development goes in line with the organizational learning theory suggesting that proficiency in the fight against pandemics grows by gaining experience in managing pandemics over time. These countries introduced advanced programs to counter the epidemic within their national borders, such as non-pharmaceutical measures (lockdowns, travel bans, and social distancing) or progressive vaccinations. Consistent with Farhadi's results, Germany was able to maintain the BL reliability (n: 911, d*: 0.040; K-S: 0.043 as of June 6, 2021, versus n: 408, d*: 0.032; K-S: 0.026 as of September 22, 2020). The US, which was identified as one of the champions of Benfordness by Farhadi, could partially pass the goodness-of-fit tests. With a non-confirmatory K-S-statistic (n:1389, cut-off: 0.036, K-S: 0.050), the null hypothesis for the US was rejected based on one test in our study. This suggests a decreased Benfordness of the US data over the past nine months from 24 February 2020 to 6 June 2021. Similarly, China conformed to BL in prior studies [2,4,5,24]; according to our results, however, China with 572 observations did not pass three tests applied in our evaluation. Furthermore, 14.8% of the countries did not pass any of the tests applied in this study. Therefore, we rejected the null hypothesis for these cases. The most significant irregularities occurred in Tajikistan, Belarus, Myanmar, Turkey, Bangladesh, and El Salvador. In January 2021, Tajikistan declared itself COVID-19 free, which explains the lack of BL conformity in our assessment [25]. Distressed social-economic conditions can explain the poor reliability of pandemic data in Belarus and Myanmar [26,27]. Experts posit that Bangladesh has seen a decline in COVID-19 incidents as it reduced the number of tests affecting the recorded infection rates [28]. Turkey, which could not pass any goodness-of-fit test in our study, strives for a speedy recovery of its suffering hospitality industry [29]. It is noteworthy that 70% of El Salvador's population work in an informal economy [30]. These distressed countries have been facing major economic, social, or political challenges during the course of the pandemic.
Notably, Johns Hopkins University (JHU) studied national capabilities, procedures, systems, and policies to respond to epidemic challenges across 195 countries [24]. Considering the Global Health Security Index of JHU (GHSI), we recognize consistency in the case of Tajikistan, ranked 144 for early detection and reporting. However, Latvia remains questionable. As a former member of the Soviet Union, it was placed second in the GHSI after the US and Australia; surprisingly and contrary to the JHU's results, Latvia significantly violates BL. This was also confirmed in prior research [2]. Furthermore, JHU identified the highest scores for "early detection and reporting for epidemics of potential international concern" for the countries with significant BL conformity, such as Australia (ranked 2nd), the UK (ranked 6th), Germany (ranked 10th), or Spain (ranked 7th). However, countries with poor BL conformity demonstrated much weaker epidemic reporting capabilities worldwide. We conducted Farhadi's approach [2], and in a further attempt, operationalized Pearson's product-moment correlation analysis to explore the relationship between the GHSI scores and the goodness-of-fit tests. Preliminary analyses were conducted to ensure no violation of assumptions of normality, linearity, and homoscedasticity. We identified a negative partial correlation between the GHSI score and K-S statistic, which suggests correlation at a moderate level (r: −0.27, n: 176, p < 0.0005) that all countries with a high GHSI score provide reliable COVID-19 data. Our results are consistent with prior investigations [2].
Contradictions with the law may be an indicator of data manipulation or the act of creating artificially fabricated data [2][3][4][5][6][7][8][9]. If so, there is a strong need to understand the actual numbers of the death cases. Irregularities with BL may pertain to varying national policies and limited preventing capacity to detect and report COVID-19 incidents; the WHO claimed early on that some nations do not have sufficient access to testing kits [31]. To effectively fight the spread of any disease in the future, we recommend establishing global governance for pandemic measurement [2,32,33]. Having access to reliable data is vital in the fight against disastrous pandemics and submicroscopic infectious organisms.

Future Research
In our study, we identified tangible improvements in the quality of pandemic data, as well as diminishing data reliability in the context of COVID-19 data in several countries worldwide. It is of particular importance to understand the formative indicators and causality of these developments. To address this issue accurately, one shall further examine the economic, political, and social conditions of these jurisdictions and the impact of the COVID-19 pandemic on Benfordness. In the case of the red-flagged countries, we still need to gain a deeper understanding of the domestic healthcare processes and epidemic reporting policies. Based on the key results of this study, the authors recommend reproducing the forensic BL assessment in a later re-examination of the pandemic data. Ensuring BL reliability may be a fundamental step in the fight against epidemics in general, as well as COVID-19 in particular. We further recommend investigating the regressive and correlative relationships of our results with the social-political and economic transformation indices, e.g., the Bertelsmann Transformation Index (or BTI). Through this approach, evidence may be found for a causal relationship between the social-economic and political variables and the phenomenon of BL non-conformity of epidemic data. Ultimately, we hope to have contributed to the combat against SARS-CoV-2 with this paper.

Limitation
The data gathered from all countries in our study are affected by diverging public health systems and data reporting policies. Lack of common policies for reporting on a global basis, particularly in developing countries, may have affected the efficiency and effectiveness of COVID-19 reports.