Pandemic Growth and Benfordness: Empirical Evidence from 176 Countries Worldwide

: In the battle against the Coronavirus, over 190 territories and countries independently work on one end goal: to stop the pandemic growth. In this context, a tidal wave of data has emerged since the beginning of the COVID-19 crisis. Extant research shows that the pandemic data are partially reliable. Only a small group of nations publishes reliable records on COVID-19 incidents. We collected global data from 176 countries and explored the causal relationship between average growth ratios and progress in the reliability of pandemic data. Furthermore, we replicated and operationalized the results of prior studies regarding the conformity of COVID-19 data to Benford’s law. Our outcomes conﬁrm that the average growth rates of new cases in the ﬁrst nine months of the Coronavirus pandemic explain improvement or deterioration in Benfordness and thus reliability of COVID-19 data. We found signiﬁcant evidence for the notion that nonconformity to BL rises by the growth of new cases in the initial phases of outbreaks.


Introduction
In January 2020, the World Health Organization (WHO) confirmed the first cases of Coronavirus, also known as COVID-19 or SARS-CoV-2 in Wuhan City, China [1]. With millions of incidents and deaths to date, a tidal wave of data on COVID-19 has emerged. Since the outbreak of the virus, countries unanimously reported two metrics, "new cases" (individuals testing positive for the virus) and "new deaths" (the daily number of deaths) [2].
Having access to reliable data is vital. Policymakers use statistics to make life-saving decisions on restricting interventions colloquially known as lockdowns, travel bans, and social distancing. Similarly, scientists use pandemic data to detect the characteristics of the germ and respond accordingly.
There have been irregularities in Coronavirus data. Several forensic studies emerged, inter alia, Koch and Okamura [2], Idrovo and Manrique-Hernández [3], Wei and Vellwock [4]. Lee et al. [5], and Isea [6]. Table 1 summarizes the prior research into the COVID-19 data. Jackson and Sambridge [7] evaluated Coronavirus data from 51 countries from 16 January 2020 to 9 April 2020. In one of the most comprehensive studies, Farhadi examined over 100.000 integers from 154 countries [8] with the primary outcome that approximately 28% of countries published reliable data on Coronavirus spread. In contrast, six countries disclosed entirely inconsistent records in the first nine months of the global pandemic. In a further step, Farhadi and Lahooti [9] investigated the "progress of Benfordness" across 182 countries from 21 January 2020 to 6 June 2021. They used prior results to explain observed improvements in COVID-19 reliability. The dataset, as well as goodness of fit tests, were further extended to inspect the reliability of over 200,000 integers. Evidence was found that approximately 32% of nations worldwide accomplished measurable progress in Benfordness, while 68.2% showed no explicit improvement. The same results All studies stated unanimously applied "the law of the first digits," also known as Benford's law (BL), a generally repeated forensic technique for detecting fraudulent data in academia and the business world. The core idea relates to the frequency of leading digits in naturally generated datasets by Equation (1) [10]: According to BL, the leading digits one to nine follow a particular logarithmic distribution: 30.1% for one, 17.6% for two, 12.5% for three, 9.7% for four, 7.9% for five, 6.7% for six, 5.8% for seven, 5.1% for eight, and 4.6% for nine [10,11]; see Table 2. In an artificially generated data set, the observed frequencies are very likely to deviate from the BL distribution. Benfordness can be used on data with a geometrical tendency and is characterized by the absence of minima and maxima. BL is commonly used and has been widely applied in several disciplines such as finance and accounting [12,13], politics [14], and epidemiology [2][3][4][5][6][7][8][9].
Previous research on COVID-19 has three major shortcomings. First and foremost, it operationalized inconsistent data sets that were commonly limited to varying and frequently smaller sample sizes. In particular, it also utilized a variety of statistical techniques to assess Benfordness. Finally, due to their narrow scope, which was typically limited to forensic analysis of COVID-19 data, earlier studies did not elucidate the determinants of BL non-compliance.
A better understanding of changes in Benfordness is crucial. If the pandemic data are only partly trustworthy, as found in previous studies, then restrictive measures such as social distancing, lockdowns, or travel bans are ineffective interventions. For this reason, it is imperative to gain insights into why some countries are improving compliance with the BL while others are only partially complying or, in the worst case, not complying at all. This paper is therefore concerned with identifying the key drivers of irregularities and improvements in BL compliance.

Pandemic Growth
The new Coronavirus follows the simplest model with lumped parameters known as the autonomous logistic sigmoid function [15,16]; see Equation (2): where N is the total number of people affected by the epidemic, N∞ represents the entire population (or the maximum number of incidents) and r is the growth rate of the epidemic [16]. Infectious disease spreading in its full range is limited to the entire population or N∞ . The severe acute respiratory syndrome (SARS) outbreak of 2002 [17,18] kept out about 8096 cases in 29 territories, and the Spanish flu of 1918, killing an estimated 50-100 million people [19,20]. Accordingly, the number of daily incidents is proportional to the number of existing cases, as underlined in Equation (3): where, N 0 : number of new cases on the first day of the pandemic; N d : number of new instances on a given day; E: average number of people exposed to new infection on a given day; p: the probability of each exposure becoming an infection; d: number of days since the beginning of the pandemic [20]. Logistic growth of the new Coronavirus can only be stopped when either E or p declines, an ineluctable fact since viruses are subject to growth limitations at some point, for instance after herd immunity has occurred or when vaccination programs have been initiated. In consequence, even in the worst-case scenario in which COVID-19 spreads widely, a group of people may no longer be exposed to the virus; in other words, they will be immunized [20]. This group is expected to increase to N∞ over time. The number of new daily cases progressively grows on the sigmoid curve before hitting the inflection point, following constantly rising slopes. In contrast, the number of incidents digressively grows to obey to constantly decreasing slopes after passing the inflection point. One of the critical statistics to monitor pandemics is thus the growth ratio (g d ), see Equation (4): where N d is the number of new cases on a particular day, N d−1 is the same number on the previous day. The "growth ratio" (sometimes referred to "growth factor") rests consistently above one (g d > 1) on the exponential part of the sigmoid curve and before reaching the curve's inflection point. At the inflection point, the slope equals one (g d = 1) It further falls below one (g d < 1) after moving away from the inflection point. This is why g d can have both progressive and degressive growth patterns. Logic suggests that interventions and restricting policies, such as lockdowns or social distancing, turn progressive growth into degressive growth. Suppose countries publish average growth ratios below or equal to one. If so, it can be concluded that the spreading of infectious diseases is growing at a slower pace. Ma postulated that "the initial exponential growth rate of an epidemic is an important measure of the severeness of the epidemic and is also closely related to the basic reproduction number" [18].
Another important concept in our view is the volatility of the pandemic's progressive or degressive growth. We denote the volatility as the fluctuation in the daily growth of new cases as measured by the standard division of the average growth ratios; see Equation (5): where δ stands for standard deviation (or volatility), N is the number of observations, g d i represents the growth ratio on a particular day. Expectedly, fluctuation in growth ratios occurs when daily incidents show sudden upward and downward movements. This may pertain to delayed reporting, e.g., when COVID-19 reports are conducted with a time lag. Larger volatility may signal inconsistent testing and reporting capabilities.
To illustrate the impact of growth ratios and volatility, we have compiled Table 3 and Figure 1, including the results of the most notable cases in our study. Table 4 summarizes the correlations between all variables. In previous studies, Tajikistan, Belarus, Bangladesh, Iran, and Turkey [9] were the major BL non-compliant countries [8,9]. These countries have striking statistical characteristics: our evaluation showed nonconformity to BL, low average growth rates, and low turnover in new cases for these countries. Unpredictably, Tajikistan proclaimed itself to be COVID-19-free in early 2021 [21]. Belarus faced ongoing political unrest and mass protests, which increased the risk of infections [22]. Bangladesh acknowledged an instant decline in Coronavirus cases with no reasonable explanations [23]. The Government of Turkey pushed for a revision of the epidemic guidelines to discourage reporting new SARS-CoV-2 cases [24]. In contrast, the conforming cases countries that conform to the BL-have substantially greater levels of growth and volatility ratios. One of the world's most transparent and well-developed public health systems, Australia had a more realistic growth ratio of 1.26 and a volatility of 151%. Australia consistently adhered to Benfordness throughout the period and even showed progress in disseminating reliable data. We grouped these diverse jurisdictions with analogous characteristics; see Figure 1. Prominent BL representatives calmly disclosed daily growth rates that averaged close to 1.0. Compliant countries such as Israel, Australia, and Germany showed much higher average growth ratios and volatilities. We believe that that this growth could drive change in Benfordness.  Table 4. Correlation of Benfordness, periodic growth ratios, and volatility.
We, therefore, hypothesize that epidemic growth ratios increase the distance from the expected Benford frequencies: Hypothesis (H1). Pandemic growth increases the distance to Benford's law.

Hypothesis (H2).
Average growth ratio in the early stage determines the future pandemic growth.

Hypothesis Testing
The structural equation modeling (SEM) technique was utilized to examine the hypotheses in our research. We used the partial least squares (PLS) method to support the explanatory research [25] and to analyze both structural and measurement models. We chose SmartPLS software to further explore the predictive power of the theoretical framework [26]. Our objective has been to uncover the growing complexity of Coronavirus growth and its causal interrelationship with the epidemic data reliability. Explanatory research is beneficial when theory has not yet been established [26][27][28]. By employing PLS-SEM, we benefited from the high statistical power of the method [29,30]. As an alternative to covariance-based methods, SEM-PLS facilitates variance-based structural equation modeling. PLS is especially suitable for early phases of research when the phenomenon is new, and there are no theories already in place. PLS approach is especially appropriate for predictive studies [31].
To assess a change in Benfordness, we incorporated the results of earlier research and subsequently analyzed two phases of the Coronavirus pandemic data. Correspondingly, the first and second phases included 31 December 2020 to 24 September 2020 and 25 September 2020 to 6 June 2021. The simultaneous system is composed of two endogenous and one exogenous construct, as shown in Figure 2, including the latent variables "Benfordness Change," "Growth Phase One," and "Growth Phase Two." Each of the exogenous constructs, growth phase one and growth phase two, comprises one indicator, i.e., "average growth rate phase one" (or AGRP1) for the period 31 December 2019 to 24 September 2020, and "average growth rate phase two" (or AGRP2) for the period 25 September 2020 to 6 June 2020. The "Change in Benfordness" involves three reflective indicators, BL changes captured by Chi-square (CHI-Delta), K-S statistics (KS-Delta), and d* or d-factor (d-Delta). These stats were used in prior studies [2][3][4][5][6][7][8][9]. We regard these items as reflective indicators of the endogenous construct "Change in Benfordness" as they conceptually measure the same phenomenon. According to Jarvis, MacKenzie, and Podsakoff [25], the causality direction for reflective constructs runs from the construct to the item; a change in the indicator values will not change the construct.

COVID-19 Data Sampling
We collected data from the COVID-19 database of the Centre for Systems Science and Engineering at Johns Hopkins University. Our sample consisted of 87,011 integers on daily new cases and 77,236 on daily new deaths reported worldwide between 21 January 2020 to 6 June 2021. We purposefully excluded other variables, such as new deaths, new tests, and new vaccinations, since these items can be moderated and influenced by domestic public health systems and policies in different regions or countries. We focused on new cases only to capture and study the pandemic's logistic growth curve between 21 January 2020 to 24 September 2020 or phase one, within 248 days and 25 September 2020 to 6 June 2021 or phase two, including 255 days.
On average, each country provided 344.69 observations in the first phase of our study. Logic suggests that countries with a smaller population or more limited health care capabilities may have yielded narrower data sets on the COVID-19 spread. However, both the average growth ratio of the logistic curve and the statistical tests for BL conformity depend on the sample size. A small sample size adversely affects the statistical testing and measurement of pandemic growth.
Thus, we focused additionally on those states with comparable and significant sample sizes over the average of 344.69 observations in the first phase. This led to an additive sample of 102 states that clearly met the specifications for acceptable sample size in SEM-PLS [6]. A frequently used methodology for estimating the minimum sample size in PLS-SEM is the ten-fold rule of thumb (28), stating that the minimum sample size should be more than ten times the maximum number of inner or outer model linkages to latent variables. All the 102 countries supplied over 570 observations in the second phase.
Prior research already measured and provided countries' distance to BL by applying multiple statistical tests. The variables commonly used in previous studies were Kolmogorov-Smirnov statistic, Chi-square (χ 2 ) goodness-of-fit test, and Euclidean distance [2][3][4][5][6][7][8][9][10]. To be consistent with prior research, we operationalized the same variables to assess the distance to Benford's frequencies. For this reason, we replicated the results of previous studies [9] and calculated the changes in COVID-19 goodness of fit tests, i.e., , where τ A and τ B are countries' goodness of fit tests for the first and second phases. A ∆ < 1 signifies countable progress in Benfordness, while a ∆ > 1 suggests the opposite, a worsening development. To exclude the undesired effect of serial correlation, we conducted the Durbin Watson test, resulting in an acceptable value of d = 1.975 for the variables AGRP1 and AGRP1. The rule of thumb suggests that if (1.5 < d < 2.5) is true, then autocorrelation is not a cause for concern. Therefore, we computed the day-to-day growth ratios for all the countries reported in earlier reports (8)(9). We then calculated the average growth rates on a country-by-country basis and initialized the periodic average growth rates as follows (see Equation (6)): where n is the number of daily growth ratios per country. We applied the Monte Carlo Simulation of the pandemic logistic growth and conformity to BL based on the Chi-square goodness of fit test. Our simulation included 1000 iteration based on the average daily new cases of the countries included in the randomly selected cases of countries. We observed that 49% of randomly created cases had a growth ratio larger than one; on average, 52% of simulated cases violated the threshold for BL conformity. See Table 5 for the variables and Table 6 for the aggregate results of changes in Benfordness.

Results
We tested Hypotheses H1 and H2 using the samples of 176 and 102 countries and confirmed the explanatory power of the model in the sample. By simplifying the reflexive indicators and the endogenous construct change in Benfordness, we found a significant improvement in the out-of-sample measures and the predictive power of the PLS-SEM. Cases with missing values were excluded using the listwise deletion procedure. In this procedure, each row containing a missing value was deleted. Only the remainder of the sample is used.

Measurement Model
First, we examined the robustness of all constructs. All item loadings exceeded the threshold of 0.708 and confirmed over 50% of the variance of their indicators, as suggested in the literature [28][29][30][31]. For the sample of 102 countries, PLS signaled a KS-delta (0.699) and a d-delta (0.683), indicating that both item loadings may be removed in the context of the smaller sample. Single-item constructs attained high values of 1.000 for the items AGRP1 and AGRP2. The use of single-item constructs is an acceptable practice in PLS-SEM. The very significant T-statistics for each item loading reaffirmed the relevance of the convergent validity. The factor loading for each item on its respective construct was highly significant (p < 0.0001), as evidenced by the T-statistics. See Table 7-outer loadings. Second, we checked internal consistency reliability, mainly by using Jöreskog's [32]. Larger values between 0.70 and 0.90 are generally considered satisfactory to good. For constructs with more than one item, a value greater than 0.95 is problematic because it indicates that the items are redundant, which negatively impairs the construct validity. Our single-item constructs scored high (1.000) and met all the above requirements, meaning that these constructs were completely (100 percent) dependent on one item.
Third, convergent validity is concerned with the extent to which the construct explains the variance in its reflective items. As one of the building blocks of PLS model evaluation, and consistent with the guidelines of Fornell and Larcker [12], we determined the average variance extracted (AVE) to have a threshold of 0.50 or higher, implying that the construct explains at least 50 percent of the variance in its reflective items [26,33]. See Table 8: Measurement Model Evaluation, Changes in Benfordness. Fourth, we also tested discriminant validity by applying the Heterotrait-Monotrait ratio (HTMT). The HTMT is defined as the mean of item correlations across constructs relative to the (geometric) mean of the average correlations for items measuring the same construct. Problems with discriminant validity are present when HTMT values are high [26] with a threshold of 0.90 for structural models. According to Rönkkö and Cho, the HTMT is indeed a new application of the parallel reliability coefficient [26]. High HTMT coefficients would indicate a problem of discriminant validity. The HTMT criterion outperforms classical approaches to assess discriminant validity, such as the Fornell-Larcker criterion and (partial) cross-loadings, which are widely unable to detect a lack of discriminant validity [34]. Our analysis confirms that the HTMT coefficients of all constructs did not exceed the recommended threshold of 0.90. See Table 9: discriminant validity by Heterotrait-Monotrait ratio (HTMT).

Structural Model
Before evaluating the structural model, multicollinearity was screened using the variance inflation factors (VIF), which specify the degree to which the items are inflated [26].
The VIF values should be close to 3 and below. Multicollinearity is induced by a high correlation between the independent variables and would affect the statistical power of the coefficients and weaken the reliability of the estimated p values. Our model based on the samples put into effect is not affected by highly correlated predictor variables. See Table 10: collinearity statistics VIF. Following the satisfactory evaluation of the measurement model, we evaluated the structural model against four standard criteria, including (a) the coefficient of determination R 2 , (b) the path coefficients, and (c) the out-of-sample predictive power [35] To test the predictive accuracy of the PLS path model, the coefficient of determination R 2 in the context of the study was performed. As a general guideline, Hair et al. recommended the following thresholds: substantial (equal to or above 0.75), moderate (close to 0.50), and weak (less than 0.25) results (28). In our explanatory research, the endogenous constructs "change in Benfordness" and "growth rate phase two" achieved R 2 : 0.370 and 0.388 for the sample of 176 countries and R 2 : 0.590 and 0.437 for the sample of 102 countries [26]. The R 2 indicates only the explanatory power of the model within the sample. These findings can be attributed to the statistical power of the smaller sample, including those countries with a sufficient number of COVID-19 records. See Table 11: coefficient of determination R 2 . All path coefficients are positive (i.e., in the expected direction) and statistically significant (at p < 0.05). To model the interaction effects, we followed Chin et al. [33]. The interaction terms were expressed by multiplying the corresponding indicators of the predictor and moderator constructs. We also adhered to their recommended hierarchical process to construct and compare the models with and without the respective interacting constructs. Table 12 shows the results of the structural model with interaction effects for both samples with 176 and 102 countries.
Further assessment of the PLS model for predictability, as suggested by Shmueli et al. [35] and Hair et al. [26], affirmed moderate out-of-sample predictive power for our results. The actual SmartPLS software algorithm allowed us to retrieve k-fold cross-validated prediction error and prediction error summary statistics, the root mean square error (RMSE), to evaluate the predictive performance of their PLS pathway model. Independent from the PLS, the linear regression (LM) model offers prediction errors. In the LM approach, each exogenous indicator variable is regressed on every endogenous indicator variable to generate predictions. Thus, a side-by-side comparison with the PLS-SEM and LM outcomes indicates whether the use of a theoretically grounded path model improves (or at least does not worsen) the predictive performance of the indicators at hand. In this study, the RMSE showed a lower prediction error for the reflective indicators Chi-square Delta and K-S Delta than the LM as recommended by the literature. The changes in Benfordness captured in this study had out-of-sample predictive power. Table 13 includes all results on the out-of-sample prediction power assessment. Based on the evaluation of the structural and measurement models, we can confirm the two hypotheses: H1, pandemic growth reduces Benfordness, and H2, pandemic growth in the initial phase determines future growth.

Findings
Initial exponential growth in the first nine months of the global pandemic explains the overall progress in line with BL and the future development of the pandemic.
We face an emerging question: why it is that the initial growth can explain changes in Benfordness? Logic suggests that BL non-compliance worsens when local authorities are confronted with uncontrolled epidemic growth in the early stages of the pandemic. These decision-makers may have lowered the number of COVID-19 incidents, which would ultimately result in a reverse development of Benfordness.
Our results confirm that the leading violators of the BL law showed similar behaviors in previous studies. Notably, Belarus and Iran-the top BL law violators-demonstrated the widest distance from BL frequencies in previous studies, based on at least one of the statistical tests. COVID-19 has exacerbated existing and, in some cases, deep-rooted political, economic, social, and security problems in those countries. Many of the challenges have troubled social cohesion in these countries, such as in Iraq as reported by the United Nations (https://reliefweb.int/report/iraq/impact-covid-19-social-cohesion-iraq; last accessed on 11 September 2021). On 16 March 2020, Alexander Lukashenko, the president of Belarus, denied the threat of Coronavirus. He called for people to work in the fields and drive tractors to overcome the pandemic: "You just have to work, especially now, in a village. There the tractor will heal everyone. The fields will cure everyone." [36]. Ali Khamenei, the Supreme Leader of the Islamic Republic, downplayed the threat of the Coronavirus, banned vaccines from the United States and the United Kingdom, and expelled Médecins Sans Frontières, who provided pro bono health services to Iranians. The British BBC news channel reported in early August 2021 that the numbers of deaths and new cases in Iran were nearly triple and double the official figures, though [37]. The pandemic emerged at a time when public faith in the clerical regime was at a low ebb. Iran was amid economic turmoil, and regular protests were erupting throughout the country.
The vast majority of the countries showing significant BL improvements, such as Israel, the United Kingdom, or Australia, had an average growth ratio greater than or equal to 1.00 during the first nine months of the pandemic. According to the Johns Hopkins University Global Health Risk Index (GHRI) [38], these countries have vibrant public health capabilities to adequately react to epidemic outbreaks. The correlation analysis between the GHRI scores and the change in Benfordness showed no statistically significant relationship in this context. We conclude that higher GHRI scores do explain changes in BL conformity. Not unexpectedly, the observed cases of BL compliance, such as the United Kingdom, Israel, Australia, and Germany, adopted advanced vaccination programs and even performed a seminal role in cutting-edge research into COVID-19 vaccine development. Complementing previous research, our findings addressed the critical determinants of data reliability [2][3][4][5][6][7][8][9]. Figure 3 shows all countries based on their changes in Benfordness as measured by the metrics stated. Russia, Switzerland, and Iraq have demonstrated notable reverse development in BL conformity based on ∆KS, ∆d, and ∆Chi. Overall, the statistical tests conducted in this study support both proposed hypotheses. The moderate predictive power of the out-of-sample changes in Benfordness strongly suggested the potential applicability of the propounded theory in any future cases of the pandemic disease data. Given the evidence provided by our research, policymakers should give due consideration to the pandemic growth trajectory in countries affected by infectious diseases. To implement effective public policies to decelerate the outbreak, policymakers and scientists need to scrutinize epidemic data regarding anomalies in the logistic growth rates. Inconsistent growth ratios from outlier territories might signal poor conformity of pandemic data to BL.

Limitation
Notable cases in our study indicate that improvement in Benfordness depends on logistic growth in the initial phases of the pandemics. According to BL, we did not identify any qualitative factors that led to a larger distance from the expected frequency of the leading digits. In addition, we did not explore the different variants of COVID-19, i.e., alpha, beta, gamma, and delta, with particular attention to their transmissibility, possibly leading to the progression of the Coronavirus outbreak.