Modeling COVID-19 Cases Statistically and Evaluating Their Effect on the Economy of Countries

COVID-19 infections have plagued the world and led to deaths with a heavy pneumonia manifestation. The main objective of this investigation is to evaluate the performance of certain economies during the crisis derived from the COVID-19 pandemic. The gross domestic product (GDP) and global health security index (GHSI) of the countries belonging–or not–to the Organization for Economic Cooperation and Development (OECD) are considered. In this paper, statistical models are formulated to study this performance. The models’ specifications include, as the response variable, the GDP variation/growth percentage in 2020, and as the covariates: the COVID-19 disease rate from its start in March 2020 until 31 December 2020; the GHSI of 2019; the countries’ risk by default spreads from July 2019 to May 2020; belongingness or not to the OECD; and the GDP per capita in 2020. We test the heteroscedasticity phenomenon present in the modeling. The variable “COVID-19 cases per million inhabitants” is statistically significant, showing its impact on each country’s economy through the GDP variation. Therefore, we report that COVID-19 cases affect domestic economies, but that OECD membership and other risk factors are also relevant.


Introduction and Review of Literature
The first cases reported through global news of pneumonia derived from an unknown pathogen virus were in December 2019, in Wuhan, China [1,2]. The pathogen was detected as a novel enveloped ribonucleic acid betacoronavirus [3]. This pathogen has been named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [3], also known currently as COVID-19. The virus belongs to the phylogenetic tree of SARS-CoV [4].
The World Health Organization (WHO) declared the COVID-19 outbreak to be a global pandemic. On 27 March 2020, the number of cases exceeded 500,000, and it has continued to rise since that date [5]. Over 170 countries have been affected, with the United States (US) having the most confirmed cases [6]. Through their policies, governments have implemented processes to contain the virus [7,8].
Outbreaks of diseases such as SARS require special attention [9] because their economic, social and financial impact can be potentially significant. Different studies considering belongingness or not to the Organization for Economic Cooperation and Development (OECD) have analyzed the effect of COVID-19 on countries' economies [10][11][12][13]. Some investigations indicate that there are no adequate econometric models to estimate: (i) the probable cost of a pandemic; (ii) the benefits of policies to mitigate the effects of diseases; or (iii) the distribution of disease's costs and benefits within an economy [14]. However, some findings suggest that increases in COVID-19 infected cases and deaths are associated with a significant increase in market illiquidity and volatility [15,16]. Moreover, restrictions and lockdowns imposed by governments for social protection contribute to the deterioration of liquidity and stability of markets.
The COVID-19 pandemic could have a real economic and financial impact, as the gross domestic product (GDP) and hours worked fall by 20% (from trend). The GDP is the most employed measure of the country's global economic activity. It represents the totality of final goods and services produced within a country during a specified time, such as one year. Some studies indicate that it is difficult for countries to recover their economic levels prior to the pandemic crisis [17]. A relationship between US returns, uncertainty, and COVID-19 cases during the first and second waves of the outbreak could be replicated by the rest of the world [18].
To the best of our knowledge, no studies have statistically evaluated the effect of the COVID-19 pandemic on the economy of countries belonging or not to the OECD. Therefore, econometric models are needed to make this evaluation. This affirmation is supported by our review of the literature, which has also allowed us to identify variables included in similar studies, as shown in Table 1. This table reports variables which are linked to our research, such as COVID-19 cases, economic effects and their relationship to COVID-19, the GDP of a country, the belongingness or not to the OECD, and the global health security index (GHSI), linking our study to countries' health indicators. Therefore, the objective of this research is to statistically evaluate the economic effect of the crisis generated by the COVID-19 pandemic on countries. We evaluate the performance of the countries' economies belonging or not to the OECD.
Statistical models are formulated to study the effect of the COVID-19 pandemic on these economies and to test the heteroscedasticity phenomenon. We use, as the response variable, the GDP variation/growth, which is calculated as a percentage over a year. For example, if in the year 2019 a country has a GDP of $M and in the year 2020 has a GDP of $N, then the GDP variation percentage in 2020 is (N/M) 100%. The GDP variation percentage (∆GDP%) indicates the progress of the country's economy. Note that the ∆GDP% is employed instead of the logarithm because the time unit utilized in the measurement refers to one year. In contrast, the logarithmic is commonly used for temporary variations that tend to zero out. The logarithmic approximation is good for small changes, but the metrics can be different for large percentage increases. The covariates (independent variables) that could model the response (dependent) variable already mentioned are the following: (i) The number of COVID-19 cases per million inhabitants, with a negative effect over the ∆GDP% being expected [30,31]; (ii) The GHSI, where a relationship between the health impact of the COVID-19 pandemic and the ∆GDP% is expected; (iii) The country's average risk and its standard deviation (StdDev) with this risk being measured by country default spreads (CDS), where a higher risk StdDev is expected to have a negative impact on the ∆GDP%; (iv) Belongingness of the country to the OECD group, where a positive effect on the ∆GDP% is expected if the country belongs to the OECD; (v) The GDP per capita (GDPpc) in each country, where a higher ∆GDP% is expected for OECD countries when facing the COVID-19 pandemic [32,33].
Thus, in this research, our hypothesis is that the COVID-19 pandemic has had a negative effect on the evolution of a country's economy, measured through the variation in its GDP. Therefore, the contribution of this research is to evaluate, through econometric models, the effect generated by the health crisis derived from the COVID-19 pandemic in the economic performance of the countries belonging or not to the OECD. In summary, a statistical analysis is conducted, considering the ∆GDP% as response variable, and the number of COVID-19 cases per million inhabitants, the GHSI, the belongingness or not to the OECD group, and country risk measures, as the covariates.
The rest of the paper is organized as follows: in Section 2, the statistical model is formulated together with the methodology to be used. Furthermore, in this section, the specification of models and variables is stated. Section 3 reports the results obtained from the statistical analysis when applying the econometric models with and without considering heterogeneity. In this section, some indications and recommendations are also included. In Section 4, we provide the conclusions, discussions, limitations, and ideas for further research of this study.

Material and Methods
This section provides details about the data, the specification of the variables, an exploratory data analysis, and the statistical models.
The number of COVID-19 infected patients (in short "COVID-19 cases" or "cases") was obtained from: https://ourworldindata.org/coronavirus-source-data (accessed on 26 June 2021). The data for each country were collected through the links mentioned above. We considered cases with strong COVID-19 impacts, which was useful for our research. According to the International Monetary Fund, the countries analyzed, and their respective ∆GDP% that corresponds to the annual variation of GDP in 2020, are presented in Table 2. Our study focused on the period between July 2019 and May 2020 of the CDS on government bonds, from execution to five years. The before-mentioned Bloomberg database provided them to measure the state of the economy before the generation of high levels of contagion derived from the COVID-19 pandemic. Although CDS premiums do not capture the exact default risk, the literature has documented that they are considered reliable and among the best default risk measures available [34]. Additionally, the CDS is used in [35] for risk analysis and bond spreads because they are positively correlated with premium risk measures. We also considered data on the health conditions of the countries in the sample from the GHSI for the year 2019 and the number of patients who died from COVID-19 during the peak of the pandemic between March 2020 and 31 December 2020.
where NZ is New Zealand, SA is Saudi Arabia, SK is South Korea, SL is Sri Lanka, and UK is United Kingdom.

Specification of Variables and Data Exploratory Analysis
To conduct the modeling, we used a set of variables and their functional forms described in the recent economic literature [14]. The notation and specification for the variables (namely response Y and covariates X) to be considered in the modeling for the country i are the following: x 1i = Cases i is the value of X 1 related to the disease rate measured by the number of COVID-19 cases at the peak of the pandemic per million inhabitants from its start in March 2020 until 31 December 2020; • x 2i = Healthy i is the value of X 2 associated with the GHSI for 2019; is the value of X 4 , the logarithm of the risk StdDev; • x 5i = OECD i is the value of X 4 , a dichotomous variable for OECD belongingness; • x 6i = GDPpc i is the value of X 6 , which is a control variable linked to the GDPpc; A descriptive summary of the data considered in this research is presented in Table 3, where the linear relationship of the variables is shown in Figure 1. Pearson correlations of the indicated variables are in the upper triangular of this figure; the corresponding scatter plots are in its lower triangular; and the histograms of each variable are in its diagonal. As displayed in Figure 1, we detected possible correlations among the response ∆GDP% and the covariates: Cases, Health, RiskAve, and RiskStdDev, which justified the use of multiple linear regression models. Nevertheless, when analyzing the existing correlations between the covariates of the model, we suspected possible multicollinearity. Thus, to assess collinearity based on Model 3 with constant variance specified in Section 2.4, we calculated the variance inflation factors (VIF) given by: VIF (Cases) = 1.21; VIF (Health) = 3.05; VIF (log(RiskAve)) = 8.24; VIF (log (RiskStdDev)) = 6.60; and VIF (GDPpc) = 2.92. Note that the variance of the estimated coefficient of log (RiskAve) is inflated by a factor of 8.24, making it the largest one. This indicates that log (RiskAve) is correlated with some of the covariates of the model. However, large VIF values, that is, greater than 10, suggest a multicollinearity problem as mentioned in p. 200 of [36] and in [37]. Therefore, no collinearity problems were detected in our case study.  The analysis of outliers provided information regarding unusual values of our response variable conditioned to the set of covariates. In contrast, the analysis of leverages, based on the diagonal elements of the projection matrix, included information on atypical values of the covariates. Figure 2a does not show evident leverage points, based on Model 3 considering heterogeneity, which is supported by the scatterplots. However, a deeper leverage analysis should be further studied with full diagnostic tools, which is beyond the objective of the present investigation. Figure 2b shows residual values that are slightly outside the range (−2.2) and the possible presence of a non-constant error variance. Hence, we detected the presence of outliers that require the use of more sophisticated models and/or robust statistical methods [38,39], which is also beyond the objective of the present investigation. In addition, we detected heterogeneity which was modeled in our study.

The Statistical Models
Based on [40][41][42][43][44][45], we formulated a statistical model to describe the relationship between the response variable and covariates mentioned above, stated as: where x i corresponds to the vector of observed values of the covariates X above defined; and β = (β 0 , β 1 , β 2 , β 3 , β 4 , β 5 , β 6 ) T is a vector of regression parameters to be estimated. In addition, in the model formulated in Equation (1), i is the error term, which is assumed to be Gaussian distributed, centered on zero, and independent of the other errors. Similarly, j is the term of the error for the country j which is different from the country i. Note that the variance of the error, represented by σ 2 i in Equation (1), is assumed to depend on covariates X, with x i being the vector of observed covariates for country i and α = (α 0 , α 1 , α 2 , α 3 , α 4 , α 5 , α 6 ) T is a vector of regression parameters associated with these covariates.
We used maximum likelihood (ML) and generalized least squares (GLS) methods to estimate the model parameters, namely β and α, following the methodology described in [46] and compared the robustness of the methods with the ordinary least squares (OLS) estimator. The ML and GLS estimators were more efficient compared to the OLS estimator when the heteroscedasticity was correctly specified. In addition, it is possible to utilize the generalized method of moments (GMM) to estimate the parameters. Nevertheless, for small samples, GMM estimators are biased [47][48][49]. In [48], the author showed that the ML estimator had the highest efficiency in comparison to the unweighted GMM and optimally weighted GMM estimators (based on an unrestricted simple estimator). Note that, only in the case of restricted normality, the GMM estimator performs better. Therefore, due to its good properties, we decided to employ the ML estimator to perform the statistical modeling. Specifically, the ML estimation is more efficient when assuming a suitable model specification and in cases of small samples [50].

Specification of Models
Based on the specification for the response formulated in Equation (1), we propose the following three models assuming (a) normally distributed errors, (b) independency of the model errors, and (c) heterogeneity or homogeneity of variances for these errors: (1), and (i) Y i is the GDP variation percentage in the country i; (ii) x 1i is the number of COVID-19 cases per million inhabitants in the country i; (iii) x 2i is the GHSI in the country i; (iv) x 3i is the logarithm of the risk average measured through CDS in the country i; (v) x 4i is the logarithm of the risk standard StdDev through CDS in the country i.

[Model 2]
with i, j as indicated above, and (i) Y i is the GDP variation percentage in the country i; (ii) x 1i is the number of COVID-19 cases per million inhabitants in the country i; (iii) x 2i is the GHSI in the country i; (iv) x 3i is the logarithm of the risk average measured through CDS in the country i; (v) x 4i is the logarithm of the risk StdDev in the country i; (vi) x 5i is an indicator of OECD belongingness in the country i.

[Model 3]
where i ∼ N 0, σ 2 i , Cov( i , j ) = 0, for i = j, with i, j as indicated above, and (i) Y i is the GDP variation percentage in the country i; (ii) x 1i is the number of COVID-19 cases per million inhabitants in the country i; (iii) x 2i is the GHSI in the country i; (iv) x 3i is the logarithm of the risk average measured through CDS in the country i; (v) x 4i is the logarithm of the risk StdDev in the country i; (vi) x 5i is an indicator of OECD belongingness in the country i; (vii) x 6i is the GDP per capita in the country i.
As mentioned, we assumed: (a) Normally distributed errors, that is, i ∼ N 0, σ 2 i ; (b) Independency of the model errors, that is, under normality, we have Cov( i , j ) = 0, for i = j = 1, . . . , n, with i, j as indicated below Equation (1); (c) Homogeneity or heterogeneity of variances for these errors. The homogeneity of variances assumption indicates that σ 2 i = σ 2 , for all i = 1, . . . , n, that is, we are only modeling the mean of Y i because σ 2 i is assumed as constant (homogeneous). However, if σ 2 i is not constant (heterogeneous) by i = 1, . . . , n, we must also model it as where x li is the covariate l associated with the country i, with l = 1, 2, 3, 4, 5, 6.

Results
In this section, we describe the econometric models and provide the numerical results of the statistical analysis when applying these models with and without considering heterogeneity. Table 4 reports the results of regression analyses for the three models without considering heterogeneity. From this table, for all models, we can observe that:

•
The COVID-19 cases per million inhabitants are significant statistically, indicating the effect that this variable has on each country's economy in this study; • In general, by increasing one case per million inhabitants, under the condition that all the remaining covariates remain fixed, we harm the GDP of 0.0026%, or 2.6%, per 1000 infected per million inhabitants on average; • The results of the proposed models show an R 2 of 12.6% on average, implying a low level of regression adjustment and suggesting that another type of model specification is required for better characterization. Table 4. Parameter estimate, (t-statistic, p-value, with its significance in parenthesis), and indicators of the listed model with COVID-19 economic data without considering heterogeneity.

Statistical Analysis under Heterogeneity
Next, we studied the heteroscedasticity of the regression model by using the Breusch-Pagan test, which indicated that:

•
The null hypothesis of homoscedasticity was rejected for Models 1 and 2 at a 10% significance; • In the case of Model 3, the homoscedasticity hypothesis was not rejected. OECD belongingness showed a negative effect, under the condition that all the remaining covariates remain fixed, implying that the economy of OECD countries was more strongly impacted when facing the pandemic phenomenon. However, it is not significant for our model.

•
Nevertheless, countries with a higher logarithm of the risk StdDev were negatively impacted, suggesting the existence of other mechanisms affecting the economy of countries when facing problems due to the global pandemic. • For both models, the variables that were not significant are log(RiskAve) and Health.

Conclusions and Future Research
In this research, the impact that the COVID-19 pandemic has had on countries' economies was evaluated, showing an average negative result of 3% in the GDP for every 1000 people per million inhabitants. In addition, we were able to identify that the more developed countries, considered as those belonging to the OECD, did not suffer a higher impact on their GDP. Therefore, there is no statistically significant effect that connects belongingness to the OECD and the pandemic phenomenon on the economies of the countries. In addition, it seems that information about the pandemic is artificially added to the global story presented and affects the model only through the number of positive cases. However, in principle, developed countries have a higher capacity to respond to a sanitary emergency. This suggests to us the existence of other mechanisms that affect the economic performance of these countries, which are not directly attributable to the health impact derived from the COVID-19 pandemic. Otherwise, countries with a higher logarithm of the risk standard deviation are negatively affected.
We should mention some limitations to this study related to data collection and outlying observations. The databases only provided information for the period between July 2019 and May 2020. For future research, it is necessary to update the data set. In addition, the utilization of robust methods to outliers [38] for the estimation of model parameters is also of empirical interest.
An issue to be further studied is the efficiency of the variables in the securities markets [51,52]. Moreover, generalizations to multivariate models [53], incorporation of temporal [54], spatial [55] and quantile [55,56] regression structures in the modeling, as well as errors in variables [57] and PLS regression [37], should also be considered using the variables: GHSI, country risk, and OECD membership as relevant in relation to countries' economies.
The use of global and local influence methods [58], as well as leverage, to diagnose atypical observations are needed. These methods are an appropriate step to employ in all statistical/econometric modeling. Furthermore, there exists a potential use in machine, deep, and statistical learning models [39,[59][60][61]. Future research will consider other variables and relationships, such as production linkages and employment effects of the COVID-19 pandemic phenomena. Moreover, it would be interesting to consider models with interaction terms connecting the number of COVID-19 cases with other covariates. Another way around would be to compare the model proposed in the present investigation with a similar model estimated with data from a pre-pandemic period.
Consequently, the methodology introduced in this study proposes challenges and offers open theoretical and numerical issues that deserve to be further analyzed. Research on these issues is in progress, and their results will be reported in future publications.

Data Availability Statement:
The data used to support the findings of this study can be secured from the websites indicated in Section 2.1 or under request from the authors.