BCG Vaccination and Mortality of COVID-19 across 173 Countries: An Ecological Study

Ecological studies have suggested fewer COVID-19 morbidities and mortalities in Bacillus Calmette–Guérin (BCG)-vaccinated countries than BCG-non-vaccinated countries. However, these studies obtained data during the early phase of the pandemic and did not adjust for potential confounders, including PCR-test numbers per population (PCR-tests). Currently—more than four months after declaration of the pandemic—the BCG-hypothesis needs reexamining. An ecological study was conducted by obtaining data of 61 factors in 173 countries, including BCG vaccine coverage (%), using morbidity and mortality as outcomes, obtained from open resources. ‘Urban population (%)’ and ‘insufficient physical activity (%)’ in each country was positively associated with morbidity, but not mortality, after adjustment for PCR-tests. On the other hand, recent BCG vaccine coverage (%) was negatively associated with mortality, but not morbidity, even with adjustment for percentage of the population ≥ 60 years of age, morbidity, PCR-tests and other factors. The results of this study generated a hypothesis that a national BCG vaccination program seems to be associated with reduced mortality of COVID-19, although this needs to be further examined and proved by randomized clinical trials.


Introduction
Currently, more than four months since declaration of the coronavirus disease 2019 (COVID- 19) as a pandemic by the World Health Organization (WHO) on March 11th, 2020, more than 14 million people have been infected with the virus and more than half a million have died worldwide. Marked differences in COVID-19 mortalities have been observed in different countries. For example, the mortality per million population is till now several tens of times or even higher in Western countries, e.g., Belgium (845), the United Kingdom (UK, 664), the United States of America (USA, 426) and Germany (109), than in Asian countries, e.g., India (19), Japan (8) and China (3), as of 17 July 2020. This is quite the opposite of what was reported during the 1918-20 influenza pandemic, the so called Spanish flu, in which the population mortality was over 30-fold higher, with excess death rates observed in low-income countries, such as India, than in high-income countries, such as those in the West [1]. Higher morbidities and mortalities may be explained by easy access to diagnostic polymerase chain reaction tests (PCR-tests) for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in these Western countries. In contrast, they may be underestimated in middle-and low-income countries due to low capacities of PCR-testing and poor access to health care. Alternatively, there is growing concern that racial and ethnic minority, as well as socioeconomic and biological factors may influence the high morbidity and mortality. A retrospective study of integrated health care systems suggested that Black race compared with White race, increasing age, male sex, having a greater burden of comorbidities, type of public insurance, residence in a low-income area and obesity were associated with increased odds of hospital admission [2]. On the other hand risk of in-hospital mortality increased only in aged patients and was not associated with the Black race, sex, comorbidities, obesity and other factors after multivariate adjustment [2]; this phenomenon was also confirmed in other races, i.e., Asians and Hispanics, compared with the White race [3,4]. Moreover, since COVID-19 is an infectious disease that spreads mainly through the droplet route by close contact in dense human societies, metropolitan areas, such as New York City in the USA [5] and Lombardy in Italy [6], Paris in France [7], Sao Paulo in Brazil [8], and so on, have tended to be regional epicenters. However, associations between population dynamics, e.g., population size, density, migrants and urbanization and the morbidity/mortality of COVID- 19 have not yet been well examined.
In addition to the factors listed above, several scientists proposed a hypothesis [9][10][11][12] that Bacillus Calmette-Guérin (BCG) vaccination has preventive effects not only against tuberculosis, i.e., the target disease of the vaccine, but also other non-specific infectious diseases, i.e., off-target diseases such as COVID-19, through epigenetic mechanisms [13][14][15]. In fact, ecological studies have suggested that both COVID-19 morbidities and mortalities were less in BCG-vaccinated countries than in BCG-non-vaccinated countries [16][17][18]. However, because these studies only obtained data during the early phase of the pandemic, by which time the disease load had escalated in Western countries, but not yet escalated in low-and middle-income countries where BCG is given at birth and did not adjust for any potential confounders, including PCR-test number per million population, the BCG-hypothesis needs to be reexamined now, more than four months after declaration of the pandemic, since the number of PCR-confirmed COVID-19 cases are still growing in many countries. We therefore aimed to explore whether recent BCG vaccine coverage is associated with COVID-19 morbidity and/or mortality rates, using linear regression models to explore associations between the two continuous random variables adjusted for a variety of potential confounders, such as median age and body mass index (BMI) in individual countries through this ecological study.

Ethics Statement
Institutional review board approval for this work was not sought due to the use of publicly available data obtained from open resources.

Study Design
This ecological study compared population data of each country.

Data Resources
All data were obtained from open resources: Information on total number of cases, total number of deaths and number of PCR-tests performed per million population were obtained from 'Coronavirus Update' [19], data regarding population dynamics were obtained from 'World Population Clock' [20], socioeconomic covariates were obtained from 'the United Nations database' [21], BCG vaccine coverage data were obtained from 'The BCG World Atlas' [22] and 'The global health observatory' on the WHO homepage [18] and other health related data were from the same WHO site [23]. Definitions of each of the covariates are described in the Supplement. Only countries that had data of both total deaths and BCG vaccine coverage were included for analyses in this study.

Outcomes and PCR-Tests
The outcomes evaluated in this study were COVID-19 morbidity, i.e., total COVID-19 cases per million population and COVID-19 mortality, i.e., total COVID-19-related deaths per million population, in each country, obtained from 'Worldometer COVID-19 Data' on July 17th, 2020 [19]. Data on the total number of PCR-tests performed per million population were simultaneously obtained from the same 'Worldometer COVID-19 Data' website. PCR-test positivity was simply calculated as the total COVID-19 case number divided by the total number of PCR-tests performed in each country.

BCG Vaccine Coverage
Recent BCG vaccine coverage was defined as the percentage of the vaccinated population among one-year-olds in each country (World Health Data Platform, the World Health Observatory, BCG-Immunization coverage estimates) [23]. For countries that have already stopped a national BCG vaccine immunization program [22], which includes a significant number of countries, their BCG vaccine coverage rate was counted as zero percent in this study.

Linear Regression Models
For preprocessed data, outcomes, i.e., morbidity and mortality per million population were transformed to the common logarithm (log10) to adjust for normality of the distribution, which was verified by means of kurtosis tests. When the number of total deaths was zero, these were changed to 0.01 per million population, because zero cannot be transformed to the common logarithm. Variance inflation factor (VIF) was used to detect the presence of multicollinearity. Only one variable among biologically similar variables, e.g., 'median age (years)', '≥ 60 years of age (%)' and '≥ 70 years of age', was selected to maximize adjusted R 2 in the multi-linear regression models to avoid the influence of collinearity. If the variance inflation factor (VIF) of certain covariates was more than 10, then the covariates were avoided in multivariate analyses because of a collinearity issue. Multiple linear regression models were used to screen potential risk or preventive factors associated with morbidity by adjusting for PCR-test numbers transformed to the common logarithm (log10) and those associated with mortality were screened by adjusting for PCR-test numbers and morbidity per million population. Considering type I error due to a multiple testing, the significance level of alfa was set as p < 0.001. Then, all the screened factors were assessed in a multi-linear regression model to determine significant factors with p < 0.05 as the cutoff point. Each model was evaluated by adjusted R 2 as a coefficient of determination. Pearson's correlation coefficient for variables with normal distributions or Spearman's rank correlation for variables with non-normal distributions, represented as rho, was used to quantify the strengths of associations between morbidity, mortality and significant factors determined by the final models, as: absolute value of rho ≥ 0.5: very strong; rho ≥ 0.4: strong; 0.4 > rho ≥ 0.2: moderate; and rho < 0.2: weak associations. Data were analyzed using Stata version 14.0 software (StataCorp LP, College Station, TX, USA).

Variability of COVID-19 Morbidity and Mortality across 173 Countries
A total of 173 countries that had data of both total COVID-19 deaths and BCG vaccine coverage were included for analyses in this study. The 20 countries with highest and lowest COVID-19 morbidities (Table 1) and mortalities (Table 2), as well as their PCR-test rate per million population are shown below. Marked differences in morbidities and mortalities were observed among these countries, ranging from 1 (Papua New Guinea) to 37,566 (Qatar) and from 0 (Vietnam, etc.) to 845 (Belgium), respectively. Six and thirteen countries that do not have a BCG national vaccine program at present (indicated with bold and mark " ") were included in the 20 countries with the highest COVID-19 morbidity and mortality, respectively.  Histograms of morbidities and mortalities were drawn as normal density plots ( Figure 1). Although the histograms of morbidity and mortality were skewed to the right, they followed a normal distribution by transformation with the common logarithm (log10).

Associations between Morbidity, Mortality and PCR-Tests per Million Population
First, associations represented by rho and VIF between morbidity (n = 173), mortality (n = 173) and PCR-tests (n = 155) are shown ( Figure 2A). These three variables were predicted to have very strong and positive associations with each other. However, VIFs were less than 2 among these three factors. Multicollinearity is considered to be present when the VIF is higher than 5 to 10 [24]. Thus, any variable with a VIF < 5.0 was considered for inclusion in multiple linear regression analyses. Considering the number of PCR-tests per million population may exhibit associations with morbidity, adjustment was performed for the PCR-test number in every analysis when screening for the risk factors of COVID-19 morbidity per million population ( Figure 2B). Considering morbidities and the number of PCR-tests per million population may exhibit associations with mortality, adjustment was performed for the morbidity and PCR-test number in every analysis when screening for the risk factors of COVID-19 mortality per million population ( Figure 2C).

Associations between Morbidity, Mortality and PCR-Tests per Million Population
First, associations represented by rho and VIF between morbidity (n = 173), mortality (n = 173) and PCR-tests (n = 155) are shown (Figure 2A). These three variables were predicted to have very strong and positive associations with each other. However, VIFs were less than 2 among these three factors. Multicollinearity is considered to be present when the VIF is higher than 5 to 10 [24]. Thus, any variable with a VIF < 5.0 was considered for inclusion in multiple linear regression analyses. Considering the number of PCR-tests per million population may exhibit associations with morbidity, adjustment was performed for the PCR-test number in every analysis when screening for the risk factors of COVID-19 morbidity per million population ( Figure 2B). Considering morbidities and the number of PCR-tests per million population may exhibit associations with mortality, adjustment was performed for the morbidity and PCR-test number in every analysis when screening for the risk factors of COVID-19 mortality per million population ( Figure 2C). Associations between morbidity, mortality and PCR-tests. Either Pearson's correlation coefficient or Spearman's rank correlation was applied to calculate rho; (B) associations between morbidity and risk factors were adjusted for PCR-tests per million population (log10); (C) associations between mortality and risk factors were adjusted for morbidity (log10) and PCR-tests (log10) per million population.
Since fewer PCR-tests may underestimate morbidity and mortality, the association between mortality as the outcome and morbidity as the exposure was adjusted for number of PCR-tests performed (Table 3). In this multiple regression analysis, higher morbidity was associated with higher mortality, whereas more PCR-tests were associated with lower mortality. Evaluation of the association between PCR-test positivity and mortality, shown as a scatter plot, indicated a very strong association between them (rho = 0.54) (Figure 3). Countries with higher PCR- Associations between morbidity, mortality and PCR-tests. Either Pearson's correlation coefficient or Spearman's rank correlation was applied to calculate rho; (B) associations between morbidity and risk factors were adjusted for PCR-tests per million population (log10); (C) associations between mortality and risk factors were adjusted for morbidity (log10) and PCR-tests (log10) per million population.
Since fewer PCR-tests may underestimate morbidity and mortality, the association between mortality as the outcome and morbidity as the exposure was adjusted for number of PCR-tests performed (Table 3). In this multiple regression analysis, higher morbidity was associated with higher mortality, whereas more PCR-tests were associated with lower mortality. Evaluation of the association between PCR-test positivity and mortality, shown as a scatter plot, indicated a very strong association between them (rho = 0.54) (Figure 3). Countries with higher PCR-positivity rates tended to have higher mortality rates. PCR-test positivity rates of countries where no deaths due to COVID-19 were observed were less than 3.5%. Minimum positivity and no deaths were observed in Vietnam.

Screening Factors Associated with Morbidity
Among the 59 covariates, plus BCG vaccine coverage and PCR-testing rate, i.e., a total of 61 factors, 'urban population' and 'insufficient physical activity' were significantly (p < 0.001) associated with morbidity after adjustment for PCR-test rate (Table 4). Next, these two significant factors were used in a multi-linear regression model to eliminate confounding ( Table 5). As a result, 'urban population' (p = 0.02) and 'insufficient physical activity' (p = 0.01) remained significant factors associated with COVID-19 morbidity, even after adjustment for PCR-tests. The adjusted R 2 was 0.5037.

Screening Factors Associated with Morbidity
Among the 59 covariates, plus BCG vaccine coverage and PCR-testing rate, i.e., a total of 61 factors, 'urban population' and 'insufficient physical activity' were significantly (p < 0.001) associated with morbidity after adjustment for PCR-test rate (Table 4). Next, these two significant factors were used in a multi-linear regression model to eliminate confounding ( Table 5). As a result, 'urban population' (p = 0.02) and 'insufficient physical activity' (p = 0.01) remained significant factors associated with COVID-19 morbidity, even after adjustment for PCR-tests. The adjusted R 2 was 0.5037.  The association between 'urban population' and morbidity, demonstrated below as a scatter plot, showed a very strong association (rho = 0.55) (Figure 4).  The association between 'urban population' and morbidity, demonstrated below as a scatter plot, showed a very strong association (rho = 0.55) (Figure 4). rates. COVID-19-related morbidity rates per million population on July 17, 2020 were transformed to the common logarithm (log10) in the graph. Since the variable of 'urban population' showed a nonnormal distribution, Spearman's rank correlation was applied to calculate rho, to quantify the strength of the association. Countries that had never had or that had stopped a national program of BCG vaccination are indicated in red, while countries that currently follow a national BCG vaccine program are indicated in black. Selected country names are shown using three-letter country codes.
The association between 'insufficient physical activity' and morbidity, demonstrated below as a scatter plot, also showed a very strong association (rho = 0.52) ( Figure 5). rates. COVID-19-related morbidity rates per million population on 17 July 2020 were transformed to the common logarithm (log10) in the graph. Since the variable of 'urban population' showed a non-normal distribution, Spearman's rank correlation was applied to calculate rho, to quantify the strength of the association. Countries that had never had or that had stopped a national program of BCG vaccination are indicated in red, while countries that currently follow a national BCG vaccine program are indicated in black. Selected country names are shown using three-letter country codes.
The association between 'insufficient physical activity' and morbidity, demonstrated below as a scatter plot, also showed a very strong association (rho = 0.52) ( Figure 5).
COVID-19-related morbidity rates per million population on July 17th were transformed to the common logarithm (log10) in the graph. Countries that had never had or that had stopped a national program of BCG vaccination are indicated in red, while countries that currently follow a national BCG vaccine program are indicated in black. Selected country's name was shown using three-letter country codes. COVID-19-related morbidity rates per million population on July 17th were transformed to the common logarithm (log10) in the graph. Countries that had never had or that had stopped a national program of BCG vaccination are indicated in red, while countries that currently follow a national BCG vaccine program are indicated in black. Selected country's name was shown using three-letter country codes.

Screening factors Associated with Mortality
Among the 58 covariates evaluated, plus BCG vaccine coverage, adjusted for morbidity and PCR-tests, age-related factors, i.e., median age, ≥60 years of age (%) and ≥70 years of age (%), were significantly (p < 0.001) associated with mortality (Table 4). Since these three age-related factors had collinearity for mortality, '≥ 60 years of age' was selected to maximize adjusted R 2 of the multi-linear regression models. Moreover, 'BCG vaccine coverage', 'Elevated total cholesterol levels' and 'Life expectancy at 60 years of age', were also significant (p < 0.001) factors associated with mortality. Next, these significant factors were used in a multi-linear regression model to eliminate confounding ( Table 6). As a result, '≥60 years of age' (p < 0.001) and 'BCG vaccine coverage' (p = 0.002) remained significant factors associated with COVID-19 mortality, even after adjustment for morbidity and PCR-tests. The adjusted R 2 was 0.8254.

Screening Factors Associated with Mortality
Among the 58 covariates evaluated, plus BCG vaccine coverage, adjusted for morbidity and PCR-tests, age-related factors, i.e., median age, ≥60 years of age (%) and ≥70 years of age (%), were significantly (p < 0.001) associated with mortality (Table 4). Since these three age-related factors had collinearity for mortality, '≥ 60 years of age' was selected to maximize adjusted R 2 of the multi-linear regression models. Moreover, 'BCG vaccine coverage', 'Elevated total cholesterol levels' and 'Life expectancy at 60 years of age', were also significant (p < 0.001) factors associated with mortality. Next, these significant factors were used in a multi-linear regression model to eliminate confounding ( Table 6). As a result, '≥60 years of age' (p < 0.001) and 'BCG vaccine coverage' (p = 0.002) remained significant factors associated with COVID-19 mortality, even after adjustment for morbidity and PCR-tests. The adjusted R 2 was 0.8254. Evaluation of the association between 'median age' and mortality, demonstrated as a scatter plot, showed a very strong association between these two variables (rho = 0.54) ( Figure 6). Countries with a larger population ≥ 60 years of age (%) showed a tendency toward a higher mortality rate.
Finally, evaluation of the association between 'BCG vaccine coverage' and mortality, demonstrated as a scatter plot, indicated a moderately negative association (rho = −0.29) (Figure 7). Countries with higher BCG vaccine coverage showed a tendency toward lower mortality. Additionally, COVID-19 mortality rates did not have significant associations with BCG strain, e.g., Tokyo 172. Moreover, there were no significant associations between mortality and either the year of stopping or introducing a national BCG vaccine program (data not shown).
Evaluation of the association between 'median age' and mortality, demonstrated as a scatter Figure 6. Scatter plot showing the association between ≥60 years of age (%) of the population and mortality rate. Mortalities per million population on July 17, 2020 transformed to the common logarithm (log10) are presented in the graph. Since the variable of 'percentage of population ≥ 60 years of age (%)' showed a non-normal distribution, Spearman's rank correlation was applied to calculate rho, to quantify the strength of the association. Countries that had never had or that had stopped their national BCG vaccine program are indicated in red, while countries that currently follow a national BCG vaccine program are indicated in black. Selected country names are shown using three-letter country codes.
Finally, evaluation of the association between 'BCG vaccine coverage' and mortality, demonstrated as a scatter plot, indicated a moderately negative association (rho = -0.29) (Figure 7). Countries with higher BCG vaccine coverage showed a tendency toward lower mortality. Additionally, COVID-19 mortality rates did not have significant associations with BCG strain, e.g., Tokyo 172. Moreover, there were no significant associations between mortality and either the year of stopping or introducing a national BCG vaccine program (data not shown). Figure 6. Scatter plot showing the association between ≥60 years of age (%) of the population and mortality rate. Mortalities per million population on 17 July 2020 transformed to the common logarithm (log10) are presented in the graph. Since the variable of 'percentage of population ≥ 60 years of age (%)' showed a non-normal distribution, Spearman's rank correlation was applied to calculate rho, to quantify the strength of the association. Countries that had never had or that had stopped their national BCG vaccine program are indicated in red, while countries that currently follow a national BCG vaccine program are indicated in black. Selected country names are shown using three-letter country codes.

Discussion
Among the variety of parameters abstracted from open resources, 'BCG vaccine coverage' had a significant association with COVID-19 mortality, even after adjusting for morbidity, PCR-tests, age, universal health coverage, numbers of medical doctors, elevated total cholesterol and healthy life expectancy. On the other hand, BCG vaccination was not associated with COVID-19 morbidity. The Mortalities per million population on 17 July 2020 are transformed to the common logarithm (log10) in the graph. Since the variable of 'BCG vaccine coverage' showed a non-normal distribution, Spearman's rank correlation was applied to calculate rho, to quantify the strength of the association. Selected country names are shown using three-letter country codes.

Discussion
Among the variety of parameters abstracted from open resources, 'BCG vaccine coverage' had a significant association with COVID-19 mortality, even after adjusting for morbidity, PCR-tests, age, universal health coverage, numbers of medical doctors, elevated total cholesterol and healthy life expectancy. On the other hand, BCG vaccination was not associated with COVID-19 morbidity.
The main results of this study are consistent with a very recent article demonstrating that every 10% increase in the BCG index was associated with a 10% reduction in COVID-19 mortality [25]. Moreover, a retrospective cohort study suggested that BCG-vaccinated individuals were less likely to require hospital admission during the disease course [26]. In contrast to BCG, coverage of the measles vaccine, which is also considered to induce heterologous protection against infections through long-term boosting of innate immune responses [9], showed no association with the morbidity and mortality of COVID-19, which was also consistent with a very recent article showing a significant low risk of COVID-19 mortality in countries with higher BCG vaccine coverage, but not with measles vaccine coverage [27].
Moreover, SARS-CoV-2 is a single-stranded positive-sense RNA virus and the BCG vaccine has been shown in controlled trials to reduce the severity of infections by other viruses with such a structure [9]. For example, the BCG vaccine reduced yellow fever vaccine viremia by 71% in volunteers in the Netherlands [28]. However, some countries with a current national BCG vaccination policy have high mortality rates. Plausible reasons for this discrepancy may be: (1) low coverage of BCG vaccination in these countries (% coverage-mortality per million population), e.g., Ireland (18%-354); Portugal (32%-165); and Greece (50%-19): (2) late introduction of BCG vaccine program (year of introduction-mortality per million population), e.g., Iran, (1984-162): (3) oral delivery of the BCG vaccine, e.g., Brazil retained oral delivery of the vaccine until 1977: (4) UK administered the vaccine to older children (12 to 13 years of age) and (5) connection with endemic country by land, e.g., Mexico, Panama, Peru and Chile.
Higher morbidity, but fewer PCR-tests, were associated with higher mortality. Moreover, higher PCR-test positivity was associated with higher mortality, which was consistent with the report by Hisaka et al. [29]. Expanding application of PCR-tests not only to typical symptomatic cases, but also to mild or asymptomatic cases and to those who had close contact with patients, may decrease the PCR-test positivity rate. Thus, enhancing the capacity of PCR-testing may enable identification of cases, so that appropriate measures can be taken to prevent them spreading SARS-CoV-2 to others at home, at their work place or at events of mass gatherings.
In this study, the covariate of 'urban population' and 'insufficient physical inactivity' had a strong and positive association with morbidity, but not with mortality. People living in urban areas tend to have close contact with a greater number of people per day than those in rural areas, independent of age. A report on 4103 patients with COVID-19 in New York City found that obesity, which may strongly depend on the balance between physical activity and diet, was one of the clinical features leading to hospital admission [30]. On the other hand, older age was associated with higher mortality, which is consistent with previous articles [2][3][4]. From this study, the risk factors for morbidity seem to be different from those associated with mortality, suggesting that factors related to susceptibility may be different from those related to disease severity.
There are several limitations to this study. First, although we selected 61 covariates in this study, we did not evaluate range and timing of non-pharmaceutical interventions, e.g., school closures, workplace closures, cancellation of public events, restrictions on public gatherings, stay-at-home restrictions, restrictions on internal movement, international travel controls, etc., all of which would also have had significant effects on COVID-19-related morbidity and mortality. Therefore, the present study results are burdened with an extreme error. Second, the study design was ecological. Therefore, the outcome of this work should be considered highly limited, with a potential risk of high bias. Consequently, only the hypothesis that BCG vaccination mitigates COVID-19 mortality can be proposed here; cohort or case-control studies and randomized clinical trials, similar to the BCG-CORONA study [31], are required to test this hypothesis. Third, the COVID-19 pandemic is still ongoing, although we have confirmed the results using the latest data. However, the results may be different a few months from now. Fourth, it is clear that an extremely large number of covariates, i.e., 61, were selected for the limited number of 173 countries. This sample size could allow use of a maximum of 10 covariates in multiple regression analysis. Therefore, significant (p < 0.001) variables were at first screened after adjustment for PCR-tests and morbidity (Table 4). Then, multivariable linear regression using the screened variables were performed after adjustment for PCR-tests and morbidity (Table 5 for morbidity  and Table 6 for mortality). Fifth, mortality in each country should be compared with excess deaths whether these two do not make a big difference. Sixth, the definition of COVID-19 cases may differ by country, e.g., including PCR confirmed, but asymptomatic cases and pneumonia cases with negative PCR-tests. Seventh, there were still residual confounders even after adjustment for PCR-tests.

Conclusions
Our results suggest the hypothesis that greater BCG vaccine coverage may reduce the risk of deaths due to COVID-19, which needs to be further studied by observational studies and confirmed by randomized clinical trials.