This section provides the main results obtained from our data analysis, incorporating the discussion of the corresponding most important findings.
3.1. Descriptive Analysis
Figure 4 shows that besides the expected high positive correlation between
people_vaccinated_per_hundred and
people_fully_vaccinated_per_hundred,
aged_70_older and
median_age,
life_expectancy, and
median_age, there were other interesting patterns in terms of bivariate correlations. In addition to the high positive correlations between
human_development_index and variables used to create that index (
gdp_per_capita,
life_exprectancy, and others), there was a highly positive correlation between
human_development_index and
total_cases_per_million (0.69). Given that more developed GPEs had a higher median age and higher income [
16], this positive correlation values suggest that more developed GPEs tend to report more cases than other GPEs. However, the correlations between
human_development_index and
total_deaths_per_million (0.50) and between
human_development_index and
people_vaccinated_per_hundred (0.82) indicate that although developed GPEs tend to report more cases, they also have proportionally fewer deaths, possibly due to their higher vaccination rates. The positive correlation between
total_cases_per_million and both
people_vaccinated_per_hundred and
people_fully_vaccinated_per_hundred (around 0.50) seems to indicate that more cases are associated with higher vaccination rates. However, this high correlation reinforces the idea that GPEs with less vaccinated people are also less likely to report cases correctly and accurately, with smaller COVID-19 tests being conducted.
The relatively high correlation between
people_vaccinated_per_hundred and
people_fully_vaccinated_per_hundred with
median_age,
aged_70_older,
gdp_per_capita, and
human_development_index, with values between 0.56 and 0.84, also points to a direct relationship between the vaccination rate and the development level of the GPE. A visualization of the percentage of the vaccinated population and percentage of people over 70 years old versus the total of deaths per million of the population can be seen in
Figure 5. As shown, there is a clear contrast between the top, middle, and bottom of this visual representation. At the top, we can find primarily developed GPEs, as can be asserted by the percentage of the population over 70 years old in those GPEs. These GPEs, at the top of
Figure 5, present the higher vaccination rates but, in general, not so many deaths as the GPEs in the middle of the figure, which have lower vaccination rates than the ones at the top and tend to present more deaths. The vaccination efficacy may explain this tendency. The bottom of the figure is composed mainly of less developed GPEs with very low vaccination rates. Conversely, these were also GPEs with smaller numbers of deaths. This fact may be yet one more indication that these GPEs are not enforcing an adequate COVID-19 monitorization and reporting policy.
The abyss between the percentage of the population vaccinated between GPEs can be confirmed in
Figure 6. While over 30 GPEs had vaccinated less than 20% of their population, over 35 GPEs had vaccinated more than 70%.
Another demonstration of the vaccination effect can be seen in
Figure 7, which illustrates the daily evolution of the pandemic by plotting the seven-day moving average of daily deaths versus the seven-day moving average of the percentage of the vaccinated population. Since plotting this information for all the GPEs under study would not produce an interpretable visualization, we decided to show here only six particular GPEs: Israel, Great Britain, Portugal, Russia, Spain, and the USA. These GPEs were chosen due to their development level and the start of vaccination similarity. As distinctly seen, over time, as a higher percentage of the GPEs’ population is vaccinated, the number of deaths tends to decrease or stabilize, particularly when the rate of the vaccination reaches values above 60% of the population.
As illustrated in
Figure 7, the relationship between vaccination rates and lives saved is not linear and can also depend on the vaccines being provided to the population. There seems to be a minimum threshold of around 20% for the vaccination rates to be converted into significant death decreases, followed by a rapid decrease of deaths per capita and then a relatively stable situation below five daily deaths per million people.
The resulting clustering model was not balanced in terms of the number of GPEs in each cluster. While cluster A was composed of 37 GPEs, cluster B was composed of 69 and cluster C of 53 GPEs.
The analysis of the mean values of the different variables per cluster, as detailed in
Table 2, shows that there may be three distinct clusters of GPE. In cluster A, we find the GPEs where COVID-19 had a higher reported health impact, with more deaths per cases (higher
death_ratio). This cluster comprises mainly less developed GPEs, as seen in the variable
humand_development_index. These were the GPEs that implemented less restrictive healthcare measures (
stringency_index_med). This application of less restrictive measures could also be related to stronger economic needs. This lack of economic capacity could also explain the lower vaccination percentage in this cluster (9.6%). As seen in
Figure 8, the GPEs of cluster A were primarily from Africa and the Middle East.
In contrast, we have cluster B, where COVID-19 had a smaller impact in terms of deaths by cases. Cluster B is formed by the higher developed GPEs. As shown in
Figure 8, Cluster B is composed predominantly of European, North and South American, richer Asian, and Oceanian GPEs. Lastly, in cluster C, we find the “not-so-developed” GPEs. These GPEs had a higher number of deaths per cases. Still, much inferior to the impact found in cluster A. Geographically, as shown in
Figure 8, these are primarily GPEs from Latin America, north Africa, and Asia. Reversely to other indicators, the stringency index in cluster C is higher than in clusters A and B, thus suggesting that since GPEs in this cluster did not have the vaccination capability of GPEs of cluster B, they may have opted for higher levels of public health restrictions.
When comparing the probability of dying in the case of contracting the virus before the vaccination programs rollout (30 November 2020), or in other words, the odds of dying from COVID-19 in the case of contracting the virus, as presented in
Table 3, it was between 1.8% and 2.8% across the clusters. However, that probability was substantially reduced after vaccination. Before vaccination, people from the GPEs of cluster B, higher developed GPEs, and as such, with an older population less capable of surviving the disease, had a probability of dying of 2.72%. In cluster A that probability was 2.04%, and in cluster B of 1.74%. Notwithstanding, after vaccination, cluster B turned from being the cluster with the highest probability of dying to being the one with the lowest (1.36%). This decrease means that in the one year of vaccination, the probability of dying in case of testing positive decreased 0.32 percentual points (pp) in cluster A, 1.36 pp in cluster B, and 0.18 pp in cluster C. The odds ratio shows that in cluster B, the cluster with higher vaccination rates, there was a 50.7% decrease in the odds of dying compared to the same day in the previous year. However, in cluster C, the cluster with the second-highest vaccination rate, the decrease was only 11.1%. In cluster A, the cluster of GPEs with the lowest vaccination rates, the decrease of the odds of dying was only 16.3%. These results emphasize the impact of vaccination in reducing the number of deaths.
The difference between clusters is even more evident when analyzing the average deaths by cases (
death_ratio) by the average vaccinated percentage of the population per week (
Figure 9). While in cluster B, it is possible to see a pattern where the increase in vaccination resulted in a decrease in the
death_ratio; the opposite happened in cluster C. As vaccination increased,
death_ratio also increased. In cluster A, there seemed to also be some sort of discontinuity in the
death_ratio time profile evolution. These observations seem to show once more that only when values above 20% of the population vaccinated were reached did there emerge a stable pattern of saving lives, leading to values below 1.75 deaths per 100 cases of COVID-19.
One possible explanation for the different trends in the two clusters with a higher percentage of vaccinated people (clusters B and C) could be the types of vaccines that were mainly administrated in each country. However, as shown in
Figure 10, due to the limitations and types of available data, inference on the efficiency of the different types of vaccines is hard to make. This limitation makes this particular topic something that may be studied in more detail as part of future work and further analysis. For instance, the data now available only includes the number of doses administrated. Since some vaccines were of a single dose, it is expected for this representation of such vaccines over others to be underrated. Secondly, most GPEs that provided data by vaccine manufacturers were from cluster B, many of which were from the European Union; therefore, having followed somewhat more similar vaccination policies.
Notwithstanding these limitations, it is possible to see in
Figure 10 that Bulgaria and Romania, two European GPEs from cluster C, are among the GPEs with higher death ratios after the vaccination started, despite their low vaccination rates (compared to the other GPEs). Since the distribution of vaccines by manufacturers in Bulgaria and Romania was not much different from other European Union members, the higher death ratio seemed to be related to the lower vaccination rate of these GPEs. Among the GPEs represented in
Figure 10, the ones that show a clear, distinct pattern of vaccination by the manufacturer are Chile, Ecuador, Hungary, and Peru. All of these GPEs are from cluster B. Except for Chile, the remaining three GPEs are among the top five countries with a higher death ratio. A lower vaccination rate could explain this high death ratio. However, that is not the case. Despite having higher death rates, Ecuador, Hungary, and Peru are between these 33 GPEs the 19th, 9th, and 10th in terms of lower vaccination rates, respectively. The development level of these GPEs may also have contributed to the higher death ratio values that were found. Nevertheless, since Ecuador, Hungary, and Peru administered some types of vaccines that the remaining GPEs did not, this raises the question of the possible different effectiveness real-life performances of some of the vaccines, something that may deserve additional research work to be conducted in the future when more public domain data becomes available in this regard.
3.2. Regression Model
The global fitted regression model of the ratio of deaths by cases was the following:
This overall regression did not present very interesting statistically significant results. However, regression models built for each cluster produced significantly improved statistical models as expected, and shown below:
As depicted in
Figure 9, the dissimilarity among clusters does justify the impossibility of building a good single global regression model to explain deaths per case (1). However, as also suggested by
Figure 9, statistically significant models can help understand the power of vaccination in reducing the number of cases by cluster, particularly in clusters B and C.
To further study by simulation from the above models the impact of vaccination on saving lives, we applied the regression models (3) to the respective clusters in the week of 28 November 2021, considering a scenario with the vaccination variable having an increase of 5%. Therefore, simulating that in the week between 21 and 28 November, it would have been possible to increase vaccination rates by 5%. As shown in
Table 4, it would then have been possible to save around half a million lives in cluster B GPEs, the ones where significant vaccine rates have already been achieved.
Vaccination does have a significant impact and potential for saving lives, as illustrated above, but it is not the only factor that increases the probability of not dying from COVID-19. When we analyze examples of GPEs from the different clusters (
Table 5,
Table 6 and
Table 7,
Figure 11,
Figure 12 and
Figure 13), distinctive behaviors in the three clusters can be found. There was a high variance in the death ratio in cluster A, independently of HDI and vaccination rate. In cluster A, the weekly profiles of death ratio by vaccination rate were very erratic and GPE specific, as shown in
Figure 11. In cluster B, even though the cluster was composed of GPEs with a wide range of HDI, there seems to be a pattern over time of decrease of the death ratio as the vaccination rate increases (see
Figure 9 and
Figure 12). This pattern, as previously mentioned in
Figure 7, seems to be more robust when vaccination rates over 60% are reached. However, there are also exceptions, such as Bhutan and Cambodia, two of the less developed GPEs in cluster B. Cluster C weekly results are indeed the stranger, even when looking at some examples with different HDI and vaccination rates (
Figure 12). Most GPEs in cluster C did not show a decrease in the death ratio, despite the increase of the vaccination rates. This situation shows that there seems to be a minimum threshold value of vaccination rates to make visible its statistical impacts on deaths and on saving lives.