1. Introduction
The coronavirus disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and first reported in December 2019 in Wuhan, China, has become a global health concern [
1,
2]. Declared as a pandemic on 11 March 2020, the disease has severely hit the world [
2]. For example, as of 2 August 2021, there had been 199,051,292 total cases, with 4,241,236 deaths (
https://www.worldometers.info/coronavirus/, accessed on 8 August 2022). The incidence of the disease is, however, non-uniform across the globe. In Africa, the incidence of COVID-19 is relatively low, with 6,799,806 cases, including 171,445 deaths, as of 2 August 2021, i.e., 3.4% and 4.04% of total cases and deaths, respectively, yet the continent accounts for 17.2% of the world population. The reasons for such a contrast have interested scientists, especially considering the low-quality health care systems in African countries [
1,
3,
4]. Furthermore, patterns of COVID-19 cases and deaths across African countries show high geographical disparities. For example, while about 180,000 cases per million people and about 800 deaths per million people were reported in Seychelles, fewer than 100 cases and 10 deaths per million people were noted in countries such as Benin, Angola, Guinea, etc. Why such spatial disparities, and which factors explain such patterns? Understanding the factors explaining the spatial heterogeneity in COVID-19 incidence in Africa is essential to inform public health policymakers who are aiming to better control the pandemic [
2] and ensure effective preparedness for future epidemics.
There is evidence that environmental and socio-demographic factors may act in synergy or antagonistically with climate factors to exacerbate or lessen the severity of infectious disease transmission and fatality [
5]. For example, migrants, either nationals or internationals, especially those involved in low-income jobs, are among the most vulnerable to death and infection by SARS-CoV-2 [
6]. Population age structure was also suggested as a determinant in controlling COVID-19 deaths and spreading [
7]; the high numbers of COVID-19 deaths and cases in Italy were linked with the demographic structure of the country (median age = 46 years) [
6]. In Europe, more than 95 percent of people who died due to COVID-19 were 60+ years old (WHO, 2020), which has been suggested to explain the low number of cases and deaths due to COVID-19 in Africa, where the median age is 19 years. Income factors (e.g., median household income, median household income percent, Gini coefficient) were significantly associated with COVID-19 cases and deaths [
6,
7]. Similarly, the potential role of weather and climate in COVID-19 morbidity and mortality has been highlighted by several studies, some arguing for a negative correlation between ambient temperature and humidity and the number of COVID-19 cases/deaths and others the absence of any correlation or even a positive one [
8].
Studies have been carried out to model and predict the dynamics of the pandemic [
9]. Several sought to understand environmental (climate and pollution), socio-demographic, and socio-economic correlates of the spatial heterogeneity in COVID-19 incidences and deaths in Europe, the United States of America (USA), and Asia either on country, prefecture, or county scales [
2,
6,
10,
11]. In such studies, geography, which includes spatial locations and characteristics of the spatial determinants, was shown to play a crucial role in the early outbreak and transmission of the virus across scales [
2,
12]. For instance, the spatial variability and clustered patterns of COVID-19 cases and deaths in many countries showed a strong spatial dependency on confounding factors [
13]. This indicates the need to understand spatial effects such as spatial autocorrelation, spatial stationarity, and heterogeneity in modelling COVID-19 morbidity and mortality and their correlates.
Comparatively, only a few studies have been carried out in Africa regarding these issues. The few attempts to obtain such insights in Africa (see [
7,
14]) have not explicitly considered spatial autocorrelation and spatial stationarity in the modelling and thus are potentially misleading. For example, Bouba et al. [
7] used OLS regression on COVID-19 cases and deaths from 14 February 2020 to 4 February 2021 (first waves) to explore their relationship with 34 covariates (epidemiological, socio-demographic, climatic, environmental, and economic-financial) across 54 African countries. Similarly, Tamasiga et al. [
15] used multivariate linear regression and a few predictors (seven demographic and income predictors) across 40 sub-Saharan countries to understand factors affecting COVID-19 cases and deaths based on the data from Janaury 2020 to March 2023. The authors obtained models with an
equal to 69% and 63%, indicating that substantial variations in COVID-19 cases and deaths are yet to be explained. Furthermore, Su et al. [
14] conducted a global analysis (178 countries, including African countries) of the influence of socio-ecological factors on COVID-19 risk. The study considered 28 socio-ecological and demographic variables. All these studies used OLS regression or simple Generalised Poisson models. One of the shortcomings in using OLS or simple generalised models to model the incidence and deaths due to COVID-19 across countries is the ignorance of the spatial patterns that may exist in the incidence and deaths of COVID-19 [
6,
14]. For example, due to the proximity of some countries, they may show similar patterns, which may be confounded by other factors when this spatial relationship is not explicitly tested and considered [
6,
14]. Among the rare studies that used spatial regression is [
16], where the authors used data from the first and second waves of COVID-19 (until May 2021) from 47 countries. The authors explored three linear spatial regression models, namely the spatial lag, spatial error, and spatial autoregressive condition (SAC) models and found that COVID-19 prevalence in an African country was highly dependent on that of neighbouring African countries as well as its economic wealth, transparency, and proportion of the population aged 65 or older. However, in this study, the authors ignored countries’ COVID-19 testing capacity and used only a few predictors (six), excluding, for example, climate and population migration, which are also important correlates of COVID-19 dynamics.
Our study aims to improve the assessment of the impact of socio-economic, environmental, and demographic parameters on the spread of COVID-19 cases and deaths across African countries by adopting spatial-regression-based approaches and using the most updated statistics on COVID-19 cases and deaths. The objective was to assess the socio-ecological patterns of the COVID-19 spatial dynamics in Africa. Specifically, the study sought to (i) map the spatial heterogeneity of the number of COVID-19 cases and deaths across African countries, (ii) test the existence of spatial autocorrelation and heterogeneity in the patterns of COVID-19 cases and deaths, and (iii) determine socio-ecological factors affecting the spatial heterogeneity of COVID-19 morbidity and mortality across African countries.
2. Materials and Methods
2.1. Study Area
The study considered all 54 African countries, including Madagascar. Based on the latest United Nations estimates, the population of Africa in 2023 was 1,460,481,772, i.e., about 18.2% of the world population (
https://www.worldometers.info/world-population/#region, accessed on 5 January 2024). However, Africa carries 25% of the world’s disease burden, and its share of global health expenditures is less than 1%. Worse still, it manufactures less than 2% of the medicines consumed on the continent. A majority of Africans, mostly the poor and those in the middle-income bracket, rely on underfunded public health facilities, while a small minority have access to well-funded, quality private health care. The first three challenges identified were inadequate human resources, inadequate budgetary allocation to health, and poor leadership and management [
17]. At the advent of COVID-19, and because most countries on the continent rank as poor on the United Nations Development Programme’s Human Development Index, experts predicted millions of COVID-19 deaths on the continent, which turned out to be wrong several years after the pandemic [
1]. For example, as of 3 April 2024, only 1.82% and 3.69% of the total cases and deaths, respectively, were reported in Africa.
2.2. Data Acquisition
We sought to model the dynamics of the COVID-19 cases and deaths. As such, we compiled data on the cumulative cases and deaths of COVID-19 for each African country as of 8 August 2022. The numbers of cases and deaths were then divided per millions of people to make them comparable across countries. The number of cases detected highly depends on the number of tests carried out. Therefore, for modelling the number of cases, the total number of tests was added as a covariate to account for its effect on the number of reported cases and deaths. Data on the number of cases, deaths, tests, and population were obtained from the Worldometer database (
https://www.worldometer.org/, accessed on 8 August 2022). In total, 43 explanatory variables grouped in 7 categories were considered as follows: demography (10 variables), migration (3 variables), economic (6 variables), health care systems (3), clinical or diseases (7), pollution (4), climate (8), and others (2). Demography variables included population density, annual change in population, fertility rate, median age, proportion of total population aged 65 and above, proportion of total population aged 15–64, dependence ratio as % of working-age population, dependence ratio for old people as % of total population, and median year of life expectancy at birth. Migration variables included the net migrants, number of airports in the country, and number of air transport passengers carried per capita. The economic variables included the adjusted savings, i.e., particulate emission damage (% of GNI), which is equal to net national savings plus education expenditure and minus energy depletion, mineral depletion, net forest depletion, carbon dioxide, and particulate emissions damage, the GDP per capita, the Human Development Index, and urbanisation rate. The health care systems variables included the number of nurses and midwives per 1000 population, number of physicians per 1000 population, and the Global Health Security Detection Index (weighted sum of all the Global Health Security (GHS) data normalised to a scale of 0 to 100, where 100 = best health security condition). The clinical variables included the prevalence of diabetes (% of population aged 20 to 79), the incidence rate of tuberculosis (TB) per 100,000 people, the Bacillus Calmette–Guérin (BCG) vaccination coverage in %, the prevalence of HIV (total % of population aged 15–49), the reported cases of malaria per 100,000 population, the raised total cholesterol (≥5.0 mmol/L) as an age-standardised estimate, and the burden of communicable diseases and maternal, prenatal, and nutrition conditions (including infectious and parasitic diseases, respiratory infections) per 100,000 people. Variables on pollution were PM2.5 air pollution (population exposed to levels exceeding the World Health Organization (WHO) guideline value (% of total)), the methane emissions in the energy sector (thousand metric tons of carbon dioxide (CO
2) equivalent), the nitrous oxide emissions (thousand metric tons of CO
2 equivalent), and the proportion of people practising open defection (% of population). Climate variables were annual mean temperature, temperature seasonality, annual precipitation, precipitation of driest quarter, moisture index, moisture index of the most arid quarter, moisture index of the moistest quarter, and potential evapotranspiration. The other variables were the total area land of the country and the proportion of the total land area that is covered by forests. Further details and sources of the data are provided in the
Supplementary File, Table S1.
2.3. Data Analysis
COVID-19 cases and deaths per million across African countries were considered response variables and plotted in a geographical information system to explore the spatial heterogeneity of the disease incidence and fatality across the continent. The correlation between both variables was moderate and positive (r = 0.672, p-value ). The correlation between the total number of cases per million and the total number of tests per million was high and positive (r = 0.834, p-value ). The correlation between the total number of deaths per million and the total number of tests per million was rather moderate and positive (r = 0.662, p-value ).
Before diving into the statistical analyses, all explanatory variables were standardised using the min–max normalisation, which resulted in the values of all variables ranging from 0 to 1. The modelling of the relationship between explanatory variables and each of the two response variables involved three steps. As the number of explanatory variables was high, the first step dealt with collinearity analysis using the variance inflation factor (VIF). The process consisted of regressing each explanatory variable on the remaining explanatory variables and iteratively eliminating those with a variance inflation factor (VIF) greater or equal to 5, resulting in 23 variables selected out of the 43. The 18 variables selected included country’s total area, excluding area under inland water bodies, national claims to continental shelf, and exclusive economic zones (land_Area), methane emissions in the energy sector (thousand metric tons of CO2 equivalent) in 2018 (Meth_em), nitrous oxide emissions (thousand metric tons of CO2 equivalent) in 2018 (Nitro_oxide), number of airports in the country (Nb_Airport), dependence ratio for old people (% of total population) in 2020 (DepR_old), number of tests per 1 million people (Tests_1Mpop), urbanisation rate (Urban_Rate), the net migrants (Migrants), population density (Density_2020), precipitation of driest quarter (bio17), moisture index of the most moist quarter (mimq), adjusted savings in % of GNI (AdjSav), annual mean temperature (bio1), annual change in the population (Yearly_change), dependence ratio (% of working-age population) in 2020 (DepR), proportion of the total land area that is covered by forests (Forest_area), BCG vaccination coverage, in % (BCG.19), prevalence of HIV as total % of population aged 15–49 (HIV.19), raised total cholesterol (≥5.0 mmol/L) as an age-standardised estimate (Raised_Choleste_2018), GDP per capita (current US$) (GDP.19), reported cases of malaria per 100,000 population (Malaria.19), the burden of communicable diseases and maternal, prenatal, and nutrition conditions (including infectious and parasitic diseases, respiratory infections) per 100,000 people (Commun_DiseasePrevalence2019), and the Global Health Security Detection Index (weighted sum of all the GHS data normalised to a scale of 0 to 100, where 100 = best health security condition) (GHS.index.19).
Then, an OLS regression was performed on the response variables (total cases per 1 million population and total deaths per 1 million population), including the 23 pre-selected explanatory variables. The parsimonious model was identified after a backward selection on the initial regression model. Then, the Global Moran’s I was used to explore the spatial autocorrelation of COVID-19 cases and deaths across African countries. This index was calculated using the parsimonious OLS regression model, which was performed on each response variable using various weight matrices. Three row-standardised weight matrices were considered for testing the global spatial autocorrelation, including the maximum distance matrix, the 4-nearest neighbours matrix, and the 10-nearest neighbours matrix. The maximum distance was the maximum of the minimum distance, which allowed each country to have at least one neighbour. The average number of neighbours within this distance was 6.03. Based on this, the 4-nearest neighbours matrix and the 10-nearest neighbours matrix were considered in addition to the maximum distance matrix. This analysis allowed prior assessment of the relevance of spatial models.
Four global spatial models (GSMs) [
2] were considered, including OLS regression (OLS), the spatial lag model (SLM), the spatial error model (SEM), and the conditional autoregressive model (CAR) [
18]. The OLS model was used because it is one of the most used regression techniques, though it has some constraining assumptions which limit its applicability for data with special features like spatial data. It was used as a reference for comparison purposes. The SLM, SEM, and CAR were considered because (i) we were primary interested in global (not local) spatial modelling, and (ii) they are the most used global spatial regression techniques in epidemiological modelling [
2,
6,
19].
The SLM and SEM are also known as Simultaneous Autoregressive Models (SARs). OLS regression assumes spatial stationarity across the scale and, therefore, hypothesises that a model conceptualised for a particular area can be applied effectively to other areas of interest [
20]. According to Anselin and Arribas-Bel [
21], the global OLS has fundamental assumptions; the observation in the feature space does not vary with space, and therefore should be independent, and the residual model errors should not be correlated [
22]. The OLS is formulated as follows:
where Y is the vector of response variable,
the vector of slopes associated with the predictors matrix X, and
the error term.
The spatial lag model (SLM) assumes spatial dependence between the explanatory and response variables in feature space and conceptualises the global regression by incorporating spatial dependence attributes in the modelling process. The SLM also assumes that spatially lagged dependent variables are in the model estimation, which can be ensured by the spatial dependence test resulting from the OLS [
2]. The effect of this spatial variable generated from a weighted contiguity matrix quantifies the level of interactions of an observation with its neighbour values in the feature space. Suppose that the determinant factors tested by Moran’s I (error), the Lagrange Multiplier (lag), and Robust LM (lag) exhibited statistically significant estimates at a defined probability level. In that case, one should reconsider the model selection process and opt for the SLM (i.e., the unrestricted model) as a replacement for the OLS (the restricted model without spatial term). The SLM is formulated as follows:
where
is the autoregression parameter, W is a matrix of weights, and the remaining parameters are defined as above.
The spatial error model (SEM) is an extension of global models that fundamentally stands on the assumption of spatial dependence in the residual error of the OLS [
20]. The SEM posits that spatial autocorrelation among regression residuals is thus evident. Two standard spatial dependence tests, the Lagrange Multiplier (error) and Robust LM (error), were performed to ensure statistical significance in the spatial dependency in error terms. The SEM can be written as follows:
where
is the autoregression parameter,
is the spatial error term, and the rest is as above.
The conditional autoregressive (CAR) model assumes a conditional spatial dependence between the response and the explanatory variables through a symmetric weights matrix. The model can be written as follows [
18]:
with
. If the error variance
is constant for all locations
i, the covariance matrix is
, where W is a matrix of weights that must be symmetric. Though the CAR and SAR models are related, the terms
used in both the CAR and SAR models are not identical because the matrix W does not need to be symmetric in the SAR models.
All statistical analyses were implemented in R software version 4.1.0 [
23]. The spatial regression models were implemented in the package “spatialreg” [
24] and the LM tests in the package spdep [
25]. The best model was selected based on the AIC, BIC,
, Root Mean Squared Error (RMSE), and statistical difference from the spatial autocorrelation tests. The coefficient of determination (
) statistics denote the overall model strength and robustness. The AIC and BIC values measure the overall model accuracy and parsimonious character. The RMSE measures the precision of the model fitted to the observed data. The residuals of the models were plotted for further diagnostic purposes (see
Figure S1 in the Supplementary File).
The relative importance of the selected explanatory variables for both response variables was assessed using the Random Forest model [
26], which spots the key explanatory factors in the models [
2]. This was implemented in the “randomForest” package in the R software [
27]. Because the importance ranking can vary between runs due to the random selection of training data and variables to determine the split at each node [
28], the model was run with 1000 trees [
29], and the mean decrease in accuracy (%IncMSE) was used to measure predictor influence. %IncMSE is the average increase in the squared residuals of the test set when the variable is pruned, and it provides information on the variable’s contribution to the overall variance of the predicted variable. This measure was calculated for each tree in the forest and then averaged over all trees.
4. Discussion
Although the African continent has not been severely affected by the COVID-19 pandemic [
1], a better understanding of how socio-economic and climate factors have shaped the pandemic dynamics is crucial for informing the policymakers at both country and continental levels. Among the several existing models, choosing the most appropriate ones is important to avoid misleading conclusions, especially when dealing with multi-location data where spatial autocorrelation matters. In this study, we showed that the distribution of COVID-19 cases and deaths was heterogeneous across the 54 African countries and sought to understand underlying socio-economic and climate factors. To do so, we compared the performance of OLS, SLM, SEM, and CAR models on the number of COVID-19 cases and deaths per million population.
Consistent with previous findings (e.g., [
2,
6]), we found that models incorporating spatial autocorrelation (SLM and SEM) outperformed the OLS for both the number of COVID-19 cases and deaths. This finding highlights the importance of exploring the potential effect of spatial autocorrelation in fitting models with multi-location data. We found an increase of 6% for the
for the number of cases and 4.5% for the number of deaths. These are, respectively, larger than the 4% increase reported for the number of cases and lower than the 33% increase for the number of deaths found by Maiti et al. [
2] in a study of socio-economic and ecological drivers of COVID-19 dynamics at the county level in the United States of America. This suggests that the magnitude of the model improvement when accounting for spatial autocorrelation likely depends on the context and the studied variables. Our models also have a better explanatory power than previously established models for Africa (e.g., Adj-
= 70% for the number of cases and Adj-
= 50% for the number of deaths in Bouba et al. [
7]).
Population demography pattern and structure, migration, socio-economic conditions, health care systems, pollution, and climate have been shown to modulate the dynamics of COVID-19 and hence may be epidemiologically informative in several places [
2,
6,
30]. For example, Su et al. [
14] found that paying more attention to controlling migration, either national or international, restricted population flows, modernising the health care system by improving diagnosis and treatment capacity, and upgrading the public welfare system to make it fully functional for the crisis situation could be the points of interest to effectively fight against COVID-19. Our results showed that variables such as the number of tests per million population, age dependency ratio, old dependency ratio, urbanisation rate, bioclimatic variables, and pollution metrics are important drivers of COVID-19 incidence in Africa.
The low burden of COVID-19 in most African countries was suggested to be partly explained by the flaws in the detection and reporting system [
31], which appeared to be supported by the positive and significant association with both the number of cases and deaths per million population that we found. Bouba et al. [
7] also found similar results, suggesting that the statistics reported in African countries might be sufficiently underestimated, at least for the number of cases. This also indicates the relevance of accounting for the number of tests as a covariate for proper estimations of the effect of other variables. Indeed, bias, especially underreporting and reporting delays, is a major issue in African COVID-19 cases and deaths data, which some studies have shown to being largely underestimated by a factor of 8.5 on average due to the weakness of the health systems at country level [
32,
33,
34]. To consider this potential bias, we have included the number of tests [
33] as an explanatory variable, which turned out to be significant for both the number of cases and number of deaths. Nevertheless, this might not have entirely addressed the issue of underreporting as this is heterogeneous across countries [
33], which introduces uncertainty in our modelling.
Children and the old-aged population are often more vulnerable to respiratory diseases [
35,
36], thus indicating the relevance of examining the potential role of the age dependency ratio (DpR) and old dependency ratio (DepR_old) in the morbidity and mortality of COVID-19. DpR is the sum of the young population (under the age of 15) and elderly population (aged 65 and over) relative to the working-age population (aged 15 to 64). DepR_old is the number of people (in the age group of 65 and older) per 100 people (aged 15 to 64). Our data and models indicate significant negative relationships between DpR and the number of cases and number of deaths and, marginally, a significant negative relationship between DepR_old and the number of deaths, consistent with the findings of Varkey et al. [
35] for Asian countries. This result supports the fact that, although statistics of earlier waves indicate that older adults are more prone to COVID-19, the subsequent waves provide evidence that even young adults are also affected by the disease.
The urbanisation level was one of the first confirmed positive driving factors of COVID-19 transmission and subsequent deaths. This resulted in the first non-pharmaceutical interventions to curb the disease dynamics, such as forbidding people gathering, social distancing, airport closures, limited travel, sanitary cordons, etc. [
37], which all aimed to reduce people gathering, and hence the propagation of the virus through, e.g., aerosols, droplets, and bioaerosols. Using data from 184 countries, Upadhyaya et al. [
38] found a positive and statistically significant association between urbanisation level and COVID-19 mortality. Similarly, Fan et al. [
39] reported a positive association between urbanisation with regional health vulnerability and the severity of the COVID-19 case rate and death rate.
In addition to the social, viral, and human dimensions regulating COVID-19 cases and death patterns, climate may also play a pivotal role as a co-factor in the disease dynamics [
40]. For example, the duration of survival and transmission of SARS-CoV-2 through aerosols, droplets, and bioaerosols are negatively affected by temperature [
41]. The negative relationship between annual mean temperature (Bio1) and COVID-19 cases that we found agreed with several previous findings, supporting the conclusion that temperature has a negative relationship with the incidence of COVID-19 [
42,
43]. In particular, a 1 °C rise in temperature was associated with a 1.92 decrease in cases per million. Our findings also indicated that the precipitation of the driest quarter (Bio17) was negatively associated with the number of cases, corroborating previous evidence that bioclimatic variables are important factors shaping the incidence distribution of COVID-19 [
40]. In addition to temperature- and precipitation-related variables, moisture has also been suggested as a significant correlate of the number of COVID-19 cases and deaths [
44]. There is evidence that moisture is an important risk factor for respiratory diseases, where infection is enhanced in low-humidity conditions [
45], resulting in a negative relationship between humidity and the incidence of respiratory disease. In this regard, Ma et al. [
44] reported a negative association between humidity and the daily death counts of COVID-19 in Wuban. Consistent with these findings, our results also indicate that the number of COVID-19 deaths decreases with the moisture index of the most moist quarter (Mimq), highlighting the importance of this factor, particularly for the number of deaths. This negative relationship might somehow explain the low number of deaths in the arid countries of Africa (e.g., Niger, Mali, and Burkina Faso), where the Mimq is often high. Contrary to Bouba et al. [
7], who did not find any association between COVID-19 cases and deaths and climate variables, our findings provide evidence of the significant role of climate variables in the patterns of the disease in Africa. These differences could be linked either to the collinearity among predictors, diluting the effect of some variables, or the fact that we explicitly considered spatial autocorrelation in our models, which was not the case in [
7].
Changes in levels of air pollution affect urban environmental health and are often associated with an increased likelihood of viral infection [
46], which includes COVID-19. Our findings suggest that high methane emissions and many people practising open defection are negatively associated with the number of cases and deaths, respectively. These findings are counter-intuitive as increased pollution is expected to increase the likelihood of infections and mortality. These findings could hide an indirect effect of confounding factors that our model might not capture. It may also be linked to the fact that the data used for these two variables are too old (1 to 2 years before the pandemic’s start) to determine the current patterns of the disease. Unfortunately, these are the most recent data that we found, thus revealing a critical issue with public data in African countries.
Among the significant variables, the number of tests per 1 million people (Tests_1Mpop), adjusted savings (AdjSav), and GDP were identified as the most important for both the number of COVID-19 cases and the number of deaths, illustrating the importance of these variables in driving the overall pattern of COVID-19 on the continent. The number of tests per 1 million people varied from 5073 (Algeria) to 878,731 (Eswatini) with a coefficient of variation of 129.9%. The adjusted savings varied from 0.14 (Mauritius) to 3.64 (Chad) with a coefficient of variation of 59.3%, and the GDP varied from 217 (Burundi) to 16,850 (Seychelles) with a coefficient of variation of 121.3%. As illustrated by these figures, these variables varied greatly across countries. In addition to the above variables that were common to both number of cases and deaths, dependence ratio (DepR) and annual mean temperature (bio1) were identified as important variables for the number of cases, while prevalence of malaria (Malaria.19) and the prevalence of communicable diseases (Commun_DiseasePrevalence2019) were identified as important for the number of deaths. These variables have a coefficient of variation of 39.96%, 14.2%, 130%, and 20.62%, respectively. For the prevalence of malaria, which showed the greatest variation across countries, the negative effect we found was supported by previous findings. For example, Anyanwu [
47] reported a reduced number of COVID-19 deaths in malaria-endemic countries, although they suggested further clinical trials. The prevalence of malaria in our dataset varied from 0 in countries such as Algeria, Cabo Verde, Egypt, Lesotho, Libya, Mauritius, Morocco, Seychelles, and Tunisia, where the numbers of cases and deaths were high, to more than 300 in countries such as Sierra Leone, Mozambique, the CAR, and Burundi, where the reported numbers of cases and deaths were relatively low. Concerning the prevalence of communicable disease, previous evidence also showed a strong association between the COVID-19 pandemic and the control and prevention programs, diagnosis capacity, and adherence to treatment of major infectious diseases (e.g., HIV, TB, and malaria), including neglected diseases and non-communicable diseases [
48].
Other variables not considered in this study have been suggested to drive the patterns of COVID-19. For example, the Gini inequality index, the Global Health Security (GHS) index, and the mean body mass index (BMI) have been identified as significant correlates of the number of COVID-19 cases in Africa [
7]. Similarly, the prevalence of diabetic patients, the number of nurses per 1000 population, and the GHS index were also identified as determinants of mortality due to COVID-19 in Africa [
7]. These variables might be correlated with some of the variables included in our models. Nevertheless, this indicates that multi-dimensional perspectives should be considered to understand the drivers of COVID-19 better and consequently design appropriate actions and public health policies.