Early Spread of COVID-19 in the Air-Polluted Regions of Eight Severely Affected Countries

: COVID-19 escalated into a pandemic posing several humanitarian as well as scientiﬁc challenges. We here investigated the geographical character of the early spread of the infection and correlated it with several annual satellite and ground indexes of air quality in China, the United States, Italy, Iran, France, Spain, Germany, and the United Kingdom. The time of the analysis corresponded with the end of the ﬁrst wave infection in China, namely June 2020. We found more viral infections in those areas afﬂicted by high PM 2.5 and nitrogen dioxide values. Higher mortality was also correlated with relatively poor air quality. In Italy, the correspondence between the Po Valley pollution and SARS-CoV-2 infections and induced mortality was the starkest, originating right in the most polluted European area. Spain and Germany did not present a noticeable gradient of pollution levels causing non-signiﬁcant correlations. Densely populated areas were often hotspots of lower air quality levels but were not always correlated with a higher viral incidence. Air pollution has long been recognised as a high risk factor for several respiratory-related diseases and conditions, and it now appears to be a risk factor for COVID-19 as well. As such, air pollution should always be included as a factor for the study of airborne epidemics and further included in public health policies.


Introduction
From the first detected outbreak of a new member of the coronavirus (CoV) family [1] in Wuhan, Hubei Province, China [2][3][4], SARS-CoV-2 [5] has rapidly spread around the world [6], with governments and institutions showing mixed results in its effective containment [7,8]. Certain regions have been much more adversely impacted in terms of infections and mortality rates than others, and the full reasons for this are not yet clear. This paper shows compelling evidence of a correlation between air pollution and incidence of COVID-19 in eight of the first countries known to have experienced an initial fast spread of the virus.
It must be noted that from the time that these hypotheses and related results were presented to the public with a preprint dated early June 2020 [83,84], several studies with similar hypotheses have been published. These studies investigated different regions and used a variety of approaches. They are cited in the Introduction and Discussion to complement and support our hypotheses.

Materials and Methods
As briefly introduced above, the present work was performed from March until June 2020, when the first wave of the global pandemic was considered under control in China. We added analyses for seven other countries particularly affected by the virus at that particular time of the pandemic. Italy was the second country to know a rapid contagion spread, especially in its highly industrialised northern region. The third country investigated was the conterminous US, which had the highest number of infections worldwide, yet was still behind in the pandemic curve due to its later arrival as compared to Asia and Europe. Among the countries where the virus spread earlier, we included Iran, which heavily suffers from severe air pollution due to the ubiquitous use of gas methane, refineries, and heavy traffic. France and Spain were selected because of the high COVID-19 figures but more minor air pollution issues than Italy. Lastly, Germany and the UK represented suitable candidates to feed into the analysis because of the relatively reduced lockdown measures adopted [85].
We evaluated the potential correlation between air quality metrics and infections at the finest granularity available. Owing to the differences in the virus advancement stage in each country and the different methodologies employed to record COVID-19 infections and deaths as well as testing policies, the data for each country were analysed separately. We evaluated the potential correlation between air quality metrics and infections at the finest granularity available, controlling both COVID-19 and air pollution variables for potential relationships with population densities as well as the presence of bivariate virus/pollution spatial clusters. Then, the results and differences in the pattern between countries are discussed.

Data Collection and Processing
The COVID-19 datasets were compiled at the second-order administrative subdivision level (US counties equivalent), using the last available information at the time of the analysis (beginning of June); however, a few geographical and time adaptations were required for some contentious administrations that do not make public all the data. In particular, for Iran, we were able to find data of infections only, and at the first-order administrative level only. The Chinese dataset includes the 17 April update with a 50% increase in deaths in Wuhan city [86]. Deaths in Italy were available only at the regional level; therefore, two different datasets were compiled. The autonomous communities of Catalonia, Galicia, and Pais Vasco in Spain provided figures at the first administration level only [87], so we considered them at the same level as provinces. For France, COVID-19 deaths were available only at the department level. Finally, in the UK, the data for Scotland were organised following the National Health Service (NHS) subdivisions rather than the second-order administration scale.
Both infections and deaths due to COVID-19 were collected and normalised by population size per administration unit (100,000 residents), and mortality rates (number of deaths/number of infections × 100) were calculated. Population densities for each unit's area were extracted at 1 square km resolution.
Air quality information was retrieved from long-term satellite observations and averaged at the administrative unit level for each country. The first observations were global annual PM 2.5 grids from MODIS, MISR, and SeaWiFS Aerosol Optical Depth (AOD) with GWR, v1 (1998GWR, v1 ( -2016, and they were obtained from NASA's Socioeconomic Data and Applications Center [88,89]. From the same repository, we retrieved a second dataset consisting of the Global 3-Year Running Mean Ground-Level NO2 Grids from GOME, SCIAMACHY, and GOME-2, v1 (1996-2012) [90,91]. For both products, the annual grids were first reduced to an average multi-year image and, afterwards, the mean of all grid cells covering every administrative unit was calculated.
Additionally, ground measures for the US, China, and Italy were collected from various sources (Table 1). To every administrative unit, we assigned the air quality value from its related station. If more than one point fell within a given unit, the mean was calculated. No ground measures for the other countries were included in our study.
Combining all these measures poses compilation challenges [92]. Satellite data hold several advantages over ground station data, such as regular and continuous data acquisition, quasi-global coverage, and spatially consistent measurement methodologies [93]. On the other hand, ground stations offer actual measures of single pollutants instead of deriving them from spectral information; however, they require more or less arbitrary estimations (such as interpolation) to fill spatial gaps.

Data Collection and Processing
Exploratory analysis of the variables was conducted with a focus on evaluating the air pollution distributions within each country. Due to the highly skewed distributions of both population-adjusted dependent variables, namely COVID-19 infections/100,000 inhabitants, COVID-19 deaths/100,000 inhabitants, and mortality rates (deaths/infections × 100), we opted for a non-parametric correlation metric. Kendall tau correlation coefficients were employed for all statistical tests.
Since both virus spread and air pollution dynamics present visible spatially dependent dynamics, we identified potential clusters of adjacent administrations using Local Moran's Bivariate statistic [94,95]. This metric also shows which regions mostly explain the resulting correlations by excluding non-significant regions.
These results are illustrated with thematic maps that better highlight the overlap between air quality and COVID-19 distributions within the eight assessed countries.

Correlation between Air Pollution Variables and COVID-19 Infections, Deaths, and Mortality Rates
Significant positive correlations between air quality variables and COVID-19 infections, deaths, and mortality rates were found in China, the US, Italy, Iran, France, and the UK, but not entirely in Spain and Germany (Tables 2-4). The strongest correlations were found in Italy, both for infections and deaths, while population size and densities did not explain COVID-19 incidence. In China, population densities showed a similar positive correlation with the virus infections and deaths than air pollution, while in the US and UK, the population density had a stronger correlation than air pollution variables. In the UK, air pollution showed a fair degree of correlation with deaths and mortality but not with the infections. Despite its small sample size (df = 29), Iran showed a significant correlation with NO 2 distribution and no incidence from population variables. The results for Spain and Germany showed different patterns. Differences in air pollution could not explain the spread of COVID-19 and its related deaths in Spain; however, the mortality rate varied with NO 2 concentration. Moreover, population size and density were negatively correlated with the virus. In a distinct manner, population density weakly explained COVID-19 infections in Germany, while the distribution of fine particulate matter was in some cases weakly negatively correlated. Among the different pollutants analysed, O 3 and SO 2 measures from ground stations in China and the United States did not show significant correlations with COVID-19 or were negatively correlated, in contrast with the overall results from the other pollutants.  Table A1). While the PM 2.5 maps are continuous surfaces drawn following the same classification scheme across countries, the COVID-19 infections and deaths maps required ad hoc classification adaptations due to different population profiles and infection dynamics. In China, due to the vast population and an apparently effective policy for the containment of the virus, the number of infections per 100,000 residents was relatively low and highly concentrated in the epicentre of the outbreak (Wuhan and the Hubei province). A visual correlation between the two maps can be perceived, especially between the eastern and western parts of the country, which are also highlighted in the cluster map ( Figure 2).      Table 1 for information about dates and data sources). Maps of different countries may not be compared directly due to different classification schemes and spatial scales. Some administrative units' boundaries were adapted according to the COVID-19 data available (e.g., merged districts of Galicia, Catalunia, and Pais Vasco in Spain).  Table 1 for information about dates and data sources). Maps of different countries may not be compared directly due to different classification schemes and spatial scales. Some administrative units' boundaries were adapted according to the COVID-19 data available (e.g., merged districts of Galicia, Catalunia, and Pais Vasco in Spain). Note that the COVID-19 variables used in each country are those of Figure 1. Data sources can be found in Table 1. Note that the COVID-19 variables used in each country are those of Figure 1. Data sources can be found in Table 1.

COVID-19 Distribution, Clusters, and Air Quality Maps
The presence of outlier clusters (HL and LH) and large non-significant areas in most of the countries partly explains the limited significance and strengths of correlations shown in the tables at the general countries' level. The highly developed and polluted areas in the east of China represent outlier clusters due to the low COVID-19 infections compared to the Hubei province. In the US, the virus noticeably appears to spread over several areas. PM 2.5 differences are not large, but their distribution looks adequately coincident with the deaths. Similar to China, a longitudinal pattern is visible with low-deaths/low-pollution clusters (LL) concentrated in the mid-western part of the country while high-death and high-pollution clusters (HH) are found in the east, along the Mississippi River and the states surrounding New York. However, a high number of outliers of both types (HL and LH) exist. The high correlation results found for Italy are clearly visible. The polluted areas of the Po Valley are those heavily affected by COVID-19 infections. The clusters are clear, and the number of outliers is minimal. While in Iran and France, the correlations are only lightly perceivable, and the cluster maps show a north-south regionalisation pattern similar to Italy. The maps of Spain confirm the absence or weak correlation shown in Tables 2-4, apparently going against our general hypotheses. Nevertheless, PM 2.5 levels in Spain are minimal, as well as their variation-as indicated by the low range and interquartile range (Table A1, Appendix A). The UK map of PM 2.5 shows well the higher concentrations around urban areas and the overall southeastern area where COVID-19 mortality is higher, too. However, also, a few counties/NHS in the north of Scotland are particularly affected by the virus infections, becoming outliers in the clusters map. Finally, COVID-19 mortality in Germany is low, and no apparent distribution pattern can be detected, being quite well spread. Similarly, PM 2.5 concentrations are fairly high all over the country, with peaks in the eastern districts, where a few HH clusters and LH outliers are found. The high number of non-significant clusters and both types of outliers confirm this tendency to a homogenous distribution of COVID-19 and air pollution.

Previous Literature Account
Given the delay between the last of our preprints [84] and the present publication, we include a list of 10 recent studies that support our correlational findings (Table 5). These and other research works are discussed in the next section. It is worth noting that a study by Ogen [96] found a positive correlation between NO 2 levels and COVID-19 fatalities in the administrative regions of Spain, Germany, Italy, and France when considered together as a cluster. However, our results showed that within the second-order administrative regions of Germany and Spain, the correlation were not always significant, and it was sometimes negative. Except for our study and these two countries, we were not able to find other works contradicting our initial hypotheses. Table 5. A list of correlational studies between long-term exposure to air pollution and incidence of COVID-19 at a country-wide or cluster of countries level. Studies including smaller geographical areas have not been listed as well as those considering the short-term hypothesis (pollution particles acting as virus carriers). The pollutants are specified whether to having been collected from ground (G) or satellite (S) stations.

Discussion and Conclusions
As a preprint [84], this study was the first to investigate the correlation between COVID-19 and air pollution during the early stage of the pandemic. Specifically, we have assessed long-term air pollution exposure for eight countries, which was measured by satellite and ground sensors as a potential and highly likely risk factor for the incidence of and mortality rates due to SARS-CoV-2. It provides some evidence that the new coronavirus infections are most often found in highly polluted and densely populated areas. In Italy and Iran, air pollution independent from population density explained the distribution pattern of the virus. In addition, in these areas affected by a mixture of air pollutants, the virus killed more frequently than elsewhere. In the questionable case that the figures provided by these eight nations concerning the number of infections and deaths are inaccurate [105], our analyses and conclusions would not need to be reframed. If that were the case, this error would most likely be concentrated in just one or very few administrations, or it would be evenly spread across administrations, not affecting the general significance of the correlations.
In Chinese cities [106] and, more in detail, in the Hubei province, time analyses give preliminary evidence of a correlation between high levels of NO 2 and 12-day delayed virus outbreaks [107] and other PM covariates [108,109]. With our paper, we therefore add the long-term exposure effects for China as we did in greater detail before [46]. As shown in the maps, China bears extremely high rates of air pollution, as concentrated in the east. However, COVID-19 infections occurred mainly in the constrained area of Hubei. All evidence suggests that the enforced lockdown was the major factor controlling the virus spread. Nevertheless, it is peculiar that the onset of the pandemic still appeared in one of the most polluted areas of the globe.
In the US, an increase of a mere 1 µg/m 3 in PM 2.5 was recently found responsible for an 8% higher mortality rate by COVID-19 than baseline from previous years. This is a rate relatively higher than the other 11 demographic co-variables tested [97,98]. Ozone and diesel particulate matter were recently confirmed to be a source of concern over there [110]. With our study, we add PM 10, NO 2 , and CO measures from ground stations to those analyses.
We found that in Italy, the correspondence between poor air quality and SARS-CoV-2 appearance as well as its induced mortality was the starkest. The area with the largest number of infections and deaths in Italy is the Po Valley, which is also the foremost place of polluted air in Europe [111]. This result was first hypothesised [112] and later confirmed by another study [76] and a remarkable further one [99] that has controlled for five demographic co-variables. The fact that population density does not play a role in the incidence of COVID-19 in Italy and Iran is a result of our investigation that strongly supports the common hypotheses of these other studies and questions the widespread scepticism maintaining that air pollution usually overlaps with areas of high population density, and that the contribution of each to the virus incidence cannot be discerned (tested in Appendix B, Table A2). Other factors to be attributed to such a severe virus incidence in Italy include its very large ageing population, which might have gotten exposed to air pollutants for the longest time. In turn, such pollutants cause other comorbidities [77] and COVID-19 vulnerabilities such as cardiocirculatory diseases.
Unfortunately, information on COVID-19 infections was available for Iran only at the first administration level until 22 March 2020. Nonetheless, the results there are very similar to the Italian ones, with the pollution gradient explaining most of the virus incidence. France provides up-to-date information about deaths only. Despite this limitation, France as well shows highly clustered COVID-19/pollution distributions, resulting in significant positive correlations that confirm our hypotheses.
The absence of correlation found in Spain may be attributable to the high levels of air quality throughout its national territory, which are within the green Air Quality Index standard range, ensuing minimal differences among the provinces. Moreover, the regions most affected by the virus seem to be those less densely populated, which is a peculiarity still not explained thoroughly in the literature that requires future investigation. In Germany, also, a clear correlation could not be detected because, inversely, pollution is widely spread across its districts. For these two countries, we therefore back up another report [96] that analysed these two countries together with Italy and France at the firstorder administrative level. Higher levels of NO 2 associated with COVID-19 mortality were found in this super-region.
Finally, in the UK, where containment measures were implemented late compared to other countries, deaths and mortality rates, but not infections alone, are correlated with air pollution, suggesting that when affected by the disease, a weakened respiratory system due to prolonged stress by air pollution increases the risk of mortality in those polluted areas of the southeast.
Despite the significant and consistent correlations of these findings that we collected over three time periods in March [83], April [84] and, as reported here, end of May, their interpretation needs to be cautious. The virus spread in most countries is still ongoing [113] and is being contained [47]. Causation should not be inferred by correlational data alone. Air pollution is just one of the risk factors for increased COVID-19 incidence. It is partly comforting that we find outliers or non-significances via clustering analysis. In fact, the regions flagged as such and prevalent in most countries may become sites for the virus due to other factors than air pollution. It is not necessarily because an area is polluted that it will have a higher frequency of COVID-19. The other external factors involved in SARS-CoV-2 infection include age, pathological comorbidities, access to health care, socioeconomic status, multigenerational housing, travel in crowded transportation hubs, attendance at super-spreading events, etc. In addition, other factors include policies for prevention and containment as well as compliance to measures such as wearing face masks, social distancing, contact tracing, lockdowns, etc.
There are confounding factors, such as how the virus infection was determined in patients by different countries. However, the larger the geographical areas affected by the pandemic, the lower these elements play a role. Finally, it should be noted that by accounting for yearly averaged air quality indexes, we accounted for the long-term exposure to these pollutants, therefore keeping on the conservative side. In fact, these correlations would become even more robust when limiting the analysis to the more polluted winter months, given how they invariably bear lower air quality.
We run these analyses considering eight countries in their second-order administrations' level. If controlling for several other predictors such as demographic variables is something advisable to perform at a single-country level to cross-check for interdependence, including them at an international scale poses an apparent technical limitation [114]. National or federal health systems have different capacities and provide care in distinct ways. In turn, this influences case detections, intensive care capacity, and mortality rates. Cofactors such as the earliest location of the pathogen, population mobility, and patient socioeconomic status or ethnicity may not be accounted reliably between such diverse countries spanning from Asia to the western world, because they are interdependent and only in part nested within countries or administrations, even when included as random factors as in a comprehensive generalised mixed model. Yet, the epidemic, which has turned into a pandemic, might have catered for this limitation; the wider its extent, the more prominent a common factor such as air pollution has become, while other secondary predictors will level out across places.
Left alone, ambient, outdoor air pollution, causing an estimated 4.2 million deaths yearly worldwide [81], is a risk cofactor to be hypothesised in connection with a new respiratory disease, without necessarily having to analyse some of the other cofactors, especially in the temperate climate zones where new countries keep reporting similar correlations between PM 2.5 and the virus: the Netherlands [100,101], controlling for some other medical risk factors; Japan [102], finding a positive correlation in the elderly; and also India, which holds a similar trend in relation to the long-term hypothesis [103] and in relation to the short-term one, too [115,116]. Lastly, Canada [104], Peru [117], with other Latin America countries plus the Caribbean [118], and Malaysia [119] were reported to bear positive associations. To note that, in the most comprehensive analysis performed during the first infection wave in 126 countries, CO 2 and SO emissions correlated with COVID-19 when analysed using the "Our World in Data" database [120].
Since there is now some first evidence that the cross of the virus from animals to humans may have happened earlier than the end of 2019 [79] and further south than in the Chinese city of Wuhan [121], we can speculate that air pollution could have allowed the new epidemic to become recognised due to an influx of patients with weak respiratory systems showing higher morbidity and mortality than influenza. The same seems to have happened in Europe, as the virus, in February 2020, quickly moved from central Europe to the most polluted region of the continent, in northern Italy [47,122].
Further research in the field of genetics will ascertain whether virulence has evolved in the areas of those countries where a gradient of air pollution is present. The initial location of the pathogen, long-distance travelling, and super-spreader events are deemed to be the foremost factors governing the epidemics. Later on, other factors such as hospital capacities, population confinement, and possibly also indoor air pollution may become major predictors for severe infections to keep on manifesting. In relation to short-term exposure to peaks of low air quality levels, the capacity of pollutants to act as viral vectors should be investigated further [123,124]. In fact, particulate matter does act as a medium for the aerial transport of SARS-CoV-2 [40,125,126]. Aggregates of particulate matter with this virus have been collected in the worst affected northern Italian city of Bergamo [127].
If the viral load carried by the aggregates is enough to cause morbidity, pollution would directly act as a vector, broadening the harm done by the human-to-human contagions.
To conclude, these findings are sufficiently significant to prompt researchers studying the public health of industrialised countries to always consider air pollution as a contributing risk factor for COVID-19 and for any other airborne viral epidemics [128]. To overcome the limitations of our study, longitudinal screenings performed on patients from retrospective cohorts may further point at air pollution as a cofactor [129]. These results inform epidemiologists and policymakers on how to prevent future, more frequent and lethal viral outbreaks by curbing air pollution and, ultimately, meeting climate goals [130]. Can the fossil fuel economy carry on unabated once we resume the lockdowns? Institutions need to endorse these interventions and speed up reforms more seriously [131,132], together with endorsing collateral and more comprehensive measures [133] playing a role in epidemics and zoonoses [134,135], such as impeding biodiversity loss and land use change [136][137][138][139][140], decreasing intensive livestock farming, and alleviating poverty [141,142]. This new coronavirus shall be an opportunity given to the governments to forcefully revive sustainable development goals.