Geographical Pattern of COVID-19-Related Outcomes over the Pandemic Period in France: A Nationwide Socio-Environmental Study

Background: Several studies have investigated the implication of air pollution and some social determinants on COVID-19-related outcomes, but none of them assessed the implication of spatial repartition of the socio-environmental determinants on geographic variations of COVID-19 related outcomes. Understanding spatial heterogeneity in relation to the socio-environmental determinant and COVID-19-related outcomes is central to target interventions toward a vulnerable population. Objectives: To determine the spatial variability of COVID-19 related outcomes among the elderly in France at the department level. We also aimed to assess whether a geographic pattern of Covid-19 may be partially explained by spatial distribution of both long-term exposure to air pollution and deprived living conditions. Methods: This study considered four health events related to COVID-19 infection over the period of 18 March and 02 December 2020: (i) hospitalization, (ii) cases in intensive health care in the hospital, (iii) death in the hospital, and (iv) hospitalized patients recovered and returned back home. We used the percentage of household living in an overcrowding housing to characterize the living conditions and long-term exposure to NO2 to analyse the implication of air pollution. Using a spatial scan statistic approach, a Poisson cluster analysis method based on a likelihood ratio test and Monte Carlo replications was applied to identify high-risk clusters of a COVID-19-related outcome. Result: our results revealed that all the outcomes related to COVID-19 infection investigated were not randomly distributed in France with a statistically significant cluster of high risk located in Eastern France of the hospitalization, cases in the intensive health care at the hospital, death in the hospital, and recovered and returned back home compared to the rest of France (relative risk, RR = 1.28, p-value = 0.001, RR = 3.05, p = 0.001, RR = 2.94, p = 0.001, RR = 2.51, p = 0.001, respectively). After adjustments for socio-environmental determinants, the crude cluster shifts according to different scenarios suggested that both the overcrowding housing level and long-term exposure to largely NO2 explain the spatial distribution of COVID-19-related outcomes. Conclusions: Our findings suggest that the geographic pattern of COVID-19-related outcomes is largely explained by socio-spatial distribution of long-term exposure to NO2. However, to better understand spatial variations of COVID-19-related outcomes, it would be necessary to investigate and adjust it for other determinants. Thus, the current sanitary crisis reminds us of how unequal we all are in facing this disease.


Introduction
Over the last several months, most countries in the world have been impacted by the severe pandemic. In December 2019, in Wuhan (China), several pneumonia cases were ob-the rate of health events related to COVID-19 higher among peoples living in a more polluted and more deprived living condition? In other words, is there a socio-environmental disproportionate distribution of COVID-19-related outcomes? In this context, the aim of the present study is to investigate whether a geographic pattern of Covid-19 may be explained by spatial distribution of both long-term exposure to air pollution and a deprived living condition by assessing the spatial implication of neighbourhood characteristics on a spatial repartition of a Covid-19-related outcome.

Study Area
The study area is the national territory of the France metropolitan, which host about 64 million inhabitants (INSEE, 2020). France is subdivided into 95 departments with a mean population of about 680,000 inhabitants (varying from a minimum of 76,000 inhabitants in Lozère and more than 2 million inhabitants in Nord, Bouches-du-Rhône, and Nord departments).

Health Data
This study considered four health events related to a COVID-19 infection. All indicators were estimated at the French department scale. We used health data collected in the hospital available from the French government website [18]. The French Public Health Agency (Santé Publique France) collected all health data of confirmed cases of COVID-19 from all health care systems in order to monitor the pandemic. All anonymized databases at the department level were available on open access from the health minister website.
Four health indicators were considered, over the period of 18 March and 02 December 2020.
i. the total number of hospitalized persons due to COVID-19 infection, ii. the total number of severe COVID-19 cases in the intensive health care in the hospital, iii. the total number of deaths at the hospital caused by COVID-19 infection, and iv. the total number of hospitalized patients recovered and returned back home.
All number cases as well as the number of deaths have been divided by the department number of inhabitants to standardize the health indicators by population size. The number of inhabitants per department are available from the National Institute of Statistics and Economic Studies (INSEE website [19]).

Air Pollution Data
Several recent epidemiological studies and scientific evidence suggest the role of air pollution in COVID-19 spread and lethality [15] including NO 2 and PM. The Nitrogen dioxide (NO 2 ) is a good indicator for traffic-related air pollution and correlated with other traffic-related air pollution: particulate matter. In addition, the modelled NO 2 presents more spatial variations than other modelled outdoor pollutants [20]. Today, NO 2 may be considered a good proxy indicator of outdoor air pollution. Therefore, Nitrogen dioxide (NO 2 ) was the pollutant included in this study. This pollutant was known to be a good marker of the pollution due to road traffic. It was chosen because of the largest spatial variabilities observed compared to particle matters. The dataset of annual mean of NO 2 for each monitoring station was taken from 12 Associations for Surveillance of Air Quality (AASQA) through the National Association for the Surveillance of Air Pollution (ATMO France) [21] and from the European Environmental Agency, accessed on 22/05/2020, EEA, 2020).
We estimated the average of the annual mean of NO 2 between 2014 and 2018 for all monitoring stations (Scenario 0) for background and traffic monitoring stations (Scenario 1) and for background monitoring stations only (Scenario 2) located within each department in France. Then, we obtained tree measures of NO 2 exposure per department, which reflect the long-term NO 2 exposure of the population.

Neighbourhood Deprivation Context
To characterize the deprived condition level, in this study, we used indicators to characterize the living conditions during the containment: The percentage of household living in an overcrowding housing. Data are collected by the National Institute of Statistics and Economic Studies (INSEE) and available on their website [22].

Descriptive Analysis
First, the Pearson correlation coefficient "r" was produced in order to quantify the intensity of the relation between each health event and the deprived living condition, as well as the long-term NO 2 exposure separately.
Second, we completed the descriptive analysis by implementing a simple linear regression to give an order of magnitude of the health rate increase according to the increase of the covariates (living condition and long-term exposure to NO 2 ).
Finally, in order to explore the existence of a potential interaction between the socioeconomic variable and NO 2 exposure, we stratified our analysis in two groups based on the median of the distribution of the percentage of household living in an overcrowding housing. Interactions were formally tested by introducing an interaction term in the linear regression to produce the p-value and conclude the statistical significance of the modifier effect of overcrowding in relation to air pollution and health events related to a COVID-19 infection.

Spatial Analysis
To investigate the geographic pattern of the incidence of a Covid-19 related-outcome at the department level in France, we used the most appropriate spatial scan statistic approach [23] implemented in the SaTScan software [24]. This approach is used in an increasing number of applications in the field of spatial epidemiology and public health [25][26][27][28][29]. In our study, its allows the statistical and significant investigation of the presence of a high-risk Covid-19-related outcome among elderly people during pandemic periods and their spatial approximate location [30][31][32].

Methodological Approach
In this approach, the null hypothesis (H 0 ) tested is that the risk of a Covid-19 healthrelated outcome is the same throughout the study area. In other words, the expected incidence of a COVID-19-related outcome would be randomly distributed in space [28,33].
The alternative hypothesis (H 1 ) is that there is an elevated risk of incidence of a COVID-19-related outcome within the cluster in comparison with census blocks outside the cluster. The Poisson probability model was chosen as a cluster analysis method to detect the presence of cluster of high risk of a COVID-19-related outcome. More precisely, in our approach, we used the Poisson-based model where the number of COVID-19 health events in a geographical area is Poisson-distributed.
The procedure works as follows: a circle or window of a variable radius (from 0% up to 50% of the population size [34]) is placed through an iterative process on one centroid of the department and moves across the whole study area, to compare the incidence of a COVID-19-related outcome in the window with incidence expected under a random distribution.
Step 1: Estimate the relative risk (RR) within each circle based on a Poisson model We, therefore, computed a relative risk (RR) in each department weighted by the population size living in each department. This RR represents the expected risk of a COVID-19-related outcome within an area or window divided by the expected risk outside of the department or window.
We conducted separate analysis for each incidence of a COVID-19 related outcome (i.e., hospitalization, death, severe form, return back home). This is the estimated risk within the cluster divided by the estimated risk outside the cluster. It is calculated as the observed divide by the expectation within the cluster, which is divided by the observation expected outside the cluster.
The mathematical notation is described below with Equation (1). where:  C is the total number of cases in the data set.  Note that since the analysis is conditioned on the total number of cases observed, E[C] = C.
Step 2: Identification of the most likelihood cluster An increasing number of circles with different sets of departments within them are created. Thus, each circle is a possible candidate for a cluster with the likelihood estimate. The scan statistic approach is likelihood based. The most likely cluster can be selected and tested for statistical significance.
Under the Poisson assumption, the likelihood function is described by Equation (2).
where:  C is the total number of cases,  c is the observed number of cases within the window, and  E[c] is the covariate adjusted and expected number of cases within the window under the null-hypothesis.  Note that, since the analysis is conditioned on the total number of cases observed, C-E[c] is the expected number of cases outside the window.  I() is equal to 1 when the window has more cases than expected under the nullhypothesis, and 0 otherwise. Since this analysis is only interested in detecting clusters with higher than expected rates, I() was equal to 1.
In the first crude model, which is the unadjusted analysis, E[c] is equal to an expected number of cases within the window under the null-hypothesis. In the second and third adjusted model, E[c] is a covariate adjusted and expected number of cases.
The likelihood function is maximized over all window locations and sizes, and the one with the maximum likelihood constitutes the most likely cluster. The identification of the most likely clusters is based on a likelihood ratio test [34] with an associated p-value obtained using Monte Carlo replications [33]. We considered a 0.05 level of significance. Mapping of the clusters was carried out using ArcGIS [35].

Methodological Strategy
c is the number of observed cases within the cluster and  c is the number of observed cases within the cluster and  C is the total number of cases in the data set.  Note that since the analysis is conditioned on the total number of cases observed, E[C] = C.
Step 2: Identification of the most likelihood cluster An increasing number of circles with different sets of departments within them are created. Thus, each circle is a possible candidate for a cluster with the likelihood estimate. The scan statistic approach is likelihood based. The most likely cluster can be selected and tested for statistical significance.
Under the Poisson assumption, the likelihood function is described by Equation (2).
where:  C is the total number of cases,  c is the observed number of cases within the window, and  E[c] is the covariate adjusted and expected number of cases within the window under the null-hypothesis.  Note that, since the analysis is conditioned on the total number of cases observed, C-E[c] is the expected number of cases outside the window.  I() is equal to 1 when the window has more cases than expected under the nullhypothesis, and 0 otherwise. Since this analysis is only interested in detecting clusters with higher than expected rates, I() was equal to 1.
In the first crude model, which is the unadjusted analysis, E[c] is equal to an expected number of cases within the window under the null-hypothesis. In the second and third adjusted model, E[c] is a covariate adjusted and expected number of cases.
The likelihood function is maximized over all window locations and sizes, and the one with the maximum likelihood constitutes the most likely cluster. The identification of the most likely clusters is based on a likelihood ratio test [34] with an associated p-value obtained using Monte Carlo replications [33]. We considered a 0.05 level of significance. Mapping of the clusters was carried out using ArcGIS [35].

Methodological Strategy
C is the total number of cases in the data set.


c is the number of observed cases within the cluster and  C is the total number of cases in the data set.  Note that since the analysis is conditioned on the total number of cases observed, E[C] = C.
Step 2: Identification of the most likelihood cluster An increasing number of circles with different sets of departments within them are created. Thus, each circle is a possible candidate for a cluster with the likelihood estimate. The scan statistic approach is likelihood based. The most likely cluster can be selected and tested for statistical significance.
Under the Poisson assumption, the likelihood function is described by Equation (2).
where:  C is the total number of cases,  c is the observed number of cases within the window, and  E[c] is the covariate adjusted and expected number of cases within the window under the null-hypothesis.  Note that, since the analysis is conditioned on the total number of cases observed, C-E[c] is the expected number of cases outside the window.  I() is equal to 1 when the window has more cases than expected under the nullhypothesis, and 0 otherwise. Since this analysis is only interested in detecting clusters with higher than expected rates, I() was equal to 1.
In the first crude model, which is the unadjusted analysis, E[c] is equal to an expected number of cases within the window under the null-hypothesis. In the second and third adjusted model, E[c] is a covariate adjusted and expected number of cases.
The likelihood function is maximized over all window locations and sizes, and the one with the maximum likelihood constitutes the most likely cluster. The identification of the most likely clusters is based on a likelihood ratio test [34] with an associated p-value obtained using Monte Carlo replications [33]. We considered a 0.05 level of significance. Mapping of the clusters was carried out using ArcGIS [35].

Methodological Strategy
Note that since the analysis is conditioned on the total number of cases observed, E[C] = C.
Step 2: Identification of the most likelihood cluster An increasing number of circles with different sets of departments within them are created. Thus, each circle is a possible candidate for a cluster with the likelihood estimate. The scan statistic approach is likelihood based. The most likely cluster can be selected and tested for statistical significance.
Under the Poisson assumption, the likelihood function is described by Equation (2). where: COVID-19-related outcome within an area or window divided by the expected risk outside of the department or window. We conducted separate analysis for each incidence of a COVID-19 related outcome (i.e., hospitalization, death, severe form, return back home). This is the estimated risk within the cluster divided by the estimated risk outside the cluster. It is calculated as the observed divide by the expectation within the cluster, which is divided by the observation expected outside the cluster.
The mathematical notation is described below with Equation (1).
c is the number of observed cases within the cluster and  C is the total number of cases in the data set.  Note that since the analysis is conditioned on the total number of cases observed, E[C] = C.
Step 2: Identification of the most likelihood cluster An increasing number of circles with different sets of departments within them are created. Thus, each circle is a possible candidate for a cluster with the likelihood estimate. The scan statistic approach is likelihood based. The most likely cluster can be selected and tested for statistical significance.
Under the Poisson assumption, the likelihood function is described by Equation (2).
where:  C is the total number of cases,  c is the observed number of cases within the window, and  E[c] is the covariate adjusted and expected number of cases within the window under the null-hypothesis.  Note that, since the analysis is conditioned on the total number of cases observed, C-E[c] is the expected number of cases outside the window.  I() is equal to 1 when the window has more cases than expected under the nullhypothesis, and 0 otherwise. Since this analysis is only interested in detecting clusters with higher than expected rates, I() was equal to 1.
In the first crude model, which is the unadjusted analysis, E[c] is equal to an expected number of cases within the window under the null-hypothesis. In the second and third adjusted model, E[c] is a covariate adjusted and expected number of cases.
The likelihood function is maximized over all window locations and sizes, and the one with the maximum likelihood constitutes the most likely cluster. The identification of the most likely clusters is based on a likelihood ratio test [34] with an associated p-value obtained using Monte Carlo replications [33]. We considered a 0.05 level of significance. Mapping of the clusters was carried out using ArcGIS [35].

Methodological Strategy
C is the total number of cases, population size living in each department. This RR represents the expected risk of a COVID-19-related outcome within an area or window divided by the expected risk outside of the department or window.
We conducted separate analysis for each incidence of a COVID-19 related outcome (i.e., hospitalization, death, severe form, return back home). This is the estimated risk within the cluster divided by the estimated risk outside the cluster. It is calculated as the observed divide by the expectation within the cluster, which is divided by the observation expected outside the cluster.
The mathematical notation is described below with Equation (1).
where:  c is the number of observed cases within the cluster and  C is the total number of cases in the data set.  Note that since the analysis is conditioned on the total number of cases observed, Step 2: Identification of the most likelihood cluster An increasing number of circles with different sets of departments within them are created. Thus, each circle is a possible candidate for a cluster with the likelihood estimate. The scan statistic approach is likelihood based. The most likely cluster can be selected and tested for statistical significance.
Under the Poisson assumption, the likelihood function is described by Equation (2).
where:  C is the total number of cases,  c is the observed number of cases within the window, and  E[c] is the covariate adjusted and expected number of cases within the window under the null-hypothesis.  Note that, since the analysis is conditioned on the total number of cases observed, C-E[c] is the expected number of cases outside the window.  I() is equal to 1 when the window has more cases than expected under the nullhypothesis, and 0 otherwise. Since this analysis is only interested in detecting clusters with higher than expected rates, I() was equal to 1.
In the first crude model, which is the unadjusted analysis, E[c] is equal to an expected number of cases within the window under the null-hypothesis. In the second and third adjusted model, E[c] is a covariate adjusted and expected number of cases.
The likelihood function is maximized over all window locations and sizes, and the one with the maximum likelihood constitutes the most likely cluster. The identification of the most likely clusters is based on a likelihood ratio test [34] with an associated p-value obtained using Monte Carlo replications [33]. We considered a 0.05 level of significance. Mapping of the clusters was carried out using ArcGIS [35].

Methodological Strategy
c is the observed number of cases within the window, and We, therefore, computed a relative risk (RR) in each department weighted by the population size living in each department. This RR represents the expected risk of a COVID-19-related outcome within an area or window divided by the expected risk outside of the department or window.
We conducted separate analysis for each incidence of a COVID-19 related outcome (i.e., hospitalization, death, severe form, return back home). This is the estimated risk within the cluster divided by the estimated risk outside the cluster. It is calculated as the observed divide by the expectation within the cluster, which is divided by the observation expected outside the cluster.
The mathematical notation is described below with Equation (1).
where:  c is the number of observed cases within the cluster and  C is the total number of cases in the data set.  Note that since the analysis is conditioned on the total number of cases observed, E[C] = C.
Step 2: Identification of the most likelihood cluster An increasing number of circles with different sets of departments within them are created. Thus, each circle is a possible candidate for a cluster with the likelihood estimate. The scan statistic approach is likelihood based. The most likely cluster can be selected and tested for statistical significance.
Under the Poisson assumption, the likelihood function is described by Equation (2).
where:  C is the total number of cases,  c is the observed number of cases within the window, and  E[c] is the covariate adjusted and expected number of cases within the window under the null-hypothesis.  Note that, since the analysis is conditioned on the total number of cases observed, C-E[c] is the expected number of cases outside the window.  I() is equal to 1 when the window has more cases than expected under the nullhypothesis, and 0 otherwise. Since this analysis is only interested in detecting clusters with higher than expected rates, I() was equal to 1.
In the first crude model, which is the unadjusted analysis, E[c] is equal to an expected number of cases within the window under the null-hypothesis. In the second and third adjusted model, E[c] is a covariate adjusted and expected number of cases.
The likelihood function is maximized over all window locations and sizes, and the one with the maximum likelihood constitutes the most likely cluster. The identification of the most likely clusters is based on a likelihood ratio test [34] with an associated p-value obtained using Monte Carlo replications [33]. We considered a 0.05 level of significance. Mapping of the clusters was carried out using ArcGIS [35].

Methodological Strategy
is the covariate adjusted and expected number of cases within the window under the null-hypothesis.
Step 1: Estimate the relative risk (RR) within each circle based on a Poisson model We, therefore, computed a relative risk (RR) in each department weighted by the population size living in each department. This RR represents the expected risk of a COVID-19-related outcome within an area or window divided by the expected risk outside of the department or window.
We conducted separate analysis for each incidence of a COVID-19 related outcome (i.e., hospitalization, death, severe form, return back home). This is the estimated risk within the cluster divided by the estimated risk outside the cluster. It is calculated as the observed divide by the expectation within the cluster, which is divided by the observation expected outside the cluster.
The mathematical notation is described below with Equation (1).
where:  c is the number of observed cases within the cluster and  C is the total number of cases in the data set.  Note that since the analysis is conditioned on the total number of cases observed, E[C] = C.
Step 2: Identification of the most likelihood cluster An increasing number of circles with different sets of departments within them are created. Thus, each circle is a possible candidate for a cluster with the likelihood estimate. The scan statistic approach is likelihood based. The most likely cluster can be selected and tested for statistical significance.
Under the Poisson assumption, the likelihood function is described by Equation (2).
where:  C is the total number of cases,  c is the observed number of cases within the window, and  E[c] is the covariate adjusted and expected number of cases within the window under the null-hypothesis.  Note that, since the analysis is conditioned on the total number of cases observed, C-E[c] is the expected number of cases outside the window.  I() is equal to 1 when the window has more cases than expected under the nullhypothesis, and 0 otherwise. Since this analysis is only interested in detecting clusters with higher than expected rates, I() was equal to 1.
In the first crude model, which is the unadjusted analysis, E[c] is equal to an expected number of cases within the window under the null-hypothesis. In the second and third adjusted model, E[c] is a covariate adjusted and expected number of cases.
The likelihood function is maximized over all window locations and sizes, and the one with the maximum likelihood constitutes the most likely cluster. The identification of the most likely clusters is based on a likelihood ratio test [34] with an associated p-value obtained using Monte Carlo replications [33]. We considered a 0.05 level of significance. Mapping of the clusters was carried out using ArcGIS [35].

Methodological Strategy
Note that, since the analysis is conditioned on the total number of cases observed, C-E[c] is the expected number of cases outside the window. Step 1: Estimate the relative risk (RR) within each circle based on a Poisson model We, therefore, computed a relative risk (RR) in each department weighted by the population size living in each department. This RR represents the expected risk of a COVID-19-related outcome within an area or window divided by the expected risk outside of the department or window.
We conducted separate analysis for each incidence of a COVID-19 related outcome (i.e., hospitalization, death, severe form, return back home). This is the estimated risk within the cluster divided by the estimated risk outside the cluster. It is calculated as the observed divide by the expectation within the cluster, which is divided by the observation expected outside the cluster.
The mathematical notation is described below with Equation (1).
where:  c is the number of observed cases within the cluster and  C is the total number of cases in the data set.  Note that since the analysis is conditioned on the total number of cases observed, E[C] = C.
Step 2: Identification of the most likelihood cluster An increasing number of circles with different sets of departments within them are created. Thus, each circle is a possible candidate for a cluster with the likelihood estimate. The scan statistic approach is likelihood based. The most likely cluster can be selected and tested for statistical significance.
Under the Poisson assumption, the likelihood function is described by Equation (2).
where:  C is the total number of cases,  c is the observed number of cases within the window, and  E[c] is the covariate adjusted and expected number of cases within the window under the null-hypothesis.  Note that, since the analysis is conditioned on the total number of cases observed, C-E[c] is the expected number of cases outside the window.  I() is equal to 1 when the window has more cases than expected under the nullhypothesis, and 0 otherwise. Since this analysis is only interested in detecting clusters with higher than expected rates, I() was equal to 1.
In the first crude model, which is the unadjusted analysis, E[c] is equal to an expected number of cases within the window under the null-hypothesis. In the second and third adjusted model, E[c] is a covariate adjusted and expected number of cases.
The likelihood function is maximized over all window locations and sizes, and the one with the maximum likelihood constitutes the most likely cluster. The identification of the most likely clusters is based on a likelihood ratio test [34] with an associated p-value obtained using Monte Carlo replications [33]. We considered a 0.05 level of significance. Mapping of the clusters was carried out using ArcGIS [35].

Methodological Strategy
I() is equal to 1 when the window has more cases than expected under the nullhypothesis, and 0 otherwise. Since this analysis is only interested in detecting clusters with higher than expected rates, I() was equal to 1.
In the first crude model, which is the unadjusted analysis, E[c] is equal to an expected number of cases within the window under the null-hypothesis. In the second and third adjusted model, E[c] is a covariate adjusted and expected number of cases.
The likelihood function is maximized over all window locations and sizes, and the one with the maximum likelihood constitutes the most likely cluster. The identification of the most likely clusters is based on a likelihood ratio test [34] with an associated p-value obtained using Monte Carlo replications [33]. We considered a 0.05 level of significance. Mapping of the clusters was carried out using ArcGIS [35].

Methodological Strategy
If we detect a significant cluster of high-risk using this method, the next step will be to explore whether the significant cluster can be explained by suspected risk factors. Thus, spatial analyses were performed in four stages (step-by-step). i.
Crude analysis (unadjusted) to identify and spatially localize the most likely cluster of high incidence of a COVID-19 related outcome. ii. Adjusted analysis for a living deprivation condition.
iii. Adjusted analysis for long-term exposure to NO 2 . iv. Adjusted analysis for both the deprivation context and long-term exposure to NO 2 .
The models were adjusted on one or more co-variables, and, according to the Kulldorff studies [34], several criteria were used to reject, or not, the H 0 hypothesis according to the cluster's localization and statistical significance, as well as the likelihood ratio value of each model (for more detail, see References [25][26][27]36]). Figure 1a-d presents the spatial distribution of the four health events related to COVID-19 infection, which reveal a strong spatial pattern with higher health events counted in Northeast France. The annual mean of NO 2 is higher in the north and east departments of France whatever the scenario ( Figure 2). As expected, from the background monitoring stations (scenario 2), less variabilities of NO 2 are observed (Figure 2c). Spatial clusters of a high percentage of households living in overcrowded housing are visible in the Ile de France region and in the departments near the Mediterranean Sea (Figure 3).

Descriptive Data
The fours graphics in Figure 4a revealed significant linear associations between the percentage of households living in overcrowded housing and each of the four health indicators: the correlation coefficients vary between 0.75 and 0.78. All are statistically significant. In the graphics of Figure 4b, similar results were found with long-term NO 2 exposure regardless of the monitoring stations included in the calculation of the air pollution indicators (Figure 4b). The NO 2 exposure is higher over the long-term, and the rate of health events is higher. The correlation coefficients vary between 0.68 and 0.735.

Spatial Distribution
Figure 5 details the department containing the most likely clusters of high risk of a COVID-related health incidence (hospitalized, intensive health care, death, recovered cases, returned back home), their spatial location, and the spatial shift of centroid from an unadjusted cluster to a covariate-adjusted cluster. Table 3 presents the most likely clusters, the number of departments, the radius of the circle of the cluster, and the relative risk (RR, the ratio of the observed-to-expected number of new patients in each department estimated by SaTScan) for each cluster. Figure 5a revealed the location of the most likely cluster. Regardless of the COVID-19-related outcome, the most likely cluster had a statistically significant high risk of a  Table 1 presents the association between the overcrowding housing and COVID-19 health events. All measures of associations are highly significant (p < 0.0001), between overcrowded housing and four COVID-19 related outcomes. The regression coefficients estimated by a linear varied from 71.8 into 439.4 for cases hospitalized and cases in intensive healthcare, respectively.  Table 2 presents the association between long-term NO 2 exposure and health events stratified, according to tertile distribution of the percentage of overcrowded housing. All measures of associations are highly significant among the department with a high percentage of households living in overcrowded housing. These findings are in favour of the existence of an interaction between a deprived living condition and long-term NO 2 exposure on intensive healthcare risk, as confirmed by the interaction tests (p = 0.008).

Spatial Distribution
Figure 5 details the department containing the most likely clusters of high risk of a COVID-related health incidence (hospitalized, intensive health care, death, recovered cases, returned back home), their spatial location, and the spatial shift of centroid from an unadjusted cluster to a covariate-adjusted cluster. Table 3 presents the most likely clusters, the number of departments, the radius of the circle of the cluster, and the relative risk (RR, the ratio of the observed-to-expected number of new patients in each department estimated by SaTScan) for each cluster. Our results indicated that the excess risk of different COVID-19-related outcomes among a population aged over 60 years old observed in the unadjusted analysis was explained in a great part, but not entirely, by long-term exposure to NO2 and the percentage of households living in overcrowded housing.  Table 3. Summary statistics of the most likely clusters of COVID-19-related outcomes and spatial relocation resulting from the adjustment analysis.   RR: relative risk. LLr: log likelihood ratio. a Unadjusted analysis, to identify and localize the most likely cluster(s) of high risk of COVID-19-related outcomes. b Cluster radius is the radius of the spatial circle. c Number of Department: is the number of departments composed of the most likely cluster. d No. of observed cases: the number of the observed cases within the most likelihood cluster. e No. of expected cases: the number of the expected cases within the most likely cluster. f Shift: if the cluster is in the same location of the crude, the most likely cluster (in unadjusted analysis). g p value: the statistically significant one of the most likely cluster. Figure 5a revealed the location of the most likely cluster. Regardless of the COVID-19-related outcome, the most likely cluster had a statistically significant high risk of a COVID-19-related outcome incidence. More precisely, the result reveled that the risk of a COVID-19 hospitalized incidence was 2.49 greater in the eastern part than the rest of France (p-value = 0.001, Table 3). The spatial analysis of COVID-19-related death, which was recovered and returned back home revealed the same most likely clusters located in Northeast France and composed of 43 departments. The clusters had a risk of death and, of those recovered and returned back home, 2.94 and 2.51 (respectively) were greater than the rest of France. A small cluster of intensive healthcare in the hospital was also identified in Northeast France (p-value = 0.039). The cluster composed of six departments had a risk of intensive healthcare in a hospital of 3.05 greater than the rest of France (p-value = 0.0001, Table 3).

Unadjusted Analysis
Adjusted scan statistical analysis is detailed below according to the variables for which the model was adjusted.
After adjustment for the percentage of households living in overcrowded housing. The most likely cluster remained statistically significant and located in the same zone. However, the likelihood ratio decreasing from 22,089 to 11,756 indicates that the spatial distribution of overcrowded housing only slightly explains the spatial distribution of a COVID-19-related outcome (data not shown).
After adjustment for long-term exposure to NO 2 (Figure 5b), the cluster was reduced, the centroid of the cluster was shifted to Northeast France and the likelihood ratio decreased from 22,089 to 3074 (Table 3). It indicates that the spatial distribution of long-term exposure to NO 2 partially explained the excess risk of COVID hospitalization observed in the unadjusted analysis.
The cluster of excess risk of death identified in crude analysis in Eastern France was also reduced from 43 departments to 22 departments, after an adjustment for long-term exposure to NO 2 and the likelihood ratio largely decreased from 4642.46 to 855.92. In the same pattern, the cluster of excess risk of recovered values and those returned back home identified in crude analysis in Eastern France was also reduced from 43 to 25 departments and the likelihood ratio largely decreased from 15,673.73 to 2247.95 (Table 3).
In addition, the cluster of excess risk of intensive health care in a hospital identified in crude analysis in Eastern France was also reduced: the number of departments decreased from 6 to 4, after adjustment for long-term exposure to NO 2 . The likelihood ratio decreased from 3906.31 to 825.26 (Table 3).
After adjustment for long-term exposure to NO 2 and percentage of households living in overcrowded housing (Figure 5b)-The most likely significant cluster shifted in Northeastern France (RR = 1.32) with a relatively larger decrease in the likelihood ratio from 22,089 to 2189 (Table 3). These results indicated that long-term exposure to NO 2 and percentage of households living in overcrowded housing explained a great part of the excess risk of COVID-19 hospitalization observed in the unadjusted analysis.
The cluster of excess risk of death identified in crude analysis in East France was also reduced to 22 departments, after an adjustment for long-term exposure to NO 2 and a deprived conditional level. The Llr value largely decreased from 4642.46 to 601.95. Similarly, the cluster of excess risk of recovered and returned back home identified in crude analysis in Eastern France was also reduced to 22 departments and the Llr largely decreased from 15,673.73 to 1605.65 (Table 3).
In addition, the cluster of excess risk of intensive healthcare in a hospital identified in crude analysis in Eastern France was also reduced to a cluster of 4 departments. The likelihood ratio largely decreased from 3906.31 to 555.34 (Table 3).
Taking in to account the long-term exposure to NO 2 and the percentage of households living in overcrowded housing reduced the LLr to a larger degree than long-term exposure to NO 2 alone. These variables also explain a large amount of the excess of a COVID-19related outcome.
Our results indicated that the excess risk of different COVID-19-related outcomes among a population aged over 60 years old observed in the unadjusted analysis was explained in a great part, but not entirely, by long-term exposure to NO 2 and the percentage of households living in overcrowded housing.

Discussion
Our nationwide study in France revealed a significant correlation between long-term exposure to NO 2 and incidence of a COVID-19-related outcome. To our knowledge, such a work, exploring spatial implication of long-term exposure to NO 2 on geographical variations of a COVID-19 related outcome, has never been performed. It is one reason why it is difficult to compare our findings with those of others.
Our study revealed that all the outcomes related to COVID-19 infection investigated (including: hospitalization, death at hospital, recovered and returned back home, intensive health care in a hospital) were not randomly distributed in France. The increased COVID-19-related outcome incidence in Eastern France was statistically significant. The spatial distribution of both long-term exposure to NO 2 and overcrowded housing may be taken into account to fully interpret the spatial distribution of incidence of COVID-19related outcomes.
These findings are in line with recent published studies investigating the role of air pollution in COVID-19 infection, including mortality and morbidity outcomes.
More precisely, our results are consistent with studies carried out in European countries as well as in US and China, which suggest that people living in a polluted area are more predisposed to develop severe COVID-19-related events [6,9,10]. In Europe, Ogen et al., 2020, revealed that 78% of death related to COVID-19 identified in Europe are mainly concentrated in five polluted regions located in North Italy and Central Spain. More precisely, the authors found that 83% of all fatalities occurred in the European region where the maximum NO 2 concentrations was above 100 µmol/m 2 [9]. In the United States, Wu et al. suggested that even a small increase in long-term exposure to PM 2.5 increase the COVID-19 death rate: 1µg/m 3 in PM 2.5 is associated with an 8% increase in the COVID-19 death rate (95CI [2%, 15%]) [10]. All these results enrich the debate of the potential effects of chronic exposure to air pollution on the COVID-19 severity.
Therefore, we may hypothesize that chronic exposure to a high level of air pollution and living condition (approximated here by the percentage of households living in overcrowded housing) could contribute to spatial, disproportionate, severe forms in COVID-19-related outcomes observed following the different pathways.
(i) the first one acts indirectly by increasing the risk of cardiopulmonary and respiratory diseases [37,38] as well as the hypertension and the diabetes [39], which were identified as one main comorbidity risk factor of COVID-19, (ii) the second one has a more direct effect by increasing susceptibility of people to COVID-19 infection.
In addition, our finding suggests that both an overcrowded housing level and chronic exposure to NO 2 could largely explain the spatial distribution of COVID-19-related observations in a French metropolitan area.
Our finding suggests an unequal impact of COVID-19 including a death, COVID-19 hospitalization, recovered and returned back home, and intensive healthcare in a hospital. We argue that this unequal geographic pattern is due to a complex interaction between the outdoor and indoor condition: air pollution and housing condition. Thus, beyond the air pollution effect, the living condition may contribute to an unequal impact of COVID-19.
As suggested by Patel et al., 2020: "To date, policymakers have targeted people with multiple comorbidities after identifying them as the most vulnerable. However, this medical model of disease risks ignoring social factors, which can increase exposure to and mortality from coronavirus disease 2019 (COVID-19)" [40].
Based on a conceptual model of Singer [41], and the model of the main determinants of health described by Dahlgren and Whitehead [42], Bambra et al. proposed a theo-retical framework of "the syndemic of COVID-19, non-communicable diseases (NCDs) and the social determinants of health". His framework aims to explain how inequalities in COVID-19 are related to existing inequalities in chronic diseases and the social determinants of health" [43]. Thus, the authors argue that, for the most disadvantaged communities, COVID-19 is experienced as a syndemic-a co-occurring, synergistic pandemic that interacts with and exacerbates their existing NCDs and social conditions [43].
Among the living condition, housing characteristics play an important role in unequal impact of COVID-19 by following some pathways: (i) Some studies suggest that people in lower income households are more likely experience overcrowding and live in overcrowded conditions. Therefore, deprived living conditions may constitute itself as an additional risk factor of the known underlying clinical risk factors that increase the severity and mortality of COVID-19 (including cardiovascular disease, obesity, diabetes, and hypertension [44]). It suggests that people living in deprived conditions have an increased susceptibility to COVID-19 mortality. (ii) The overcrowding combined with poor quality of housing conditions may increase the vulnerability of people for COVID-19 and the severity of its consequences. These deprived living conditions including damp housing and overcrowding may induce some health outcome respiratory disorders, such as asthma and other viral infections.
Our hypothesis highlights the complexity of the mechanisms, which link chronic exposure to NO 2 and living conditions to COVID-19-related outcomes. It suggests that the combination of outdoor and indoor conditions may interact to explain the geographical pattern of COVID-19.
Lastly, our findings showed that it remained a significant cluster of excess risk of COVID-19-related outcomes, not entirely explained by spatial distribution of age, NO 2 , and overcrowded housing. This observed cluster cannot be explained by healthcare access. In France, the access to diagnosis and treatment should not be limited by socioeconomic status. Medical and hospital costs for patients with COVID-19-related outcomes are completely covered (100%), and the reimbursement is regulated by uniform rates regardless of whether the patient is treated in the public or private facility.
This observed cluster may be partly explained by the population health status such as the pre-existing chronic diseases related to the COVID-19-related outcome known as a risk factor [45].
Interpretation of our findings must consider weaknesses that could affect the strength of the associations, yield limitations in comparison with other studies, or impede the formulation of accurate conclusions.
-First, we performed an ecological study with data available at the French department level. Therefore, our results should be interpreted only in this design context and should not be interpreted at the individual level. -Second, our approach based on ecological data, has several limitations. One is the non-inclusion of gender and presence of pre-existing and background diseases and comorbidities in the analysis, which is known to be risk factors for COVID-19-related outcomes. Focusing in further studies on larger risk factors is recommended, and might produce clearer results to explain the cluster of the excess risk. -Third, to characterize the chronic exposure to NO 2 , as recent studies investigated this issue, we used mean values over five years (2014-2018) from all monitoring stations located in each French department, including several background, traffic, and industrial stations. The mean values from each station may vary according to the type: Background, traffic, and industrial station, and to the area: rural, urban, and sub-urban. In our study, we used all data available at the French department level and carried out sensitivity analysis through three scenarios. However, in our study, using data from a different type of monitoring stations may misclassify the level of exposure of several French departments. At this time, these were the only available data for the study period, which covered all departments of France. However, the results revealed that the measure of associations as well as the statistical significance did not vary as much, according to the scenario used (data not shown). Next, studies need to assess the impact of air pollution on COVID-19 outcomes using modelled measures of NO 2 exposure at a finer spatial scale.

Conclusions
Our study quantified the unequal spatial distribution of the COVID-19 related outcome. The clustering analysis confirmed the higher risk of COVID-19-related outcome located in Eastern France. In addition, our findings suggest that the geographic pattern of a COVID-19-related outcome is largely explained by socio-spatial distribution of long-term exposure to NO 2 . However, to better understand spatial variations of a COVID-19-related outcome, it would be necessary to research and adjust for other determinants.
Thus, the current sanitary crisis reminds us how unequal we all are in facing this disease. Building a healthy environment for all, and especially for the most vulnerable population, is a crucial issue if we want to design and implement measures for a greener and more equitable territory.