A Transnational and Transregional Study of the Impact and Effectiveness of Social Distancing for COVID-19 Mitigation

We present an analysis of the relationship between SARS-CoV-2 infection rates and a social distancing metric from data for all the states and most populous cities in the United States and Brazil, all the 22 European Economic Community countries and the United Kingdom. We discuss why the infection rate, instead of the effective reproduction number or growth rate of cases, is a proper choice to perform this analysis when considering a wide span of time. We obtain a strong Spearman’s rank order correlation between the social distancing metric and the infection rate in each locality. We show that mask mandates increase the values of Spearman’s correlation in the United States, where a mandate was adopted. We also obtain an explicit numerical relation between the infection rate and the social distancing metric defined in the present work.


Introduction
The current COVID-19 pandemic is the main health crisis in the world in a century, with over 220 million cases and 4.5 million deaths [1]. It began in China at the end of 2019, and has since expanded to every country in the world, with waves occurring at different times in each location. A number of interventions were implemented in most countries, such as travel ban, social distancing and mandatory mask use [2,3], and their effects have been discussed in different works, which generally concluded that they were effective in reducing the growth of cases and deaths [4][5][6][7][8][9]. Possibly, more effective measures are lock-downs, closing of workplaces, businesses school closing, i.e., the social distancing policies [10], with travel restrictions expected to have modest effects in reducing transmission when there is a high circulation of the virus [11].
In order to quantify and qualify the degree of social distancing and its effects, some different approaches have been proposed: by survey questionnaires in the population in order to assess adherence to social distancing and to compare it to the growth of cases, or deaths [12], or by using mobility data from different sources [13][14][15][16][17][18][19]. In the latter case, a mobility or social distancing metric is compared to the growth rate of cases (or deaths) of COVID-19, or to the effective reproduction number R t . As we discuss below, this introduces a limitation in the analysis due to the fact that the interpretation of both the growth rate and R t at the beginning of the pandemic, when most of the population is still susceptible to the virus, is different to that at latter stages, when a non-negligible proportion of the population has already been infected, or has already been vaccinated. A more informative parameter, that better represents information on the circulation of the SARS-CoV-2 virus, is the average infection rate β, which is proportional to R t divided by the proportion of the susceptible population (see Equation (3) below). This explains particularly the result by

Effective Reproduction Number
The effective reproduction number R t at day t, estimated from the generation time distribution w(t) with t the number of days between infections, is given by [21]: with I(t) the number (or proportion) of infected individual at day t. The effective reproduction number can also be estimated from the series of deaths by first determining the number of infected individuals as: where u(t) is the distribution of the number t of days (taken as discrete) between first symptom and death [22], N deaths (t) the number of deaths at day t and θ is the average infection fatality ratio [23], computed from the demographic structure in each locality. We then use Equation (1) to determine R t at a given day.

Infection Rate
The infection rate can be estimated as [24]: with S(t) the proportion of susceptible individuals in the population at day t, R t the timedependent effective reproduction number, and γ the recovery rate from infection with the value reported in the literature [25]. We can also write that where C is the average number of contacts of one individual per day, and P c the probability of contagion of a susceptible individual from a single contact with an infected individual. Social distancing acts by reducing the number of contacts C, while other non-pharmaceutical interventions reduce the value of P c .

Social Distancing Metric
As a proxy for the "amount" of social distancing, we define a metric quantifying the deviation from a baseline representing the pre-pandemic normality. Many possibilities exist, and different mobility data are available from different sources [26][27][28][29]. We require that data are freely available, with coverage up to the city level. For these sources, only Google mobility trends satisfies these two criteria, providing data on the following six categories of locations: retail and recreation (D 1 ); grocery and pharmacy (D 2 ); parks (D 3 ); transit stations (D 4 ); workplaces (D 5 ) and residential (D 6 ), as percentages of variation of time spent in each type of place, with respect to a baseline defined for the period of 3 January to 6 February 2020. The symbols between parenthesis represent the numeric value of the time series for each type of data. An increase in the time spent at residence is expected to decrease the value of the infection rate β, and is considered as a negative contribution to the metric, while an increase in the remaining five categories are expected to increase β and thus contribute with a positive sign. The social distancing metric is then defined as a weighted average of the data for each category, with the specified sign, with weights given by an (arbitrarily) estimated average proportion of the duration of a day spent in each type of location, and given by where the value of 100 is added such that the baseline is close to this value, and has no effect of the value of the Spearman's correlation. The resulting metric M for each Brazilian and American state is shown in Figure 1A,B, respectively, with a similar behavior for the other localities considered here (not shown). This definition is such that a smaller value of M represents a more beneficial situation.

Spearman's Rank-Order Correlation
Spearman's rank-order correlation r s (A, B) between two time series A = (A 1 , . . . , A N data ) and B = (B 1 , . . . , B N data ), of length N data , with A i the value of the series at the i-th data value, is defined as [30]: such that −1 ≥ r s ≥ 1 and d i is the difference in paired ranks of the two series A and B, i.e., the difference in position of the i-th data point for the two datasets when ordered in ascending order. The coefficient r s measures the strength of how two variables are monotonically related, by an increasing or decreasing relation if r s > 0 or r s < 0, respectively. In order to show the importance to account for the decreasing number of susceptible individuals with time, we show in Figure 2A the time evolution of R t and β for the Los Angeles county in the United States. As the proportion of susceptible individuals decreases over time, R t and β diverge slowly. By computing the Spearman's correlation between M and R t and between M and β, for a period of N data = 150 days for the same data, we see from Figure 2B that a small difference between R t and β has a significant effect on the value of r s . The Spearman's correlation between M and R t is close to zero at later times while clearly positive for M and β. This is explained by the fact that, from Equation (3), that the same value of R t can correspond to different values of the infection rate β which is directly related to the circulation of the virus, as it measures the rate at which susceptible individuals are infected, and thus more closely related to the different mitigation policies implemented. We conclude that using R t to represent the stage of the pandemic can lead to misleading results at later stages in assessing the effectiveness of social distancing, as the number of susceptible individuals decreases, and that of vaccinated individuals increase.

Data Sources
The following data sources were employed in the present work:

Results and Discussion
The localities analyzed here are: • All 50 US states, from the first reported case up to 20 December 2020; • The 24 US counties with a population of at least one million and at least 1000 deaths in 2020 (Nassau was not considered due to inconsistent data for the number of deaths), from the first reported case in each county up to 20 December 2020; The span of time of the data was chosen to avoid the effect of vaccination in the United States and Europe, while for Brazil detailed and publicly available anonymized data on each vaccine shot delivered allows modeling the time evolution of the pandemic for a longer period. For estimations of susceptible population in Equation (3), we use the epidemiological model described in [31] to determine the attack rate in each locality and the model is described in Appendix A. Serological surveys also provide such estimates, but are not available for every locality and for the required time window and, where available, data do not have the required time resolution.
The results of the Spearman's rank-order correlation between the social distancing metric M and the infection rate β for each locality are show in Figure 3. In order to assess the effect of mandatory mask use in each US county and state, we compute r s for two periods: for the whole period, indicating in the corresponding graphic the percentage of time with a mask mandate, and for the period with a mask mandate, for those counties with a mandate for at least 50% of the days since the beginning of the pandemic, while for the remaining counties, we consider the whole period and display the corresponding histogram in black. We also computed the Spearman's correlation separately for each of the six mobility data reported by Google, with results shown in Figures 4 and 5. The average of β/γ, over the time period considered for each locality, versus the total number of deaths at the end of each period is shown in Figure 6, where an approximately linear relation is clearly visible, with the exception of a few cases in Brazil.
In order to established a numeric relationship between β and M, let us assume the linear relation with α a constant, and consider only the time window that allows to an accurate estimation of R t . The distributions of values of the ratio α/γ = β(t)/γM(t) for the Brazilian states, Brazilian municipalities, European countries, US states and counties are shown in Figure  While vaccination reduces the proportion of susceptible individuals in the population, it does not alter the relationship of the infection rate β with social distancing policies with M as a proxy, and this was explicitly taken into account in our analysis by using an epidemiological model with vaccination compartments. The approach presented here allowed to evidence a monotonous relationship between the infection rate in each locality and the social distancing metric M. It also allowed to explicitly obtain a numeric relationship between β and a metric for social distancing. Behavioral changes can also have a significant impact on the evolution of any epidemic, and are difficult to include in the current analysis. Nevertheless, the significant values obtained for the Spearman's correlation indicate the important role that social distancing has played up to now. This is particularly clear in Belgium (r s = 0.75), Spain (r s = 0.8) and the United Kingdom (r s = 0.88), three countries with a high attack rate. The correlation is somewhat smaller for other localities, but nevertheless with significant positive values, clearly indicating an approximately monotonous relationship between the two variables.        For Brazil and the European countries, the results for Spearman's correlation are quite similar: the variation in time spent at residence is negatively correlated with the infection rate, i.e., the more time spent at home the smaller the value of β, while other categories are positively correlated. For the United States, due to a much greater variety of mitigation policies implemented [13], we see a slightly different picture. In general, time at residence is negatively correlated with β while time at workplace is positively correlated with the transmission rate, as expected. For the remaining categories (grocery and pharmacy, park, retail and recreation and transit stations), we observe both negative and positive correlations according to the locality, indicating that the most relevant categories are those related to the increase of time spent at home and the decrease of time spent at work places. For the United States case, there is a significant increase in the value of r s when considering only the time period with a mask mandate, which indeed shows its effectiveness.
The values of the proportionality constant α/γ between β(t)/γ and M(t) are surprisingly close to one another, despite the great differences in the history and implemented policies to mitigate the COVID-19 pandemic. We obtain a log-normal distribution for the value of α/γ (and for α consequently) for all types of localities considered here, with average values significantly close to each other, despite all the differences between countries, implemented mitigation policies, and timings. This points to a universal efficacy of social distancing, enhanced by a mandatory mask use. The explicit linear relation in Equation (7) with the value obtained for the proportionality constant α can be used, for instance, in modeling studies with different scenarios for social isolation.
Of course not only social distancing affects the evolution of the infection rate, causing the variation observed for the Spearman's correlation for the different localities. We note that even a small increase in β, and thus, a small decrease in M, for a long period of time, results in a significant increase in mortality, as can be seen from Figure 6. Our analysis does not grasp the impact of great gatherings of individuals and the possible effect of the so-called superspreading events [20], or the implications of contact tracing.

Conclusions
A proper choice of a variable to represent the current circulation of the virus is central to assessing the effects of mitigation policies. The infection rate as expressed in Equation (4) is affected by the reduction of social contacts through the average number contacts C, and by other implemented protocols, such as mask wearing, that reduce the probability of contagion per contact P c . On the other hand, the effective reproduction number R t , or any other measure of growth rate of the pandemic, also depends on the current attack rate, and confuses variables in the analysis. The value of R t depends on two factors: the amount of virus in circulation and the proportion of susceptible individuals in the population S(t). For instance, for the same value of R t = 1 occurring at two different moments of time t 1 and t 2 such that S(t 1 ) = 1 (begging of the pandemic) and S(t 2 ) = 0.5 (half of the population already infected) would imply β(t 1 ) = 1 and β(t 2 ) = 2, i.e., the probability of being infected by unit of time at t = t 2 is the double than for t = t 1 . A smaller infection probability is what is sought by the mitigation measures. We see that the same value of R t can mean different situations depending on the attack rate by the virus, and blurs the analysis when a wider time span is considered such that S(t) varies significantly, as is the case in the data analyzed here. At a given moment of time, social isolation acts on β but not on S(t). While we expect a monotonous relation between β and M, a monotonous relation between R t and M is only evidenced for a shorter time interval such that the proportion of susceptible individuals does no vary significantly. We performed the same analysis for all the localities considered (not shown) by computing the Spearman's correlation between R t and M, and obtained much less significant results, as well exemplified in Figure 2. This is an important point to consider as a more detailed analysis requires a large dataset, and consequently, a significant variation in the proportion of susceptible individuals. Computing Spearman's correlation, rather than Pearson correlation, for instance, allows us to clearly evidence a monotonous relationship between the social distancing metric as defined here and the infection rate, and computed from the whole time series for each locality.
One limitation of the present work is due to the fact that Spearman's correlation measures the "amount" of how much one variable is a monotonous function of another variable and that the existence of a time lag for social isolation to affect the evolution of the disease may result in a smaller value for the correlation r s . Nevertheless, the approximate (inverse) monotonous relation between social isolation and infection rate is clearly evidenced in our results. We also obtained a strong indication of the positive effect of mask use on controlling the spread of the virus. For localities where a mask mandate was in place, the value of the Spearman's correlation is usually bigger, as well when considering only the time period with a mask mandate. Further and more detailed studies should be performed to put forward a more direct relation between mask use and the infection rate values.
Future research considering socioeconomic and demographic data would certainly provide valuable information on mitigation strategies targeted at specific groups, such as elders and individuals with comorbidities, as well as the impact of school closure, each considered separately from other factors [32]. We hope that the present work will contribute to a better assessment of the effects of social distancing, and at least partially of mask mandates, on the still ongoing mitigation interventions against the COVID-19 pandemic.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Epidemiological Model
In order to determine the proportion of susceptible individuals in a given locality, we use the approach described in [31] based on the SEIAHRV epidemiological model with variables described in Table A1. Table A1. Variables in the SEIAHRV model reported in [31]. All variables are proportions with respect to the initial population and the index i refers to the age-group.

S i Susceptible individuals E i
Exposed individuals (non-contagious) This is a nonlinear delayed set of ODEs due to the time delay between infection, hospitalization and death. The different parameter values used in the model are given in Table 1 in [31]. The force of infection in Equation (A1) is given by with β i,j the infection rate from an infected individual of age group j to infect an individual of age-group i. The epidemiological model is calibrated using the time series of deaths in order to avoid the significant under-notification of cases [33]. The value β in Equation (3) is an age-independent estimate obtained from the total proportion of susceptible individuals obtained from S(t) = 1 where P i is the population in age group i and P tot the total population for the given locality. The model is fitted from the time series of deaths as described in [31].