Unpredictable, Counter-Intuitive Geoclimatic and Demographic Correlations of COVID-19 Spread Rates

Simple Summary Rates of viral spread during first and second waves of the COVID-19 pandemic for USA states, and for consecutive nonoverlapping periods of 20 days for the USA and 51 countries across the globe associate with mean temperature, elevation, population density and age. Some associations switch directions when comparing different periods. Even population density, which presumably should always increase viral spread, at some periods seems to decrease spread rates. We also observed systematic inversions between spread rates estimated at 80–100 day intervals. These patterns remain unexplained and suggest difficulties in managing and predicting the pandemic, in particular, negative correlations between population density and spread rates, which were observed in independent samples and at different periods. Putatively, confinements could produce these patterns, by selecting viral strains with longer contagiousness and/or latent periods. Abstract We present spread parameters for first and second waves of the COVID-19 pandemic for USA states, and for consecutive nonoverlapping periods of 20 days for the USA and 51 countries across the globe. We studied spread rates in the USA states and 51 countries, and analyzed associations between spread rates at different periods, and with temperature, elevation, population density and age. USA first/second wave spread rates increase/decrease with population density, and are uncorrelated with temperature and median population age. Spread rates are systematically inversely proportional to those estimated 80–100 days later. Ascending/descending phases of the same wave only partially explain this. Directions of correlations with factors such as temperature and median age flip. Changes in environmental trends of the COVID-19 pandemic remain unpredictable; predictions based on classical epidemiological knowledge are highly uncertain. Negative associations between population density and spread rates, observed in independent samples and at different periods, are most surprising. We suggest that systematic negative associations between spread rates 80–100 days apart could result from confinements selecting for greater contagiousness, a potential double-edged sword effect of confinements.


Introduction
Spread parameters of daily new confirmed COVID-19 cases (calculated as the slope of their logarithmic regression curve) estimates viral contagiousness. Previously, it was shown that first wave spread, when comparing different countries, decreases with mean annual temperature [1], and the opposite trend with temperature occurs for second wave spread parameters [2]. This is in line with observations on variation in spread across different regions of Italy, in March 2020 (first wave period, negative correlation with temperature) and in May 2020 (second wave period, positive correlation with temperature) [3].
First and second wave spreads differ also in terms of other factors: The spread of the first wave increases with the median population age and decreases the later the date of its onset. On the contrary, the spread of the second wave decreases with the median population age and increases the later the date of its onset. First and second waves also differ because we detected no associations between second wave slopes and mean country elevation, while first wave slopes increase with elevation up to 900 m and decrease beyond that approximate altitude [2].
Inversion of trends for these independent covariates is difficult to explain. One could invoke different explanations for each factor. A common, most parsimonious explanation could involve deterministic mutation dynamics, resulting in parallel evolution of distinct virus populations [4,5].
This working hypothesis expects cycles in the epidemiological behavior of the virus, where pattern inversions occur each time a sufficient number of mutations cumulated to cause a switch in secondary structure of the single-stranded RNA coronavirus. This would explain the inversion in patterns between first and second waves. Because mutations cumulate on average proportionally to time, one expects that epidemiological pattern inversions occur after a fixed time interval.
Here, we calculate spread rates for daily new confirmed COVID-19 cases for consecutive, nonoverlapping time windows of 20 days since the start of the pandemic in 51 different countries, and in each of the 50 states of the USA (and the district of Columbia) until the end of 2020, and examine these spread rates for inversions in patterns of correlations with some environmental factors (temperature, elevation) and population properties (density, median age), and across spread rates for different time windows. This approach, which calculates spread rates for each consecutive period of 20 days, avoids difficulties inherent to the objective definition of the start and end of different pandemic waves.

Materials and Methods
We used the methods as in previous analyses [1,2]. The coefficients (slopes) of regression analyses are considered as estimates of viral spread rates. We adjust the exponential model y = a × exp(b × x), where y is the daily number of new confirmed COVID- 19 cases, x is the number of days since wave onset, a is a constant and b is the slope. The logtransformed version of this is ln y = ln a + b × x. Daily numbers of new cases per countries are from the site www.worldometers.info/coronavirus/ (accessed on 25 December 2020).
Spread rates are determined for running windows of 20 consecutive days, where consecutive time windows are nonoverlapping. Hence, if the first new case in a country occurred 280 days ago, 14 spread rates were calculated, one for each of the nonoverlapping 20-day periods. Spread rates are estimated for these running windows as described above, the slope of the exponential model of new daily cases with time.
For countries, sources for mean elevation, mean temperature, population density and median population age and numbers of genomic variants are as previously described [2]. For the 50 states of the USA, we used the following sources in Table 1: We also analyze daily new confirmed case data for each of the 50 states of the USA and the district of Colombia, estimating first and second wave spread rates as done previously for countries across the globe [1,2], using for the USA the exact same methods as described previously [1,2].
Note that the slopes we evaluate from daily data in daily new cases estimate the acceleration in increase of new cases: in the first differential model proposed for the COVID-19 outbreak [6], the variable is the number of cumulative cases at time t, denoted X t , the velocity is the number of daily new cases, denoted Y t = (X t − X t − 1 )/(t − (t − 1)), and the acceleration is the slope of the curve of these daily new cases, ). This acceleration as observed in simulation of the differential model, has a first phase where daily new cases grow exponentially, until an inflection point is reached with acceleration zero before the saturation phase.
We tested for normality of the distribution of variables used in correlation tests with the Kolmogorov-Smirnov test, as available at Kolmogorov-Smirnov Calculator (Test of Normality) (socscistatistics.com, accessed 20 December 2020). For variables statistically significantly diverging from normality (p < 0.05), we calculated both parametric Pearson and nonparametric Spearman rank correlation coefficients at Spearman's Rho Calculator (Correlation Coefficient) (socscistatistics.com, accessed on 20 December 2020)). Figure 1 plots the total number of genomic variants described for a country since the start of the pandemic, on 20 December 2020, as a function of the numbers of days since 1 January 2020 when the cumulated number of confirmed cases in a country reached 100 cases. This method for estimating the duration of the pandemic is not adequate for states/countries for which the pandemic presumably stopped before end 2020. These are excluded from analyses. Overall, the longer the pandemic is ongoing in a country (few days since 1st of January), the more mutants have been described in that country. The negative correlation confirms that numbers of mutations are proportional to time. A similar result was found on 31 May 2020 [2]. Hence, this phenomenon is stable over time. It is probable that numbers of variants reported from a country reflect also the sequencing efforts in that country, which vary among countries. However, the association in Figure 1 is too strong to be only due to this factor, and probably reflects also, or even more, an actual natural cumulation of mutant variants. Indeed, along that rationale, population size should also increase variant numbers, but no such effect is observed, as small and large countries fit equally well the curve. Table 2 presents for each of the 50 states of the USA onset dates and slopes of first and second waves, as detected by visual inspection as previously [2]. At mid-August, 42 among 50 states have a second wave. Second wave slopes are lower than first wave slopes in all but three states: Oklahoma (slopes basically identical), and Kansas and Ohio (second wave spread greater than first wave spread). Hence, overall, spread decreased from the first to the second wave, from mean first wave slope = 0.1902 to mean second wave slope = 0.0585.

Spread Parameters for First and Second Waves in the USA
In the USA, first wave slopes decrease with time since first wave onset (r = −0.5306, two-tailed p = 0.000074, Spearman rank correlation coefficient rs = 0.415, p = 0.003). This is similar to the decrease in slopes previously reported for comparisons across countries [2]. Comparing countries, second wave slopes increase with time since second wave onset [2]. However, for the USA, no such increase could be detected, the trend might fit with that observed for the first wave (r = −0.29, p = 0.06, Spearman rank correlation coefficient rs = −0.365, p = 0.019). Also contrasting with previous analyses of variation between countries, there is no association in the USA between first and second wave slopes and mean annual temperature. A positive association exists between first wave slope and time since second wave onset (r = 0.4469, two-tailed p = 0.003), which could indicate that high initial spread rates contribute to early onset of ulterior waves. Note that this effect does not explain the absence of second waves in several states with particularly high first waves, such as Illinois, New Jersey and New York.
First wave slopes increase with population density (r = 0.3777, one-tailed p = 0.0034, Figure 2A). This association is expected, but was not observed when comparing different countries [1,2]. Six New England states have relatively low slopes considering their densities. Excluding them from calculations increases the strength of this correlation with density (r = 0.5538, one-tailed p = 0.00005).
Density in principle increases spread rate. This was not observed for first and second wave slopes when comparing countries [1,2]. However, this density principle is confirmed for first wave slopes from the USA ( Figure 2A). Unexpectedly, second wave slopes decrease with population density (r = −0.459, two-tailed p = 0.0022, Figure 2B). Unexplained pattern inversions between first and second waves have been reported for covariates such as temperature, time since wave onset and median population age when comparing different countries [2]. Decrease of slopes with density in the USA is foremost astounding because it does not at all fit with any knowledge on disease spread.

Visual Inspection vs. Objective Statistical Analysis
Arguably, determining onsets of waves from visual examinations of graphs is subjective. This issue has been raised in the past [7,8], but has no obvious simple solutions and requires extensive simulation analyses curtailed to each specific dataset, meaning in this case for each state [9,10]. We therefore use a simplified method. Statistical analyses based on calculating Pearson correlation coefficients r for a running window of 20 days determine a local maximum for r within five days of the second wave onset date determined visually for 80 percent of the countries examined [2]. We applied this method for the 42 second waves detected for the USA (Dis, Table 2). Most of the onset dates determined by running windows (62%) are within five days of the date determined visually, in line with similar previous analyses for other countries [2]. The spread rates calculated for nonoverlapping, consecutive windows of 20 days presented below do not suffer difficulties inherent to defining wave onset.

Spread Rates for Consecutive Nonoverlapping 20-Day Windows
We estimated spread rates for 14 and 15 consecutive, nonoverlapping 20-day periods for the 50 states of the USA and the district of Colombia (Table 3), and 51 countries across the globe ( Table 4). Note that because the start of the pandemic varies among states and countries, spread rates are not exactly for the same period when comparing states/countries. Numbers of states and countries where rates are positive are highest for the earliest period (approximately end of February to beginning of March, day 20). This number is lowest during the warmer spring and summer months. This is in line with an overall decrease of spread with high temperatures. Spread parameters (slope of exponential regression) of first (A) and second (B) waves in USA states as a function of population density. Trend inverted between first and second wave. A: the New England states with high densities (filled circles) follow a different pattern than the rest of states, with a slower increase in spread with increasing density. B: the second wave slopes (as determined in August 2020) are overall lower than first wave slopes, and counter-intuitively decrease with population density. Patterns remain statistically significant with nonparametric Spearman rank correlation tests, see text. Table 5 presents the pairwise Pearson correlation coefficients for the spread rate data from Tables 3 and 4 (USA, above diagonal, data from Table 3; countries across the globe, below diagonal, data from Table 4). These 91 and 105 r's include 16 cases (eight cases for the USA) where the correlation between spread rates for two periods are positive and statistically significant (p < 0.05, 2-tailed tests), and 14 and 19 cases, respectively, where r is negative and statistically significant (p < 0.05, 2-tailed tests). Among the 16 statistically significant positive cases, a statistically significant majority, 13, are between consecutive periods (two-tailed sign test, p = 0.011). This indicates that spread rates are relatively similar across two consecutive periods of 20 days, hence for about 40 days.   Table 5. Pearson correlation coefficients r (×100) between spread rates. Above the diagonal, consecutive nonoverlapping 20-day periods from the USA (Table 3), and, below the diagonal, from various countries ( The distribution of statistically significant negative correlations among spread rates from Tables 3 and 4 in relation to time periods is greater than for positive correlations, but 13 among 14 cases are for periods separated by 40-140 days, on average 88 days with a standard deviation of 30 days (Figure 3). Two statistically significant negative correlations between spread rates are separated by 160-240 days.  Table 5) as a function of numbers of days between the time frames when spread rates were estimated. Blue lines indicate crirical values of r for p < 0.05 according to two-tailed tests. Most positive r's with p < 0.05 are for pairs of consecutive periods (20 days on x axis, green box), most negative r's with p < 0.05 are for periods separated by approximately 80 days, orange box.3.5. Table 6 presents Pearson correlation coefficients r between spread rates in Tables 3 and 4  and environmental variables from Tables 2 and 7. Results include inversions of directions of correlations, as also shown in Table 5 and Figure 3. For mean annual temperatures, around May (day 100), spread rates increase with temperature in the USA and in the world, as previously reported for countries across the world [2], and for the period of December (day 280, USA only). Temperature decreases spread rates at the end of summer and beginning of autumn in the USA and across the world. Results for mean altitudes show similar pattern inversions, including inversion between consecutive periods at days 180 vs. 200 (August vs. start of September, USA only). For the USA, an inversion in correlation directions also occurs with median age, with positive vs. negative statistically significant correlations at days 180 vs. 200. The negative correlation of spread rates with age corresponds to the time when schools open (end of August to beginning of September). Spread rates increase again with median age at the end of autumn (day 280, first part of December). For countries across the world, median age increases with spread rates at pandemic onset, and in summer-autumn, for the period spanning from day 160 to 240. In contrast, for the spring periods from days 60 to 100, spread rates decrease with median age when comparing countries across the world. This might result from seasonal differences between age-related behaviors, but other explanations are also possible. Table 6. Pearson correlation coefficients r (×100) between spread rates for consecutive nonoverlapping 20-day periods from Tables 3 and 4 with environmental variables from Tables 2 and 6. Rows starting with USA give r between spread rates and environmental factors for the US states, rows starting with world are for countries. Rows starting with rs present nonparametric Spearman rank correlation coefficients rs for the two variables, altitude and density, that differ at p < 0.05 from normality according to Kolmogorov-Sminov tests. Bold, underlined indicate that correlations are statistically significant at p < 0.05 according to a two-and one-tailed test, respectively.  The only statistically significant correlations with population density when comparing countries across the world are detected using nonparametric Spearman rank correlation analyses, with the only positive associations fitting trivial expectation on consecutive periods spanning days 180 and 200 (p < 0.05, one-tailed tests). Density is expected to increase spread rates. Such a positive association is observed at pandemic onset in the USA. This pattern is inversed for three consecutive periods in the spring in the USA and at days 80 and 280 for countries across the world.

Environmental Correlates of Spread Rates across Time
Negative associations of viral spread rate with density are unexpected and make no sense in terms of classical epidemiological understanding, as these imply greater spread rates in populations with low density.

A Potential Epidemiological Explanation for Spread Rate Inversions
The inversion of patterns with time and other variables (altitude, temperature, median population age) for slopes of pandemic waves renders predictions particularly dubious. In addition, the negative correlation between second wave slopes and USA density ( Figure 2B) contradicts a proven and accepted epidemiological principle. The inversion of directions of correlations between spread rates and several environmental factors (Table 6) confirms previous observations regarding temperature [2]. Periods until inversions vary among environmental factors. For example, associations of spread rates with age flip directions from increasing to decreasing with median age at the transition between summer vacations and the beginning of the school year, at least in the USA. One could find for each of the inversions in directions in statistically significant r values in Table 6 a specific explanation, such as that stated above about school openings. However, the almost systematic inversion of correlation directions for most environmental factors suggests a common cause for most of these pattern inversions.
This point for a more general cause for pattern inversions is strengthened by correlations between spread rates at different periods, for USA states and for countries across the world. The spread rate hierarchy among US states and among countries is systematically inverted after 80-90 days (Table 4 and Figure 3). The cause for these inversions is unclear, but this inversion pattern is systematic, as spread rates at any period will be inversely proportional to those at an ulterior time, typically 80-90 days later. Note that 80-90-day intervals between slopes are too long for these slopes to be part of the ascending and descending parts of the same wave, as proven in the next section.
A first potential explanation is that spread rates vary according to cycles of maxima, intermediate and minima in daily case numbers, and that countries with the highest maxima have the lowest minima and those with the lowest maxima have the highest minima, about 80-100 days later. This hypothesis assumes that the same rules also apply to periods with intermediate spread rates, and that ups and downs in spread rates are synchronized between most countries. The latter assumption is adequate for the first wave in late winter 2020, but could not hold throughout the complete period analyzed, also because countries and states vary in confinement periods, lengths and efficiencies, which causes decreases in spread rates. Hence, additional analyses are required to understand the causes of these pattern inversions; however, the systematic aspect of these inversions in spread rates is such that these should be accounted in the predictions and policymaking. This is because results of correlation analyses hint at the possibility that steep decreases in spread rates cause steep ulterior increases. In that respect, it might be optimal to mitigate variations in spread rates by avoiding drastic policies suddenly increasing or decreasing spread rates.

Negative Correlations between Spread Rates Are Not between Ascending and Descending Parts of the Same Wave
We test here the hypothesis that negative correlations between spread rates from different periods reflect negative associations between the steepness of the rate during the ascending and the descending parts of the same wave can be tested. This is easily tested using data in Tables 3 and 4, because spread rates in the ascending phase of a wave are positive, and those in the descending phase of a wave are by definition negative. So, if the hypothesis is correct, for statistically significant negative correlations in Table 5, we should see systematically the opposite sign for rates in the two periods considered. If this inversion of signs is not observed, the ascending/descending phase hypothesis is incorrect.
We tested this for each of the negative correlations with p < 0.05 in Table 5, as presented below for a specific example. We considered the negative association between spread rates for the USA between periods of 20 days starting at days 100 and 240 after the start of the outbreak. For these periods, among the 51 states and district of Colombia, there are 26 and 48 positive spread rates. Hence, the probability of random assortment of identical signs between these two periods expects 26 × 48/51 + 25 × 3/51 = 25.94 where spread rates in the two periods have the same sign, and 51 -25.94 = 25.06 cases where signs are inverted between the two periods. The observed number of inverted signs between these periods are 28. This slight increase as compared to the expected has p = 0.41 according to a chi-square test.
This analysis was done for all negative correlations with p < 0.05 in Table 5. There are 10 (USA, 5; countries across the world, 5) negative associations between spread rates where sign inversion between periods is significantly greater than random predictions, and 23 cases where this effect is not statistically significant as in the above example. This means that the hypothesis of ascending/descending phases of a wave could contribute in some cases to the phenomenon of negative associations observed between spread rates from different periods, but is not the main cause of these negative associations.

Does Confinement Increase Ulterior Spread Rates?
One mechanism that could be invoked in this context is that strict confinement policies select for highly contagious viral strains and/or long contagion periods that are more likely to overcome confined conditions. As case numbers decrease during confinements, reopening societies to normal activity unleashes these more contagious viral strains while potentially competing strains with lower infection abilities disappeared, resulting in higher spread rates at the level of the whole population. This dynamic would presumably be visible at the level of analyses such as those done here at time lags of 80-90 days, and would presumably cause the systematic inversions in spread rates among countries and regions described here. Note that the natural trajectory of viral evolution typically increases contagiousness while at the same time decreases pathogenicity, also in the absence of confinement. Confinements hasten the process of evolving greater contagiousness. Steep increases in recent months (autumn of 2020) in the UK, apparently associated with a new highly contagious viral variant, seem in line with this working hypothesis (Figure 4). A coolheaded analysis and evaluation of above and future results is crucial for better managing the pandemic.

Intrinsic vs. Extrinsic Constraints
We discuss below additional potential, more speculative and genetics-oriented explanations for inversions of directions of correlations with spread rates.
One hypothesis is that self-hybridization of the virus' single-stranded RNA genome protects nucleotides forming stems against mutations, while favoring mutations in loops [11]. Mutation cumulation presumably causes deterministic switches between few optimal struc-tures which differ in their properties in relation to temperature, etc. Such switches between secondary structures have been suggested for COVID-19 after small insertion/deletion mutations [12].
These patterns remind the little-known phenomenon of negative heritability. Usually, heritabilities of traits are such that offsprings resemble their parents: if a parent is for a given trait above average, his/her offspring will on average be also above average, and parents below average for that trait have on average offspring which are below average for that trait. In short, heritabilities are correlations between traits of parents and offspring. These correlations are usually positive. Surprisingly, in some cases, negative correlations were observed, meaning negative heritabilities, where above average parents produce below average offspring for given traits, and vice versa for below average parents [13,14]. These phenomena are not statistical artefacts [15] but are difficult to reproduce empirically [16]. This is expected when assuming negative heritability results from selection under changing environmental conditions [17].
Inversion of trends, such as for negative heritabilities, could presumably result from drastic environmental changes. For example, levels of channeling of Sorghum bicolor plant populations towards developmental trajectories better adapted to NaCl salinity after early low level NaCl exposure increased with population variability [18] in the first generation exposed to salinity, but decreased with populational variability in their offspring [19,20]. This is in line with the concept that COVID-19 adapts to its new human host and to various environments inhabited by that host.
It is possible that a large part of the variation in slopes during the pandemic is due to factors intrinsic to evolutionary trajectories of the virus, genetic ones included. This does not exclude environmental effects due to host population age structure, density, temperature and altitude, among others. However, inversions of trends with these cofactors at different periods of the pandemic suggest that unknown intrinsic factors have major effects on the evolution of the pandemic.
Another approach stipulates that variations across different locations reflect to large extents the dynamics at a single location, at different times. This principle where spatial variation is tantamount to temporal variation has been observed in many different contexts, such as in astrophysics where larger distances are interpreted as reflecting more ancient times [21], and in forest stage succession [22][23][24], ecological communities [25] and biomolecules such as those involved in ribosomes [26][27][28][29][30][31]. The principle is also applied in the renown Haeckelian statement "ontogeny recapitulates phylogeny" [32,33]. Variation among individuals in directional asymmetry is interpreted as reflecting different termination timings of development [34]. The principle exists within the relationship of the genetic code and the ribosome [35]. Similarly, the average order of translation of amino acids in single proteins reflects the order of integration of amino acids in the genetic code [36]. At the level of the pandemic, this would suggest that the macroscopic spread rates correspond to viral replication cycles and/or rates of production of viral particles. This hypothesis should be considered as suggestive at best, but could be proven useful as a working hypothesis for the long-term dynamics of this still unknown disease at the level of individuals.
Pattern inversions add to the problem of uncertainty in the data and in predictions [37,38]. Analyses of spread rates estimated for consecutive nonoverlapping periods of 20 days confirm results obtained for visually determined waves. They show systematic inversions in spread rates between countries/regions, which occur approximately every 80-90 days, across the whole period studied. The causes for this are not well understood but might indicate compensation mechanisms for reduced spread rates during confinement in periods following the confinement. This reduction is shown and analyzed in many countries imposing a confinement like England, France, Germany, Iran, Italy, Netherlands, Spain, United States and China compared to countries like Sweden and South Korea, which did not implement mandatory stay-at-home confinements [39][40][41]. A stochastic modeling of the reducing effect of confinement is possible and shows a control depending on the characteristics of the countries concerned and on the early or late nature of this mitigation policy [42].
However, confinements might select for more contagious viral strains, and/or viral strains that are contagious for longer periods. The increased contagiousness and possibly pathogenicity [43][44][45] of these viruses would cause higher spread rates after deconfinement and therefore reduce the positive impact of restrictive measures such as the lockdown. This working hypothesis on effects of lockdowns arises from the results presented above and could be tested by ulterior analyses specifically designed to examine this working hypothesis.

Conclusions
In the USA, first and second wave slopes are not correlated with temperature, median age or time since wave onset, but with population density. The principle of inverted trends between first and second waves upholds; density effects are positive/negative for first/second wave slopes. Negative associations between population density and viral spread rates are also observed when examining countries across the world. Such negative associations of viral spread with population density are not compatible with our present understanding of the epidemiology of infectious agents. These inversions of directions of associations between viral spread rates and environmental and populational variables are confirmed by analyses of consecutive nonoverlapping periods of 20 days. We also observe that viral spread rates at any time during the study period are inversely proportional to rates 80-90 days later. These results were observed in two independent samples, US states and 51 countries across the globe. They stress that pandemic dynamics are misunderstood and probably mismanaged.  Data Availability Statement: All the data are obtained from readily available public databases that are cited in the text.