COVID-19: A Comparative Study of Contagions Peaks in Cities from Europe and the Americas

Coronavirus disease 2019 (COVID-19) is an infectious disease caused by a group of viruses that provoke illnesses ranging from the common cold to more serious illnesses such as pneumonia. COVID-19 started in China and spread rapidly from a single city to an entire country in just 30 days and to the rest of the world in no more than 3 months. Several studies have tried to model the behavior of COVID-19 in diverse regions, based on differential equations of the SIR and stochastic SIR type, and their extensions. In this article, a statistical analysis of daily confirmed COVID-19 cases reported in eleven different cities in Europe and America is conducted. Log-linear models are proposed to model the rise or drop in the number of positive cases reported daily. A classification analysis of the estimated slopes is performed, allowing a comparison of the eleven cities at different epidemic peaks. By rescaling the curves, similar behaviors among rises and drops in different cities are found, independent of socioeconomic conditions, type of quarantine measures taken, whether more or less restrictive. The log-linear model appears to be suitable for modeling the incidence of COVID-19 both in rises and drops.


Introduction
Coronavirus disease 2019 (COVID-19) is an infectious disease caused by a group of viruses called coronaviruses that provoke illnesses ranging from the common cold to more serious illnesses such as pneumonia, Middle East respiratory syndrome (MERS) and severe acute respiratory syndrome (SARS). The coronavirus strain (COVID-19) that has caused the outbreak in China is novel and was not previously known; it has been rapidly exported to other countries and has severely affected different aspects of human health worldwide. On 11 march 2020, the World Health Organization declared the COVID-19 outbreak an epidemic. The COVID-19 virus spread rapidly from a single city to an entire country in just 30 days and to the rest of the world in no more than 3 months. The rapid geographic spread and the sudden increase in the number of cases surprised and quickly overwhelmed the health and public health services of several countries. Hence, governments in different countries promoted border closures, social distancing, school closures and other mandatory control measures in order to reduce the spread of the virus.
Several studies have tried to model the behavior of COVID-19 in diverse regions. Different models based on differential equations of the SIR and stochastic SIR type (and their extensions) have been applied (ref. [1]) and statistical comparisons of the COVID-19 dynamic, especially that of the governmental measures behavior, from different countries have been done (see [2][3][4][5]).
Classical susceptible-infectious-removed (SIR) epidemic models have been widely used in the literature to model infectious diseases. These mathematical models are structured according to the number of susceptible (S), infectious (I) and removed (recovered or deceased) individuals (R). The dynamic behavior of these models depends mainly on the basic reproduction number. The disease is controlled and subsequently eliminated if the basic reproduction number is less than one and continues to spread otherwise [6]. However, in practice, infectious diseases may exhibit periodic oscillations and other nonlinear phenomena during the outbreak. Therefore, several studies have attempted to model this nonlinear behavior.
In this article, another direction was taken, by comparing the speed to reach different contagion peaks and, respectively, the decline in different cities. These were chosen because they provided easy access to their data and because of the type of containment measures adopted by their governments. These cities were Bogota, Lima, Lyon, London, Madrid, Mexico City, Marseille, Paris, New York, Rome and Chile Metropolitan Region.
Several articles compare the impact of COVID-19 infections and mortality in different countries. For example, Ref. [7] studied the early impact from February to July 2020 in the Nordic nations of Sweden, Norway, Denmark and Finland using publicly available case data sources. Ref. [8] studied countries from Asia and Europe in terms of cases, death and case fatality rate. Ref. [9] investigated the spreading rate of COVID-19 comparing 10 European countries. In relation to the comparison of cities by COVID-19, Ref. [10] examined the impacts and outcomes of COVID-19 in three cities, New York, London and Tokyo, taking COVID-19-related deaths in these cities and their respective countries as a variable. Some differences were found, possibly explained in part by political, socioeconomic and cultural differences.
In this paper, we wanted to compare 11 cities to each other, in terms of their downward/upward slopes at different contagion peaks. Since these cities had different population sizes, the number of new daily cases were not on the same scale. Moreover, these data were highly nonlinear. Therefore, the data were normalized by applying a natural logarithm function and a log-linear regression model was applied.

Material and Methods
This study used different data sources. In Section 2.1, information about the chosen cities and the data studied is presented, while Section 2.2 explains how the data were modeled.

Dataset
A dataset was created containing daily infected cases of COVID-19 between 1 February 2020 and 30 August 2021 for eleven cities. Note that the data from all the cities were not exactly from the same dates. Table 1 shows the eleven cities and the abbreviations used, as well as the respective repositories.
In the sequel, for each city, i = 1, · · · , 11, N i (t) denotes the daily number of infected cases of COVID-19 detected in city i at day t, where the series of infected cases detected have been smoothed with a moving average of 7 days to eliminate weekend variations. Figure 1 shows the daily number of infected cases for these cities up to 24 April 2021, which corresponds to the last day in which all cities had information.
As it can be observed in Figure 1 and Figures in Appendix A, depending on the city, the number of infected had one, two or three peaks. Table 2 summarizes the number of observed rises and drops for each city. In fact, most of the cities had at least 2 peaks (2 upward and 2 downward movements) whereas London (LO) and Mexico City (ME) had only 1 peak.
To compare the development of different cities, with diverse control measures, it was necessary to normalize them to the maximum observed at each peak. More specifically, for i = 1, . . . , 11 and j = 1, 2, 3, A ij was defined as the maximal value of N i for city i and peak j and the day T ij where this maximum was reached. In fact, note that N i (T ij ) = A ij . For each city i, peak j and day t, denote by (CN) ij (t) = N i (t − T ij )/A ij , the centered and normalized data. Figure 2 shows the normalized and centered data for each city and the corresponding peaks. In this way, each curve of cases in different cities and peaks were overlapped on the same graph, so that the peaks coincided at time 0 (time translation). Thus, −50 (days) represented, for each curve, 50 days prior to the corresponding peak. The curves were also normalized in height, so that the peak of each curve was 1. That is, each curve of cases was divided by its own maximum (normalization of cases). In this way, different curves at different locations, which had different sizes, could be compared to see if they responded to a similar mechanism. It can be seen that the normalized curves had similar behaviors even when distinct containment measures were taken.    In Section 2.2, two regression models for the logarithm of these normalized data, centered around their peaks, are proposed. In these models, increasing and decreasing slopes are estimated. Our goal is then to compare the cities in terms of these estimated slopes (see cluster analysis in Sections 2.3 and 2.4).

Regression Models
In this section, two regression models are proposed. The first one models the total number of rises and drops. The second one models the rises and drops in a neighborhood of the peak, that is, between the peak and the days when a city exceeds half of the corresponding peak.

The Total Regression Model
A previous step to the definition of the total model is the definition of the beginning of the rises and the end of the drops. Our model for the rises considered the starting point of a rise as the point at which a significant change in the slope was observed with respect to the pseudo-steady state. The same reasoning was applied for the end of a drop. For example, for the city of Bogota, Figure 3 plots the cases between February 2020 and August 2021. Increases are identified with circles and drops with asterisks. Sometimes, an asterisk is superimposed on a circle. Three rises and drops can be isolated. Figures 4 and 5 show the rises and drops of the logarithm of daily reported cases. For the eleven cities, the daily reported cases and the rises and drops can be found in Appendix A.     Denote by R a rise and by D a drop, for the total model (full rise and drop). For a city i and a peak j, T R ij denotes the starting point of a rise and T D ij , the end point of a drop. Recall that T ij has been defined in Section 2.1 and it is the point where the maximum value of the peak is reached. The following model is then considered where R i,j (s) and D i,j (s) are independent white noises. Models (1) and (2) are log-linear regression models. Estimates for the slope parameters β R ij , β D ij and the intercept parameters α R ij and α D ij were obtained using the classical least square estimators. The estimators of the upward and downward slopes are given in Table 3 together with their goodness of fit. Table 3. Estimation in the total regression model.

City
Peak   Note that the starting and end points of the rises and the drops were not entirely well-defined, as can be seen in the pictures of the Appendix A. Therefore, the model could vary significantly due to this starting point. This was one of the reason for considering the 50% model defined in Section 2.2.2, which did not require obtaining these points.

The 50% Regression Model
Denote by r a rise and by d a drop for the 50% model. Moreover, T r ij = max{t < T ij : as the day that the city i exceeds half of the corresponding peak j. In a similar way, } is defined as the first day after the peak j that N i falls below half of the peak.
Our goal was to study the relationship between time and the series N i (T r ij + s) : log where r i,j and d i,j are independent white noise processes. In the same way as done previously, least square estimators were calculated for the intercepts and the slopes (see Table 4). For the rises, the highest slope was observed for the first rise of Madrid at 0.11 along |T ij − T r ij | = 6 days. The lowest slope was for the second rise of Marseille at 0.00066 during |T ij − T r ij | = 82 days, but with an R 2 close to zero. Most of the models had a good R 2 , several of them close to or greater than 0.9. Only 4 had values of R 2 lower than 0.7. Those below this value were the second rise of RM, the third rise of MD, the second rise of MR and the first rise of ME. With respect to the drops, the slopes took values between −0.068 (first drop of LY) and −0.0068 (second drop of NY). Only one case had an R 2 < 0.6, the second drop of MR. The limiting case corresponded to the first drop of Lima with R 2 = 0.56.
There were some drops, in which the number of reported cases failed to fall below 50% and subsequently restarting a new rise, a phenomena that we called the wadding effect. In the figures given in Appendix A, this can be seen in the following cases: 3 BO goes up ( Figure A2); 1 LI goes down ( Figure A6); 1 and 2 MR go up ( Figure A16); 1 ME goes up ( Figure A19); 2 NY goes down ( Figure A22; 2 RM goes up ( Figure A27); 1 RO goes down ( Figure A31).
In the second Marseille descent, two types of descents were observed: initially a slower one and then a steep one. Moreover, the descents were in general more regular than the rises.

Upward Slopes in the Total and 50% Models
In this section, the slopes are studied in both regression models. Cities were classified according to the upward slopes estimated in the linear regression models. Table 5 gives the estimates for the upward slopes and the number of days |T ij − T q ij | with q ∈ {r, R}. From Table 5, it can be seen that in general, the second ascents in the 50% model were slower than or at least equal to the first one with the exception of MD and PA. The cases of RM and NY were distinguished because in the second peak there was a smaller slope in a greater number of days, which could be interpreted as a lower number of infected per day during the second peak than during the first one. Additionally, the slopes in the total model were in general larger than the slopes in the 50% model. The 50th percentile was considered to be a central point where quarantine or other confinement measures began to result in a slope change. Finally, one way to check that the model had a good fit was whether the product Days × Slope was close to log(2) ≈ 0.69. Notice this quantity should be close to log(2), because it represents the logarithmic increase from the 50% to the peak. In Figure 6, the slopes of the rises in the total and 50% models are plotted. It is clear that the first rises of Madrid (MD) in the total model (0.277) and at 50% (0.115) were atypical (outliers) points in the graph. So the cluster analysis of the upward slopes was performed without the first rises of Madrid.
A cluster Analysis (CA) is one of the data mining techniques that groups the sample observations into classes depending on the essential similarities within a class and the dissimilarities between the different classes found in the dataset. A cluster analysis has been used in the context of the COVID-19 epidemic for comparing countries in terms of mobility and cases (see [11]). In this work, a cluster analysis based on a K-means method and a squared Euclidean distance was performed using Matlab software. Figure 7 suggests that three was the optimum number of clusters for each model (total and 50%), and Table 6 shows the three clusters and the corresponding means.   The classes were obtained by classifying the different rises by the means from lowest to highest, indicating the speed of growth of the number of infected. A difference was evident when considering the total model and the 50% model slopes. The means of the classes with the totality of the data were higher than the means of the classes with the 50% data, which could be attributed to a saturation phenomenon. Several rises from Cluster 1 with the full data moved into Clusters 2 or 3 with respect to the 50th percentile data. NY remained in Cluster 3 with the highest data growth rate. With the exception of LO, the Cluster 2 rises with the entire data remained in Cluster 2 of the 50th percentile data. The 50th percentile model's rises had three behaviors or classes, namely: slow (Cluster 1), medium (Cluster 2), and fast (Cluster 3). However, in the total model's rises, there were essentially two behaviors and an outlier rise in NY, probably due to a faulty measurement, which was also evident in the differences of the slopes (see Figure 7).

Downward Slopes in the Total and 50% Models
In this section, downward slopes are studied in both regression models. Cities were classified according to the downward slopes observed in the log-linear regression models. Table 7 contains the results for the downward slopes. Observe that the first slopes were larger in absolute value than during the second peak. RM exhibited a similar behavior in the first and second downslope. The same occurred in RM with the number of days. In BO and LI, an increase occurred in the slope of the second peak and the number of days decreased. In the rest of the cities, the slope decreased, and the number of days increased. This increase in the second dip could be interpreted as the measures taken in the second phase not having much effect or being weaker or less respected. The product −Days × Slope in several runs was close to log(2), which indicated that the model fitted the data well. Figure 8 also suggests that three was the optimum number of clusters for both models (total and 50%), and Table 8 shows the three clusters and the corresponding means.
As with the rises, drops were classified (by the K-means method) into three groups with both total and 50% models. Several drops moved from being in Cluster 2 with the total data to Cluster 1 with 50% of the data. Only MD and LI moved from Cluster 1 with the total data to Cluster 2 with 50% of the data.

Discussion
In this article, a simple statistical analysis of the new coronavirus disease 2019 (COVID-19) outbreak in 11 cities from Europe and America was provided. Using daily infected data in the cities for approximately the months after the first cases were confirmed in each city, downward and rising slopes for each wave were analyzed using a log-linear model approach. These models were consistent with the rises and drops of SIR-type models if one thinks about each of the pandemic waves.
This study was triggered by the question of whether southern hemisphere cities had anticipated measures learned from the experience of northern hemisphere cities that had their first peaks earlier.
We found that the rises and drops were almost independent of the type of quarantine measures taken, whether more or less restrictive. By rescaling the curves, similar behaviors between the rises and drops of the different cities were obtained. For example, there were similarities between cities such as NY and RM, independently of their socioeconomic conditions. The question that remains here is whether this has self-similarity within each city. Of course, the height of the peaks strongly depended on several reasons, namely climatic conditions, confinement measures, testing intensity among many others. However, the evidence provided by the data showed that when the curves of detected cases were rescaled, they had an unexpected regularity and similarity between different cities of the world with disparate socioeconomic conditions, with very dissimilar health systems and where very distinct containment measures were applied.
In this direction, Figures 9 and 10 shows the social distancing and the use of mask in the nine countries considered (http://www.healthdata.org/covid/data-downloads, accessed on 3 February 2022, data from the corresponding cities were used if available; see the explanation of how the data were collected on the web page); in both figures the horizontal axis is the time measured in days (one corresponds to 2 April 2020). Figure 9 shows the evolution of the social distancing, measured through cell phone data, in different geographical areas, showing a higher variability (except for the first peak) than our normalized curves. The zero on the vertical scale represents the social distancing before the pandemic. A negative number represents the reduction in the social distancing. For example, an index of −80% means a reduction in 80%. It is interesting to note how this distancing evolved upwards after the first wave, with smaller declines in the periods of the successive peaks. This disparity was also observed in the use of face masks (see Figure 10) in public places for the different geographical areas. The vertical axis represents the percentage of mask use in different countries. To this must be added the high unpredictability of the evolution of COVID-19, as shown in Canals et al. [12], where a comparison of the main Lyapunov exponent was made on the same data series for different countries, obtaining that these varied between 5 and 15, much higher than in other epidemics such as N1H1, where the main coefficient was of the order of 2.  When applying the log-linear regression model to the daily infected cases, the results for the increases in the 11 chosen cities revealed three groups differentiated by the corresponding slopes. For the 50% model, a higher growth rate for group three and a lower one for group one, between 0.001 and 0.007 for the first, 0.017 and 0.026 for the second and 0.029 and 0.036 for the third. When considering the total data, the results showed three groups also differentiated by a higher growth rate for group three and a lower one for group one, between 0.004 and 0.036 for the first one, 0.045 and 0.061 for the second one and the third one containing a single point or outlier with a slope of 0.122.
For the downslope phases of the outbreaks, the results in the 11 chosen cities revealed three groups differentiated by a higher growth rate for group three and lower for group one, between −0.007 and −0.016 for the third, −0.03 and −0.041 for the second and −0.058 and −0.068 for the 50% model. When considering the total data, the results showed three groups also differentiated by a higher growth rate for group three and a lower one for group one, between −0.008 and −0.035 for the third, −0.042 and −0.059 for the second, and the first one containing a single point or outlier with a slope of −0.078 which corresponded to 1 LY.

Conclusions
The log-linear regression models studied for the rises and drops showed that different cities had a similar behavior, independent of their socioeconomic situations. On the other hand, for the same city, very different behaviors were observed from one peak to another.
These results show that estimates can only give reasonable indications in the short term and that other variables such as health interventions, the level of distancing and other public policies are necessary in any model that wants to predict in the medium term.
Despite the simplicity of these models, they provide an interesting insight into the COVID-19 statistics in 11 cities in Europe and the Americas, allowing comparisons to be made between them. The log-linear model appears to be suitable for modeling the incidence of COVID-19 both in rises and drops. In addition, the results could be useful for supporting health policy decisions or government interventions. However, they should be used in conjunction with other more complex mathematical and epidemiological models.  Data Availability Statement: Data and models that support the findings of this study are available from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.