COVID-19: Worldwide Proﬁles during the First 250 Days

: The present COVID-19 pandemic is happening in a strongly interconnected world. This interconnection explains why it became universal in such a short period of time and why it stimulated the creation of a large amount of relevant open data. In this paper, we use data science tools to explore this open data from the moment the pandemic began and across the ﬁrst 250 days of prevalence before vaccination started. The use of unsupervised machine learning techniques allowed us to identify three clusters of countries and territories with similar proﬁles of standardized COVID-19 time dynamics. Although countries and territories in the three clusters share some characteristics, their composition is not homogenous. All these clusters contain countries from different geographies and with different development levels. The use of descriptive statistics and data visualization techniques enabled the description and understanding of where and how COVID-19 was impacting. Some interesting extracted features are discussed and suggestions for future research in this area are also presented.


Introduction
COVID-19 has posed tremendous health challenges worldwide due to its high level of contagion and quick geographical spread. The interconnected world assisted in disseminating the virus at such a speed, achieving coverage of countries which led the World Health Organization (WHO) to declare COVID-19 as a pandemic in early 2020. In just 12 months, as of 31 December 2020, there were 82.8 million confirmed infections and 1.8 million deaths across the world [1,2].
The lessons learned from previous epidemics and pandemics, such as the 1918 influenza pandemic, severe acute respiratory syndrome (SARS) during 2002-2003, or H1N1 influenza virus (swine flu) during 2008-2010, showed that public health measures had a significant influence on the impact of the disease, in particular in terms of overall mortality. Voluntary and mandated quarantine, ban of mass gatherings and large events, closing schools and workplaces, and isolation of households/regions were some of the measures applied by governments to reduce diseases' mortality [1][2][3][4][5]. Pandemics and epidemics also showed that such diseases can have a significant toll on the economies [1, [3][4][5][6][7]. For this reason, countries and regions have to decide which mitigation measures to implement and when to apply them in order to avoid reaching peaks that would overwhelm healthcare services but also to define measures acting as moderators of the disease negative effects on the economy, a balance that is not easy to reach [1]. The imposition of the above-mentioned restrictions and lockdowns generated a heavy load of economic consequences in many countries, triggering a dramatic increase in unemployment rates as well as company closures. That has been followed by social repercussions, thus reinforcing the requirement for an understanding of the evolution of this pandemic, namely in terms of eventual different country profiles evolving across the planet [2][3][4][5].
From a comprehensive diagnosis perspective, the use of data science and machine learning methods and techniques constitutes an opportunity for research to achieve this Considering the motivations presented above, together with the lack of previous studies that have examined the temporal evolution of the pandemic on a global scale as a way to characterize and understand the similarities among countries, this work aims to test the following hypotheses with regard to the country-wise time profile evolutions of COVID-19: In terms of research methodologies, we used data science techniques, such as data visualizations and statistical tests, to do what in data mining is often called data characterization and data description, i.e., summarizing data by class and comparing classes [14]. We also employed time series and unsupervised learning machine learning-based techniques, namely to group countries by their similarity in terms of COVID-19 cases and deaths time profiles. Additionally, we carried out some preliminary analysis of the relationships between cases and deaths caused by COVID-19 and some countries' development indicators.
The structure of this paper reflects the methodology employed during the corresponding research process, often known as CRISP-DM (CRoss-Industry Standard Process for Data Mining) [15]. Therefore, Section 2 describes the data used, including data sources, data transformation, and data analyses techniques, which under the CRISP-DM framework would correspond to the data understanding, data preparation, and modeling phases. Results are presented and discussed in Section 3 (equivalent to the CRISP-DM evaluation phase). Section 4 presents the study main conclusions. Finally, limitations of the study and recommendations for future research are presented in Section 5.

Materials and Methods
This section discusses data sources, data quality, data preparation, and employed analysis techniques.
All analyses were performed in Python, using the packages that are typically applied in data science, namely NumPy [16], Pandas [17], MatplotLib [18], and Seaborn [19], as well as others detailed in a later section.

Data Understanding
Two public datasets were used in this study. The European Centre for Disease Prevention and Control (ECDC) historical data on the daily number of new reported COVID-19 cases and deaths worldwide [20] were used for COVID-19 data. The 2019 values of the Human Development Index (HDI) dataset provided by the United Nations Development Program (UNDP) [21] were used to assess countries' development indicators. HDI is a geometric mean of three key dimensions of human development: long and healthy life; being knowledgeable; and having a decent standard of living. Information on the structure and the original collection of data for both datasets can be seen on the respective websites.
As shown in Table 2, the ECDC dataset presented some data quality issues, as follows: • The variables geoId, countryterritoryCode, popData2019, and Cumulative_number_ for_14_days_of_COVID-19_cases_per_100,000 have missing values.

•
The minimum values of the variables cases and deaths are negative, something that by definition is not possible. Note: Count is the number of observations; Type is the type of variable (numerical or categorical); Mean is the mean of the variable (for numeric variables); Standard deviation is the standard deviation of the variable (for numeric variables); Min. is the minimum value (for numeric variables); 25% is the value of the first quartile (for numeric variables); 50% is value of the second quartile or median (for numeric variables); 75% is value of the third quartile (for numeric variables); and Max. is the maximum value (for numeric variables). For more details on the ECDC dataset, please check the corresponding link in the references. Table 2 also shows that COVID-19 data are available for 336 different dates (variable dateRep) and that the data are also available for 214 countries and territories (variable countriesAndTerritories). Nevertheless, as shown in Figure 1, the number of observations (dates with data) varied significantly across countries, reflecting the different times it took for the COVID-19 pandemic to reach each particular country, but with the clear majority of them having over 200 days of accumulated time evolution for registered COVID-19 cases. The ECDC database also had some incorrect ISO 3166-1 alpha-3 country and territory codes [22] (variable countryterrritoryCode), namely Namibia and Taiwan's codes. While the first was missing, the second was coded as "CNG1925" instead of "TWN".
As shown in Table 3, apart from some outliers in the Gross National Income Per Capita variable (for instance, the standard deviation portrays a larger number than the mean, which underlines significant levels of inequality among countries), no major data quality issues were found in the UNDP dataset used.  The ECDC database also had some incorrect ISO 3166-1 alpha-3 country and territory codes [22] (variable countryterrritoryCode), namely Namibia and Taiwan's codes. While the first was missing, the second was coded as "CNG1925" instead of "TWN".
As shown in Table 3, apart from some outliers in the Gross National Income Per Capita variable (for instance, the standard deviation portrays a larger number than the mean, which underlines significant levels of inequality among countries), no major data quality issues were found in the UNDP dataset used.

Data Preparation
Several transformations were applied to the original data in order to correct the identified quality issues and prepare the modeling data. In terms of the ECDC dataset, the following transformations were initially applied:

•
Correction of the ISO 3166 alpha 3 codes in the Namibia and Taiwan observations • Removal of observations with missing values in the countriesAndTerritories, which were related to two small territories (Wallis and Futuna and "Cases on an international conveyance Japan") • Replacement of the character "_" by a space in countries' names (variable coun-triesAndTerritories) After these initial transformations, the dataset was sorted by country and date. Since the dataset did not include cumulative sums per day or measures normalized by the population, additional variables were created: × 100, 000 As presented in Figure 1 and detailed in the previous subsection, since the virus did not affect all countries at the same time, it was also decided to follow the approach of Alvarez et al. (2020) to synchronize the scaled data with respect to time. Nonetheless, to have a broader panoramic view and since more data were available, it was decided this time to study a broader period, larger than the first 100 days of the pandemic. As detailed in Table 4, even after removing observations with missing values, the number of daily observations available in the country profile time series ranged from 19 to 335. A more detailed analysis revealed that a cut-off point equal to the second quartile (261 days) would remove 96 countries from the study, whereas a cut-off point equal to the first quartile (255 days) would remove 49 countries from the study. A cut-off point at 250 days would remove just 24 countries. Thus, it was decided to use 250 days as the cut-off point, i.e. we focused on the countries that had at least 250 days of pandemic prevalence, and therefore enough pandemic time maturity for the corresponding time profiles. Based on this criterion, the following 24 countries were removed from further studies: Anguilla; Bonaire, Saint Eustatius, and Saba; Botswana; British Virgin Islands; Burundi; Comoros; Falkland Islands (Malvinas); Guinea Bissau; Lesotho; Malawi; Mali; Marshall Islands; Northern Mariana Islands; Puerto Rico; Saint Kitts and Nevis; Sao Tome and Principe; Sierra Leone; Sint Maarten; Solomon Islands; South Sudan; Tajikistan; Vanuatu; Western Sahara; and Yemen. Hence, out of the initial list of 214 countries, 188 were kept for further analysis.

Modeling 2.3.1. Analysis of Temporal Sequences
To understand the temporal differences between countries at the level of cases and deaths, the version of the package DTAIDistance [23] of the algorithm Dynamic Time Warping (DTW) was applied. DTW is an algorithm aimed to measure the similarity between time series. Due to countries' population differences, cases, and deaths, similarities were measured using the daily values normalized to 100,000 of the population, and over the time synchronized profiles of this scaled and standardized variable.
Although hierarchical clustering is a valuable type of algorithm to visualize hierarchical groups in small datasets, it rarely provides good results in larger datasets. Thus, it was decided to use the implementation of the package PyClustering [24] of the K-medoids algorithm for clustering countries based on squares matrices, where both rows and columns were the countries and the values the DTW distances. K-medoids is a variation of the K-means algorithm which instead of defining the clusters' centers arbitrarily does so based on data points. Whereas in K-means the sum of the squared Euclidean distances of the data points is used to define the number of clusters, in K-medoids the sum of dissimilarities of data points is used. As such, K-medoids is better suited to measure the distances between data points, and it is more robust to tackle noise or outliers [14,25,26].
The silhouette method was used to identify the number of relevant clusters (k). The silhouette value measures how similar a data point is to its cluster as compared with other clusters. The silhouette value ranges from −1 to +1, where +1 indicates that the data point is well matched to its cluster and −1 indicates the opposite. The average silhouette value, also known as the silhouette score, provides an indication of the cluster validity [27].
In the silhouette score analysis, as depicted in Figure 2, k = 2 presented the best results concerning the clustering of COVID-19 cases time profiles. However, the score for k = 3 was also very close to k = 2. While for k = 2 the two clusters were composed, respectively, by 123 and 65 countries, for k = 3 the three clusters were composed, respectively, by 51, 57, and 80 countries. Based on this observation, it was decided to classify the countries into three clusters in terms of their standardized and synchronized time profiles of COVID-19 cases.
In the silhouette score analysis, as depicted in Figure 2, k = 2 presented the best results concerning the clustering of COVID-19 cases time profiles. However, the score for k = 3 was also very close to k = 2. While for k = 2 the two clusters were composed, respectively, by 123 and 65 countries, for k = 3 the three clusters were composed, respectively, by 51, 57, and 80 countries. Based on this observation, it was decided to classify the countries into three clusters in terms of their standardized and synchronized time profiles of COVID-19 cases. The Kruskal-Wallis test was conducted using the module "stats" from the package SciPy [28] to examine the difference in total cases by 100,000 of the population per cluster. The results (statistic = 124.695, p < 0.001) show that the total number of cases mean values differed between the clusters. The post-hoc analysis, conducted with the package scikit- The Kruskal-Wallis test was conducted using the module "stats" from the package SciPy [28] to examine the difference in total cases by 100,000 of the population per cluster. The results (statistic = 124.695, p < 0.001) show that the total number of cases mean values differed between the clusters. The post-hoc analysis, conducted with the package scikitposthocs [29], showed that the total number of cases' mean values also differed per each tuple of clusters. The p-value between Clusters A and B was 0.015, between Clusters A and C was less than 0.001, and it was also less than 0.001 between Clusters B and C.
Regarding the clustering of deaths scaled and synchronized time profiles, in the silhouette score analysis, as depicted in Figure 3, k = 3 presented the best results, thus we grouped the countries in terms of standardized and synchronized deaths time profiles in three clusters, with 52, 61, and 75 countries, respectively. posthocs [29], showed that the total number of cases' mean values also differed per each tuple of clusters. The p-value between Clusters A and B was 0.015, between Clusters A and C was less than 0.001, and it was also less than 0.001 between Clusters B and C.
Regarding the clustering of deaths scaled and synchronized time profiles, in the silhouette score analysis, as depicted in Figure 3, k = 3 presented the best results, thus we grouped the countries in terms of standardized and synchronized deaths time profiles in three clusters, with 52, 61, and 75 countries, respectively. As was also done for the cases time profiles clustering, the Kruskal-Wallis test was conducted to examine the difference in total deaths by 100,000 of the population per cluster. The results (statistic = 124.462, p < 0.001) reveal that the total number of deaths mean values differed across the clusters. The post-hoc analysis showed that the total number of deaths mean values also differed per each tuple of clusters. The p-value between Clusters A and B was 0.005. As happened with the clustering of the time profiles for COVID-19 cases, the p-value between Clusters A and C as well as between Clusters B and C was found to be less than 0.001. As was also done for the cases time profiles clustering, the Kruskal-Wallis test was conducted to examine the difference in total deaths by 100,000 of the population per cluster. The results (statistic = 124.462, p < 0.001) reveal that the total number of deaths mean values differed across the clusters. The post-hoc analysis showed that the total number of deaths mean values also differed per each tuple of clusters. The p-value between Clusters A and B was 0.005. As happened with the clustering of the time profiles for COVID-19 cases, the p-value between Clusters A and C as well as between Clusters B and C was found to be less than 0.001.

Analysis at Day 250
To analyze the relationship between cases and deaths caused by COVID-19 and countries' development metrics, UNDP indicators were merged with Day 250 (t = 249) of the ECDC dataset. However, the UNDP dataset does not include data for all countries and territories available in the ECDC dataset. For that reason, the resulting dataset did not include data for another 24

Results and Discussion
This section presents the data visualizations, tables, and other results of the conducted analyses together with a discussion of obtained results. The first subsection presents results related to COVID-19 cases. The second subsection addresses results related to COVID-19 deaths. The third subsection undertakes a discussion confronting cases vs. deaths. The fourth subsection considers results related to the analysis of the cases and deaths vs. the countries' development indicators.

Cases-Temporal Sequence
As presented in Figure 4, the coronavirus did not spread at the same time to all countries. It took from 31 December 2019 to 25 March 2020 (86 days) for cases to be identified in all 188 countries under study. Although the first case was reported on 29 December 2019, it was not until the end of February 2020 that its spread appears to have accelerated across the world. Before that time, more than 50% of the countries reporting cases were from Asia. Interestingly, whereas on all other continents only 8-15% of countries reported prior cases, 35% of Asian countries had already identified cases in their population. Although the first cases were mostly reported in Asia, they quickly spread across other continents. This dissemination across continents can be noticeably seen in Figure 5, categorized by cluster, as discussed below.

Cases-Temporal Sequence
As presented in Figure 4, the coronavirus did not spread at the same time to all countries. It took from 31 December 2019 to 25 March 2020 (86 days) for cases to be identified in all 188 countries under study. Although the first case was reported on 29 December 2019, it was not until the end of February 2020 that its spread appears to have accelerated across the world. Before that time, more than 50% of the countries reporting cases were from Asia. Interestingly, whereas on all other continents only 8-15% of countries reported prior cases, 35% of Asian countries had already identified cases in their population. Although the first cases were mostly reported in Asia, they quickly spread across other continents. This dissemination across continents can be noticeably seen in Figure 5, categorized by cluster, as discussed below.     As clearly shown in Figures 6 and 7 and Table 5, the three clusters of country profiles revealed distinct patterns, in terms of both values and shapes, as illustrated by the average profile computed for each cluster: Cluster B average profile indicates incidences that seem to have peaked only at around Day 240, at values above 35 cases per 100,000 of the population. This profile shows a different time constant and slower temporal dynamic with small slope linear growth until around Day 220, followed only after that by what seems to be exponential growth that has only recently reached a peak. The daily average number of cases per 100,000 of the population was 45.808. Except for Argentina, this cluster is composed mostly of small countries, much of them from Europe. The average population for these countries is seven million people. • Cluster C average time profiles correspond to countries with new cases that have always been below 6 cases per 100,000 of the population and clearly showing smaller numbers of people with confirmed infection (either less testing, less incidence, or both). The first small peak was reached around Days 30-40, and a second small peak seems to take place at around Day 115, but more recently an apparent third peak started at Day 240. The daily average number of cases per 100,000 of the population was 0.923. Similar to Cluster A, this cluster is also composed of countries from a large variety of geographies and sizes. However, this is the cluster with the highest average population, 66 million people.
numbers of people with confirmed infection (either less testing, less incidence, or both). The first small peak was reached around Days 30-40, and a second small peak seems to take place at around Day 115, but more recently an apparent third peak started at Day 240. The daily average number of cases per 100,000 of the population was 0.923. Similar to Cluster A, this cluster is also composed of countries from a large variety of geographies and sizes. However, this is the cluster with the highest average population, 66 million people.     As illustrated in Figure 7, by the temporal sequence of cases of the countries from each cluster with the lowest and the highest number of cases by 100,000 of the population at Day 250, although sharing common features there is yet a perceptible difference among countries in each cluster. For example, in Cluster A it is possible to see that Equatorial Guinea did not report cases on all days and that, when it did, it created some spikes. However, it is also possible to note the difference in amplitude and shape per cluster. From the three clusters, at Day 250, Cluster C countries had the lowest average number of reported cases. Conversely, Cluster B countries present the highest average number of reported cases.
The contrast mentioned above between cases' clusters at Day 250 is also visible in As illustrated in Figure 7, by the temporal sequence of cases of the countries from each cluster with the lowest and the highest number of cases by 100,000 of the population at Day 250, although sharing common features there is yet a perceptible difference among countries in each cluster. For example, in Cluster A it is possible to see that Equatorial Guinea did not report cases on all days and that, when it did, it created some spikes. However, it is also possible to note the difference in amplitude and shape per cluster. From the three clusters, at Day 250, Cluster C countries had the lowest average number of reported cases. Conversely, Cluster B countries present the highest average number of reported cases.
The contrast mentioned above between cases' clusters at Day 250 is also visible in Figure 8. Only five countries from Cluster A are included among the top 20 countries with the highest number of cases per 100,000 of the population at Day 250. The remaining 15 countries are all from Cluster B. Except for the last four countries in this top 20, which are South American countries with a considerable population, the remaining 16 countries are mostly tiny and not highly populated.

Deaths-Temporal Sequence
Although the clustering of the temporal sequence of reported standardized and synchronized deaths also identified three clusters, countries were not grouped in the same

Deaths-Temporal Sequence
Although the clustering of the temporal sequence of reported standardized and synchronized deaths also identified three clusters, countries were not grouped in the same way as in the time profiles of registered cases. As presented in Figure 9, the clusters' geographic dispersion changed when compared with the one presented earlier on for the numbers of cases.  Table 6): • Cluster A presents two waves of its average time profile. The first peak happens at around Day 50 and the second about 100 days later. However, another 100 days later, a third wave seems to be forming again. This cluster shows a slight trend line, with the average number of deaths increasing over time. The daily average number of deaths per 100,000 of the population was 0.190. This cluster is composed of countries from a diversity of geographies and sizes. The average population of the countries in this cluster is about 21 million people.

•
Cluster B is the cluster with the highest daily average deaths per 100,000 of the population, 0.562. It presents one first wave of average time profile before Day 50 and a second one around Day 70, followed by a period of irregularity, and, then, after Day 210, a rapid increase of deaths that did not slow down up to Day 250. This cluster includes several small countries and larger countries such as the United States of America, United Kingdom, Spain, Italy, and Sweden, as well as other countries known publicly for being highly impacted by the pandemic. The countries' average population in this cluster is very similar to Cluster A, at around 20 million people.

•
Cluster C is the cluster with the lowest daily average deaths per 100,000 of the population, 0.029, which is six times less than Cluster A and 19 times less than Cluster B. This cluster presents a very flat average profile with no significant waves. Similar to Equatorial Guinea in Cluster A of cases, the country with the lowest number of deaths per 100,000 of the population, Aruba, seems to report data intermittently, thus causing spikes. Apart from some exceptions, this cluster is mostly comprised of countries from Africa, Asia, and Oceania. The countries' average population in Cluster C is almost three times greater than the ones reported in Clusters A and B, reaching 61 million people. As it happened with the clusters of time profiles for COVID-19 cases, the three death standardized and synchronized time profiles clusters revealed a distinctive pattern, in terms of both values and shapes (Figures 10 and 11 and Table 6): • Cluster A presents two waves of its average time profile. The first peak happens at around Day 50 and the second about 100 days later. However, another 100 days later, a third wave seems to be forming again. This cluster shows a slight trend line, with the average number of deaths increasing over time. The daily average number of deaths per 100,000 of the population was 0.190. This cluster is composed of countries from a diversity of geographies and sizes. The average population of the countries in this cluster is about 21 million people. • Cluster B is the cluster with the highest daily average deaths per 100,000 of the population, 0.562. It presents one first wave of average time profile before Day 50 and a second one around Day 70, followed by a period of irregularity, and, then, after Day 210, a rapid increase of deaths that did not slow down up to Day 250. This cluster includes several small countries and larger countries such as the United States of America, United Kingdom, Spain, Italy, and Sweden, as well as other countries known publicly for being highly impacted by the pandemic. The countries' average population in this cluster is very similar to Cluster A, at around 20 million people. • Cluster C is the cluster with the lowest daily average deaths per 100,000 of the population, 0.029, which is six times less than Cluster A and 19 times less than Cluster B. This cluster presents a very flat average profile with no significant waves. Similar to Equatorial Guinea in Cluster A of cases, the country with the lowest number of deaths per 100,000 of the population, Aruba, seems to report data intermittently, thus causing spikes. Apart from some exceptions, this cluster is mostly comprised of coun-tries from Africa, Asia, and Oceania. The countries' average population in Cluster C is almost three times greater than the ones reported in Clusters A and B, reaching 61 million people.      The top 20 countries with the highest cumulative number of deaths per 100,000 of the population at Day 250 ( Figure 12) differ substantially from the top 20 countries regarding cumulative registered cases also at Day 250. Although only countries from Clusters A and B are present, actually only three are from Cluster A: Brazil, Panama, and Colombia. Nine of the top 20 countries with more deaths did not appear in the top 20 countries with more cases: Belgium, Bolivia, Spain, Colombia, United Kingdom, United States of America, Bosnia and Herzegovina, Italy, and Sweden. This difference could be explained by different levels of mortality, testing, or both. In contrast with the top 20 countries by cases, the top 20 countries in deaths do not include many small countries. In fact, this set includes larger and higher populated countries, most of them from the Americas and Europe.
Appl. Sci. 2021, 11, 3400 15 of 22 levels of mortality, testing, or both. In contrast with the top 20 countries by cases, the top 20 countries in deaths do not include many small countries. In fact, this set includes larger and higher populated countries, most of them from the Americas and Europe.

Cases vs. Deaths
Despite the differences found in the top 20 countries for cases and deaths per capita at Day 250, as mentioned in the previous section and highlighted in the Sankey diagram of Figure 13, there is an association between the scaled and synchronized time profiles of clusters of cases and clusters of deaths per 100,000 of the population. The deaths cluster with the high number of deaths, Cluster B, is mostly composed of countries that also belonged to Cluster B of cases. The same relation exists between Cluster A of cases and Cluster A of deaths. Nevertheless, several examples also exist of countries that moved from "bad" clusters to "good" clusters, and vice versa. Once again, countries' different clustering positioning in terms of cases and deaths suggests a possible relation between the capacity to fight the pandemic, namely testing capacity, ageing, and health conditions.

Cases vs. Deaths
Despite the differences found in the top 20 countries for cases and deaths per capita at Day 250, as mentioned in the previous section and highlighted in the Sankey diagram of Figure 13, there is an association between the scaled and synchronized time profiles of clusters of cases and clusters of deaths per 100,000 of the population. The deaths cluster with the high number of deaths, Cluster B, is mostly composed of countries that also belonged to Cluster B of cases. The same relation exists between Cluster A of cases and Cluster A of deaths. Nevertheless, several examples also exist of countries that moved from "bad" clusters to "good" clusters, and vice versa. Once again, countries' different clustering positioning in terms of cases and deaths suggests a possible relation between the capacity to fight the pandemic, namely testing capacity, ageing, and health conditions.  Despite histograms of accumulated cases and deaths per 100,000 of the population showing a similar shape distribution (Figure 14), the plot of cumulative scaled deaths versus cases at Day 250 ( Figure 15) confirms that there is some relationship between them. However, there is also considerable variability (for the same number of scaled cases, there can be four times the number of scaled deaths as we move from one country to another). Despite histograms of accumulated cases and deaths per 100,000 of the population showing a similar shape distribution (Figure 14), the plot of cumulative scaled deaths versus cases at Day 250 ( Figure 15) confirms that there is some relationship between them. However, there is also considerable variability (for the same number of scaled cases, there can be four times the number of scaled deaths as we move from one country to another).  As illustrated in Figure 16, the Pearson correlation coefficient between scaled cases and deaths was found to be positive (0.67), as expected.  As illustrated in Figure 16, the Pearson correlation coefficient between scaled cases and deaths was found to be positive (0.67), as expected.

Cases and Deaths vs. Development Indicators
Tables 5 and 6 and Figure 16 also show the differences found between the clusters' development indicators and their relationship with the accumulated numbers of cases and deaths at Day 250. Figure 16 reveals a positive correlation of cases and deaths with the HDI of 0.44 and 0.40, respectively. The same range of correlation was found between scaled cases and deaths and the other variables that compose the HDI (life expectancy, expected years of education, average years of education, and GNI per capita). This correlation suggests that, to a certain extent, there is an association between countries' development and cases/deaths of COVID-19. This association could be related to the fact that in developed countries the population lives longer and is older. However, it could also reveal that underdeveloped countries do not have the means to conduct a high number of tests, resulting in unidentified cases and deaths.
The associations mentioned above are also visible in the boxplots for variables' averages per country of each cluster, as shown in Figures 17 and 18, respectively. Overall, both boxplots show a statistically significant difference in the clustering of cases and deaths for all variables. These boxplots also show that cases and deaths in Cluster A comprise countries with a good HDI and a high life expectancy. Moreover, boxplots show that Cluster B is the one with the worst performance in terms of cases and deaths. This cluster comprises countries with higher HDI, which in turn means there is higher life expectancy, expected and frequented years of education, and GNI per capita. In turn, boxplots show that Cluster C countries in cases and deaths are mostly countries with low HDI and associated lower development indicators.

Cases and Deaths vs. Development Indicators
Tables 5 and 6 and Figure 16 also show the differences found between the clusters' development indicators and their relationship with the accumulated numbers of cases and deaths at Day 250. Figure 16 reveals a positive correlation of cases and deaths with the HDI of 0.44 and 0.40, respectively. The same range of correlation was found between scaled cases and deaths and the other variables that compose the HDI (life expectancy, expected years of education, average years of education, and GNI per capita). This correlation suggests that, to a certain extent, there is an association between countries' development and cases/deaths of COVID-19. This association could be related to the fact that in developed countries the population lives longer and is older. However, it could also reveal that underdeveloped countries do not have the means to conduct a high number of tests, resulting in unidentified cases and deaths.
The associations mentioned above are also visible in the boxplots for variables' averages per country of each cluster, as shown in Figures 17 and 18, respectively. Overall, both boxplots show a statistically significant difference in the clustering of cases and deaths for all variables. These boxplots also show that cases and deaths in Cluster A comprise countries with a good HDI and a high life expectancy. Moreover, boxplots show that Cluster B is the one with the worst performance in terms of cases and deaths. This cluster comprises countries with higher HDI, which in turn means there is higher life expectancy, expected and frequented years of education, and GNI per capita. In turn, boxplots show that Cluster C countries in cases and deaths are mostly countries with low HDI and associated lower development indicators.

Conclusions
Despite COVID-19 being a worldwide pandemic, very significant differences and time profiles were found regarding both the registered cases and deaths time profiles for each country, when scaled and synchronized data were used, leading to the identification of three distinctive clusters in the corresponding country time series.
These findings seem to validate our initial Hypothesis H1: there are different types of country/territories behavior with regards to the corresponding scaled and synchronized COVID-19 time evolution and profiles.

Conclusions
Despite COVID-19 being a worldwide pandemic, very significant differences and time profiles were found regarding both the registered cases and deaths time profiles for each country, when scaled and synchronized data were used, leading to the identification of three distinctive clusters in the corresponding country time series.
These findings seem to validate our initial Hypothesis H1: there are different types of country/territories behavior with regards to the corresponding scaled and synchronized COVID-19 time evolution and profiles.

Conclusions
Despite COVID-19 being a worldwide pandemic, very significant differences and time profiles were found regarding both the registered cases and deaths time profiles for each country, when scaled and synchronized data were used, leading to the identification of three distinctive clusters in the corresponding country time series.
These findings seem to validate our initial Hypothesis H1: there are different types of country/territories behavior with regards to the corresponding scaled and synchronized COVID-19 time evolution and profiles.
Moreover, clusters could be found by unsupervised learning and were explored, and no geographical bases or obvious groupings were identified. In fact, one can see countries that show quite different patterns within the same continent or region. Such findings assist also in answering our initial research Hypotheses H1a and H1b: three clusters were identified regarding the time profiles of scaled COVID-19 cases, and another three clusters were found out for the time profiles of scaled COVID-19 deaths. Some features, such as the number or intensity of peaks or when they take place, seem to be associated with the different clusters that were identified.
Finally, regarding our initial research Hypothesis H1c, which deals with the characteristics of the countries/territories placed in each cluster, there are interesting relations but wide variability was also found in the scaled cases versus deaths values seen across countries. Although clusters' mean results seem to validate Hypothesis H1c, wide variability is present in each cluster.
Countries presenting higher numbers of cases per 100,000 of population are only partially correlated with those that have the largest numbers of deaths per 100,000 of population. Usually, more developed countries have been able to step up the number of tests as compared to less developed ones, with the latter also suffering from comparatively worst sanitary conditions as well as weaker public health response mechanisms, but they also have younger populations, and therefore some effects can compensate for the others. These can explain the non-trivial connections found between variables and countries, as well as the corresponding COVID-19 time profiles, but some interesting partial correlations and findings were extracted from the analysis conducted.

Limitations and Future Research Directions
This study faced a number of challenges, which imposed some limitations to it. Specifically, some countries reported data intermittently, causing spikes, and affecting averages of both cases and deaths. Although not frequently, some countries erroneously reported excess values on some dates, leading them to declare negative values later on other dates to correct for those values. This situation also affected the daily averages of cases and deaths.
Future studies should keep updating information and extract further time profile observations and evolutions. Additionally, subsequent research is advised to try to model for instance deaths per capita with a number of country variables and see what can be concluded from this modeling analysis. The use of a ratio of population by area (population density) can also bring another quite interesting and enlightening perspective, since contagion of the COVID-19 is spearheaded by proximity between humans.
Finally, foreseeable studies should investigate what impacts may be derived by the number of people per capita who received vaccines and the corresponding scaled and synchronized vaccination time profiles.  Data Availability Statement: The data used in this paper is available in the references presented in Section 2.1.

Conflicts of Interest:
The authors declare no conflict of interest.