1. Introduction
In December 2019, cases of acute pneumonia in the city of Wuhan, China, called attention due to the speed of contagion [
1]. The virus that caused this pneumonia in affected individuals is the SARS-CoV-2, the causative agent of the novel coronavirus (COVID-19). The number of cases quickly evolved into a pandemic, decreed by the World Health Organization (WHO) on 11 March 2020. Currently, a few months after the first cases, this disease has been plaguing the world [
2].
At the moment, the pandemic epicenter is in Latin America, with Brazil, Argentina, Colombia, Peru, Mexico, and Chile being the most representative countries, concentrating more than 10 million cases. Brazil already has more than 5 million cases and 150,000 deaths, ranking third in the world in the number of cases and deaths by COVID-19, behind only to the United States and India [
3].
Intense efforts by the scientific community are focused on finding drugs to treat [
4] and develop a vaccine for the disease [
5]. Other research tries to understand how the disease has spread around the world. Some factors can influence the spread of the virus, including the disregard of preventive measures adopted by the government by population [
6], climatic [
7,
8,
9], comorbidities [
10,
11], hospital structure, prepared professionals, personal protective equipment, and financial resources [
4].
In order to monitor the pandemic, it is necessary to collect information on the most significant possible number of factors that can influence the disease’s dissemination behavior and, with this, establish a relationship that allows to affirm which measures delay the spread of COVID-19. Statistical tools have been used to study the behavior of epidemics for years; this area treats, analyzes, and obtains information from large datasets that cannot be analyzed by traditional systems [
12].
Artificial Neural Networks (ANNs) are tools that have been gaining scientific relevance to perform pattern recognition, classification, or prediction tasks. ANN is a non-linear computational model attempting to simulate human brain structure and decision-making [
13]. Its architecture is inspired by biological neural networks and consists of simple processing units that store empirical knowledge through a learning process [
14].
Among the most common types of neural networks are the Feed-Forward Neural Network (FFNN), the Convolutional Neural Network (CNN), and the Recursive Neural Network (RNN). FFNNs are the most common type of ANN in practical applications, consisting of one or more hidden layers of perceptrons (neurons) that require supervised training. The input data of the desired sample datasets and the output results are sent to the network several times until the error in the output is minimized [
13].
Traditionally, several types of neural networks exist with various techniques such as Autoregression (AR) [
15,
16], Moving Average (MA) [
17,
18], Exponential Smoothing (ES) [
19], Hybrid Methods (HM) [
20,
21,
22], and Autoregressive Integrated Moving Average (ARIMA) [
23]. They have been used to predict the dependent variable in a time series [
15,
16,
17,
18,
19,
20,
21,
22,
23]. Among these techniques, unsupervised neural networks of the type Self-Organizing Maps model has mostly outperformed others in precision and accuracy [
14].
Self-Organizing Maps (SOM) or Kohonen Map is a method for the analysis of multivariate data used for pattern recognition and classification, which may be taken as a non-linear generalization of principal component analysis [
24]. The SOM algorithm consists of input nodes and a grid of computational connected nodes (neurons), which compete among themselves for activation as the one that most closely resembles the input vector. If the input data exhibit some similarity across the input classes, the neurons will organize themselves, showing similarity patterns in a grid [
13,
25].
SOM has already been used successfully for data mining in several areas of knowledge [
24,
26,
27,
28,
29,
30], like food science [
26,
28,
29], fuels [
27], monitoring chemical reactions [
24,
30], and other applications [
25,
31,
32]. Recently, SOM was used to verify the spatial relationship of the COVID-19 spread in countries and states in Mexico [
31]. Most published papers have the main theme of the modelling or prediction of the spread of COVID-19 [
2,
9,
33,
34,
35,
36]. In our literature searches, we did not evidence reports that simultaneously consider the historical data on the number of cases/deaths from COVID-19 with possible external factors that affect the spread of the virus through a spatial analysis by pattern recognition.
In a previous study, we demonstrated that the spread of COVID-19 varies according to Brazilian regions, states, and cities. However, it was not possible to allege the reason for this behavior. In this study, 14 possible factors that can affect the spread of COVID-19 were analyzed jointly by SOM. This analysis will verify which variables may be essential and will point out possible variables that had the most significant effect on each Brazilian state.
4. Tests and Results
Figure 1 shows the number of cases and deaths accumulated by Brazilian states and regions per 100,000 inhabitants. All states belonging to the South (S) and Central-West (CW) regions of the country had the lowest rates of cases and deaths by COVID-19 recorded, an average of 180 cases and 4 deaths per 100,000 inhabitants. Most of the Brazilian states with the highest rates of cases and deaths belong to the North (N) region, an average of 954 cases and 43 deaths per 100,000 inhabitants, mainly represented by Acre (AC), Amapá (AP), Amazonas (AM), Pará (PA), and Roraima (RR).
Some states in the Brazilian Northeast (NE) and Southeast (SE) also had high rates of cases and deaths when compared to the other states in the South (S) and Central-West (CW) regions. The Northeast (NE) is the second region with the highest rate of cases and deaths, an average of 527 cases and 24 deaths per 100,000 inhabitants, the states in this region that were most aggravated by the disease were Ceará (CE), Maranhão (MA), and Pernambuco (PE). The Southeast (SE) is the third region with the highest rate of cases and deaths, with an average of 338 cases and 23 deaths by COVID-19, with Espírito Santo (ES), Rio de Janeiro (RJ), and São Paulo (SP) being the most representative states in the region.
The neural network demonstrated that the spread of COVID-19 in Brazil has heterogeneous behavior; see
Figure 1. It is essential to understand this behavior. The North of the country was more affected by COVID-19 than the South. The Northeast and Southeast differ in case rates but have similar death rates. In this sense, possible factors were evaluated that may explain the reason for this behavior and point out which measures are more relevant in the fight against COVID-19 in each Brazilian state.
After the training phase of the SOM network, we generated the topological map of the 14 variables evaluated together, which represents the distribution of each federative unit in Brazil, according to the winning neuron; see
Figure 2. In the topological map, each federative unit is associated with a respective winning neuron, that is, the one that best represents in the analysis. The SOM network classifies the input data as clusters that can be formed by one or more neurons. The definition of clusters is characterized by the presence of empty neurons between the groups. Nearby clusters share some similarity, that is, the greater the Euclidean distance, the greater the difference in behavior.
We identified the formation of some clusters by evaluating the topological map. Among some of the clusters formed, we highlight the one formed in the lower-left corner of the map in
Figure 2, represented by the states of the Southern region of the country, which contain the federative units: Paraná (PR), Rio Grande do Sul (RS), and Santa Catarina (SC). This cluster contains the federal units and the region of the country with the lowest rates of deaths and cases by COVID-19, as shown in
Figure 1. Another cluster formed that we can highlight is composed of some states in the North region of the country, represented by the states of Acre (AC), Amapá (AP), and Roraima (RR), presented in the upper-right corner of the map. This cluster represents the region and some of the federative units with the highest rates of deaths and cases due to COVID-19 in the country. This same analysis can be done for the other formed clusters.
To verify the applicability of SOM analysis, we compared the results obtained with another unsupervised method. Thus, we evaluated the ability to group the data obtained using the Hierarchical Cluster Analysis (HCA) method. The results obtained with the HCA were very similar to those obtained with the SOM algorithm. The dataset presents different patterns according to the region in which the Brazilian states are located. In other words, most states in the North and Northeast regions formed a cluster, while most federative units in the Central-West, South, and Southeast formed another cluster. The graphic output generated by the HCA method is shown in
Figure S15 of the Supplementary Material.
In general, the topological map of the SOM network made it possible to state that the socioeconomic, health and safety data demonstrate that the spread of COVID-19 in the country varies according to the Brazilian federative units. However, only the topological map does not allow to state which of the 14 variables evaluated may be the main responsible for explaining the spread of COVID-19 in the country. Thus, we used weight maps in the following discussions. In order to obtain a better representation, the values collected from the weight maps were transposed through the color scale to the Brazilian cartographic map for each variable. However, when doing the transposition procedure, we lose the neighborhood relationship. In
Figures S1–S14 of the Supplementary Material, the original outputs of the weight maps for each variable in the SOM network are shown.
The 14 variables were evaluated jointly by ANN. However, for better results visualization and discussion, the weight maps were divided into four blocks and grouped according to similarities: (i) Available hospital infrastructure; (ii) inputs and tests available; (iii) drugs available; and (iv) financial resources,
The first set of variables is shown in
Figure 3, composed of the rates of ICU beds, ventilators, physicians, and nurses per 100,000 inhabitants. In general, it is observed that the states that belong to the South, Southeast, and Central-West regions of Brazil had higher rates than the North and Northeast. There is evidence that these variables influence the rates of cases and deaths in each Brazilian state, and it can be said that the states with the lowest rates of ICU, respirators, physicians, and nurses, have the highest rates of cases and deaths from COVID-19.
It is also evident that these were not the only relevant factors. The state of Rio de Janeiro (RJ), for example, has high rates of ICU beds, respirators, physicians, and nurses and despite this, it has a higher death rate than other states in the North region, such as Acre (AC), Amapá (AP), and Roraima (RR).
The second set of variables presented in
Figure 4 represents the inputs destined to each state by the Federal Government to combat and control the spread of COVID-19, composed by PPE, hand sanitizer, rapid test, and PCR test per 100,000 inhabitants. In general, the PPE and hand sanitizer rates distributed for each state did not have a direct relationship with the rates of cases or deaths by COVID-19 according to the SOM analysis.
The states of Amapá (AP) and Rio de Janeiro (RJ), for example, had the highest rates of PPE distributed, and Amazonas (AM) had the highest rates of liters of hand sanitizer distributed per 100,000 inhabitants; however, these three states present higher rates of cases and deaths when compared to other Brazilian states.
It is important to note that the data refer to the distributed quantity of inputs, and it is not possible to estimate which portion was used by hospitals or the population, or even say if they were used correctly. The cultural factor also interferes in these variables, mainly concerning the education and awareness of the local population about the use of these inputs [
45].
The disease tests in the country have been carried out homogeneously, as shown in
Figure 4. This homogeneity is considered an important aspect and demonstrates that differences do not influence the data collected regarding the number of cases in the form of confirmation of the disease, which could question the reliability of the data. The states that had the highest test rates available were the Distrito Federal (DF), Rio Grande do Sul (RS), and Roraima (RR), which belong to the Central-West, South, and North regions, respectively, i.e., the most and least affected regions by COVID-19 in the country.
Rapid tests are cheaper, less reliable, and can result in false positives or negatives, while the PCR test is reliable but expensive and time-consuming [
46]. It is necessary to adopt a balance between rapid tests and PCR; for example, if PCR test indices are higher, it will not be possible to follow the spread in a short period, while rapid tests, when misused, can generate mistaken information.
Previous measures as vaccines and drugs against diseases with similar symptoms, such as influenza H1N1 and H3N2, can facilitate the diagnosis of COVID-19, since they are relevant information during the patient’s anamnesis. These procedures allow faster and more efficient identification of the disease, which makes the adopted treatments more assertive and precocious [
47]. Another measure that focuses on significant discussions in the scientific community is the use of drugs against the coronavirus [
48]. Among several drugs, the Brazilian Federal Government has invested and passed on chloroquine tablets to the states.
The third set of variables in
Figure 5 consists of influenza vaccines distributed and applied, and drugs distributed by states per 100,000 inhabitants. The states of the North region, such as Amazonas (AM), Roraima (RR), Pará (PA), and Acre (AC), had a more significant disparity than the other states in the rates of vaccines applied and distributed. At first, this behavior could be related to health factors, shown in
Figure 3. However, this disparity would also be observed in the Northeast region. Other intrinsic factors may include geographic and logistical factors that make distribution and access to the population a challenge to receive the vaccine.
As far as we know, there is no drug with proven efficiency against SARS-CoV-2; some drugs have been evaluated to combat or alleviate the disease’s symptoms [
48]. Recent studies demonstrate that dexamethasone welcomes the preliminary treatment of critically ill patients with COVID-19 [
49]. Recently, the Food and Drug Administration (FDA) revoked emergency use authorization for chloroquine and hydroxychloroquine in patients with COVID-19 [
50].
Figure 5 shows the chloroquine rates distributed by the Federal Government per 100,000 inhabitants. Chloroquine tablets were more widely distributed in the states of the North region, which has higher rates of deaths and cases of COVID-19 in Brazil. The possibility of greater distribution of chloroquine in these states may have occurred as an attempt to treat the disease symptoms when there were no restrictions on its use.
The oseltamivir rates distributed by the Federal Government per 100,000 inhabitants is shown in
Figure 5. This drug is commonly used to treat symptoms caused by the influenza virus and is administered when there is still no confirmed diagnosis of COVID-19 [
51]. Oseltamivir was more widely distributed in the Southern and Northern states of Brazil; however, it is not possible to establish a direct relationship with the rates of cases and deaths by COVID-19 using SOM analysis.
The last set of variables evaluated comprises the HDI and the distribution of federal funds destined to each state. According to
Figure 6, the destination of Federal funds was more significant for some states in the North, the region which has the highest rates of cases and deaths due to COVID-19. More resources destined for this region can be guided by the scenario of cases and deaths registered.
The highest HDI values are in the Central-West, South, and Southeast regions, with the Federal District (DF), Santa Catarina (SC), and São Paulo (SP) being the highest values in the country. It is also worth mentioning that the most affected states by COVID-19 have some of the lowest HDIs in the country, showing problems that some of these states have in fundamental areas of the population service.
Figure 7 shows the Spearman’s correlation test adopted to determine the correlation among variables. The test demonstrated that the influenza vaccine applied, ICU, ventilators, physicians, nurses, and HDI were the most correlated significant variables, all positive correlations, while chloroquine had a negative correlation with all these variables. The HDI is directly related to essential aspects of the population, such as family income, education level, health, among other factors; therefore, the correlation with the HDI was already expected, since it takes into account important aspects of health for its determination. Other variables evaluated showed minor correlations, while PPE did not correlate with any other variable.
5. Discussion
Considering the importance of understanding the spread of the virus, many studies have been presented in recent months related to different aspects of the pandemic, with the application of computational intelligence tools to model and predict the spread of the disease [
34].
In our study, we gave the data a differentiated approach from the reported works. It is not common to find simultaneous studies with historical data on the number of COVID-19 cases, related to external factors, that affect the spread of the virus through spatial analysis [
31]. For this, we used unsupervised pattern recognition to analyze some variables, among many others, that may be related to the disease spread.
Brazil is one of the most affected countries by the pandemic and by the tremendous socioeconomic, territorial, climatic differences, among the country’s federative units. We understand that it may be interesting to try to establish spread patterns of the virus in the country. Thus, once the relationship of these factors with the spread and lethality of the disease has been identified, we hope that our study can assist in the analysis of the data that have been generated by institutions and, consequently, assist in directing decision-making on practical combat actions.
It is important to note that our analysis does not aim to point out or compare which were the best actions adopted to contain the spread of COVID-19, so we approached each variable according to the significance indicated by the SOM analysis.
Based on SOM’s clustering skills, we were able to spatially group similar federative units in terms of the number of cases and deaths by COVID-19 with data on the distribution of financial resources, equipment, health professionals and HDI, represented by a color scale. Thus, SOM’s ability to cluster, made it possible to cluster together with the federal units that are behaving similarly and, therefore, can benefit from similar strategies to deal with the virus spread.
The SOM analysis allowed us to raise some hypotheses about the spread of the virus in Brazil. In general, the analysis indicated that the spread of the disease has a direct relationship with quite heterogeneous socioeconomic, health, and safety variations in the national territory; see topological map in
Figure 2.
In
Figure 2, we can see that the spread of the disease in Brazilian states has been differentiated and regionalized. Interestingly, the network separated the most and least regions affected by COVID-19. We found that there was a separation pattern, in which all the Southern federative units are located at the bottom left of the map (PR, RS, and SC), while the Southeast states (SP, RJ, MG, and ES) and the Central-West (MS, MT, DF, and GO) are in the upper left corner of the map. In the North and Northeast regions, the federative units were classified on the right side of the map.
For better visualization, we invite readers to check the original output weight maps of the variables in the
Supplementary Material. Weight maps represent the overlap of the topological map (
Figure 2) and allow us to evaluate the behavior of each variable for the segmentation of the federative units. Analyzing the weight maps for the 14 variables extracted from the SOM network, it was possible to show which were the main responsible for differentiating the states, regarding the number of cases and deaths by COVID-19 in Brazil by socioeconomic, health, and safety data.
According to the 14 weight maps, the HDI was one of the essential variables for the distribution obtained (
Figure 6 and
Figure S13). The analysis indicated that the federal units with the lowest HDI were considerably more affected by the pandemic and that they had greater difficulty in combating the spread of the virus. Meanwhile, the best-prepared Brazilian federations with higher HDI values were more capable of deal with the pandemic.
Possibly, the HDI was the main measure responsible for the separation due to its high correlation with the other variables. It is noteworthy that this index relates to health (life expectancy), education (adult literacy index and levels of education), and income (GDP—gross domestic product—per capita). Thus, it is possible to observe the correlation of the HDI in
Figure 7, with other variables, such as ICU (
Figure S1), ventilators (
Figure S2), doctors (
Figure S3), and nurses (
Figure S4).
Another important variable that allowed us to assess the behavior of the spread of COVID-19 in Brazil was the influenza vaccine rates of doses distributed and applied, highly correlated according to the Spearman correlation test (
Figure 7). According to
Figure 6,
Figures S9 and S10, we show that the federative units with the highest rates of vaccines applied had lower rates of cases and deaths from COVID-19, as pointed out by the SOM analysis. Although this procedure is not effective against SARS-CoV-2, this measure may have facilitated clinical diagnosis and made the treatment of affected patients by COVID-19 faster and more accurate, and may have reflected in the number of cases and deaths in these regions and federative units.
Other measures adopted in the country, such as the use of drugs as chloroquine and oseltamivir, did not allow to evaluate the spread of COVID-19, as shown in
Figure 5 and the weight maps of the variables in
Figures S11 and S12. The SOM analysis indicated that the federative units in the North and Northeast regions had higher rates of chloroquine tablets distribution but more cases and deaths from the disease were registered. On the other hand, oseltamivir capsules had higher distribution rates in the Southern and Northern states, representing the regions more and less affected by the disease, which means that this variable, the spread of COVID-19 was similar.
We evidence that PPE (
Figure S5) and hand sanitizer (
Figure S6) rates distributed for each state did not have a direct relationship with the rates of cases or deaths by COVID-19 when considering the SOM analysis (
Figure 4). It is known that these measures directly influence the control of the COVID-19 dissemination [
45]; in this sense, we show that the SOM analysis allowed us to verify that although these items have been distributed to federative units, the population has not followed the measures adopted or made use of these items properly.
Figure 7 reinforces this hypothesis since the PPE did not correlate with any other variable. Thereby, government agencies must further encourage the use of available items.
Among the other less expressive variables evaluated, we show that the rapid test (
Figure S8), PCR test (
Figure S9), and distribution of Federal funds (
Figure S14) did not present an obvious behavior that would allow explaining the spread of COVID-19 using the numbers of cases and deaths in the country, according to the SOM analysis indicated.