Graph Theory Approach to COVID-19 Transmission by Municipalities and Age Groups

: The COVID-19 pandemic remains a global problem that affects the health of millions of people and the world economy. Identifying how the movement of people between regions of the world, countries, and municipalities and how the close contact between individuals of different age groups promotes the spread of infectious diseases is a pressing concern for society, during epidemic outbreaks and pandemics, such as COVID-19. Networks and Graph Theory provide adequate and powerful tools to study the spread of communicable diseases. In this work, we use Graph Theory to analyze COVID-19 transmission dynamics between municipalities of Aveiro district, in Portugal, and between different age groups, considering data from 2020 and 2021, in order to better understand the spread of this disease, as well as preparing actions for possible future pandemics. We used a digraph structure that models the transmission of SARS-CoV-2 virus between Aveiro’s municipalities and between age groups. To understand how a node ﬁts over the contact digraphs, we studied centrality measures, namely eigencentrality, closeness, degree, and betweenness. Transmission ratios were also considered to determine whether there were certain age groups or municipals that were more responsible for the virus’s spread. According to the results of this research, transmissions mostly occur within the same social groupings, that is, within the same municipalities and age groups. However, the study of centrality measures, eliminating loops, reveals that municipalities such as Aveiro, Estarreja and Ovar are relevant nodes in the transmission network of municipalities as well as the age group of 40–49 in the transmission network of age groups. Furthermore, we conclude that vaccination is effective in reducing the virus. Author Contributions: Conceptualization, P.M., S.J.P., V.A. and C.J.S.; methodology, P.M., S.J.P., V.A. and C.J.S.; validation, P.M., S.J.P., V.A. and C.J.S.; formal analysis, P.M., S.J.P., and C.J.S.; investigation, P.M., S.J.P., V.A. and C.J.S.; data curation, R.L.; writing—original writing—review


Introduction
The pandemic caused by the SARS-CoV-2 virus, known as COVID-19, remains a major problem on a global scale. The virus not only affected the health of millions of people, but it also hindered the growth of economies around the world. Overpopulation, globalization, and hyper-connectivity have been identified as important factors that have accelerated the transmission of this infectious disease, turning the epidemic into a pandemic [1].
Concerning the SARS-CoV-2 virus, the main transmission mode was recognized as the dissemination of aerosol droplets by normal breathing and speech [2]. In this sense, aerosol exposure related to SARS-CoV-2 and the risk of infection have led to further studies [3]. However, the spread of infectious diseases is mainly caused by two factors: the virus's physical and chemical properties, and the way people interact with each other through their social networks [4]. Humans are not passive hosts for viruses; they actively interact with one another and, as a consequence, transfer diseases to socially predictable subjects and locations [4]. The way people build social networks influences the overall status and pattern of a virus spread. In fact, one common research question is to identify which network characteristics predict the importance of a node regarding the disease spreading [5][6][7]. More precisely, structural network measures, such as centrality measures, are sought that classify nodes in the same order of some quantity describing their importance in relation to the spread of the disease [8,9]. Some studies aim to investigate the predictive power of such centrality measures [10,11].
In Portugal, several restrictive measures were taken between 2020-2021. With the aim of preventing the virus transmission, on 18 March 2020, with the declaration of the first state of emergency, extraordinary and urgent restrictive measures were applied in terms of movement rights and economic freedoms. This state of emergency ended on 2 May, and was followed by a decrease in the epidemic crisis. During the summer of 2020 until the next state of emergency on 6 November 2020, the Portuguese government changed contingency states or alert states depending on the countryside or regional epidemic situation. In September 2020, an increase marking the beginning of a second wave peaked in November with over 6000 daily new cases despite the reintroduction of some restrictive measures in late October and early November 2020 [12].
This paper focuses on the social networks, particularly on concepts of centrality measures in a graph identifying important nodes. We consider centrality measures from graph theory, to better interpret the COVID-19 transmission network in Aveiro, Portugal, considering their municipalities as well as the role of the different age groups in the spread of the virus.
Graph theory is a powerful tool for measuring and describing social interactions, being commonly used to describe social networks. Graphs are mathematical models that represents relationships between entities that are the vertices of the graph and the relationships between them represented by links or edges of the graph. They can be used to model anything from chemical structures [13] to city drainage systems [14] or to human brain networks [15]. If the edges are directed from one vertex to other, the graph is called a directed graph or digraph. In this case the edges are called arcs and the vertices are called nodes. We use a digraph to represent the transmission of SARS-CoV-2 virus between Aveiro's municipalities and between age groups and we study some centrality measures to explain it. In [16] the centrality measures as a way to control the spread of the virus through vaccination were studied.
SARS-CoV-2 transmission was previously interpreted using these methods in countries like Italy [17], India [1], and Turkey [18] with different levels of results. Several research studies on the spread of diseases are based on graph models and show how their use can significantly help in controlling dissemination [16,[19][20][21].

Materials and Methods
In this study, all COVID-19 related test results in Baixo Vouga Primary Care Cluster (ACES BV) reported to the Public Health Unit (PHU) between 8 March 2020, and 14 January 2022 (N = 17,568) were considered. However, due to missing numbers and/or insufficient information in the data for 2022, only the first two years of data were considered. Since the study's focus was on the dynamics between municipalities and age groups, the database was filtered to contain just the relevant data for this purpose. The dataset was also filtered to eliminate missing and repetitive items.
The resulting dataset was then used to generate contact matrices for age groups and municipalities for 2020 and 2021, as well as January-June, July-September, and October-December time intervals for 2021 (in this latter period the data are more complete), allowing a comparative study between the two complete years as well as considering the children's school year. First, to realize the global dynamics of COVID-19 disease in the period 2020-2021, we studied the contact matrices considering the data from this period referring to age groups and municipalities. Then, to detect the importance/influence of a node in the transmission of the virus, we studied some centrality measures. For this, the contact matrices were used to generate digraphs where the nodes represent the municipalities or age groups and there is a weighted arc linking two nodes if there is transmission between them. The weight of an arc quantifies the level of transmission. In this case, loops, which are arcs that start and end at the same node, were taken out because the centrality measures were meant to focus on the relationships between nodes, and thus only the arcs that represented transmissions between nodes were kept. A path in a digraph is a sequence of nodes in which there is an arc pointing from each node in the sequence to its successor in the sequence, with no repeated arcs. The length of a path is the sum of the weights of its arcs. If the digraph is unweighted, then we assume that the weight is one. The shortest path in a digraph is a path such that the sum of the weights of its constituent arcs is minimum [22].
The following centrality measures were applied: closeness centrality, betweenness centrality, eigencentrality, degree centrality. The closeness centrality of a node represents its proximity to all other nodes in the network. It is calculated as the average of the shortest path lengths from one node to all other nodes in the network, and so represents the transmission strength. The closeness centrality of a node v, C(v), is defined by where d(v, u) is the distance (length of the shortest path) between nodes v and u.
The betweenness centrality evaluates a node's impact on the information flow in the network, and thus it represents the power of a node as a bridge between nodes [15]. The betweenness centrality of a node v, B(v), is defined by where σ uw is the total number of shortest paths from u to w and σ uw (v) the number of those paths that pass through v.
The eigencentrality is another way to figure out how important a node is in a network, and looks at how strong/relevant each node's neighbors are in the proximity network [23]. In this way, a node with a few relevant neighbors has a larger eigenvector centrality than a node with various neighbors of limited relevance. This measure is computed by assuming that the centrality of node v is proportional to the sum of centrality of node v's neighbors.
The degree centrality is the number of arcs incident on a node [24] and refers to the sum of the indegree and outdegree centrality measures.
Closeness and betweenness centrality measures are based on the shortest paths that can be taken from one node to every other node in the network. Since in our case the arc's weights are the number of transmission cases between two nodes, that is, the greater the number of transmissions, the higher the arc's weight, and consequently the closer the nodes involved should be, we consider the inverse of the arc's weights as entries of the contact matrices to perform the calculations for the centrality measures.
All the digraphs and the calculation of the centrality measures were made using the igraph R package [25].
Matrices were also used to make transmission ratios by figuring out the ratio between the number of people infected by each group and the number of people infected in each group. This allowed to find out which groups spread the most per person. Furthermore, ratio values greater than one indicate that this group increased the prevalence of COVID-19 by stating that, on average, each individual in the target group infected more than one person and therefore disseminated the virus. Tables 1 and 2 show the contact matrices according to the overall time span of the data (2020 to 2021), both for data relating to municipalities and for data relating to distinct age groups. In the following tables the color coding helps to discern the level of transmission, where a darker color corresponds to a higher value.  The municipalities data matrix displays a pattern, that is, the higher number is always given in the principal diagonal which means that the highest number of infections was within the municipality itself. However, in the age group matrix, this is not so evident, occurring in some groups but always at a much lower level than the other matrix, being the transmission more dispersed on the lines. This trend implies that, for both municipalities and age groups, transmissions preferentially occur with the same group. Furthermore, in Table 1 the transmission numbers have a two-decade gap, which may be justified by the existence of a relationship between parents and children. In Table 2, excluding the principal diagonal, the higher values coincide with geographically close municipalities.

Centrality Measures and Transmission Ratios
Since, in the study of centrality measures, the main goal is to study the level of transmission between different classes, the loops were eliminated from the digraphs; that is, the entries of principal diagonal of the contact matrices have been rewritten with zeros. The digraphs corresponding to contact matrices for municipalities and age groups are presented in Figure 1. The network structure conforms to the distances between the nodes (see Section 2 distance definition) in both municipalities and ages digraphs. For example, in the left digraph can be visualize an isolation of the S. Vouga municipality and in the right digraph is the 90+ age group that is isolated. transmission between different classes, the loops were eliminated from the digraph is, the entries of principal diagonal of the contact matrices have been rewritten with The digraphs corresponding to contact matrices for municipalities and age group presented in Figure 1. The network structure conforms to the distances between the (see Section 2 distance definition) in both municipalities and ages digraphs. For exa in the left digraph can be visualize an isolation of the S. Vouga municipality and right digraph is the 90+ age group that is isolated. To facilitate the results interpretation, using the same scale, we normalize the v for closeness and eigencentrality according to the min-max scaling:

100,
where is the calculated value and min and max are, respectively the minimum maximum of the calculated value concerning all municipalities or age groups. Table 3 presents centrality measures by municipalities and by year (2020 and Aveiro, district capital, has the highest values for closeness centrality, degree cent and betweenness centrality for both years which means that Aveiro is the munici with the highest transmission speed, like a transmission bridge between municip with the most connections. In terms of closeness centrality, we can also highlight Í and Águeda that have highest mean values. In terms of eigencentrality, Ílhavo and are the municipalities that present the highest mean value, which means th population of Ílhavo and Vagos becomes easily infected because they are connecte node whose population is easily infected. Therefore, we may conclude that their neig are, in general, responsible for the higher transmission. Geographically Ílhavo bo Aveiro and Vagos and Vagos borders Aveiro, Ílhavo and O. Bairro. To facilitate the results interpretation, using the same scale, we normalize the values for closeness and eigencentrality according to the min-max scaling:

Municipalities Analysis for 2020 and 2021
where x is the calculated value and min and max are, respectively the minimum and maximum of the calculated value concerning all municipalities or age groups. Table 3 presents centrality measures by municipalities and by year (2020 and 2021). Aveiro, district capital, has the highest values for closeness centrality, degree centrality, and betweenness centrality for both years which means that Aveiro is the municipality with the highest transmission speed, like a transmission bridge between municipalities with the most connections. In terms of closeness centrality, we can also highlight Ílhavo and Águeda that have highest mean values. In terms of eigencentrality, Ílhavo and Vagos are the municipalities that present the highest mean value, which means that the population of Ílhavo and Vagos becomes easily infected because they are connected to a node whose population is easily infected. Therefore, we may conclude that their neighbors are, in general, responsible for the higher transmission. Geographically Ílhavo borders Aveiro and Vagos and Vagos borders Aveiro, Ílhavo and O. Bairro. The transmission ratios for both years are shown in Figure 2, and no municipality seems to contrast with the others, being all practically similar in both years. However, A. Velha and O. Bairro, present transmission ratios with opposite signs in 2020 and 2021. It is also observed that Murtosa, O. Bairro, and Vagos are above the other municipalities in 2020. The transmission ratios for both years are shown in Figure 2, and no municipali seems to contrast with the others, being all practically similar in both years. However, Velha and O. Bairro, present transmission ratios with opposite signs in 2020 and 2021. is also observed that Murtosa, O. Bairro, and Vagos are above the other municipalities 2020.

Municipalities Analysis in 2021
When subdividing the 2021 data (Jan-Jun, Jul-Sept and Oct-Dec), it is possible better understand the dynamics of the disease throughout the year. The division of 20 attempted to illustrate the changes that particularly occur in the summer (second interv Jul-Sept).
Thus, regarding the closeness centrality (Table 4), Estarreja and Ovar were t municipalities that presented the highest transmission speed in the first interv suggesting that these municipalities are able to disseminate the virus efficiently, taking central position in the network, that is, they require few intermediates for contactin

Municipalities Analysis in 2021
When subdividing the 2021 data (Jan-Jun, Jul-Sept and Oct-Dec), it is possible to better understand the dynamics of the disease throughout the year. The division of 2021 attempted to illustrate the changes that particularly occur in the summer (second interval, Jul-Sept).
Thus, regarding the closeness centrality (Table 4), Estarreja and Ovar were the municipalities that presented the highest transmission speed in the first interval, suggesting that these municipalities are able to disseminate the virus efficiently, taking a central position in the network, that is, they require few intermediates for contacting others. In the second interval (summer and vacations period), most municipalities report numbers that warrant higher transmission. In the third interval, we can highlight Aveiro and Ílhavo since they are the municipalities with highest transmission speed and S. Vouga with the smallest one, which is a municipality with an ageing population and is geographically isolated. Table 4. Centrality Measures of the digraph related to the municipality data of Jan-Jun, Jul-Sept and Oct-Dec time intervals of 2021.

Closeness
Betweenness Degree Eigen Jan-Jun The betweenness centrality reveals that Aveiro was the most relevant municipality since it was shown to have the highest value in all the intervals, in agreement with what was already presented in the analysis of the full year. Aveiro is the municipality that most serves as a link between municipalities when it comes to the transmission of COVID-19 throughout the year, regardless of the seasons. This is also applicable to the degree centrality, since Aveiro is the municipality with the highest values for this measure throughout the time intervals. This would be expected since Aveiro is the district capital and therefore it has connectivity with almost other municipalities throughout the year.

Jul-Sept
Analogously to the previous study, the transmission ratio regarding the municipality data for Jan-Jun, Jul-Sept and Oct-Dec time intervals (Figure 3 Table 5 shows that the centrality measures for the age group data for 2020 and 2021. Globally, the behavior, along the age groups, of the centrality measures is similar. The age group 40-49 is highlighted since it presents the highest values of closeness and betweenness centrality measures. The highest value on closeness means the effectiveness of virus transmission by this age group and the highest value of betweenness means that the node 40-49 lies on the shortest path between other nodes, showing that this age group is a 'bridge' between nodes on the network.

Age Groups Analysis in 2020-2021
Considering the degree centrality measure, the values are identical, given that all age groups interact with each other.
In the eigencentrality measure, the ages between 00-09 and above 90+ were the ones that showed the highest values. Although they are not super spreader age groups, they have contact with age groups that are more responsible for the disease transmission, such as those presented in the closeness centrality measure. We may associate this fact with the extra need for attention and care required by children and elderly people.   Table 5 shows that the centrality measures for the age group data for 2020 and 2021. Globally, the behavior, along the age groups, of the centrality measures is similar. The age group 40-49 is highlighted since it presents the highest values of closeness and betweenness centrality measures. The highest value on closeness means the effectiveness of virus transmission by this age group and the highest value of betweenness means that the node 40-49 lies on the shortest path between other nodes, showing that this age group is a 'bridge' between nodes on the network.
Considering the degree centrality measure, the values are identical, given that all age groups interact with each other. In the eigencentrality measure, the ages between 00-09 and above 90+ were the ones that showed the highest values. Although they are not super spreader age groups, they have contact with age groups that are more responsible for the disease transmission, such as those presented in the closeness centrality measure. We may associate this fact with the extra need for attention and care required by children and elderly people.
The transmission ratios for both years are depicted in Figure 4. In the first year, the age groups with values greater than 1 are those between 20 and 59 years old, showing that each infected person in this group, on average, infected more than one person, and thus these ages are the most responsible for the spread of the disease. However, in the second year, the age range with the highest values reduces to 20 to 49 years old. The transmission ratios for both years are depicted in Figure 4. In the first year, th age groups with values greater than 1 are those between 20 and 59 years old, showing tha each infected person in this group, on average, infected more than one person, and thu these ages are the most responsible for the spread of the disease. However, in the secon year, the age range with the highest values reduces to 20 to 49 years old.  Table 6 presents the centrality measures for age group in 2021. In terms of closenes centrality, the Jan-Jun and Oct-Dec presents consistent values against the Jul-Sept period Note that in the Jul-Sept period the 20-29 age group has the highest value in oppositio to 40-59 in the other two periods. In terms of the betweenness centrality measure, the ag groups with higher values are the same for the first and third time intervals, but differen for the second interval, which corresponds to the summer months. This shows that durin the summer, the age group most likely to spread this disease from one age group t another changed from 40-59 to 20-29. Regarding eigencentrality, the same can be said a with the full-year assignment: older ages have high values in all intervals. Table 6. Centrality Measures of the digraph related to the Age groups Jan-Jun, Jul-Sept and Oct Dec time intervals of 2021.

Age Group Analysis in 2021
In Figure 5, we observe peaks on the transmission ratios and there is a translation of the ages that transmit the most per person and a funneling of the age range with a ratio greater than 1. The age class with the highest transmission ratio was 40-49 between Jan-Jun and between Jul-Dec was 20-29. Furthermore, the transmission ratio associated with the 70+ age classes is lower than one for the all-time periods.

90+
0 0 0 0 0 0 16 9 14 97 100 1 In Figure 5, we observe peaks on the transmission ratios and there is a translatio the ages that transmit the most per person and a funneling of the age range with a r greater than 1. The age class with the highest transmission ratio was 40-49 between Jun and between Jul-Dec was 20-29. Furthermore, the transmission ratio associated w the 70+ age classes is lower than one for the all-time periods.

Discussion
Considering the results, we can first say that, based on the contact matrices, tra missions tend to happen most often within the most similar social groups, whether th groups are based on where they live or on their age.
In relation to the ratios, the results of these showed the effectiveness of the vacc tion since, referring now to the municipalities, the values of the ratio are lower in 2 than in 2020. However, this conclusion is more direct in the transmission ratio relativ the year 2021 subdivided of the age groups since, in addition to a translation of the groups that are most transmitted to the left (younger ages), a funneling of the age gr intervals that were most dangerous is achieved. This is in line with the 2021 vaccinat which started with older people at the beginning of the year and then moved on younger people.
Regarding the measures of centrality, the results of the municipalities showed the municipality of Aveiro, as expected for being the capital of the city, is the municipa that, in addition to presenting more connections, presents itself with the highest transm sion speed as well as the one that serves as a transmission bridge between municipali Looking at the results of the subdivided year, it is proved that in the summer practic all municipalities increase their transmission speed, possibly due to the holidays. In ad tion, the results related to the data of the municipalities also showed possible reasons Ovar's 2020 prophylactic isolation, seeing that Ovar is a municipality with a high tra mission speed and it is connected to municipalities who themselves have high transm sion speeds.
As for the centrality measure results concerning the data related to the age grou with the subdivision of the year and with the arrival of summer, there was a worsen

Discussion
Considering the results, we can first say that, based on the contact matrices, transmissions tend to happen most often within the most similar social groups, whether those groups are based on where they live or on their age.
In relation to the ratios, the results of these showed the effectiveness of the vaccination since, referring now to the municipalities, the values of the ratio are lower in 2021 than in 2020. However, this conclusion is more direct in the transmission ratio relative to the year 2021 subdivided of the age groups since, in addition to a translation of the age groups that are most transmitted to the left (younger ages), a funneling of the age group intervals that were most dangerous is achieved. This is in line with the 2021 vaccination, which started with older people at the beginning of the year and then moved on to younger people.
Regarding the measures of centrality, the results of the municipalities showed that the municipality of Aveiro, as expected for being the capital of the city, is the municipality that, in addition to presenting more connections, presents itself with the highest transmission speed as well as the one that serves as a transmission bridge between municipalities. Looking at the results of the subdivided year, it is proved that in the summer practically all municipalities increase their transmission speed, possibly due to the holidays. In addition, the results related to the data of the municipalities also showed possible reasons for Ovar's 2020 prophylactic isolation, seeing that Ovar is a municipality with a high transmission speed and it is connected to municipalities who themselves have high transmission speeds.
As for the centrality measure results concerning the data related to the age groups, with the subdivision of the year and with the arrival of summer, there was a worsening of the metrics for the 20-29 age group. In fact, we observed a change in the values of the betweenness and closeness centrality measures in this age group.