Protection Strategy against an Epidemic Disease on Edge-Weighted Graphs Applied to a COVID-19 Case

Simple Summary Infectious diseases have been part of human history. Countless epidemics have produced high mortality rates in vulnerable populations. With the understanding of the spread of these types of diseases, population groups have been able to adapt and better cope with infections. Given the COVID-19 pandemic, one of the strategies used is the modeling of infectious diseases with the aim of establishing protection measures for people and stopping the spread of the epidemic. Our study evaluates protection strategies through infectious disease modeling with COVID-19 data in a commune in Chile. The results of the simulations indicate that the model generates important protection for the population by recognizing the super-propagating people (bridge nodes). This type of protection can be key in the fight against COVID-19. Abstract Among the diverse and important applications that networks currently have is the modeling of infectious diseases. Immunization, or the process of protecting nodes in the network, plays a key role in stopping diseases from spreading. Hence the importance of having tools or strategies that allow the solving of this challenge. In this paper, we evaluate the effectiveness of the DIL-Wα ranking in immunizing nodes in an edge-weighted network with 3866 nodes and 6,841,470 edges. The network is obtained from a real database and the spread of COVID-19 was modeled with the classic SIR model. We apply the protection to the network, according to the importance ranking list produced by DIL-Wα, considering different protection budgets. Furthermore, we consider three different values for α; in this way, we compare how the protection performs according to the value of α.


Introduction
Infectious diseases have been the focus of multiple fields of research. In public health and epidemiology, efforts are directed at establishing transmission dynamics, the characteristics of infectious agents, and the populations most affected by pathogens, among others, which are of high importance for science [1]. In recent decades, research on infectious diseases has involved the application of complex theories from mathematics and engineering. In particular, the use of network models has allowed explanations of the spread of diseases from infected people (nodes) and their links with others (edges) [2].
Network models establish the connection between population groups, which is useful not only in the field of public health or epidemiology, but also in engineering and social sciences [3] (see [4][5][6][7][8]). In the health field, being a theoretical approach, the importance of recognizing the complexities of community structures has been discussed in order to understand social dynamics in the spread of infectious diseases [9]. For example, performed on G E ; the initial population contains one infected node and all the simulations, considering δ = 1 15 (recovered rate).
This paper is organized as follows: Section 2 contains generalities about graph theory, includes a graph from a database, and the DIL-W α ranking is explained. In Section 3, we obtain the graph from a real database from a city in Chile (Olmué-City), and we set the protection strategy. In Section 4, the results of the study are presented. Section 5 provides a discussion of the results and potentialities of the method used. Finally, Section 6 provides the conclusions.

Basic Definitions
In this section, we establish the definitions and elements used throughout this paper. We summarize the symbols and notations in Table 1.

Notations Definition and Description
G Graph or network.

(G, w)
Edge-weighted graph. v i Vertex or node.
Edge between vertex v i and vertex v j . w ij Weight of the edge e ij .
Real number. Tuning parameter. C wα D (v i ) Degree centrality of v i ∈ V of an edge-weighted graph (G, w).

DIL-W α
Ranking based on Degree and importance of line. I α (e ij ) Importance of edge e ij . W α (e ij ) Contribution that v i makes to the importance of the edge e ij .
Variable of a database. p k Weight of the variable X k . k Protection budget (the number of nodes in graph G that can be protected). σ Ratio of surviving nodes.

Definition 1.
A graph G is a finite nonempty set V of objects called vertices, together with a possibly empty set E of 2-element subsets of V called edges.
To indicate that a graph G has vertex set V and edge set E, we write G = (V, E). If the set of vertices is V = {v 1 , v 2 , . . . , v n }, then the edge between vertex v i and vertex v j is denoted by e ij .
If e ij is an edge of G, then v i and v j are adjacent vertices. Two adjacent vertices are referred to as neighbors of each other. The set of neighbors of a vertex v is called the open neighborhood of v (or simply the neighborhood of v) and is denoted by N(v). If e ij and e jk are distinct edges in G, then e ij and e jk are adjacent edges. Definition 2. The number of vertices in a graph G is the order of G and the number of edges is the size of G. Definition 3. The degree of a vertex v in a graph G, denoted by deg(v), is the number of vertices in G that are adjacent to v. Thus, the degree of v is the number of vertices in its neighborhood N(v).

Definition 4.
Let G be a graph of order n, where V(G) = {v 1 , v 2 , . . . , v n }. The adjacency matrix of G is the n × n zero-one matrix On the other hand, an important generalization of the simple graph consists of the definition of a weighted graph, more specifically an edge-weighted graph. Informally, an edge-weighted graph is a graph whose edges have been assigned a weight.
Definition 5. An edge-weighted graph is a pair (G, W), where G = (V, E) is a graph and W : E → R is a weight function. If e ij ∈ E then W(e ij ) = w ij . Definition 6. The strength of a vertex v i , denoted by S(v i ), is defined as the sum of the weights of all edges incident to it, this is to say, The following definition comes from [32].
The parameter α is called the tuning parameter. Notice that, when α = 0, then C wα D (v i ) = deg(v i ) and, when α = 1, then C wα D (v i ) = S(v i ).

DIL-W α Ranking
We briefly describe the DIL-W α ranking in this Section. The DIL ranking is a tool for evaluating the node importance based on degree and the importance of lines (DIL) proposed by Liu et al. in [33] for an undirected and unweighted network. Recently, Manríquez et al. in [14] propose DIL-W α rank. This ranking method of node importance for undirected and edge-weighted is a generalization of the measure of line importance (DIL) based on the centrality degree (Definition 7) proposed by Opsahl in [32].
The following comes from [14]. Let us consider an undirected weighted graph (G, w) with G = (V, E) and V = {v 1 , v 2 , . . . , v n }. Definition 8 (Importance edge [14]). The importance of an edge e ij ∈ E, denoted by I α (e ij ), is defined as where, for k ∈ {i, j}, p α k = (p + 1) (1−α) · t α k with p being the number of triangles, one edge of the triangle is e ij , t α k is the weight of the sum of the edges incident to v k that form a triangle with e ij and In order to illustrate the above Definition, let us consider the edge-weighted graph in Figure 1. Moreover, we consider the edges e 78 and e 75 . Notice that they both have the same weight (three). For this example, we set α = 1. Applying Definition 7, we get: From Definition 8: Therefore, In the same way with the edge e 78 , we obtain I 1 (e 78 ) = 96.
In conclusion, edge e 68 is more important than edge e 75 . The latter is reasonable because the edge e 78 is a bridging edge of the graph. Definition 9 (Contribution [14]). The contribution that v i ∈ V makes to the importance of the edge e ij , denoted by W α (e ij ), is defined as where w ij is the weight of e ij .
We have calculated the importance of the edge e 78 of the graph in Figure 1. The contribution that v 7 makes to it is given by Definition 9: In the same way, the contribution that v 8 makes to I 1 (e 78 ) is: The above means that the node v 7 contributes more to the edge e 78 than node v 8 .
Remark 1. From the definition of Degree centrality (Definition 7) proposed by Opsahl in [32], we can see that, when the tuning parameter α is 0, the Definitions 8-10 are the same than the proposed by Liu et al.
In order to illustrate the above Definition, we compute the importance of v 7 and v 8 in the graph of Figure 1.
, then node v 7 is more important than node v 8 (according to DIL-W 1 ranking).

Graph from a Database
The authors of [13] provide a way to obtain an edge-weighted graph from a database, which we briefly detail.
Let V = {v 1 , v 2 , . . . , v N } be a set of people registered in a database, denoted by E , with K different variables, denoted by X k . These variables are separated into two categories: the characteristic variables (CHAR) and the relationship variables (REL) (which are those that allow us to assume that some person meets another). Let us denote by K 1 the number of relationship variables and EPI(i, k) the response of the person v i to the variable X k . Definition 11. We will say that a person v i is related to a person v j if and only if there exists X k ∈ REL for k ∈ {1, 2, . . . , K 1 } such that EPI(i, k) = EPI(j, k) and i = j.
To define the weight of each link between two persons, we assume that each X ∈ REL has an associated inherent weight, this is to say, it is possible to discriminate some hierarchical order between the variables. Let p k be the weight associated to the variable X k ∈ REL for k = 1, . . . , K 1 .

Definition 12.
We will say that for X j , X t ∈ REL, X j is related to X t , denoted by X j RX t , if and only if p j = p t . Definition 13. Let A 1 , A 2 , . . . , A c be the different classes that are defined by the different weights p 1 , p 2 , . . . , p c and α 1 , α 2 , . . . , α c and its respective cardinalities. Hence, for all j ∈ {1, 2, . . . , c}.
We denoted by h i,j the number of times that one person is related to another (or the number of variables that matches between them).

Definition 14.
Let v i , v j ∈ V be such that v i is related to v j and p k r is the weight of the variable in which v i and v j match, for r = 1, . . . , h i,j . We will say that is the weight of the link between v i and v j .
Finally, the weighted adjacency matrix, which defines the graph obtained from the database, is the n × n matrix Example 1. In the following example, Table 2 simulates a database with 20 registered people. The data hosted correspond to the city in which they live (City), the workplace (considering school and university as a workplace), gender (Gen.), age, extracurricular activity (EC activity), address, whether they drink alcohol (Drin.), whether they are smokers (Sm.) and marital status (MS). Let us consider A and B as two different cities, and x, y, z, w, u, v, r, s, q, t, p, k, d, g and h as different people's addresses. Moreover, in the table, Y = Yes, N = No, IC = in couple, M = married, S = single, W = widower. From Table 2, we have that EPI = {X 1 , X 2 , X 3 , X 4 , X 5 , X 6 , X 7 , X 8 , X 9 }, where X 1 = City, X 2 = Workplace, X 3 = E.P. activity, X 4 = Address, X 5 = Sm., X 6 = Dri., X 7 = Gen., X 8 = M.S. and X 9 = Age. Then, we obtain the sets: 1. REL = {X 1 , X 2 , X 3 , X 4 } and 2.
CH AR = {X 5 , X 6 , X 7 , X 8 , X 9 }. In our criteria, the hierarchical order of the variables X 1 , X 2 , X 3 , X 4 in descending form is X 4 , X 2 , X 3 , and X 1 . Moreover, we consider that the variables X 4 and X 2 have the same weight. Hence, A 1 = {X 2 , X 4 }, A 2 = {X 3 }, and A 3 = {X 1 } are the different classes that are defined by the different weights. Hence, by Definition 13 To construct the graph, we must resort to Definition 11. For instance, person 17 is related to all the people who live in city A or who work at Workplace 8 or who have music as an extra curricular activity or whose address is k. With respect to the weights of the edges, Equation (6) in Definition 14 gives us the answer. For instance, person 6 matches person 11 in the answers of the variables X 1 and X 2 , this is to say, both people live in city A and have the same workplace. Then, the edge v 6 v 11 has weight w 6 11 = 0.5 + 0.25 = 0.75. Figure 2 shows the obtained graph.

Method
The data that are modeled correspond to the city of Olmué (Valparaíso region, Chile) and were obtained from the database of the Epidemiological Surveillance System of the Ministry of Health of Chile, which included the notified cases (positive or negative) and their contacts from 3 March 2020 to 15 January 2021 with a total of 3866 registered persons.
We denote by E pi the database of the Epidemiological Surveillance System of the Ministry of Health of Chile. From the total of variables included in E pi (K = 279) 7 of them are relationship variables (K 1 = 7). They are: full address (X 1 ); the street where the people live (X 2 ); town (X 3 ); place of work (X 4 ); workplace section (X 5 ); health facility where they were treated (X 6 ) and the region of the country where the test was taken to confirm, or not, the contagion (X 7 ).
In our criteria, the hierarchical order of the seven variables in descending form is X 1 , X 2 , X 3 , X 4 , X 5 , X 6 , X 7 . Moreover, we consider that the variables X 1 , X 2 and X 3 have the same weight. In the same way, we also consider the variables X 4 and X 5 with equal weight. Hence, A 1 = {X 1 , X 2 , X 3 }, A 2 = {X 4 , X 5 }, A 3 = {X 6 } and A 4 = {X 7 } are the different classes that are defined by the different weights. Hence, by Definition 13. Figure 3 shows the obtained graph. Let us denote by G E the graph obtained from database E pi .

Strategy Protection
In this section, we provide definitions of the protection of a graph when disease spreads on it. Moreover, we state the protection strategy used in the graph G E obtained in the previous Section. The following definitions come from [28].
Definition 15. Protecting a vertex means removing all of its corresponding edges. (See Figure 4).
It is also possible to find in the literature that protecting a vertex means removing the vertex from the graph. See, for instance, [20]. Definition 16. The number of vertices that are allowed to protect is called the protection budget, denoted by k. Definition 17. We will say that the survival rate, denoted by σ, is the ratio of vertices that remain uninfected at the end of the disease over the total numbers of vertices.
Therefore, our problem is: given a graph G = (E, V), SIR model, and a protection budget k, the goal is to find a set of vertices S ⊆ V, such that with |S| = k. However, the problem (4) is NP-Hard (see [34]).
Our chosen protection strategy corresponds to the DIL-W α ranking (see [14]). It is well known that an index to measure the connection of a graph is the efficiency of the networks (see [35]). High connectivity of the graph indicates high efficiency. In [14], the authors show that the DIL-W α ranking provides good results regarding the rate of decline in network efficiency (for more detail see [36]), when it comes to eliminating the best positioned nodes by this ranking. One of the good qualities of the DIL-W α ranking is that it recognizes the importance of bridge nodes (see more in [37]). This quality is inherited from the version of the DIL ranking for graphs not weighted at the edges (see [33]). Furthermore, [28] evaluated the effectiveness of the DIL-W α ranking in the immunization of nodes that are attacked by an infectious disease that spreads on an edge-weighted graph using a graph-based SIR model.
Finally, in order to illustrate in a simple way why the DIL-W α ranking has been chosen, let us consider the graph of Figure 5 with 16 edges, 15 nodes, and the respective weights on the edges. When we apply the DIL-W 1 ranking, the first 3 places are occupied by nodes 3, 5 and 1, respectively. These nodes are precise bridge nodes and, when protecting them, according to Definition 15, the graph loses connectivity (see Figure 6). If we apply the Strength ranking, the first three places are occupied by nodes 3, 1 and 4, respectively. Note that the order in which it positions the nodes and the importance it gives to node 4 makes the loss of network connectivity lower than the loss when applying DIL-W 1 (see Figure 6).  In summary, we apply the protection to the G E network, according to the importance ranking list produced by DIL-W α , considering different protection budgets. For the ranking DIL-W α , we consider three different values for α; they are 0, 0.5 and 1. In this way, we compare how the protection performs according to the value of α.

Results
In this paper, we use a graph-based SIR model in the same way as in [13,28], namely, each individual is represented by a vertex in G E . At time t, each vertex v i is in a state v t i belonging to S = {0, 1, −1}, where 0, 1 and −1 represent the three discrete states: Susceptible (S), Infected (I) and Recovered or Removed (R). We set At time t + ∆t, the vertex v i will change state according to probabilistic rules: 1.
The probability (P I (v i )) that a susceptible vertex v i is infected by one of its neighbors is given by where ρ is a purely biological factor and representative of the disease and w ij is the weight of the edge e ij .

2.
The probability (P R (v i )) that an infected vertex v i at time t will recover is given by where δ is the recovery rate.
Moreover, we assume that the disease is present for a certain period of time and that, when individuals recover, they are immune, that is, reinfection is not considered.
The initial population contains one infected node and all the simulations that consider δ = 1 15 . Five hundred simulations were performed on G E with ρ = 0.00121. Figure 7 shows the average infected curve and the real infected data in E pi . Moreover, it shows a curve fitted to the data following the SIR model; for this, we used the classic method of least squares to compare with our proposal. The graph G E was protected with different protection budgets according to the importance of the DIL-W 0 , DIL-W 0.5 and DIL-W 1 rankings. Protection is carried out in week 1, this is to say, at the beginning of the spread of the disease. Figure 8 shows the results.
We can see the survival rate in Figure 9. Figure 10 shows the relationship between the real infected (450 people) and those immunized according to our proposal.
We can see that 80% of the real infected are located in 60% of the top ranked according to DIL-W α . We think that this is a way to recognize those who will get sick; however, it is not the solution.
Another element that we have considered investigating is the time at which the protection takes place. We modified the protection in the graph as the weeks advanced. In Figure 11, we can see the different infected curves, considering the 10% protection according to the DIL-W α ranking. Figure 12 shows the relationship between the survival rate and the week in which the protection is carried out with our proposal.
The survival rate is clearly decreasing.

Discussion
The results of the present investigation are directed towards the analysis of the effectiveness of immunization using the DIL-W α ranking with real COVID-19 data from the city of Olmué-in Chile. Depending on the importance of the rankings, the immunization results were similar, despite the percentage of protection proposed in the simulations. Our method, therefore, goes in the direction of finding new optimization algorithms in network protection strategies [38].
At the level of protection, it is evident that when the percentage of initial coverage is higher, the epidemic ends with a smaller fraction of people affected by COVID-19. This event is related not only to the random increase in immunization, but also to the possibility, in the model, of recognizing the bridge nodes to increase the effectiveness of vaccination. This is consistent with other investigations that indicate that the recognition of central nodes or high-risk individuals improves the efficiency of immunization strategies in real networks, a situation that favors the protection of the network and the best use of vaccine doses [39,40]. The best use of doses is a challenge for the current scenario of vaccine shortages worldwide, mainly in poor nations [41].
On the other hand, the level of effectiveness of the DIL-W α ranking, given the percentage of protection, is established in the recognition of the bridging nodes in a regular vaccination process. The results using the α parameters of DIL-W α indicate a high survival rate. DIL-W 1 achieves better results with 70% protection and is positioned with the best survival rate, but DIL-W 0 and DIL-W 0.5 show good results. The difference between the different values of α is marginal and can be explained by the adequate representation that the DIL-W α model has and by the values of α, which do not generate excessive differences in the ranking. This is similar to the results of the research by Ophsal et al. who, through Freeman's EIES network, mention that the centrality degree (Definition 7) is relatively stable among the different α parameters [32].
In a real and regular immunization strategy situation, such as the administration of vaccines, determining the population that infects most frequently is relevant since it allows optimization of these processes. Among our findings, it stands out that 80% of the real infected in the Olmué-Chile commune were located in 60% of the top of the DIL-W α ranking. Consequently, our proposal recognizes the heterogeneity of the network, approaching the reality of human interactions and achieving similar results in complex homogeneous networks [40].
Regarding immunization with 10% protection, a decrease in the survival rate is established by 4% from weeks 5 to 45 of protection. Likewise, with the same percentage of protection, the effectiveness of the immunity strategy tends to be important until week 20.
After 20 weeks, the fraction of infected is similar with or without protection. Consequently, our model is strongly effective as a measure of rapid recognition of the epidemic outbreak in a given territory.
Therefore, according to the findings of our research, there are two important variables for the success and effectiveness of immunity strategies against COVID-19: (1) Recognition of bridging nodes (people with the highest probability of contagion) to apply measures of protection; and (2) the development time of this strategy.
Regarding the recognition of bridging nodes, there is evidence to support that targeted immunization schemes significantly reduce epidemic outbreaks [42]. This opens the possibility of changing the traditional perspective of immunization by protecting a small proportion of the population over a long period of time [43]. It is important, therefore, not only to direct COVID-19 immunity efforts towards the population most affected by mortality, but also in those population groups that tend to infect with greater force.
The time of development of the immunization strategy continues to be a variable under discussion in the scientific community regarding the slowness worldwide of the vaccination process, which risks not achieving herd immunity [44]. In summary, both at a theoretical and empirical level, the execution time of immunization strategies is important in overcoming the COVID-19 pandemic. Finally, our model helps to establish a ranking of bridge nodes in a non-homogeneous network, so it is highly replicable with real COVID-19 dissemination data and it is useful to establish more focused strategies given the reduced number of vaccines available.

Conclusions
In this paper, we evaluate the effectiveness of the DIL-W α ranking in the immunization of nodes that are attacked by an infectious disease (COVID-19) that spreads on an edgeweighted graph obtained from the database of the Epidemiological Surveillance System of the Chilean Ministry of Health, using a graph-based SIR model.
Considering survival rates, the DIL-W 1 ranking performs better (by a small margin) than DIL-W 0.5 and DIL-W 0 rankings, subject to the protection budget being equal to 10% of the network nodes.
The period in which immunization or protection is given plays a key role in stopping the spread of the disease (see Figure 11) since around week 25 immunization does not generate a great impact and as time progresses the survival rate decreases almost linearly. An interesting and complex task to solve is to determine which value of α to choose in the network so that the ranking generated is the optimal one. The same value does not always make the performance the best. One way to explore this is to continue with the ideas proposed in [45], where the selection standard of the optimal turning parameters is proposed for the centrality degree, but is not for DIL-W α ranking. However, when considering this method, there are as many rankings as there are numbers between 0 and 1.

Data Availability Statement:
The data presented in this study were available after being requested by research project COVID-ANID to the Chilean Ministry of Health. The data are not publicly available due to legal restrictions. The source codes are accessible through the Github link: https: //github.com/RonaldManriquez/Prot-stra-aga-epi-ewgraphs.git, from 8 June 2021.

Conflicts of Interest:
The authors declare that they have no conflict of interest. The funders had no role in the design of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.