Protection Strategy for Edge-Weighted Graphs in Disease Spread

: Fake news, viruses on computer systems or infectious diseases on communities are some of the problems that are addressed by researchers dedicated to study complex networks. The immunization process is the solution to these challenges and hence the importance of obtaining immunization strategies that control these spreads. In this paper, we evaluate the effectiveness of the DIL-W α ranking in the immunization of nodes that are attacked by an infectious disease that spreads on an edge-weighted graph using a graph-based SIR model. The experimentation was done on real and scale-free networks and the results illustrate the beneﬁts of this ranking.


Introduction
For more than 2 decades, the complex networks have been the focus of many researchers due to their multiple applications in economy, traffic problems, spread of diseases, electrical networks, biological systems and the famous social networks (see for instance [1][2][3][4][5][6][7]). Furthermore, in discrete mathematics the most important object is the graph.
Modeling the spread of disease over networks has many applications in real life. For example, rumors or spam that spread on large social networks such as Facebook or Twitter [8], viruses that spread through computer systems [9], or diseases that spread over a population such as it does the actual SARS-CoV-2 (see for instance [10]). This leads to solve the problem of totally or partially controlling these spreads.
It is well known that a set of nodes can spread disease over the entire network. The nodes that have this characteristic are called influencing nodes or spreader nodes. Consequently, determining this type of nodes is crucial for the objective of controlling spreads over networks. That is, we can prevent the spread of contagious diseases over a network by immunizing the influencing nodes. In that direction, efforts have been directed in many investigations in the last time. Among them, Pastor-Satorras and Vespignani in [11] concluded that uniform random immunization of nodes does not lead to the eradication of infections in all networks, while targeted immunization drastically reduces the vulnerability of the network to epidemic attacks. In [12] the authors provided a purely local strategy, which requires minimal information about the randomly selected nodes. Tong et al. [13] proposed NetShield: an effective immunization strategy which use the properties of matrix perturbation. A few years later (2016) its NetShield+ variant appears to balance the optimization quality and speed [14]. The authors in [15] use different features of the organization of the network to identify influential diffusers. In 2014 and 2015 Zhang and Prakash developed DAVA and DAVA-fast, two methods in which all infected nodes merge into a supernode by constructing a weighted dominator tree of the input network [16,17]. Song et al. in [18] provided NIIP, which selects k nodes to immunize over a period of time. For each time point, the NIIP algorithm will take decisions about which nodes to immunize given the estimated value of k for that time point. In [19], Gupta et al. also agree that community structure is important for understanding the spread of an epidemic. Wang et al. propose a dynamic minimization model of the influence of a rumor with the user experience (DRIMUX), considering both the global popularity and the individual attraction of the rumor [20]. GraphShield method is developed by Wijayanto and Murata in [21] taking into account the function of infection flow, graphics connectivity and top-grade centrality. Saxena et al. designed a method to identify the best ranked nodes in order to control an epidemic through immunizations [22]. Ghalmane et al. analyzed the nonoverlapping community structure network based on an immunization strategy, recognizing the relevance of the node given the characteristics of the communities [23], while the same author asserts that modular centrality must include the influences of the nodes both in their own communities and in others [24]. In [25] the authors introduced ReProtect and ReProtect-p methods, which divide the size of protection budget into several turns and protect nodes according to the currently observed temporal snapshot of dynamic networks. Tang et al. [26] proposed the weighted K-order propagation number algorithm to extract the disease propagation based on the network topology to evaluate the node importance.
In [27], the authors show that the DIL-W α ranking provides good results, regarding the rate of decline in network efficiency (more details in [28]). Furthermore, one of the good qualities of the DIL-W α ranking is that it recognizes the importance of bridge nodes, these nodes are those that connect the peripheral nodes and the peripheral groups with the rest of the network (see more details in [29]). This attribute is inherited from the version of the DIL ranking for graphs not weighted at the edges (see [30]). For this reason, we have chosen this ranking to evaluate its effectiveness in the immunization of nodes that are attacked by an infectious disease that spreads on an edge-weighted graph. The immunization is done according to the importance ranking list produced by DIL-W α ranking. The effectiveness is measured with the ratio of vertices that remain uninfected at the end of the disease over the total numbers of vertices subject to budget protection. The experimentation was done on real and scale-free networks.
The paper is organized as follows. Section 2 contains generalities about graph theory and the DIL-W α ranking. In Section 3, we provide definitions of the protection of a graph when a disease spreads on it. Moreover, we state the protection strategy. Section 4 is devoted to run simulations on test networks (Zacharys karate club network, Wild birds network, Sandy authors network and CAG-mat72 network) and scale-free networks and it also provides a discussion about the results. Moreover, we address the variation in the survival rate when the protection budget is modified. Finally, we give the conclusions in Section 5.

Definition 1.
A graph G is a finite nonempty set V of objects called vertices together with a possibly empty set E of 2-element subsets of V called edges.
To indicate that a graph G has vertex set V and edge set E, we write G = (V, E). If the set of vertices is V = {v 1 , v 2 , . . . , v n }, then the edge between vertex v i and vertex v j is denoted by e ij .
If e ij is an edge of G, then v i and v j are adjacent vertices. Two adjacent vertices are referred to as neighbors of each other. The set of neighbors of a vertex v is called the open neighborhood of v (or simply the neighborhood of v) and is denoted by N(v). If e ij and e jk are distinct edges in G, then e ij and e jk are adjacent edges. Definition 2. The number of vertices in a graph G is the order of G and the number of edges is the size of G. Definition 3. The degree of a vertex v in a graph G, denoted by deg(v), is the number of vertices in G that are adjacent to v. Thus, the degree of v is the number of vertices in its neighborhood N(v).
On the other hand, an important generalization of the simple graph consists in the definition of weighted graph, more specifically edge-weighted graph. Informally, an edgeweighted graph is a graph whose edges have been assigned a weight.

Definition 4.
An edge-weighted graph is a pair (G, w) where G = (V, E) is a graph and w : E → R is a weight function. If e ij ∈ E then w(e ij ) = w ij .

Definition 5.
The strength of a vertex v i , denoted by S(v i ), is defined as the sum of the weights of all edges incident to it, this is to say

DIL-W α Ranking
The following definition comes from [27,33]. Let us consider an undirected weighted graph (G, w) with G = (V, E) and V = {v 1 , v 2 , . . . , v n }. Definition 6 (Degree centrality [33]). The degree centrality of v i ∈ V of an edge-weighted graph (G, w), denoted by C wα The parameter α is called tuning parameter. Notice that when Definition 7 (Importance edge [27]). The importance of an edge e ij ∈ E, denoted by I α (e ij ), is defined as where, for k ∈ {i, j}, p α k = (p + 1) (1−α) · t α k with p is the number of triangles, one edge of the triangle is e ij , t α k is the weight of the sum of the edges incident to v k that form a triangle with e ij and Definition 8 (Contribution [27]). The contribution that v i ∈ V makes to the importance of the edge e ij , denoted by W α (e ij ), is defined as where w ij is the weight of e ij .
From the definition of Degree centrality (Definition 6) proposed by Opsahl in [33], we can see that when the tuning parameter α is 0 the Definitions 7, 8 and 9 of importance edge, contribution, and importance of vertex respectively, are the same than the proposed by Liu et al. in [30] for an undirected and unweighted network.

Strategy Protection
In this section, we provide definitions of the protection of a graph when a disease spreads on it. Moreover, we state the protection strategy. It is also possible to find in the literature that protecting a vertex means removing the vertex from the graph. See for instance [13]. Definition 11. The numbers of vertices allowed to protect is called protection budget, denoted by k.
It is clear that k ∈ Z + . Definition 12. We will say that the survival rate, denoted by σ, is the ratio of vertices that remain uninfected at the end of disease over the total numbers of vertices.
Therefore, our problem is: given a graph G = (E, V), SIR model, and a protection budget k, the goal is to find a set of vertices S ⊆ V such that with |S| = k. However, the problem (2) is NP-Hard (see [34]).
It is well known that an index to measure the connection of a graph is the efficiency of the networks (see [35]). High connectivity of the graph indicates high efficiency. In [27] the authors show that the DIL-W α ranking provides good results, regarding the rate of decline in network efficiency (more details in [28]), when it comes to eliminating the nodes best positioned by this ranking. One of the good qualities of the DIL-W α ranking is that it recognizes the importance of bridge nodes (see more in [29]). This quality is inherited from the version of the DIL ranking for graphs not weighted at the edges (see [30]). For this reason, we have chosen this ranking to evaluate its effectiveness in the immunization of nodes that are attacked by an infectious disease that spreads on an edge-weighted graph.
From the results in [36], we have that 5% to 10% of important nodes can cause the entire network to fail. According to the latter, our protection budget k will be 10% of the network nodes.
In summary, we protect the ten percent of the network nodes according to the importance ranking list produced by DIL-W α .

Data and Methodology
The protection tests are carried out on 4 networks, these are: Zacharys karate club network [37] (with 34 nodes and 78 edges), wild bird network [38] (with 131 nodes and 1444 edges), Sandy authors network [39] (with 86 nodes and 124 edges), and CAG-mat72 (with 72 nodes and 678 edges) where each network will have the immunization corresponding to the 10% of its highest ranked nodes according to the DIL-W α ranking [27], strength of the node [40], weighted betweenness centrality [41], weighted closeness centrality [42], and Laplacian centrality for undirected and edge-weighted graph [43]. For the ranking DIL-W α , we consider 3 different values for α, they are 0, 0.5 and 1. In this way, we compare how the protection performs according to the value of α. Finally, the databases of our test networks can be found in [44].

Simulation of Disease
In this work, we use a graph-based SIR model in the same way as in [10], that is, each individual is represented by a vertex in an edge-weighted graph. At time t, each vertex v i is in a state v t i belonging to S = {0, 1, −1}, where 0, 1 and −1 represent the three discrete states: susceptible (S), infected (I) and recovered (R). Let G be an edge-weighted graph. We set At time t + ∆t, the vertex v i will change of state according to probabilistic rules: 1.
The probability (P I (v i )) that a susceptible vertex v i is infected by one of its neighbors is given by where ρ is a purely biological factor and representative of the disease and w ij is the weight of the edge e ij . Notice that the expression (4) can be deduced from the infection model called q-influence, assuming q = ρ. (see [45,46]).

2.
The probability (P R (v i )) that an infected vertex v i at time t will recover is given by where δ is the recovery rate.
Moreover, we assume that the disease is present for a certain period of time and when individuals recover, they are immune.
The initial population contains one infected node and all the simulations considering . For each network, it was consider a different ρ, so in each case the percentage of population affected is similar. Table 1 shows the different ρ for each test network.  Figure 2 shows the results.
In Sandy authors and CAG-mat72 networks the DIL-W 0 ranking performs the best with respect to the minimum peak of the infected curve, respectively. Furthermore, DIL-W 0.5 ranking comes second in both networks. The Strength and Laplacian rankings perform the best with respect to the minimum peak of the infected curve in Zachary karate club network. The DIL-W 0.5 and DIL-W 1 rankings are in fourth and fifth place respectively. However, the computational complexity of Laplacian and Strength is O(n · max v∈V (deg(v)) and O(m) respectively, while DIL-W α has a computational complexity of O(n· < k >) (see [30] or [27]), where n is the total number of vertices of the graph, m is the total number of edges of the graph, and < k > is the average degree of vertices in the graph. In Wild bird network, DIL-W 1 performs better in the same criterion.
On the other hand, regarding the decrease of the infected curve, DIL-W α performs the best in Wild bird, Sandy authors, and CAG-mat72 networks. The Laplacian is better on the Zachary network. The above is summarized in the survival rate. In Table 2, we can see the survival rates obtained according to the protections applied to each network.

Survival Rate
In this Section, we address the variation in the survival rate (σ) when the protection budget (k) is modified. As we have seen in the above Section, k is the 10% of the network nodes. The variation of k is between 5 and 50 percent of the network nodes. On each test network and each protection budget 2000 simulations were done. Figure 3 shows the results.
We can see that, at the beginning (that is considering k = 5%), DIL-W α is always among the first 3 places with the highest survival rate in Wild bird, Zachary karate club, and Sandy authors networks. Even in the CAG-mat72 network it ranks fourth, very close to the tops.
With the 10% protection budget, DIL-W 0 , DIL-W 0.5 , and DIL-W 1 perform best with the top three spots on Wild birds and CAG-mat72 networks. It similarly happens in Sandyauthors network since DIL-W 1 has a very close rate to that generated by the betweenness ranking (0.9613 and 0.9616 respectively). In Zachary karate club network the strength ranking performs better with this protection budget. The DIL-W 0 ranking occupies the third position behind Laplacian ranking who performs second. However, DIL-W 0 has minor computational complexity than Laplacian.  Finally, with 50% of the network nodes as the protection budget, in the CAG-mat72 network DIL-W 0 and DIL-W 1 rankings are better performed. In the Wild-birds network, the betweenness and strength rankings finish with the best performance respectively. The DIL-W 1 ranking follows. In the Sandy authors network, DIL-W 0.5 performs better while in the Zachary network, betweenness and strength displace DIL-W 1 to third place. Tables A1-A4 show in detail the survival rates according to the protection budget in each test network (see Appendix A).

Scale-Free Network
The scale-free networks are networks whose distribution degree follows a power law distribution with an exponent between 2 and 3. The study of epidemics and disease dynamics on scale-free networks is a relevant theoretical issue [47], because this networks are a model for the spread of sexually transmitted diseases (see for instance [48]) or a model to explain the early spread of COVID-19 in China (see [49]). Many works that address this type of network in spread of disease can be found in the literature. See for instance [50][51][52][53].
Following the previous structure, we analyze the protection on a Scale-free network using the model proposed by Albert-László Barabási and Réka Albert in [54]. The variables size of the graph (N) and the average degree of the vertices (d) were studied with respect to the survival rate. The weights are uniformly distributed between 1 and 10. Finally, 5000 simulations on each network were done, considering ρ = 0.011, δ = 1 14 , and the initial population containing one infected vertex.
In the same direction as before, we analyze the variation of σ according to the change of k. For that, we fix d = 5 and N = 100. The results obtained are the same as those obtained in the previous Section, this is to say, the ranking DIL-W α performs the best. Indeed, with k = 5% the best is DIL-W 1 . For the following protection values, DIL-W 0 performs the best very close to Strength ranking (see Figure 4). In Appendix A, Table A5 shows in detail σ according to k. In the variation of the network size, we fix d = 5 and k = 10% of the network nodes. The DIL-W α ranking performs the best as well as the Strength ranking when α = 1 or α = 0. It is clear that the survival rate is increasing along with the size of the network, when we consider DIL-W α . Figure 5 shows σ as a function of the size of the network.
When σ is a function of the average contact, we fix N = 100 and k = 10% of the network nodes. The best performance is for the DIL-W 1 ranking up to the average of 30 contacts, that is, 30% of the total network nodes. The survival rate is decreasing because the higher the average number of contacts, the greater the probability of contagion. Figure 6 shows the results. (See Appendix A, Table A6, to see in detail σ according to d.)

Remark 1.
If the weights are uniformly distributed between and 1, with close to 0, then the results are equals to the above.
Finally, in order to illustrate in a simple way why the DIL-W α ranking has better performance in test networks and on free scale networks, let us consider a free scale network with N = 15, d = 3 and weights on the edges uniformly distributed between 1 and 10. When we apply the DIL-W 1 ranking, the first 3 places are occupied by nodes 3, 5 and 1 respectively. These nodes are precisely bridge nodes and when protecting them, according to Definition 10, the graph loses connectivity. If we apply the Strength ranking, the first 3 places are occupied by nodes 3, 1 and 4 respectively. Note that the order in which it positions the nodes and the importance it gives to node 4 makes the lost of network connectivity less than the lost when applying DIL-W 1 . (See Figure 7).

Conclusions
In this paper, we evaluate the effectiveness of the DIL-W α ranking in the immunization of nodes that are attacked by an infectious disease that spreads on an edge-weighted graph using a graph-based SIR model. This protection strategy shows better survival rates in 3 of the four networks that we have considered, conditioned to the protection budget equal to 10% of the network nodes. In the case where DIL-W α does not perform better than the other strategies, it is still a good alternative due to its lower computational complexity.
When we modify the protection budget, the DIL-W α ranking continues to show itself as one of the 3 main immunization strategies with the best survival rate. Even for more than a value of α. Again, when it is not in the top places, with respect to performance, its computational complexity makes DIL-W α a good alternative to choose over the others.
In the case of scale-free networks, the DIL-W α ranking maintains the trend of obtaining better survival rates than the other compared rankings when the protection budget is modified. The size of the network does not affect this result, because DIL-W α reaches the first place with α = 0 or α = 1. Something similar happens when modifying the average number of contacts, DIL-W 1 has the best performance up to 30% of total nodes. An interesting and complex task to solve is to determine which value of α will be chosen in certain networks so that the ranking generated would be the optimal one. Not always the same value makes the best performance. However, when considering this method, there are as many rankings as there are numbers between 0 and 1.

Conflicts of Interest:
The authors declare that they have no conflict of interest. The funders had no role in the design of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A
Survival rates (σ), according to different protection budgets in each test network.