Variance of the Infection Number of Heterogeneous Malware Spread in Network

: The Susceptible–Infected–Susceptible (SIS) model in complex networks is one of the critical models employed in the modeling of virus spread. The study of the heterogeneous SIS model with a non-homogeneous nodal infection rate in finite-size networks has attracted little attention. Investigating the statistical properties of heterogeneous SIS epidemic dynamics in finite networks is thus intriguing. In this paper, we focus on the measure of variability in the number of infected nodes for the heterogeneous SIS epidemic dynamics in finite-size bipartite graphs and star graphs. Specifically, we investigate the metastable-state variance of the number of infected nodes for the SIS epidemic process in finite-size bipartite graphs and star graphs with heterogeneous nodal infection rates. We employ an extended individual-based mean-field approximation to analyze the heterogeneous SIS epidemic process in finite-size bipartite networks and star graphs. We derive the approximation solutions of the variance of the infected number. We verify the proposed theory by simulations. The proposed theory has the potential to help us better understand the fluctuations of SIS models like epidemic dynamics with a


Introduction
The research of modeling and analysis of infectious diffusion processes in networks, e.g., epidemics, rumors and computer viruses, have attracted ample attention over the past decades [1,2].Compartmental models are used to describe this sort of diffusion process.The commonly used basic compartment models are the Susceptible-Infectious-Susceptible (SIS) model and the Susceptible-Infected-Recovery (SIR) model.Several theoretical frameworks such as Markov theory, message passing and mean-field approximation are employed to model the dynamics of these diffusion processes.For the sake of convenience, most conventional methods to model the epidemic dynamics implicitly hold the assumptions that, for example, the underlying network size is infinite, all individuals are fully mixed and the infection and recovery rates are homogeneous.The usage of these assumptions helps to reduce the complexity of modeling and analysis.However, real networks are usually limited in size and heterogeneous nodal infection rate.It is intriguing to investigate the impact of a given finite-size network topology over the SIS dynamics.
In this work, we focus on the heterogeneous SIS epidemic dynamics in networks with a given finite-size topology characterized by the adjacency matrix.Specifically, we investigate the SIS epidemic dynamics with each individual having a non-homogeneous infection rate or recovery rate.We derive the variance of the number of infected nodes for the heterogeneous SIS epidemic dynamics in a bipartite network and a start graph with a finite size.We experimentally verify the proposed theories.
The paper is organized as follows.Section 2 introduces several related works.In Section 3, we elaborate on the theoretical framework, followed by the approximation solutions to the variance of infection fractions for bipartite graphs and star graphs.The proposed theoretical formulas are derived.Also, we describe the simulation method employed in this paper.In Section 4, we first demonstrate the comparison between the theory and the simulation for the infection change versus time.Then, we verify the accuracy of the proposed theoretical formulas of the variance for the heterogeneous SIS process in bipartite graphs and star graphs.In Section 5, we analyze the accuracy of the proposed theoretical approximation formulas and provide several explanations.Finally, we conclude the paper in Section 6.

Literature Review
Much effort has been paid to the modeling and analysis of the dynamics of the homogeneous SIS process in static networks where fixed infection and recovery rates stay unchanged for each node.Pastor-Satorras et al. proposed a heterogeneous mean-field approximation (HMFA) method to characterize the impact of the heterogeneous network topology on a SIS virus spread [3].Chakrabarti et al. proposed a time-discrete model to investigate the SIS epidemic process in networks [4].Van Mieghem et al. [5] proposed a time-continuous theoretical framework named N-intertwined mean-field approximation (NIMFA) to model SIS epidemic dynamics in a static single finite-size network, where the infection rate as well as the recovery rate are fixed for each node in a given configuration.In [4,5], the metastable-state fraction of infected nodes and the epidemic threshold are derived, respectively, for the discrete-time and the continuous-time SIS processes in a finite-size network.Specifically, the claim that the epidemic threshold is upper-bounded by the largest eigenvalue of the adjacency matrix of the network topology is also proved.
The NIMFA is extended and adapted to modeling the SIS epidemic dynamics, where each node may have a different infection rate or recovery rate [6].In the framework of this heterogeneous NIMFA, several general solutions to the metastable-state average number of infected nodes and the epidemic threshold are derived.Jiao et al. studied the scenario where the nodes in a network with the topology of a bipartite complete are divided into two groups and the infection rates for nodes in a specific group are identical [7].This scenario reduces to a star graph, where outer nodes have the same infection rate that is different from the center node.Jiao et al. employed the heterogeneous NIMFA and derived the exact solution to the average steady-state number of infected nodes.
The heterogeneous dynamic process model has various applications.Wang et al. applied the heterogeneous NIMFA framework to analyze and assess interest Negative ACKnowledgments on mitigating interest flooding attacks in Named Data Networking (NDN) [8].Cui et al. studied a susceptible-infected-water-susceptible (SIWS) model which takes in to account disease diffusion in various mediums with different infection rates [9].Pagliara and Leonard studied the susceptible-infected-recovered-infected (SIRI) model which assumes that the a recovering individual would change their susceptibility to reinfection [10].The incomplete immunity to reinfection leads to a different infection rate for each node.
In comparison with the statistics of epidemic dynamics such as the metastable-state number of infected nodes, the variance in the number of infected nodes does not attract enough concern.The lifetime and the survival time of an epidemic are closely related to its variance [11].Van Mieghem and Omic investigated the variance under the framework of NIMFA with a homogeneous infection rate and curing rate [12].With the approximation that individual viral states are independent from each other, the variance as well as its upper and lower bounds in the nodal viral state and system viral state are derived.These general solutions are exemplified with complete graphs and bipartite graphs.

Theory Framework
We consider a Markov process representing the SIS process in a network with a finite size, where the topology of the network is specified by an undirected and unweighted graph G.The size N of a graph G denotes the number of nodes belonging to G. We denote by A the adjacency matrix of the topology, where an element a ij of the matrix is equal to 1 if the connection between nodal pair i and j exists or 0 if the connection does not exist.The degree d i of the node i is usually defined as d i = ∑ N j=1 a ij .Each node i could be in one of two states at a time t: the infected state, meaning that the node is capable of infecting others, or the susceptible state, meaning that the node is healthy but susceptible to infection.The infected state and the susceptible state of the node i are denoted by I or S, respectively.
The heterogeneous NIMFA is rehearsed in the following.At time t, the viral state of a node i is represented by a Bernoulli random variable X i (t) ∈ {0, 1}.The viral state of the node i could either be the infected state with probability v i (t) = Pr[X i (t) = 1], or the susceptible state with probability 1 Assuming there is a link between an infected node i and a susceptible node j, the infection process per link connecting an infected node and a susceptible node is a Poisson process with rate β i , as shown in Figure 1a.The infection process for each nodal pair of infected nodes and one of its neighbors is independent.Any infected node i will achieve recovery with rate δ i .If the node i recovers, the viral state of node i becomes X i = 0 from X i = 1.This is called the curing process for the infected node i, which is a Poisson process with rate δ i , as shown in Figure 1b.The curing process for each node is independent.The heterogeneous SIS epidemic process is composed of all curing processes and infection processes in a given finite-size network.It is useful to introduce the notation τ i , named the effective infection rate, defined as τ i = β i /δ i .At time t, the number of infected nodes and the expected fraction of infected nodes are defined as The steady state of the system composed of the SIS process in a given finite-size network actually corresponds to the metastable state [5].The steady state, denoted by We also denote by y or y ∞ , and I or I ∞ , the metastable-state fraction of infected nodes and the metastable-state number of infected nodes.The only approximation employed by the NIMFA framework leads to which means that the random variables X i and X j are assumed to be independent [5].This assumption of independence approximation further induces The exact Markov differential equation for the viral state of node i can be expressed as Substituting ( 3) into (4) leads to The Markov differential Equations ( 4) and ( 5) were proposed and derived previously in [6] on p. 2. Similarly, we could obtain the differential equations for all nodal viral states, shown as According to the properties of the Bernoulli random variable, the expectation and variance of the viral state X i are expressed as Var According to the linearity of expectation, the expectation of I(t) could be expressed as The variance of I(t) could be expressed as that was derived and shown in ( [13] p. 30).
Using the independence approximation ( 3) and ( 8), the variance (10) reduces to since the covariance Cov X i , X j is zero for any nodal pair.

Variance of Infection Fraction for Bipartite Graph
The main topologies concerned are the complete bipartite graph and the star graph.A complete bipartite graph K m,n is composed of two groups of nodes, where any nodal pair in one group does not connect and a node in one group is fully connected to all nodes in another group.As shown in Figure 2a, the nodes on the left side belong to a group of m = 2 nodes, while the nodes on the right side are of another group of n = 3 nodes.As shown in Figure 2b, a star graph of n + 1 is a tree structure composed of one special node of degree n and the other n nodes of degree 1.The special node, referred to as the center or the root, is connected to each of the other nodes.A star graph is isomorphic to a complete bipartite graph K m,n with m = 1.
(a) (b) Lemma 1.The metastable-state average number y of infected nodes for the heterogeneous SIS process in a complete bipartite network K m,n with the setting where Proof of Lemma 1.The expression for the metastable-state fraction y of the number of infected nodes was previously proved and shown as Theorem 1 in [7].Substituting N = m + n and using E[I(t)] = y • N leads to the expression for E[I(t)].
Proposition 1.The variance Var[I] of the metastable-state average number of infected nodes for the heterogeneous SIS process in a complete bipartite network K m,n with the setting can be expressed as where ω n and ω n are shown in ( 13) and ( 14) Proof of Proposition 1.The nodes in a complete bipartite graph are divided into two sets.Within each set, the nodes are identical for the configuration concerned, which was proven in [7].For the node i of the complete bipartite graph with i = 1, • • • , m, the metastable-state fraction v i of infection is identical, e.g., v i = ω m as shown in (13).Similarly, Substituting ω m in (13) and ω n in ( 14) into (11) leads to A complete bipartite graph K m,n is isomorphic to a star graph with the setting m = 1.A star graph of size 1 + n is a tree with one rooted node as the center, having degree n and the other having degree 1 as siblings connecting to the center.
Corollary 1.The variance Var[I] of the metastable-state average number of infected nodes for the heterogeneous SIS process in a star graph of n + 1 nodes with the setting where ω m and ω n are shown in ( 13) and ( 14).
Proof of Corollary 1.The star graph of the n + 1 nodes is isomorphic to the complete bipartite graph K m,n where m = 1.Substituting m = 1 into (15) and conducting some manipulations (see Appendix A for details), we can derive the corollary (16).

Simulation Method
It is straightforward to consider the SIS process in a given finite-size network as a Markov chain where each system state is denoted by a vector is the absorbing state, which means the system will stay unchanged if it enter into this state.Notable is that any other system states could achieve the absorbing state with a non-zero probability.The final steady state of the exact SIS process in the finite-size network is the absorbing state.The steady state of the NIMFA model corresponds to the metastable state of the exact SIS process in the finite-size network [5].But, it is relatively hard to theoretically define and empirically detect the metastable state.
Van Mieghem et al. proposed the ε-SIS model and employed the steady state of the ε-SIS model to approximate the metastable state in the exact SIS process [14].Through a simulation of the ε-SIS model, Li et al. obtained extensive empirical results with sound accuracy for various network topologies [15].One merit of the ε-SIS model is that there is no absorbing state and it is unnecessary to detect the metastable state.
In this work, we implement the ε-SIS model and employ it as the simulation method.A self-infection Poisson process with rate ε i is added to each node i for i = 1, • • • , N.Even if the system enters into the state [0, 0, • • • , 0], the probability that the system jumps out of the state is non-zero.In other words, there is no absorbing state in the ε-SIS model.At the beginning of a simulation instance, a portion of the nodes are selected to be infected while the others are susceptible.After a warm-up time, one can continue to run the simulation for a long time and record the change in the system state.After stopping the simulation, one can calculate any statistics of the dynamics, such as the time-average number of infected nodes, according to the records.
The simulation environment setup is described in the following section.We take the simulation time to be 10 3 time units and set the self-infection rate For a given setup (β m , β n , δ m , δ n , ε) and a graph, we start one simulation and record the number of infected nodes in the graph for each time step until the end of the simulation.Finally, we can calculate the statistical properties such as the time-average value of the steady-state average number of the infected nodes.For a given setup, we only need to run one single simulation instance to calculate all related statistical properties.We can then show the relation between one of the statistical properties and the simulation time in a figure .In this work, we also show the relation between the statistical properties as a function of infection rate.We fix the setup (β n , δ m , δ n , ε).Then, we change the setup of β m from 0 to a large enough number.For each number of β m , we calculate the corresponding statistical properties.Finally, we can show the relation between the metastable-state number and the infection rate β m in a figure.

Time Evolution
We implement a discrete-time simulator of the ε-SIS model which was elaborated in Section 3.3.We first conduct experiments on the time evolution of infection and verify the existence of the metastable state of the system.
Figure 3 demonstrates a single run of the simulation, which shows the change in the fraction number y(t) of a heterogeneous SIS process as a function of time in a complete bipartite graph K 15,25 with the model settings β m = β n = 0.2 and δ m = δ n = 1.The fraction y(t) of infected number starts at a small portion which is fixed at the very beginning of the simulation as a parameter.Then, the fraction increases rapidly into a plateau where it fluctuates between a relatively fixed range.The thick curve and the dashed thin curves denote the theoretical mean and standard deviation of the fraction of infected nodes, respectively.The theoretical mean value is calculated by (12), while the standard deviation is obtained via the square root of the variance (15), shown in Proposition 1.It appears that there exists a metastable-state mean value of the infection number which can be approximated by the steady-state average number of infected nodes calculated by the above-mentioned NIMFA model.

Variance of Infection Fraction in Bipartite Graph
We conducted numerous simulations for the heterogeneous SIS process in the bipartite graph in order to verify the proposed theory (15) proven in Proposition 1.The number of infected nodes versus time for each run of a simulation with a specific configuration is recorded as demonstrated in Figure 3.The records with various configurations are collected and then used to calculate several statistical properties such as the steady-steady mean fraction y ∞ of the infected nodes and the steady-state variance Var[I] of the fraction of the infected nodes.
Take the results shown in Figure 4 as an example for the heterogeneous SIS process in a bipartite graph.The metastable-state mean and variance are compared with the corresponding steady-state theoretical results in each diagram.The markers and curves denote the metastable-state simulation results and the steady-state theoretical results, respectively.Without the loss of generality, the fraction of infected nodes are shown as a function of the infection rate β m while the other set of parameters (β n , δ m , δ n , ε) is fixed.The variance of the number of the infected nodes corresponding to the mean infection fraction is also demonstrated with the same set of parameters.

Variance of Infection Fraction in Star Graph
We conduct numerous simulations for the heterogeneous SIS process in a star graph in order to verify the proposed theory (16) proved in Corollary 1.Take the results shown in Figure 5 as an example of a heterogeneous SIS process in a star graph.The center node is with the infection rate different to the other nodes.The comparisons of the steady-state fraction of infected nodes and the variance and the metastable-state fraction of infected nodes and the variance are demonstrated.Although the noises are relatively large in simulations for a finite-size star graph, the trend of simulation results is still apparent as shown in Figure 5.The mean value of the steady-state fraction of infected nodes is shown, where the theory is calculated by (12).(b) The variance of the steady-state fraction of infected nodes is demonstrated, where the theory is calculated by (16).

Discussion
The existence of the metastable-state of the heterogeneous SIS in bipartite graphs and star graphs are verified, which is shown in Figure 3 for a non-trivial case.The statistical mean value of the fraction y(t) of infected nodes tends to stay unchanged, although the simulation result changes as time goes on.It appears that the metastable-state mean value of the fraction of infected number for the heterogeneous SIS process corresponds to the steady-state expectation value of the proposed theory.The theoretical standard deviation as the square root of the proposed variance (15) provides a relatively accurate approximation of the statistical fluctuation as shown in Figure 3.Moreover, the deviation of the fraction of the infected number according to the simulation results is restricted to a limited range, which could be predicted by the theoretical variance to some extent.
According to the comparisons shown in Figure 4, the proposed theories are capable of approximating the real heterogeneous SIS process in bipartite graphs for the range of various infection rates.Specifically, the proposed approximations of the steady-state number of infected nodes and the variance can be used to predict the metastable-state number of the infected nodes and the variance.Although the inaccuracy of the approximation is not apparent for the mean fraction of the infected number, it is relatively large for the approximation to the variance for a small range of infection rates.It is shown that the obvious discrepancy between the theory and the simulation lies in the phase transition point, e.g., the epidemic threshold.The main reason is that the NIMFA framework employed in this work has its own limitations in approximation.The homogeneous NIMFA framework could not predict a real variance larger than 0.25N, where N is the network size [12].As the infection rate grows, the accuracy of the approximations increases dramatically.For relatively large infection rates, the discrepancy between the theory and the simulation becomes tiny enough to be omitted.
It appears that the steady-state variance of the number of infected nodes calculated by ( 16) is capable of approximating the simulation results for a relatively large range of various infection rates, as shown in Figure 5.The discrepancy of the theory and the simulation appears near the epidemic threshold.On the other hand, compared to the scenario of the bipartite graph, the theory for the star graph performs better and more accurately predicts the variance near the epidemic threshold.

Conclusions
The NIMFA framework, as a sort of individual-based mean-filed approximation, is extended to modeling the heterogeneous SIS process in bipartite graphs and star graphs.Specifically, several approximations to the variance of the metastable-state fraction of infected nodes are proposed.To the best of our knowledge, this is the first attempt to analyze the statistical fluctuations of the heterogeneous SIS process in specific graphs, which has attracted little attention.The empirical works in this paper demonstrate the effectiveness of the proposed theory to some extent.Especially, the proposed variance approximations provide better accuracy for a relatively large range of infection rates, although it does not perform near the epidemic threshold.
One limitation of the approximation lies in the discrepancy between the theory and the statistical properties of real dynamics near the phase transition point.The main reason is the NIMFA framework employed and extended in this work is a first-order approximation.In future studies, the accuracy of the approximation of the variance should be improved.Selecting more accurate mean-field approximation frameworks with relatively low computational complexity [16][17][18] helps to improve the accuracy of the expectation and the variance approximation.Moreover, much attention should be paid to the bounds on the variance.
A basic assumption that most of the related works are based on is the choice of Poisson process.The Poisson process could be analyzed theoretically in the framework of the Markov theory.However, many real contact and infection activities are non-Markovian processes in networks with temporal topologies.There is a lack of some general theoretical framework to analyze the above-mentioned non-Markovian processes [19,20].One could extend some propositions proposed in this work to these complex scenarios.and J.J.; project administration, D.G.; funding acquisition, D.G. and L.J.All authors have read and agreed to the published version of the manuscript.

Figure 1 .
Figure 1.Schematic diagram of two independent Poisson processes of a SIS epidemic dynamic between a nodal pair in a network.The independent processes are (a) the infection process describing an infected node infecting its susceptible neighboring node.(b) The curing process describing an infected node infecting recovered individuals.

Figure 2 .
Figure 2. Topology of networks concerned in this work.Two sorts of networks are chosen to investigate the heterogeneous SIS epidemic dynamics: (a) a complete bipartite graph K m,n with two disjoint node sets, m = 2 and n = 3.(b) A star graph.

Figure
Figure Time evolution of the fraction y(t) of the infected nodes of a single run of a SIS process in a complete bipartite graph K 15,25 with two disjointed node sets.The thin blue curve denotes the simulation results recorded while conducting the ε-SIS model, while the thick red straight curve denotes the theoretical steady-state mean value(12).The dashed lines denote the standard deviation obtained via the square root of the variance(15).

Figure 4 .
Figure 4. Statistical properties of the heterogeneous SIS process in a bipartite graph K 15,25 .The scattered cycles denote the simulation results, while the dashed curves denote the theory.(a)The mean value of the steady-state fraction of infected nodes is shown, where the theory is calculated by(12).(b) The variance of the steady-state fraction of infected nodes is demonstrated, where the theory is calculated by(15).

Figure 5 .
Figure 5. Statistical properties of the heterogeneous SIS process in star graph N = 40.The scattered cycles denote the simulation results, while the dashed curves denote the theory.(a) The mean value of the steady-state fraction of infected nodes is shown, where the theory is calculated by(12).(b) The variance of the steady-state fraction of infected nodes is demonstrated, where the theory is calculated by(16).