Estimating the Information Source under Decaying Diffusion Rates

Woo, Jiin; Choi, Jaeyoung

doi:10.3390/electronics8121384

Open AccessArticle

Estimating the Information Source under Decaying Diffusion Rates

by

Jiin Woo

¹ and

Jaeyoung Choi

^2,*

¹

NAVER, Green Factory, 6, Buljeong-ro, Bundang-gu, Seongnam-si, Gyeonggi-do 13561, Korea

²

Department of Automotive Engineering, Honam University, 417 Eodeung-daero, Gwangsan-gu, Gwangju 62399, Korea

^*

Author to whom correspondence should be addressed.

Electronics 2019, 8(12), 1384; https://doi.org/10.3390/electronics8121384

Submission received: 11 October 2019 / Revised: 8 November 2019 / Accepted: 12 November 2019 / Published: 21 November 2019

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Recently, as arising from online social network services such as Facebook and Twitter, people are more actively using social networks to exchange their new information. In this consideration, finding the information source becomes one of the indispensable and useful tasks in detecting a malicious agent and an influential person in the networks. A seminal work by Shah and Zaman in 2010 showed that the detection probability cannot be beyond 31% even for regular trees if time goes to infinity. From the study, extensive researches have been done for this problem, whose major interests lie in constructing an efficient estimator and providing theoretical analysis on its detection performance. However, most of the works assumed the homogeneous diffusion rate of the information, where the diffusion rate does not change at all times over the network. In practice, it is reported that information has a lifetime and it becomes less attractive over time. In this paper, we study the problem of detecting the information source when the diffusion rate decreases by the distance from the source in the network. As a result, we obtain analytical detection performance of Maximum Likelihood Estimator (MLE) and validate our theoretical findings over the regular tree, random and real-world networks.

Keywords:

information source detection; epidemic diffusion model; maximum likelihood estimator; decaying diffusion rate

1. Introduction

In the era of big data that came with the rapid development of the Internet, information spread is universal including the propagation of infectious diseases, the technology diffusion, the computer virus/spam infection on the Internet, and sharing popular topics by the Facebook. Information source estimation is the problem that identifies the initial seed node of the diffused information in the network. This is clearly of practical importance, because harmful diffusion can be mitigated or even blocked, e.g., by vaccinating humans or installing security updates. In the seminal work by Shah and Zaman [1], it is shown that in the regular tree topologies, the detection probability cannot be above 31% under Maximum Likelihood Estimator (MLE), and even worse, in other realistic topologies such as Facebook graphs, scale-free graphs and Internet Autonomous System (AS) graphs, the detection probability is less than 5% under the MLE-based heuristic due to a complex structure of networks. Since then, extensive research attention for this problem have been paid for various network topologies and diffusion models [1,2,3,4,5,6,7,8], whose major interests lie in constructing an efficient estimator and providing theoretical analysis of its detection performance. Prior work directly or indirectly conclude that this information (We use the terms “information” and “rumor” interchangeably through the paper.) source finding turns out to be a challenging task unless sufficient side information or multiple diffusion snapshots are provided. There have been several research efforts which use multiple snapshots [6] or a side information about a restricted suspect set the true source belongs [7], thereby the detection performance is significantly improved. Another type of side information is the one obtained from querying, i.e., asking questions to a subset of infected nodes and gathering more hints about who would be the true information source [9,10]. However, most of the works have focused on the homogeneous diffusion rate which means that the rate is same at all times over the network. This is impractical because it is reported [11] that information has a lifetime and it becomes less attractive over time. In other words, the diffusion power will be reduced over time or the number of distance (hops) to the source as in Figure 1. In this paper, we consider that the diffusion rate of information decays with respect to the distance from the source (We do not consider the diffusion rate decays over time due to some mathematical intractability for analyzing the performance.). Under this scenario, we try to find the source by using a proper estimator.

To the best of our knowledge, this is the first work to consider the decaying diffusion rate on the information source detection problem.

Our main contributions are summarized in what follows.

First, we use the MLE to find the source for decaying diffusion and show that the MLE is same as that of the homogeneous diffusion rate of information if the diffusion decays with respect to the distance from the source. This implies that the MLE of our model also has the same graphical centrality property called rumor center in [1]. This enables us to analyze the detection performance for the decaying rate scenario.
Second, we define two exponential decaying models: $(a)$ Simple exponential decay and $(b)$ Generalized exponential decay. The simple exponential decay is a kind of light tail distribution, but the generalized exponential decay covers light and heavy tail distribution in the sense of the decaying pattern. We then obtain the closed-form of detection probability of the MLE when the underlying graph is a regular tree for both decaying models. Different to the prior result in [1], the detection probability is larger than zero in the line graph and there is a non-neglectable improvement of detection for any degree of a regular tree.
Third, we consider the case that the decaying model parameter is hidden for two decaying models above. This is a more realistic scenario because knowing the exact parameter of the model is not easy in practice. To do that, we first derive MLE to estimate this parameter and show that it needs exponential computing time. Hence, we design a heuristic estimation algorithm for the true parameter by using the diffusion snapshot information, appropriately.
Finally, we validate our theoretical result for the regular tree using the MLE and for over popular random graphs (Erdös-Rényi, scale-free and small-world graphs) and real-world networks (US-power grid, Facebook and Wiki vote) using the heuristic Breath-First-Search (BFS) estimator. As a result, we see that the detection probability can be above 80% for the regular tree and it can be above 30% if the diffusion rate decays, whereas it is about 20% without decaying in the Facebook graph.

The remainder of this paper is organized as follows. Section 2 discusses related literature. In Section 3, we introduce the information diffusion model and estimator. The theoretical results for detection performance of decaying rate will be presented in Section 4, and the corresponding proof will be provided in Section 5. In Section 6, we depict the simulation results and conclude the paper in Section 8. The detailed proof will be presented in the Appendix A.

2. Related Work

The research on information source detection has recently received significant attention. We divide them into the following three categories:

(i)

single source estimation,

(i i)

multiple sources estimation, and

(i i i)

hiding and seeking the sources.

(i)

Single source estimation. The first theoretical approach was done by Shah and Zaman [1,2,3], which introduced the metric called rumor centrality—a simple topology-dependent metric for a given diffusion snapshot. They called a node that has maximum rumor centrality by rumor center as a MLE. They proved that the rumor centrality describes the likelihood function when the underlying network is a regular tree and the diffusion follows the Susceptible-Infected (SI) model, which is extended to a random graph network in [2]. Zhu and Ying [4] solved the source detection problem under the Susceptible-Infected-Removed (SIR) model and took a sample path approach to solve the problem, where the Jordan center was used, being extended to the case of sparse observations [5]. There were several attempts to boost up the detection probability. Wang et al. [6] showed that observing multiple different epidemic instances can significantly increase the detection probability. Dong et al. [7] assumed that there exist a restricted set of source candidates, where they showed the increased detection probability based on the Maximum a Posterior Estimator (MAPE). Choi et al. [8] showed that the anti-rumor spreading under some distance distribution of rumor and anti-rumor sources helps to find the rumor source by using the MAPE. Choi et al. [9,10] studied the effects of querying to finding the source and showed how many queries are sufficient to achieve a target detection probability. The authors in [12,13] introduced the notion of set estimation and provided the analytical results on the detection performance. Luo et al. [14] considered the problem of estimating an infection source for the SI model, in which not all infected nodes can be observed.

(i i)

Multiple sources estimation. Different from the single source estimation, multi-source estimation requires inferring the set of source nodes. Despite the difficulty of the problem, some prior studies tried to solve this problem by appropriate set estimation methods. Prakash et al. [15] proposed to employ the Minimum Description Length (MDL) principle to identify the best set of seed nodes and virus propagation ripple, which describes the infected graph most succinctly. They proposed a highly efficient algorithm to identify likely sets of seed nodes given a snapshot and show that it can optimize the virus propagation ripple in a principled way by maximizing the likelihood. Zhu et al. [16] proposed a new source localization algorithm, named Optimal-Jordan-Cover (OJC). The algorithm first extracts a subgraph using a candidate selection algorithm that selects source candidates based on the number of observed infected nodes in their neighborhoods. Then, in the extracted subgraph, OJC finds a set of nodes that cover all observed infected nodes with the minimum radius. Considering the heterogeneous SIR diffusion in the ER random graph, they proved that OJC can locate all sources with probability one asymptotically with partial observations. Ji et al. [17] developed a theoretical framework to estimate rumor sources, given an observation of the infection graph and the number of rumor sources.

(i i i)

Hiding and seeking the source. As opposed to finding the information source from a given snapshot of the epidemic, hiding the corresponding source approach also has been studied. Fanti et al. [18] first considered this problem and proposed an Adaptive Diffusion (AD) for the information spreading protocol. They showed that AD is near-optimal for hiding the source as well as maximizing the information spreading on the regular tree structures. Luo et al. [19] considered a problem that an information source tries to hide with maximizing the spread of its information, whereas the network adversary seeks the source, simultaneously. They formulated this problem by a game theoretic model and showed that a Nash Equilibrium (NE) exists under some mild condition of the game.

To the best of our knowledge, our paper is the first to quantitatively consider the decaying of diffusion power, which is more realistic scenario compared to the homogeneous diffusion rate. We obtain that the source can be found more easily when the diffusion rate decays with respect to the distance from the source for the real world graphs as well as tree structure.

3. Model and Estimator

We consider an undirected graph

G = (V, E),

where V is a countably infinite set of nodes and E is the set of edges of the form

(i, j)

for

i, j \in V

. Each node represents an individual in human social networks or a computer host in the Internet, and each edge corresponds to a social relationship between two individuals or a physical connection between two Internet hosts [9]. As an information spreading model, we consider a SI model as in [1,2,4], where each node is in either of two states: susceptible or infected. In this model, once a node has the information, it keeps it forever, i.e., it does not allow for any nodes to recover. All nodes are initialized to be susceptible except the information source, and once a node i has information, it is able to spread the information to another node j if and only if there is an edge between them. We denote a random variable

τ_{i j}

by the time it takes for node j to receive the information from node i if i has it. We denote

v^{*} \in V

by the information source, which acts as a node that initiates diffusion and denote

V_{N} \subset V

of N infected nodes under the observed snapshot

G_{N} \subset G

. In this paper, we consider the case when G is a regular tree and our interest is when the sufficient time has passed, as done in many prior work [1,2,6,7]. Even though the real network may not be the regular tree with high probability, it is shown that many random graphs can be approximated by regular tree due to locally tree-like structure [20]. Note that most of the works have focused on the case that all edge

e = (u, v)

for node pair u and v have same diffusion rate (or probability), say

λ > 0

. However, in this work, we assume that the diffusion rate of all edge

e = (u, v)

for node pair u and v is a function of distance from the information source

v^{*} \in V

. We assume that

τ_{i j}

are independent and have an exponential distribution with parameter

λ_{h}

where h is the number of hops of node i from the source. Hence, the diffusion rate only depends on the distance to the source in the graph. Further, we assume that a diffused information has a message about how many hops (h) passed from the source (Using this message, each infected node spreads the information to its neighbors under the diffusion rate

λ_{h}

).

Maximum Likelihood Estimator. As an estimator of the source, we consider a MLE for the observed graph (snapshot)

G_{N}

when there are N infected nodes in the network:

\begin{matrix} {\hat{v}}_{ml} & = arg max_{v \in G_{N}} P (G_{N} | v), \end{matrix}

(1)

i.e., the MLE is the node that maximizes the likelihood

P (G_{N} | v)

of the diffusion snapshot

G_{N}

. Instead of direct computing the likelihood, which is quite difficult due to heterogeneous diffusion rate, we consider the following proposition that guarantees a useful graphical characterization of the MLE.

Proposition 1.

Let

V_{m l}

be a set of MLEs (We consider the set of MLEs because it can be multiple nodes.) for the homogeneous diffusion rate and let

V_{m l} (h)

be the set of MLEs for (1) on the regular tree. Then we have

V_{m l} = V_{m l} (h) .

The proof of Proposition 1 will be presented in Section 5. This proposition indicates that for a given regular tree, the MLE of our decaying diffusion model is the same to that of the homogeneous diffusion rate. Consequently, we have the same graphical property of “rumor center” as described in [1], which is one of the graph centrality concepts. To see more details, we let

T_{u}^{v}

be the number of nodes in the subtree rooted at node u, assuming v is the root of the tree

G_{N}

(see [1] for details). Then the rumor center has the following property.

Corollary 1.

Under d-regular tree G, for a given observation

G_{N}

with N infected nodes, the node v is a MLE of (1) if and only if

T_{u}^{v} \leq N / 2

for every neighbor u of v.

For a comparative purpose, we first present the detection probability of MLE for the homogeneous diffusion rate [2] as follows:

Lemma 1

([1]). Under d-regular tree G, let

P ({\hat{v}}_{ml} = v^{*})

be the detection probability of MLE then

{lim}_{t \to \infty} P ({\hat{v}}_{ml} = v^{*}) = 0

for

d = 2

and for

d \geq 3

\begin{matrix} lim_{t \to \infty} P ({\hat{v}}_{ml} = v^{*}) = 1 - d (1 - I_{1 / 2} (\frac{1}{d - 2}, \frac{d - 1}{d - 2})), \end{matrix}

(2)

where

I_{x} (α, β)

is the incomplete Beta function (The incomplete Beta function

I_{x} (α, β)

is the probability that a Beta random variable with parameters α and β is less than

x \in [0, 1]

, whose form is

I_{x} (α, β) = \frac{Γ (α + β)}{Γ (α) Γ (β)} \int_{0}^{x} t^{α - 1} {(1 - t)}^{β - 1} d t

where

Γ (\cdot)

is the standard Gamma function [1].) with parameters α and β.

Using Lemma 1, one can check that the detection probability for MLE under the homogeneous diffusion rate is at most

0.307

in the asymptotic case for d-regular trees. In the following section, we define our interesting decaying models and obtain the detection probabilities under these models.

4. Main Results

In this section, we first obtain the asymptotic detection probability (when

t \to \infty

) for two decaying models with known model parameters. Next, we consider a case that the model parameter is hidden (unknown) so that we need to estimate it. To do that, we suggest a simple and efficient parameter estimation algorithm.

4.1. Probability of Correct Detection of MLE

As a first decaying model, we define a simple decay function as follows. To do this, we introduce a decaying parameter

p > 1

, which indicates how much the diffusion rate will be decay by the number of hops to the source. Using this, we have the following definition.

Definition 1

(Simple Exponential Decay). Let

λ_{0}

be the initial diffusion rate from the information source

v^{*}

. We call the rate function

λ_{h} (p) = \frac{λ_{0}}{p^{h}}

by simple exponential decay w.r.t. the source with the decaying parameter

p > 1

.

Using this definition, we first obtain the detection probability of the MLE when the underlying graph is a line. To do that, we let

P ({\hat{v}}_{ml} (p) = v^{*})

be the detection probability of the information source by the MLE for the simple exponential decay. Then we have the following result.

Theorem 1.

For the line graph with the decaying parameter

p > 1

, the detection probability of the information source by the MLE when t goes to infinity is given by

\begin{matrix} lim_{t \to \infty} P ({\hat{v}}_{ml} (p) = v^{*}) = \frac{1}{M (p)} (1 + \frac{1 + p}{2 p}), \end{matrix}

(3)

where

M (p) : = 2 \sum_{k = 1}^{\infty} p^{\frac{- k (k + 1)}{2}} + 2 .

Furthermore, if

p \leq 1,

then

{lim}_{t \to \infty} P ({\hat{v}}_{ml} (p) = v^{*}) = 0 .

In this result, we check that if

p = 2

, the detection probability is larger than

7 / 16

by using an integral approximation of

M (p)

. Further, we see that if p increases the detection probability also increases. This is a significant enhancement of detection compared to that of the homogeneous diffusion rate where the detection probability is zero for the line graph [2].

Next, we will obtain the source detection probability for d-regular tree (

d \geq 3

) as follows.

Theorem 2.

Under d-regular tree G

(d \geq 3)

, if

p \geq d - 1

, the MLE can detect the source with probability one as t goes to infinity, i.e.,

\begin{matrix} lim_{t \to \infty} P ({\hat{v}}_{ml} (p) = v^{*}) = 1 . \end{matrix}

(4)

This result indicates that for the d-regular tree (

d \geq 3)

, if the decaying rate of the information is larger than

d - 1

, the MLE can detect the source almost surely even though the time goes to infinity. This is because the tendency of great decaying makes the snapshot of diffusion more dense with respect to the source so that the MLE can find it easily (Since the MLE is a rumor center under the decaying model, the estimator has the largest centrality of the graph.). Next, we introduce a more general exponential decaying model where the decay level is parameterized by r from the simple exponential decaying to homogeneous rate by one parameter in what follows:

Definition 2

(Generalized Exponential Decay). Let

λ_{0}

be the initial diffusion rate of the information source

v^{*}

. We call the rate function

λ_{h} (p, r)

by generalized exponential decay with respect to the source if

\begin{matrix} λ_{h} (p, r) = \frac{λ_{0}}{p^{h}} (1 + r \sum_{l = 1}^{h} p^{h - l}), \end{matrix}

(5)

where

0 \leq r \leq p - 1

is the decay level.

We see that this form is parameterized by r. For example, if

r = 0

then it becomes the simple exponential decay as in Definition 1. If

r = p - 1

then it becomes

λ_{0}

which is the homogeneous diffusion rate. We plot the rate function

λ_{h} (p, r)

with various values of parameters

(p, r)

in Figure 2. Using this, we obtain the following result.

Theorem 3.

Under d-regular tree G

(d \geq 3)

, let

P ({\hat{v}}_{ml} (p, r) = v^{*})

be the detection probability of the information source by the MLE under the generalized exponential decay

λ_{h} (p, r)

in (5). If

p = d - 1

then we have

\begin{matrix} lim_{t \to \infty} P ({\hat{v}}_{ml} (p, r) = v^{*}) = 1 - d (1 - I_{1 / 2} (\frac{1}{r}, \frac{d - 1}{r})) . \end{matrix}

(6)

The result implies that for a given decaying level r, we have the asymptotic detection probability of the MLE as in (6). For example, if

r = 0

then

{lim}_{t \to \infty} P ({\hat{v}}_{ml} (p, r) = v^{*}) = 1

which is the result in Theorem 2 and if

r = d - 2

then the detection probability becomes the result in [1] which is the case of the homogeneous diffusion rate.

4.2. Decaying Parameter Estimation

In this subsection, we consider the scenario that the parameter p of the decaying model is hidden because it is often hard to know the exact parameter even though the model is given in practice. In this case, under the decaying model, one can estimate it by the following MLE simply using the current snapshot

G_{N} .

To formalize this, we let

σ = (v_{1} = v, v_{2}, \dots, v_{N})

be a infection sequence which generates

G_{N}

and let

Ω (v, G_{N})

be the set of these sequences when v is the source. Then we have

\begin{matrix} p_{ml} & = arg max_{p} P (G_{N} | p) \\ \overset{(a)}{=} arg max_{p} \sum_{v \in G_{N}} P (G_{N} | p, v) \\ = arg max_{p} \sum_{v \in G_{N}} (\sum_{σ \in Ω (v, G_{N})} P (σ | p, v)), \end{matrix}

(7)

where the equality

(a)

is due to the uniform prior of the source. Then, we see that computing the MLE has combinatorial complexity because there are exponentially many infection sequences in

Ω (v, G_{N})

with respect to the infected nodes N. Hence, instead of using this, we design an approximation algorithm that guarantees simple and efficient to estimate the true parameter as follows.

Algorithm. We now describe our decaying parameter estimation algorithm, named DPE(K), where K is sampling cost of the algorithm in Algorithm 1. Since we do not have any prior information for the parameter p, we set the estimation range

[p_{m i n}, p_{m a x}]

, where

p_{m i n}

and

p_{m a x}

are minimum and maximum values of p, respectively. We set

p_{m i n} = 1

(i.e., no decaying) in the algorithm. Then, the algorithm first sets

p = p_{m i n}

as described in the first line. Next, for each infected node

v \in Ω (v, G_{N})

, it calculates the rumor centrality

R (v, G_{N})

using the given snapshot information

G_{N}

as in [1] (Step 1). To approximate the term

\sum_{σ \in Ω (v, G_{N})} P (σ | p, v)

in (7), we consider that the algorithm samples K infection sequences uniformly at random (The uniform sampling indeed gives a simple approximation for the mean of infection probabilities), where K is a fixed constant

K \geq 1

and computes:

\begin{matrix} {\bar{P}}_{K} (σ | p, v) = \sum_{i = 1}^{K} P (σ_{i} | p, v) / K, \end{matrix}

(8)

where

σ_{i} (1 \leq i \leq K)

is randomly sampled infection sequence on

Ω (v, G_{N}) .

(Step 2). This is regarded as a averaged value of

P (σ_{i} | p, v)

in (7). Clearly, we know that many samples (large K) give more accuracy in general. Then, we multiply

{\bar{P}}_{K} (σ | p, v)

to the rumor centrality

R (v, G_{N})

and we put it into

\bar{R} (v, G_{N})

(Step 3). We next sum these values for all infected nodes and save it to

f (p) .

We repeat this procedure by increasing

δ > 0

to the previous value p. Finally, the algorithm compares all values of

f (p)

within the range

[p_{m i n}, p_{m a x}]

and takes the maximum p as the estimation of decaying parameter, denoted by

\hat{p}

. We see that the algorithm complexity is

O (\max {N, K})

, where N is the number of infected nodes in the graph. We will show how accurate this algorithm as varying K in the simulation section.

Algorithm 1 Decaying Parameter Estimation (DPE(K))

Input: Diffusion snapshot

G_{N}

, sampling cost K,

p_{m i n}

,

p_{m a x}

, increasing step size

δ

Output: Estimation parameter

\hat{p}

Set the initial decaying parameter

p \leftarrow p_{m i n}

;

while

p \leq p_{m a x}

do

for each

v \in G_{N}

do

Step1: Compute the rumor centrality

R (v, G_{N})

by a message passing algorithm [1];

Step2: Choose random samples

σ_{i} \in Ω (v, G_{N})

K times and compute its mean by;

\begin{matrix} {\bar{P}}_{K} (σ | p, v) = \sum_{i = 1}^{K} P (σ_{i} | p, v) / K . \end{matrix}

(9)

Step3: Set

\bar{R} (v, G_{N}) \leftarrow R (v, G_{N}) \cdot {\bar{P}}_{K} (σ | p, v)

;

end for

Set

f (p)

←

\sum_{v} \bar{R} (v, G_{N})

;

p \leftarrow p + δ

;

end while

Compute

\hat{p} = arg {max}_{p} f (p)

;

Return

\hat{p}

;

5. Proof of Results

In this section, we will provide the proofs of Propositions and Theorems. All proofs of Lemmas will be provided in Appendix A.

5.1. Proof of Proposition 1

To prove this, it is sufficient to show that under d-regular tree G, for a given observation

G_{N}

with N infected nodes, v is a MLE of (1) if and only if

| T_{u}^{v} | \leq N / 2

for every neighbor u of v from the result in [1]. Let

v \in G_{N}

be the node which satisfies this condition and let

u \in G_{N}

be a node that has

| T_{w}^{u} | > N / 2

for some neighbor node w of u. Then, we will prove

P (G_{N} | v) > P (G_{N} | u)

for all

u \in G_{N}

by using a contradiction method. Suppose there exists a node u such that

P (G_{N} | v) \leq P (G_{N} | u) .

To determine

P (σ | v)

where

σ = {v_{1} = v, v_{2}, \dots, v_{N}} \in Ω (v, G_{N})

is a infection sequence (permitted permutation of

G_{N}

), if v is the source, we let

G_{k} (σ)

be the subgraph of

G_{N}

containing nodes

{v_{1}, \dots, v_{k}}

for

1 \leq k \leq N .

Then, for every

σ \in Ω (v, G_{N})

and

σ \in Ω (u, G_{N}),

we have

\begin{matrix} P (σ | v) & = \prod_{k = 2}^{N} P (k^{t h} infected node = v_{k} | G_{k - 1} (σ), v) \\ = \prod_{k = 2}^{N} P (min_{1 \leq j \leq n_{k - 1} (σ)} τ_{j} = v_{k} | G_{k - 1} (σ), v) \\ = \prod_{k = 2}^{N} \frac{λ_{j} (σ (v))}{\sum_{j = 1}^{n_{k - 1}} λ_{j} (σ (v))} \\ \overset{(a)}{>} \prod_{k = 2}^{N} \frac{λ_{j} (σ (u))}{\sum_{j = 1}^{n_{k - 1}} λ_{j} (σ (u))} = P (σ | u), \end{matrix}

(10)

where

τ_{j}

is the exponential random variable of the jth node at the boundary of the information spread and the inequality

(a)

follows from the fact that if we let

L_{u} : = {max}_{s \in G_{N}} d (u, s)

then by the decreasing diffusion rate, we have

P (L_{u} > L_{v}) > P (L_{u} \leq L_{v})

and this makes

P (σ | v) > P (σ | u)

for any permutation

σ .

To see this more rigorously, we see that for any permutation

σ \in Ω (u, G_{N}),

there exists distinct permutation

σ \in Ω (v, G_{N})

such that

(a)

holds

\frac{λ_{j} (σ (v))}{\sum_{j = 1}^{n_{k - 1}} λ_{j} (σ (v))} > \frac{λ_{j} (σ (u))}{\sum_{j = 1}^{n_{k - 1}} λ_{j} (σ (u))},

due to the fact that

a / (a + b) > c / (b + c)

for

a > b > c .

By using the result in [1] such that

| Ω (v, G_{N}) | : = R (v, G_{N}) > R (u, G_{N}) = | Ω (u, G_{N}) |

, we have

\begin{matrix} P (G_{N} | v) & = \sum_{σ \in Ω (v, G_{N})} P (σ | v) > \sum_{σ \in Ω (u, G_{N})} P (σ | u) = P (G_{N} | u), \end{matrix}

(11)

which makes contradiction to our hypothesis. Hence, we complete the proof of Proposition 1.

5.2. Proof of Theorem 1

For the line graph, note that there are two independent diffusion processes from the source. We let

N_{i} (t) (i = 1, 2)

be those processes that indicate the total number of infections until time t from the source, respectively. Let

C_{t}^{k} : = {N_{1} (t) - N_{2} (t) = k}

be the event that the difference of number of infected nodes between

N_{1} (t)

and

N_{2} (t)

is k after time t and let

C_{t} : = {{\hat{v}}_{ml} (p) = v^{*}}

be the detection event of the MLE at time t. Then, from the Corollary 1, we have following two events for detecting the source:

(i)

The detection occurs when the MLE is uniquely defined at time t, denoted by

C_{t}^{0}

and

(i i)

The detection occurs when there are two MLEs at time t as in [1], denoted by

C_{t}^{1}

and

C_{t}^{- 1}

. Hence, the detection probability

P (C_{t})

is described by

P (C_{t}) = P (C_{t}^{0}) + \frac{1}{2} (P (C_{t}^{1}) + P (C_{t}^{- 1})) .

(12)

To obtain this, we first obtain the probability

P (C_{t}^{k})

. From the Markov property of the diffusion process, we see that

C_{t}^{k}

depends only on

C_{t - 1}^{k - 1}

or

C_{t - 1}^{k + 1}

. Then, we obtain the following lemma.

Lemma 2.

If

N_{1} (t) = m + k

and

N_{2} (t) = m

(m \geq 0, k \geq - m : i n t e g e r)

, then

\begin{matrix} P (| N_{1} (t + 1) - N_{2} (t + 1) | = k + 1) = \frac{1}{1 + p^{k}}, \end{matrix}

(13)

and

\begin{matrix} P (| N_{1} (t + 1) - N_{2} (t + 1) | = k - 1) = \frac{p^{k}}{1 + p^{k}} . \end{matrix}

(14)

Using this result, the probability

P (C_{t}^{k})

can be expressed by the following recursive form:

P (C_{t}^{k}) = \frac{1}{1 + p^{k - 1}} P (C_{t - 1}^{k - 1}) + \frac{p^{k + 1}}{1 + p^{k + 1}} P (C_{t - 1}^{k + 1}) .

(15)

Since

P (C_{t}^{k})

converges for all integer k as

t \to \infty

, (

C_{\infty}^{k} = C^{k}

) we delete the index t from now on. To compute each probability in (12), we consider the following lemma.

Lemma 3.

For any integer

k \geq 0

, we have

P (C^{k}) = \frac{1 + p^{k}}{2 p^{k (k + 1) / 2}} P (C^{0}),

(16)

where

P (C^{0}) = 2 \sum_{k = 1}^{\infty} p^{\frac{- k (k + 1)}{2}} + 2

for

p > 1

and

P (C^{0}) = 0

for

p \leq 1 .

By using this, we finally obtain

P (C_{t}^{1}),

and

P (C_{t}^{- 1}) (= P (C_{t}^{1}))

(This is because two random processes

N_{1} (t)

and

N_{2} (t)

are i.i.d.) from

P (C_{t}^{0})

. Then, from the relation of (12) and by taking limit (t goes to infinity), we have the result in Theorem 1 and this completes the proof of Theorem 1.

5.3. Proof of Theorem 2

First, we will prove for

p = d - 1

. To do that, we let

T_{i} (t)

be the set of infected nodes in the subtree rooted at

u_{i}

at time t (

u_{i}

is i-th neighbor node of the information source

v^{*}

where

1 \leq i \leq d

and we omit the superscript

v^{*}

in

T_{i}^{v^{*}} (t)

for the notational simplicity). Then we have the following lemma.

Lemma 4.

Let

S_{i} (t) = \sum_{e \in E_{i} (t)} λ_{e}

in

T_{i} (t)

. Then,

S_{i} (t) = λ_{0}

for any

1 \leq i \leq d .

This lemma implies that if we consider the decaying factor

p = d - 1

, the diffusion rate that a node is infected at any time t in

T_{i} (t)

becomes a constant

λ_{0}

.

Hence, using the Lemma 4, the probability that a node in

T_{i} (t)

is infected is always equal to

1 / d

for all

i .

Indeed, if we let

τ_{i} (t)

be a random variable of infection time for a node in

T_{i} (t)

then

P (m i n {τ_{1} (t), τ_{2} (t), \dots, τ_{d} (t)} = τ_{i} (t)) = S_{i} (t) / \sum_{i}^{d} S_{i} (t) = 1 / d .

Let

N (t)

be the total number of infections in the network until time

t .

Then, we have

\begin{matrix} P (T_{i} (t) > \frac{N (t)}{2}) = \sum_{k = \frac{N (t)}{2} + 1}^{N (t)} (\binom{N (t)}{k}) {(\frac{1}{d})}^{k} {(\frac{d - 1}{d})}^{N (t) - k} . \end{matrix}

(17)

Let

A_{i} (t) : = {\frac{T_{i} (t)}{N (t)} \leq \frac{1}{2}}

then, the detection probability is

\begin{matrix} P (C_{t}) = P (\cap_{i = 1}^{d} A_{i} (t)) & \overset{(a)}{=} 1 - d P (A_{i}^{c} (t)) \overset{(b)}{\geq} 1 - d (1 - e^{- 2 N (t) {(\frac{d - 2}{2 d})}^{2}}), \end{matrix}

where

(a)

is from the fact that all events

E_{i}

are mutually excluded and

(b)

comes from Hoeffding’s bound with the expectation

E (T_{i} (t) / N (t)) = 1 / d,

i.e.,,

\begin{matrix} P (\frac{T_{i} (t)}{N (t)} > \frac{1}{2}) = P (\frac{T_{i} (t)}{N (t)} > \frac{1}{d} + \frac{d - 2}{2 d}) \leq e^{- 2 N (t) {(\frac{d - 2}{2 d})}^{2}}, \end{matrix}

(18)

which converges to 0 as

t \to \infty

(

N (t) \to \infty

). Thus, for a fixed

d,

as

t \to \infty

,

P (C_{t})

converges to 1. Next, to see the result for

p > d - 1,

consider the following lemma.

Lemma 5.

Let

λ = [λ_{1}, λ_{2}, \dots]

be the diffusion rate vector where

λ_{i}

indicates the diffusion rate of i-hop distance node from the source and let

P (C_{t} | λ)

be the detection probability of MLE under λ. For two diffusion vectors

λ^{1}

and

λ^{2},

if

λ^{1} \geq λ^{2}

(This means that

λ_{i}^{1} \geq λ_{i}^{2}

for all

i \geq 1

.) then

{lim}_{t \to \infty} P (C_{t} | λ^{1}) \leq {lim}_{t \to \infty} P (C_{t} | λ^{2}) .

Using this, we obtain that for any rate

p \geq d - 1

, the detection probability goes to one as time goes to infinity. This completes the proof of Theorem 2.

5.4. Proof of Theorem 3

Let

p = d - 1

then, one can easily check that

λ_{h} (p, r)

in (5) satisfies the following recurrence:

\begin{matrix} (d - 1) λ_{h + 1} (p, r) = λ_{h} (p, r) + r λ_{0} . \end{matrix}

(19)

This relation means that one infection of any node in a d-regular tree increases exactly

r λ_{0}

. At initial time,

S_{1} (0) = λ_{0}

and

\sum_{i \neq 1} S_{i} (0) = (d - 1) λ_{0}

. If an infection occurs in a

T_{1}

or

G / T_{1}

, then

S_{1}

or

\sum_{i \neq 1} S_{i} (0)

increases by

r λ_{0}

. This process can be mapped into Polya’s Urn process [2] and the probability

{lim}_{t \to \infty} P (T_{i} > N (t) / 2)

becomes

1 - I_{1 / 2} (1 / r, (d - 1) / r)

. The rate

λ_{h} (p, r)

decreases as k increases with the condition that

r < d - 1

. If

r = 0

, any infection does not change the sum of rates

S_{i}

and it is same as

(d - 1)

-rate decaying model which has detection probability converging to 1 as t goes to infinity. Furthermore, if

r = d - 2

, infection rate becomes homogeneous and then the detection probability is same to the result in [2]. This completes the proof of Theorem 3.

6. Numerical and Simulation Results

In this section, we will provide simulation results for the detection probabilities over three types of graph topologies: (i) regular trees, (ii) three random graphs, and (iii) three real-world networks. We propagate the information from a randomly chosen source up to 1000 infected nodes and plot the detection probability from 500 iterations. We obtain the detection probability by varying the decaying parameter p in the generalized exponential decay, which including the simple exponential decay (

r = 0

).

6.1. Regular Trees

For the regular tree, we first plot the theoretical result (asymptotic detection probability) in the Theorem 3 numerically with respect to the decay level r of the generalized decaying model in Figure 3. We have checked that the detection probabilities are almost one for

r = 0

(i.e., simple exponential decaying) and they decrease as t increases for three cases of degrees (

d = 5, d = 10

and

d = 15

). We see that if

r = d - 2 (= 3)

for

d = 5

, it has the same detection probability to that of the homogeneous rate as our expectation. We further validate our theoretical result by simulation. We also check that the higher degree for the regular tree gives more chances to detect the source because it frequently has the best rumor centrality in the infection graph.

In Figure 4, we obtain the error of parameter estimation for p with respect to the sampling cost K for the algorithm DPE(K). We consider the case

d = 5

and we set the error between true parameter p and estimated parameter

\hat{p}

by using the distance

| p - \hat{p} |

. We plot the error for different values of p (

p = 1, 2, 3, 4

) by averaged values of 100 runnings of the algorithm. We use the step size

δ = 0.1

and

p_{m a x} = 5

. In the result, we see that the algorithm DPE(K) estimates the true parameter more accurately when p is large. This is because a large decaying parameter makes the snapshot more balanced from the source, and this enables to estimate the true parameter with less fluctuation.

In Figure 5, we obtain the detection probabilities for degree

d = 3

with respect to decaying parameter p. In this simulation, we compare two results

(i)

known parameter (true)

(i i)

unknown parameter (estimated), respectively. The known parameter means that the true decaying parameter is given a prior, and we use it in the simulation. The unknown parameter means we do not know the true parameter so that we need to estimate it using the algorithm DPE(K) first. Then we obtain the detection probability. In the result, we check that the detection probability increases as p increases and the reduction of detection performance is not much even though we use the estimated parameter using DPE(K) for

K = 500

.

6.2. Random Graphs

We consider Erdös-Rényi (ER), Scale-Free (SF) and Small World (SW) graphs. In the ER graph, we choose its parameter so that the average degree by 4 for 2000 nodes. In the SF and SW graphs, we choose the parameter so that the average ratio of edges to nodes by 1.5 for 2000 nodes. It is known that obtaining the MLE is hard for the graphs with cycles, which is ♯P-complete. Due to the reason, we first construct a diffusion tree from the Breadth-First Search (BFS) as used in [1]: Let

σ_{v}

be the infection sequence of the BFS ordering of the nodes in the given graph, then we estimate the source

v_{bfs}

that solves the following:

\begin{matrix} v_{bfs} = arg max_{v \in G_{N}} P (σ_{v} | v) R (v, T_{b} (v)), \end{matrix}

(20)

where

T_{b} (v)

is a BFS tree rooted at v and the information spreads along it. The BFS tree is a good approximation for our model because the decaying rates with the same distance to the source are same, and the nodes closer to the source have a higher diffusion rate. Then, we obtain the detection probabilities for each graph by increasing the parameter p as in Figure 6. We first simulate the case for true parameter i.e., it is given as a prior. Next, we also consider the parameter is hidden so that we need to estimate it. In this case, we replace the rumor centrality

R (v, G_{N})

for chosen node v by

R (v, T_{b} (v))

in the algorithm DPE(K). As a result, we see that the detection probabilities increase as the parameter p increases and they decrease as the parameter r increases for three random graphs. We also see that if the decay level

r = p - 1

, the detection probabilities are same as those of homogeneous diffusion rate. Further, we see that the detection is better on ER random graph than the other two networks because ER has symmetric locally tree structure, the decaying effect for large detection probability. Finally, for the unknown parameter p, we estimate the parameter using the algorithm DPE(K) with 100 iterations as in Table 1 and we check that the degradation of detection performance is not much even though we use the estimated parameter using DPE(K) if the sampling cost is sufficient.

6.3. Real World Graphs

For the Facebook network in Figure 7b, we use the data in [21] where generates an undirected graph consisting of 4039 nodes and 88,234 edges. Each edge corresponds to a social relationship (called FriendList) and the diameter is 8 hops. The US power grid network Wiki consists of 4941 nodes and 6594 edges, and the diameter is 46 hops. For the WiKi-vote network, we use the data in [22] where generates 7115 nodes and 103,689 edges, and the diameter is 7 hops. In these networks, we also use the BFS approach in (20) to estimate the source and apply the BFS tree to estimate the parameter p in DPE(K) as in Table 1 with 100 iterations. Further, we see that the detection probabilities for the three networks increase as p increases and if the decal level

r = p - 1,

they become the results of homogeneous rate in Figure 7. We also see that the detection is better on the US power grid network than the other two networks because it has a large network diameter that gives more chances to detect the source using the diffusion snapshot.

7. Discussion

Our results show that decaying of information diffusion over distance from the source has a positive effect in most cases on inferring the source of information. In particular, the results show that if only mild conditions are satisfied, the source can be found as probability one no matter how much time passes for the regular tree structure. This is due to the fact that the spreading patterns which corresponding the decaying of information spreads more evenly with respect to the center of the infection graph. This phenomenon also occurs similarly with general graphs, making it easier to find the source.

The first limitation of this result is that the diffusion model under consideration does not cover the various decaying patterns due to some technical problem for analysis. For example, we can also consider the decaying by an exponential function such as

λ_{h} (p) = c e^{- p h}

for some constant

c > 0

or by power-law function

λ_{h} (p) = λ_{0} / h^{p}

, etc. However, we find some mathematical intractability to obtain analytical detection probability. As a second limitation, we did not consider the decaying with respect to time this is also because of hardness for tracking the infection time of diffusion.

As future work, we will consider more general decaying models that can be explainable of heavy-tail decaying and light-tail decaying patterns. We hope that this direction will give more chances to find some tractability for analysis of various decaying models. Further, we will obtain some analytical results for detecting the source of the Erdös-Rényi random graph because it is the simplest random graph.

8. Conclusions

In this paper, we consider an information source finding problem when the diffusion rate of information decays with respect to the distance from the source. We first show that the MLE is same as that of the homogeneous diffusion rate of information if the diffusion decays with respect to the distance from the source. We then obtain the closed form of detection probability of the MLE when the underlying graph is a regular tree for two exponential decaying models. Different to the prior result, the detection probability is larger than zero in the line graph and there is a non-neglectable improvement of detection for any degree of a regular tree. Next, we consider the case that the decaying model parameter is hidden for two decaying models above. To do that, we design a heuristic estimation algorithm for the true parameter by using the diffusion snapshot information, appropriately. Finally, we validate our theoretical result for the regular tree using the MLE and for over popular random graphs and real-world networks using the heuristic Breath-First-Search (BFS) estimator. We obtain that the detection probability can be above 80% for the regular tree, and it can be above 30% if the diffusion rate decays, whereas it is about 20% without decaying in the real-world graphs.

Author Contributions

Conceptualization, J.C. and J.W.; methodology, J.C.; software, J.W.; validation, J.W. and J.C.; formal analysis, J.C.; investigation, J.C.; resources, J.W.; data curation, J.W.; writing—original draft preparation, J.C.; writing—review and editing, J.C.; visualization, J.W.; supervision, J.C.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No.2019R1G1A1099466) and by research fund from Honam University, 2019.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Lemmas

In this appendix, we will provide the proofs of lemmas in Theorems.

Appendix A.1. Proof of Lemma 2

Suppose that

N_{1} (t)

has rate

λ_{m + k} (p) = \frac{λ_{0}}{p^{(m + k)}}

and

N_{2} (t)

has rate

λ_{m} (p) = \frac{λ_{0}}{p^{m}}

at time t, respectively. Let

t + 1

be the embedding point of the next infection occur. Then, there are two possible events at time

t + 1

, infection occurs in

N_{1}

or

N_{2}

. By the exponential distributions, we obtain (13) by

\begin{matrix} P (infection occurs in N_{1} at t + 1) = \\ P (| N_{1} (t + 1) - N_{2} (t + 1) | = k + 1) = \frac{λ_{m + k} (p)}{λ_{m + k} (p) + λ_{m} (p)} = \frac{\frac{λ_{0}}{p^{(m + k)}}}{\frac{λ_{0}}{p^{(m + k)}} + \frac{λ_{0}}{p^{m}}} = \frac{1}{1 + p^{k}} . \end{matrix}

(A1)

One can easily obtain (14) by similar calculation and this completes the proof of Lemma 2.

Appendix A.2. Proof of Lemma 3

We will prove this by induction on k. First, consider the case

k = 0,

then we have

\begin{matrix} P (C^{0}) & = \frac{1}{1 + p^{- 1}} P (C^{- 1}) + \frac{p}{1 + p} P (C^{1}) = \frac{2 p}{1 + p} P (C^{1}), \end{matrix}

(A2)

and this implies that

P (C^{1}) = \frac{1 + p}{2 p} P (C^{0}) .

Next, If

k = 1,

we have

P (C^{1}) = \frac{1}{1 + p^{0}} P (C^{0}) + \frac{p^{2}}{1 + p^{2}} P (C^{2})

and

P (C^{2}) = \frac{1 + p^{2}}{p^{2}} (\frac{1 + p}{2 p} P (C^{0}) - \frac{1}{2} P (C^{0})) = \frac{1 + p^{2}}{2 p^{3}} P (C^{0}) .

Using Equation (12) and induction hypothesis, we obtain

\begin{matrix} P (C^{k}) = \frac{1}{2 p^{\frac{k (k - 1)}{2}}} \frac{1 + p^{k}}{p^{k}} P (C^{0}) = \frac{1 + p^{k}}{2 p^{\frac{k (k + 1)}{2}}} P (C^{0}) . \end{matrix}

(A3)

By the symmetry

P (C^{k}) = P (C^{- k})

, and the total sum

P (C^{k})

of all integer k should be one, we obtain

\begin{matrix} \sum_{k = - \infty}^{k = \infty} P (C^{k}) & = 2 \sum_{k = 1}^{k = \infty} P (C^{k}) + P (C^{0}) = (2 \sum_{k = 1}^{k = \infty} \frac{1 + p^{k}}{2 p^{\frac{k (k + 1)}{2}}} + 1) P (C^{0}) = 1 \end{matrix}

(A4)

Now let

E (p) = 2 \sum_{k = 1}^{k = \infty} \frac{1 + p^{k}}{2 p^{\frac{k (k + 1)}{2}}} + 1

and then the bound of

P (C^{0}) = 1 / E (p)

is determined by

E (p)

. Lemma 3 shows

P (C^{0} | e v e n) = 2 / E (p)

which leads to same conclusion on bound of

P (C^{0})

. If

p \leq 1

,

E (p)

goes to infinity. It means

P (C^{0})

and

P (C^{1})

goes to 0 as time goes to infinity. If

p > 1

, we have

\begin{matrix} E (p) & = 2 \sum_{k = 1}^{k = \infty} \frac{1 + p^{k}}{2 p^{\frac{k (k + 1)}{2}}} + 1 = 2 (\sum_{k = 1}^{k = \infty} \frac{1}{2 p^{\frac{k (k + 1)}{2}}} + \sum_{k = 1}^{k = \infty} \frac{p^{k}}{2 p^{\frac{k (k + 1)}{2}}}) + 1 \\ = 2 (\sum_{k = 1}^{k = \infty} \frac{1}{2 p^{\frac{k (k + 1)}{2}}} + \sum_{k = 1}^{k = \infty} \frac{1}{2 p^{\frac{k (k + 1)}{2}}} + \frac{1}{2}) + 1 \\ = 2 \sum_{k = 1}^{k = \infty} \frac{1}{p^{\frac{k (k + 1)}{2}}} + 2 . \end{matrix}

(A5)

Then,

E (p)

has upper and lower bound

2 < E (p) < \frac{2 p}{p^{2} - 1} + 2

because

0 < 2 \sum_{k = 1}^{k = \infty} \frac{1}{p^{\frac{k (k + 1)}{2}}} < \frac{2 p}{p^{2} - 1}

and this completes the proof of Lemma 3.

Appendix A.3. Proof of Lemma 4

We will prove this by induction hypothesis on

T_{i} (t)

as follows. If

T_{i} (t) = 1,

T_{i} (t) = u_{i}

where

u_{i}

have

d - 1

boundary edges and each edge has diffusion rate

\frac{λ_{0}}{d - 1}

, we have

S_{i} (t) = (d - 1) \frac{λ_{0}}{d - 1} = λ_{0} .

If

T_{i} (t) = k \geq 2

, Let v be the last infected node in

T_{i} (t)

and w be the infected node in

T_{i} (t)

which spread rumor to v. Before infections of v, w’s boundary edges having rate

λ_{d (w)} (p)

equally where

d (w)

is the distance of node w from the source. Since v is one hop further from rumor source than

w,

v’s boundary edges having rate

\frac{λ_{d (w)} (p)}{d - 1}

equally. When v is infected, one of w’s boundary edges is deleted from

E_{i} (t)

and (

d - 1

) edges of v are added to

E_{i} (t)

. Thus, the total sum of rate of

T_{i} (t)

is same as that of

(T_{i} (t) ∖ v)

. Then, by the induction hypothesis, we have that of

(T_{i} (t) ∖ v)

as

S_{i} = λ_{0}

and this completes the proof of Lemma 4.

Appendix A.4. Proof of Lemma 5

To prove this, we first denote

N (t) = N

for the notational simplicity. Then, we use a mathematical induction on N i.e., the number of infected nodes in the graph. For

N = 1,

it is trivial because there is only one infected node which is the rumor source. Hence, for any diffusion rate, we have

P (T_{i} (t) > 1 / 2 | λ^{1}) = P (T_{i} (t) > 1 / 2 | λ^{2}) = 0 .

Now, suppose this is true for

N - 1

and then define three events as

\begin{matrix} \{\begin{matrix} A_{1, s} : = {T_{i} (t) > (N - 1) / 2 | λ^{s}}, \\ A_{2, s} : = {T_{i} (t) = (N - 1) / 2 | λ^{s}}, \\ A_{3, s} : = {T_{i} (t) < (N - 1) / 2 | λ^{s}}, \end{matrix} \end{matrix}

(A6)

where

s = 1, 2

. We let

B_{s} : = {T_{i} (t + 1) > N / 2 | λ^{s}}

then by the total probability law, we have

\begin{matrix} P (B_{s}) & = \sum_{i} P (B_{s} | A_{i, s}) P (A_{i, s}) \overset{(a)}{=} \frac{1}{2} \underset{N = o d d}{\underset{︸}{(1 \cdot P (A_{1, s}) + P (B_{s} | A_{2, s}) P (A_{2, s}))}} + \frac{1}{2} \underset{N = e v e n}{\underset{︸}{P (B_{s} | A_{1, s}) P (A_{1, s})}} \\ = \frac{1}{2} \{(1 + P (B_{s} | A_{1, s})) P (A_{1, s}) + P (B_{s} | A_{2, s}) P (A_{2, s})\}, \end{matrix}

(A7)

where the equality

(a)

is from the fact that

P (N = o d d) = P (N = e v e n) = 1 / 2

,

P (B_{s} | A_{3, s}) = 0

for N is odd and

P (A_{2, s}) = P (B_{s} | A_{3, s}) = 0

for N is even, respectively. From this result, it remains four terms in (A7) to obtain the result

P (B_{1}) \geq P (B_{2}) .

First, we see that

P (A_{1, 1}) \geq P (A_{1, 2})

from the induction hypothesis. Next, we will see that

P (B_{s} | A_{1, s})

is also satisfied (This is the case that N is even). In this case, all the transition probabilities are one except the case that there are

N / 2

infected nodes in

T_{i} (t)

and

(N - 1) - N / 2

infected node in one of remained subtree. Let

β_{s}

be the transition probability from this state to

N / 2

equally distributed for those two subtrees then one can check that

β_{1} \leq β_{2}

because

\begin{matrix} β_{1} = P (min_{i} {t_{1}, \dots, t_{d}} = t_{2} | λ^{1}) \overset{(a)}{=} & \sum_{k = N / 2} \frac{λ_{σ (k - 1)}^{1}}{λ_{σ (k)}^{1} + λ_{σ (k - 1)}^{1} + \sum_{i = 3}^{d} λ_{1}^{1}} = \sum_{k = N / 2} \frac{λ_{σ (k - 1)}^{1}}{λ_{σ (k)}^{1} + λ_{σ (k - 1)}^{1} + (d - 2) λ_{1}^{1}} \\ \overset{(b)}{\leq} \sum_{k = N / 2} \frac{λ_{σ (k - 1)}^{2}}{λ_{σ (k)}^{2} + λ_{σ (k) - 1}^{2} + (d - 2) λ_{1}^{2}} = β_{2}, \end{matrix}

(A8)

where

t_{i}

be the random variable of the event that next infection occurs in the subtree

T_{i}

when there are

N - 1

infected nodes in the graph. Here,

λ_{σ (k)}^{i}

is the rate of this infection occurs in

T_{1}

when there are k nodes. The equality

(a)

is due to the exponential random variable of diffusion process and the inequality

(b)

is from the fact that

\begin{matrix} λ_{σ (k - 1)}^{1} (λ_{σ (k)}^{1} + λ_{σ (k - 1)}) \leq λ_{σ (k - 1)}^{2} (λ_{σ (k)}^{2} + λ_{σ (k - 1)}^{2}), \end{matrix}

(A9)

for all k since there are

d - 2

subtrees without any infections of

λ_{i}^{1} \geq λ_{i}^{2}

for all

i \geq 1

. Then we have

P (B_{1} | A_{1, 1}) = 1 - β_{1} \geq 1 - β_{2} = P (B_{2} | A_{1, 2}) .

Next, we consider the probability

P (B_{s} | A_{2, s}) P (A_{2, s})

(This is case that N is odd) and we first focus on

P (B_{s} | A_{2, s}) .

In this case, we have

P (B_{1} | A_{2, 1}) \geq P (B_{2} | A_{2, 2})

by the similar steps as before. Finally, we need to see that

P (A_{2, 1}) \geq P (A_{2, 2}) .

To show this is true for all

N \geq 1,

we will show it holds for all

t > 0

as follows.

\begin{matrix} P (A_{2, 1}) & = \sum_{k_{2} + \dots + k_{d} = (N - 1) / 2} P ((N - 1) / 2, k_{2}, \dots, k_{d} | λ^{1}), \forall N, \\ = P (T_{1} (t) = \sum_{i = 2}^{d} T_{i} (t) | λ^{1}) \\ \overset{(a)}{=} P (T_{1} (t) = (d - 1) T_{2} (t) | λ^{1}) \\ \overset{(b)}{\geq} P (T_{1} (t) = (d - 1) T_{2} (t) | λ^{2}) \\ = P (T_{1} (t) = \sum_{i = 2}^{d} T_{i} (t) | λ^{2}) \\ = \sum_{k_{2} + \dots + k_{d} = (N - 1) / 2} P ((N - 1) / 2, k_{2}, \dots, k_{d} | λ^{2}) \\ = P (A_{2, 2}), \end{matrix}

(A10)

where the equality

(a)

comes from the identical random process for all

T_{i} (t)

and the inequality

(b)

follows from the independence of random variables. Indeed, we see that

\begin{matrix} P (T_{1} (t) = (d - 1) T_{2} (t) | λ^{1}) & = \sum_{k = 1}^{\infty} P (T_{1} (t) = k | λ^{1}) P (T_{2} (t) = \frac{k}{d - 1} | λ^{1}) \\ \overset{(a)}{=} \sum_{m = 1}^{\infty} P (T_{1} (t) = m (d - 1) | λ^{1}) P (T_{2} (t) = m | λ^{1}) \\ \overset{(b)}{\geq} \sum_{m = 1}^{\infty} P (T_{1} (t) = m (d - 1) | λ^{2}) P (T_{2} (t) = m | λ^{2}) \\ = P (T_{1} (t) = (d - 1) T_{1} (t) | λ^{2}), \end{matrix}

(A11)

where the equality

(a)

follows from the fact that since

T_{i} (t)

(In the tree, there is a unique path of any two nodes and the distribution of diffusion time from the rumor source to any node of distance

l > 0

follows hypo-exponential with rate

(λ_{1}, \dots, λ_{l})

) takes integer values for all

t > 0,

the probability

P (T_{2} (t) = k / (d - 1) | λ^{1})

is zero when k is not a multiple of

d - 1 .

The inequality

(b)

is from the fact that

\begin{matrix} \frac{P (T_{1} (t) = m (d - 1) | λ^{1})}{P (T_{2} (t) = m | λ^{2})} \geq \frac{P (T_{1} (t) = m (d - 1) | λ^{2})}{P (T_{2} (t) = m | λ^{1})}, \end{matrix}

(A12)

since the random process

T_{1} (t)

and

T_{2} (t)

have same distribution with exponential rates

λ_{i}^{1} \geq λ_{i}^{2}

for all

i \geq 1

of each edge. (We use the fact that

\frac{b}{a} \geq \frac{d}{c}

if

b \geq d

and

a \leq c

, respectively. ) Next, we consider that

\begin{matrix} P (B_{s}) & = \frac{1}{2} \{(1 + P (B_{s} | A_{1, s})) P (A_{1, s}) + P (B_{s} | A_{2, s}) P (A_{2, s})\} \\ \geq \frac{1}{2} \{(2 - ε) P (A_{1, s}) + ε P (B_{s} | A_{2, s})\} \\ = P (A_{1, s}), a s N \to \infty, \end{matrix}

(A13)

for an arbitrary small number

ε > 0

and consider that

\begin{matrix} P (T_{i} (t) > N / 2 | λ^{s}) = \{\begin{matrix} P (T_{i} (t) > (N - 1) / 2 | λ^{s}) & if N = o d d \\ P (T_{i} (t) > (N - 1) / 2 | λ^{s}) \\ - P (T_{i} (t) = N / 2 | λ^{s}) & if N = e v e n, \end{matrix} \end{matrix}

where

s = 1, 2 .

Then, from this relation and the fact that

P (N = o d d) = P (N = e v e n) = 1 / 2,

we have

\begin{matrix} P (T_{i} (t) > N / 2 | λ^{1}) & = \frac{1}{2} (2 P (T_{i} (t) > (N - 1) / 2 | λ^{1}) - P (T_{i} (t) = N / 2 | λ^{1})) \\ = P (T_{i} (t) > (N - 1) / 2 | λ^{1}) - \frac{P (T_{i} (t) = N / 2 | λ^{1})}{2} \\ \overset{(a)}{\geq} P (T_{i} (t) > (N - 1) / 2 | λ^{2}) - \frac{P (T_{i} (t) = N / 2 | λ^{1})}{2} \\ = P (T_{i} (t) > N / 2 | λ^{2}), \end{matrix}

(A14)

where

(a)

is due to the induction hypothesis. Then, by using the fact that

{lim}_{N \to \infty} P (T_{i} (t) = N / 2 | λ^{s}) = 0

and the induction hypothesis, we have

\begin{matrix} lim_{N \to \infty} P (T_{i} (t) > N / 2 | λ^{1}) \geq lim_{N \to \infty} P (T_{i} (t) > N / 2 | λ^{2}), \end{matrix}

and this completes the proof of Lemma 5.

References

Shah, D.; Zaman, T. Detecting Sources of Computer Viruses in Networks: Theory and Experiment. In Proceedings of the ACM SIGMETRICS, New York, NY, USA, 14–18 June 2010. [Google Scholar]
Shah, D.; Zaman, T. Rumor Centrality: A Universal Source Detector. In Proceedings of the ACM SIGMETRICS, London, UK, 11–15 June 2012. [Google Scholar]
Shah, D.; Zaman, T. Rumors in a Network: Who’s the Culprit? IEEE Trans. Inf. Theory 2011, 57, 5163–5181. [Google Scholar] [CrossRef]
Zhu, K.; Ying, L. Information Source Detection in the SIR Model: A Sample Path Based Approach. In Proceedings of the IEEE Information Theory and Applications Workshop (ITA), San Diego, CA, USA, 10–15 February 2013. [Google Scholar]
Zhu, K.; Ying, L. A robust information source estimator with sparse observations. In Proceedings of the IEEE INFOCOM, Toronto, ON, Canada, 27 April–2 May 2014. [Google Scholar]
Wang, Z.; Dong, W.; Zhang, W.; Tan, C.W. Rumor source detection with multiple observations: Fundamental limits and algorithms. In Proceedings of the ACM SIGMETRICS, Austin, TX, USA, 16–20 June 2014. [Google Scholar]
Dong, W.; Zhang, W.; Tan, C.W. Rooting Out the Rumor Culprit from Suspects. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Istanbul, Turkey, 7–12 July 2013. [Google Scholar]
Choi, J.; Shin, J.; Yi, Y. Information Source Localization with Protector Diffusion in Networks. IEEE/KICS J. Commun. Netw. 2017, 21, 136–147. [Google Scholar] [CrossRef]
Choi, J.; Moon, S.; Woo, J.; Son, K.; Shin, J.; Yi, Y. Rumor Source Detection under Querying with Untruthful Answers. In Proceedings of the IEEE INFOCOM, 2017 IEEE Conference on Computer Communications, Atlanta, GA, USA, 1–4 May 2017. [Google Scholar]
Choi, J.; Yi, Y. Necessary and Sufficient Budgets in Information Source Finding with Querying: Adaptivity Gap. In Proceedings of the 2018 IEEE International Symposium on Information Theory (ISIT), Vail, CO, USA, 17–22 June 2018. [Google Scholar]
Ohsaka, N.; Yamaguchi, Y.; Kakimura, N.; Kwarabayashi, K. Maximizing Time-Decaying Influence in Social Networks. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Riva del Garda, Italy, 19–23 September 2016; pp. 132–147. [Google Scholar]
Bubeck, S.; Devroye, L.; Lugosi, G. Finding Adam in random growing trees. Random Struct. Algorithms 2017, 50, 158–172. [Google Scholar] [CrossRef]
Khim, J.; Loh, P.-O. Confidence Sets for Source of a Diffusion in Regular Trees. IEEE Trans. Netw. Sci. Eng. 2015, 4, 27–40. [Google Scholar] [CrossRef]
Luo, W.; Tay, W.P.; Leng, M. How to Identify an Infection Source With Limited Observations. IEEE J. Sel. Top. Signal Process. 2014, 8, 586–597. [Google Scholar] [CrossRef]
Prakash, B.A.; Vreeken, J.; Faloutsos, C. Efficiently Spotting the Starting Points of an Epidemic in a Large Graph. In Proceedings of the IEEE International Conference on Data Mining (ICDM), New Orleans, LA, USA, 18–21 November 2017. [Google Scholar]
Zhu, K.; Chen, Z.; Ying, L. Catch’Em All: Locating Multiple Diffusion Sources in Networks with Partial Observations. In Proceedings of the AAAI, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
Ji, F.; Tay, W.P. An Algorithmic Framework for Estimating Rumor Sources With Different Start Times. IEEE Trans. Signal Process. 2017, 65, 2517–2530. [Google Scholar] [CrossRef]
Fanti, G.; Kairouz, P.; Oh, S.; Viswanath, P. Spy vs. In Spy: Rumor Source Obfuscation. In Proceedings of the ACM SIGMETRICS, Portland, OR, USA, 15–19 June 2015. [Google Scholar]
Luo, W.; Tay, W.P.; Leng, M. Infection Sprading and Source Identification: A Hide and Seek Game. IEEE Trans. Signal Process. 2016, 64, 4228–4243. [Google Scholar] [CrossRef] [Green Version]
Fanti, G.; Viswanath, P. Anonymity Properties of the Bitcoin P2P Network. arXiv 2017, arXiv:1703.08761. [Google Scholar]
Leskovec, J.; McAuley, J. Learning to discover social circles in ego networks. In Proceedings of the Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA, 3–8 December 2012. [Google Scholar]
Leskovec, J.; Huttenlocher, D.; Kleinberg, J. Predicting Positive and Negative Links in Online Social Networks. In Proceedings of the 19th International Conference on World Wide Web, Raleigh, NC, USA, 26–30 April 2010. [Google Scholar]

Figure 1. Examples of decaying diffusion rate of information. (Initial diffusion rate p and decayed rate q: (a) over time and (b) over distance.)

Figure 2. Example of generalized exponential decay. (If

r = p - 1

, it becomes a homogeneous rate and if

r = 0

, it is a simple exponential decay.)

Figure 2. Example of generalized exponential decay. (If

r = p - 1

, it becomes a homogeneous rate and if

r = 0

, it is a simple exponential decay.)

Figure 3. Theoretical and simulation result for Theorem 3.

Figure 4. Simulation result for the estimation error

| p - \hat{p} |

with respect to the sampling cost K for the algorithm DPE(K). (

N = 1000

.)

Figure 4. Simulation result for the estimation error

| p - \hat{p} |

with respect to the sampling cost K for the algorithm DPE(K). (

N = 1000

.)

Figure 5. Detection probability by varying the parameter p for the 3-regular tree. We compare the results between true parameter p (true) and estimated parameter p (estimated) using the algorithm DPE(K). (

K = 500

.)

Figure 5. Detection probability by varying the parameter p for the 3-regular tree. We compare the results between true parameter p (true) and estimated parameter p (estimated) using the algorithm DPE(K). (

K = 500

.)

Figure 6. Detection Probabilities (true and estimated parameter) for Synthetic Graphs ((a) ER random graph, (b) Scale Free graph, and (c) Small World graph (

K = 500

).

Figure 6. Detection Probabilities (true and estimated parameter) for Synthetic Graphs ((a) ER random graph, (b) Scale Free graph, and (c) Small World graph (

K = 500

).

Figure 7. Detection Probabilities (true and estimated parameter) for Real-world Networks ((a) US power grid network, (b) Facebook Network [21], (c) Facebook network, and (d) Wiki-vote network (

K = 500

).

Figure 7. Detection Probabilities (true and estimated parameter) for Real-world Networks ((a) US power grid network, (b) Facebook Network [21], (c) Facebook network, and (d) Wiki-vote network (

K = 500

).

Table 1. Averaged value of estimations of parameter p using DPE(K) for general graphs (

K = 500

). We use the step size

δ = 0.1

and

p_{m a x} = 10

.

Table 1. Averaged value of estimations of parameter p using DPE(K) for general graphs (

K = 500

). We use the step size

δ = 0.1

and

p_{m a x} = 10

.

True p	ER	SF	SW	PG	FB	WiKi
2	1.82	2.23	2.08	1.93	1.81	1.93
3	3.26	3.72	3.35	2.83	2.84	3.11
4	3.71	3.83	4.23	4.32	3.81	4.13
5	4.84	4.72	4.73	5.28	4.93	4.91
6	5.84	5.82	5.88	6.28	5.83	5.91

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Woo, J.; Choi, J. Estimating the Information Source under Decaying Diffusion Rates. Electronics 2019, 8, 1384. https://doi.org/10.3390/electronics8121384

AMA Style

Woo J, Choi J. Estimating the Information Source under Decaying Diffusion Rates. Electronics. 2019; 8(12):1384. https://doi.org/10.3390/electronics8121384

Chicago/Turabian Style

Woo, Jiin, and Jaeyoung Choi. 2019. "Estimating the Information Source under Decaying Diffusion Rates" Electronics 8, no. 12: 1384. https://doi.org/10.3390/electronics8121384

APA Style

Woo, J., & Choi, J. (2019). Estimating the Information Source under Decaying Diffusion Rates. Electronics, 8(12), 1384. https://doi.org/10.3390/electronics8121384

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Estimating the Information Source under Decaying Diffusion Rates

Abstract

1. Introduction

2. Related Work

3. Model and Estimator

4. Main Results

4.1. Probability of Correct Detection of MLE

4.2. Decaying Parameter Estimation

5. Proof of Results

5.1. Proof of Proposition 1

5.2. Proof of Theorem 1

5.3. Proof of Theorem 2

5.4. Proof of Theorem 3

6. Numerical and Simulation Results

6.1. Regular Trees

6.2. Random Graphs

6.3. Real World Graphs

7. Discussion

8. Conclusions

Author Contributions

Funding

Conflicts of Interest

Appendix A. Proof of Lemmas

Appendix A.1. Proof of Lemma 2

Appendix A.2. Proof of Lemma 3

Appendix A.3. Proof of Lemma 4

Appendix A.4. Proof of Lemma 5

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI