Influence Maximization with Priority in Online Social Networks

Pham, Canh V.; Ha, Dung K. T.; Vu, Quang C.; Su, Anh N.; Hoang, Huan X.

doi:10.3390/a13080183

Open AccessArticle

Influence Maximization with Priority in Online Social Networks

by

Canh V. Pham

¹,

Dung K. T. Ha

¹,

Quang C. Vu

^1,*,

Anh N. Su

¹ and

Huan X. Hoang

²

¹

Faculty of Information and Security Technology, People’s Security Academy, Hanoi 100000, Vietnam

²

Vietnam National University, Hanoi 100000, Vietnam

^*

Author to whom correspondence should be addressed.

Algorithms 2020, 13(8), 183; https://doi.org/10.3390/a13080183

Submission received: 16 June 2020 / Revised: 19 July 2020 / Accepted: 21 July 2020 / Published: 29 July 2020

(This article belongs to the Section Combinatorial Optimization, Graph, and Network Algorithms)

Download

Browse Figures

Versions Notes

Abstract

:

The Influence Maximization (

IM

) problem, which finds a set of k nodes (called seedset) in a social network to initiate the influence spread so that the number of influenced nodes after propagation process is maximized, is an important problem in information propagation and social network analysis. However, previous studies ignored the constraint of priority that led to inefficient seed collections. In some real situations, companies or organizations often prioritize influencing potential users during their influence diffusion campaigns. With a new approach to these existing works, we propose a new problem called Influence Maximization with Priority (

IMP

) which finds out a set seed of k nodes in a social network to be able to influence the largest number of nodes subject to the influence spread to a specific set of nodes U (called priority set) at least a given threshold T in this paper. We show that the problem is NP-hard under well-known

IC

model. To find the solution, we propose two efficient algorithms, called Integrated Greedy (

IG

) and Integrated Greedy Sampling (

IGS

) with provable theoretical guarantees.

IG

provides a

(1 - {(1 - \frac{1}{k})}^{t})

-approximation solution with t is an outcome of algorithm and

t \geq 1

. The worst-case approximation ratio is obtained when

t = 1

and it is equal to

1 / k

. In addition,

IGS

is an efficient randomized approximation algorithm based on sampling method that provides a

(1 - {(1 - \frac{1}{k})}^{t} - ϵ)

-approximation solution with probability at least

1 - δ

with

ϵ > 0, δ \in (0, 1)

as input parameters of the problem. We conduct extensive experiments on various real networks to compare our

IGS

algorithm to the state-of-the-art algorithms in

IM

problem. The results indicate that our algorithm provides better solutions interns of influence on the priority sets when approximately give twice to ten times higher than threshold T while running time, memory usage and the influence spread also give considerable results compared to the others.

Keywords:

social networks; influence maximization with priority; optimization; approximation algorithm

1. Introduction

Presently, Online Social Networks (OSNs) have become an important platform in communication as well as e-commerce. Companies and businesses have leveraged a rapid spread of information thanks to the “word of mouth” effect among friends in social networks as a powerful tool for viral marketing. For instance, companies can provide some ones with free samples over an OSN so that much more people may know about their products and they have more chances to sell them. Influence Maximization (

IM

) problem [1], a key problem in viral marketing, has been extensively studied for this decade due to its tremendous value in business, viral marketing and influence propagation. Basically,

IM

aims to find some nodes (called seedset) in a social network to inject opinion, innovation or influence that can effect the largest the number of nodes. Kempe et al. [1] first studied

IM

as an optimization problem combined with two well-known models, Independent Cascade (

IC

) and Linear Threshold (

LT

). Since

IM

is NP-hard, they designed a native greedy algorithm that returned an

(1 - 1 / e)

-approximation solution. The research shows that

IM

is not only a potential commercial role in viral marketing [2,3] but also a foundation of various applications in many fields such as epidemics control in social network [4,5,6,7,8], social network monitoring [9,10], recommendation system [11], etc. Hence,

IM

has been extensively studied recently [2,4,12,13,14,15,16,17,18,19].

Although

IM

has a lot of great applications in viral marketing, previous studies ignored considering the impact on priority users who could play an important role for effectiveness of viral marketing campaigns. In fact, companies often prioritize specific potential customers, who are financially competent or suitable for their products. For examples, if a company produces baby diapers, they tend to introduce the product to married women aged 20 to 45. Supposing that they have some data about user accounts on a social network, hence they launch a promotion with suitable amount of gifts to married female users via this social network. If we only care about the number of influenced individuals, as in the case of

IM

, we will not evaluate the impact to the potential users and lead to wrong selection of a seed set. Figure 1 shows an example. This network contains 8 nodes and 9 edges, the priority set is

{b, d}

and the weight of each edge (or influence probability) is assigned to 1. Considering the case when the budget

k = 1

(number of seed nodes), the optimal solution of

IM

is

{f}

influences to 6 nodes including

{f, d, g, c, e, h}

except b. Hence,

IM

cannot take effect to all priority nodes.The solution must be

{a}

that has the total influence is only 5.

Motivated by such interesting scenarios, in this paper we investigate the Influence Maximization with Priority (

IMP

) problem, which takes into account the priority constraint for influence process. Given a social network

G = (V, E)

, a priority set

U \subset V

, a budget k and a priority threshold

T, (T \leq k)

, the goal is to find the seed set S sized at k so that it influences to U at least T and the influence of the cascade is maximized. In fact,

IMP

is more suitable than

IM

. Besides, it generalizes

IM

problem. Nevertheless this problem faces with complicated challenges caused by the constraint of priority. To address this problem, we propose two approximation algorithms, Integrated Greedy (

IG

) and Integrated Greedy-based Sampling (

IGS

), with provable theoretical guarantees.

IG

meets the theoretical guarantee based on a modification of the natural greedy algorithm while

IGS

is an efficient randomized approximation algorithm based on sampling method [13,14,15,20]. This algorithm combines two novel techniques. Firstly, we propose Targeted Reverse Reachable (TRR) concept by modifying the Reverse Reachable Sampling (RR) technique [13,14,15,20] to estimate influence from a seed set to a given priority set. Secondly, we develop a new strategy to select a set of seeds in accordance with the priority constraint and set the number of samples to give a theoretical guarantees. Because

IMP

is a separate case of

IM

, we have built extensive experiments on various real networks to compare our

IGS

algorithm to the state-of-the-art algorithms for

IM

problem such as

DSSA

[15],

BCT

[2],

OPIM

about the influence on a given priority set, running time and memory used while the influence spread approximations are ensures as in

IM

.

Our contributions are summarized as follows:

We propose the Influence Maximization with Priority ( $IMP$ ) problem that considers priority constraint in Influence Maximization ( $IM$ ) problem. It means we expand the $IM$ by adding a constraint to influence on a given set of users. $IMP$ aims to find the seed set S with size k so that total influence of priority users is at least a given threshold $T, (k \geq T)$ and still maintain the influence of cascade maximized.
We propose two approximation algorithms, $IG$ and $IGS$ , for the $IMP$ problem. $IG$ algorithm provides an approximation ratio of $(1 - {(1 - \frac{1}{k})}^{t})$ , where $t \geq k - T$ is an output of the algorithm. In addition, $IGS$ is a randomized approximation algorithm providing an approximation ratio of $(1 - {(1 - \frac{1}{k})}^{t} - ϵ)$ with probability at least $1 - δ$ , where $ϵ > 0, δ \in (0, 1)$ are input parameters and t is an output of algorithm.
We conduct extensive experiments on various real networks such as netHEPT, netPHY, Email-Enron, DBLP, and Twitter ReTweet. The results indicate that our algorithm, $IGS$ , often outperforms state-of-the-art $IM$ algorithms in terms of influence, running time and memory used. In particular, $IGS$ provides the solution which ensures that the influence on the priority set is approximately from twice to 10 times greater than its threshold T while still maintains influence spread approximations as in $IM$ algorithms. Further, we also demonstrate that $IGS$ is faster and uses lower memory than the others in a lot of cases. On the whole, although $IGS$ has to care about how influences to a target given users, $IGS$ still gives considerable fast runtime, low memory used and high maximized influence on all nodes such as state-of-the-art algorithms such as DSSA, BCT, OPIM-C. It proves that $IGS$ has been very well designed.

Related work. Kempe et al. [1] first studied the Influence Maximization (

IM

) problem inspired by exploiting the influence among users in social networks for viral marketing [21]. They formulated

IM

as a discrete optimization problem under two classical information diffusion models, Independent Cascade (

IC

) and Linear Threshold (

LT

). They proved that

IM

could be approximated within a ratio of

1 - 1 / e + ϵ

for any

ϵ \in (0, 1)

and proposed a greedy algorithm that provided an approximation ratio of

1 - 1 / e - ϵ

for

ϵ > 0

. Later, Chen et al. [12,16] continued to study

IM

and proved that to calculate exactly the influence spread of a seeding set was #P-Hard. Hence although many heuristics algorithms have been proposed to solve this problem in large networks, they still have failed to retain the approximation ratio of

1 - 1 / e - ϵ

and have provided a low quality solutions such as the cost-effective lazy-forward heuristic (CELF) proposed by Leskovec et al. [22] which is based on improving greedy algorithm to get 700 times faster than the greedy algorithm with Mote-Carlo simulation; a fast heuristics algorithm called PMIA proposed by Chen et al. [12] which constructs a directed acyclic graph to estimate the influence under

IC

model or the algorithm proposed by the authors in [16] which uses a local directed acyclic graphs (LDAG) to calculate the local influence of nodes under

LT

model. To keep the

1 - 1 / e - ϵ

ratio, research on the approximation approach continues to be explored. Borgs et al. [13] first presented an

(1 - 1 / e - ϵ)

-approximation algorithm with probability at least

1 - δ

in

O (k l^{2} (m + n) {log}^{2} n / ϵ^{3})

time complexity by introducing Reverse Influence Sampling (RIS) model. This model has formed the foundation for further algorithm development. [14,15,20,23].

From then on, many works expanded

IM

in contexts of viral marketing. Nguyen et al. [24] investigated the Budged Influence Maximization (BIM) problem which considered the cost of selecting a node and proposed a

(1 - 1 / \sqrt{e} - ϵ)

approximation algorithm. The authors in [2] studied the a generalization of

IM

and BIM problems, called Cost-aware Targeted Viral Marketing (CTVM). In this work, each node u had an arbitrary cost

c (u)

and a benefit

b (u)

and the goal of CTVM was to select a seed set within a given budget so that the total benefit was maximized. We believe that this is the closest problem to our work. In CTVM problem, we can set parameters that maximize the influence on a given target set of users but cannot simultaneously maximize the influence of the others as in our problem. Later, several works improve the approximation as well as the scalability of CTVM algorithms [25,26].

Moreover, there are also many variants of

IM

problem that were studied. Some works studied the constraints of

IM

such as [17,18,27], in which edges were associated with a topic influence weight. These problems aimed to find a set of k users that maximized influenced users according to a topic query. However, the proposed algorithms did not provide any theoretical guarantee. Li et al. [28] proposed the Location-aware Influence Maximization (LIM) problem with the goal was to select the k-seed set so that the number of influenced nodes in the given query region was maximized. [29] investigated the Distance-aware Influence Maximization (DAIM) problem which considered the role of distance between users and the promoted location in seed selection. They extended a RIS process model and provided an unbiased estimator for the DAIM problem.

Besides, some works investigated the problem of Competitive Influence Maximization (CIM), which considered the context of

IM

under the competition of many rivals. Bharathi et al. [30] first formulated the CIM problem under a new competitive propagation model which was an extension of

IC

model. Chen et al. [12] investigated CIM under the combating with negative opinions based on an assumption that negative information was often more attractive than official information. Some authors considered the problem under many different cases in viral marketing, such as proposing a distance-aware problem [31], expanding the

LT

model to reflect competition [13,32,33,34], proposing a heuristic algorithm [35], etc.

Recently, some authors studied the selection of seed nodes in a social network to influence groups of users or communities instead of individuals [36,37,38,39]. They argue that in real-world scenarios, creating impact on groups is more beneficial than the individuals in a network. Tsang et al. [36] investigated the Fairness Group Maximization problem with two fairness criteria including maximin fairness and diversity. While the maximin fairness aimed to maximize the minimum influence nodes of any per their population, the criterion of diversity was an alternate fairness concept by extending the notion of individual rationality to group rationality. They proposed an approximation algorithm based on multi-submodular objective function processing techniques. More recent, the authors in [37] proposed exact algorithms for fairness group influence with multiple criteria based on mix integer linear programming formulation on a specific set of sample graphs under

IC

model. In [38], the authors characterize the intricate relationship between diversity and efficiency, which sometimes may be at odds but may also reinforce each other. Nguyen et al. [39] considered the Influence Maximization problem at the Community level problem, which found seed set of k nodes that influenced to largest number of communities. They showed that the objective function was neither sub-modular nor super-modular and proposed some approximation algorithms with provable guarantees. Different to our studied problem in this paper, these studies did not address the priority set in influence maximization context. Hence the proposed algorithms cannot be applied to the

IMP

problem.

Organization. The rest of the paper is organized as follows: Section 2 presents information diffusion model and problem definitions. Section 3 and Section 4 present our proposed Integrated Greedy and Integrated Greedy-based Sampling algorithms for

IMP

problem with the theoretical analysis. Experimental results are shown in Section 5. In Section 6 we discuss the future work and conclude this paper.

2. Model and Problem Definition

In this section, we introduce about network model and the well-known Independent Cascade (

IC

) diffusion information model [1]. Under

IC

model, we formally define the Influence Maximization with Priority (

IMP

) problem.

2.1. Graph Notation and Independent Cascade Model

Let

G = (V, E)

be a directed graph representing a social network with a node set V and a directed edge set E,

| V | = n

and

| E | = m

. Let

N_{i n} (v)

and

N_{o u t} (v)

be two sets of in-neighbors and out-neighbor of a node v, respectively. The notations of S and

S^{*}

represent to a seed set that is a solution and an optimal solution of

IMP

, respectively. We also note

OPT = σ (S^{*})

is the influence of an optimal solution.

In Independent Cascade (

IC

) model, each edge

e = (u, v) \in E

has an influence probability

p (u, v) \in (0, 1)

that represents the information transmission from u to v. Each node

v \in V

has two possible states, active and inactive. Given a seed set

S \subseteq V

, the diffusion process from S happens in discrete steps

t = 0, 1, \dots

, as follow:

At step $t = 0$ , all nodes in S is activated.
At step $t \geq 1$ , for an activated node u in previous steps, it has a single chance to activate each inactive neighbour v with the successful probability $p (u, v)$ . An activated node remains $a c t i v e$ till the end of the diffusion process.
The propagation process ends when no more node is activated.

Kempe et al. [1] show that

IC

model is equivalent to live-edge model and estimating the quantity of influence nodes can be done as follows. We first generate a sample graph g from original graph G by selecting each edge

e = (u, v) \in E

, independently, with probability

p (u, v)

, and no select edge

(u, v)

with probability

1 - p (u, v)

. The probability that a realization g can be generated from G (denoted as

g \sim G

) is

\begin{matrix} Pr [g \sim G] = \prod_{e \in E (g)} p (u, v) \prod_{e \in E \ E (g)} (1 - p (u, v)) \end{matrix}

(1)

In this equation,

E (g)

is the set edge of g. The number of sample graphs is

2^{| E |}

. The influence spread of a seed set S in G is calculated as follows:

\begin{matrix} σ (S) = \sum_{g \sim G} Pr [g \sim G] | R (g, S) | \end{matrix}

(2)

where

R (g, S)

denotes the set of reachable nodes from S in g. For a set of priority nodes U, the influence spread of S to U is calculated as follows:

\begin{matrix} σ_{U} (S) = \sum_{g \sim G} Pr [g \sim G] | R (g, S \to U) | \end{matrix}

(3)

where

R_{g} (S \to P)

denotes the set of nodes in U that can reach from S in g. Kempe et al. [1] also show that,

σ (\cdot)

is a monotone and sub-modular function, i.e, for any

A \subset V

, and

v \notin V \ B

, we have:

\begin{matrix} σ (A + {v}) \geq σ (A) \end{matrix}

(4)

and for any

A \subseteq B \subset V

, and

v \notin V \ B

, we have:

\begin{matrix} σ (A + {v}) - σ (A) \geq σ (B + {v}) - σ (B) \end{matrix}

(5)

We also easy to see that

σ_{U} (\cdot)

is a monotone and sub-modular function.

2.2. Problem Definition

We investigate Influence Maximization with Priority (

IMP

) defined as follows:

Definition 1

(

IMP

problem). Given a graph

G = (V, E)

under

IC

model, a positive integer k (budget), the priority set

U \subset V

, and the threshold T with

T \leq k, T \leq | U |

.

IMP

problem asks to find the seed set

S \subset V

, with

| S | \leq k

and

σ_{U} (S) \geq T

so that influence spread,

σ (S)

, is maximized, i.e, find S that is the solution to the following optimization problem:

\begin{matrix} maximize : & σ (S) \end{matrix}

(6)

\begin{matrix} subject to : & | S | \leq k \end{matrix}

(7)

\begin{matrix} σ_{U} (S) \geq T \end{matrix}

(8)

IMP

becomes

IM

problem when

U = \emptyset

. Therefore,

IM

is a special case of

IMP

and

IMP

is also NP-hard. In addition, the calculation of the influence function from the seed set is proven to be #P-hard [12]. Thus finding the solution to the problem within the time allowed is very challenging.

3. Integrated Greedy Algorithm

In this section, we first propose Integrated Greedy (

IG

) Algorithm which is well-known to resolve monotone and sub-modular problems that ensures an lower-bounded of optimization solution. The details of algorithm is described in Algorithm 1.

Algorithm 1: Integrated Greedy (

IG

) algorithm

Assume

S_{1}

is the solution of the problem that finds the minimum seed nodes such that the influence on the priority set is greater than threshold T, and

S_{2}

is a solution of

IM

problem. The main idea of this algorithm is to modify the native greedy algorithm [1] by combining two above solutions.

The algorithm is divided into two main phases. In the first phase, it tries to find a solution

S_{1}

by a greedy strategy (line 2–4). In each iterator, the algorithm chooses a node u with largest influence incremental to set U into

S_{1}

(line 3–4) until the

σ_{U} (S_{1}) \geq T

. Since

T < k

,

| S_{1} | \leq T < k

. Denote

t = k - T

as the remaining budget (line 6). The algorithm next finds the candidate solution

S_{2}

for

IM

with the remaining budget t by using a greedy method in the second phase (line 6–10). In each iterator i, it selects a node u with largest influence incremental (line 7). If u already belongs to

S_{1}

, the algorithm increases t by 1 (line 8–9). This phase ends when the remaining budget t is exhausted (line 6). Finally, the algorithm returns the solution S which unites

S_{1}

and

S_{2}

. It is easy to confirm that

| S | = k

, and

t > T - k \geq 1

since

k > T

. Theorem 1 shows the approximation guarantee of

IG

algorithm.

Theorem 1.

IG

algorithm returns

(S, t)

, where S is a feasible solution and

t \geq 1

, satisfies:

σ (S) \geq (1 - {(1 - \frac{1}{k})}^{t}) σ (S^{*})

The worst-case approximation ratio is obtained when t = 1 and it is equal to 1/k.

Proof.

Denote

S_{IM}^{*} = {s_{1}, s_{2}, \dots, s_{k}}

is an optimal solution of

IM

problem for input data of Algorithm 1 (the graph G and budget k). Obviously, we have

σ (S_{IM}^{*}) \geq σ (S^{*})

. After ending the second phase, assume that

S_{2} = {u_{2}^{1}, u_{2}^{2}, \dots, u_{2}^{t}}

,

S_{2}^{i} = {u_{2}^{1}, u_{2}^{2}, \dots, u_{2}^{i}}

, and

S_{2}^{0} = \emptyset

. In the second phase, the algorithm repeatedly selects a node u of which incremental influence gain is largest and due to the function

σ (\cdot)

is monotone and sub-modular [1], so we have:

\begin{matrix} σ (S_{IM}^{*}) - σ (S_{2}^{i}) & \leq σ (S_{IM}^{*} \cup S_{2}^{i}) - σ (S_{2}^{i}) \end{matrix}

(9)

\begin{matrix} \leq \sum_{j = 1}^{k} (σ (S_{2}^{i} \cup {s_{1}, s_{2}, \dots, s_{j}}) - σ (S_{2}^{i} \cup {s_{1}, s_{2}, \dots, s_{j - 1}})) \end{matrix}

(10)

\begin{matrix} \leq \sum_{j = 1}^{k} (σ (S_{2}^{i} \cup {s_{j}}) - σ (S_{2}^{i})) (Dueto σ is a sub-modular function) \end{matrix}

(11)

\begin{matrix} \leq k \cdot max_{s \in S_{IM}^{*}} (σ (S_{2}^{i} \cup {s}) - σ (S_{2}^{i})) \end{matrix}

(12)

\begin{matrix} \leq k \cdot (σ (S_{2}^{i + 1}) - σ (S_{2}^{i})) \end{matrix}

(13)

Therefore, for any

i = 0, \dots, t - 1

, we have

\begin{matrix} σ (S_{2}^{i + 1}) - σ (S_{2}^{i}) \geq \frac{1}{k} (σ (S_{IM}^{*}) - σ (S_{2}^{i})) \end{matrix}

(14)

Minus two inequality terms to

σ (S_{IM}^{*})

, we have:

\begin{matrix} σ (S_{2}^{i + 1}) - σ (S_{2}^{i}) - σ (S_{IM}^{*}) \geq \frac{1}{k} σ (S_{IM}^{*}) - σ (S_{IM}^{*}) - \frac{1}{k} σ (S_{2}^{i}) \end{matrix}

(15)

Rearrange the terms of the above inequality, we have

\begin{matrix} σ (S_{2}^{i + 1}) - σ (S_{IM}^{*}) & \geq (1 - \frac{1}{k}) (σ (S_{2}^{i}) - σ (S_{IM}^{*})) \end{matrix}

(16)

\begin{matrix} \geq {(1 - \frac{1}{k})}^{t} (σ (S_{2}^{0}) - σ (S_{IM}^{*})) \end{matrix}

(17)

Together with the fact that

S_{2}^{0} = \emptyset

and

σ (\emptyset) = 0

, the above inequality implies

\begin{matrix} σ (S_{2}) = σ (S_{2}^{t}) \geq (1 - {(1 - \frac{1}{k})}^{t}) σ (S_{IM}^{*}) \end{matrix}

(18)

Since

σ_{U} (S_{1}) \geq T

and

S = S_{1} \cup S_{2}

, S is feasible solution of

IMP

, and

\begin{matrix} σ (S) \geq σ (S_{2}^{t}) & \geq (1 - {(1 - \frac{1}{k})}^{t}) σ (S_{IM}^{*}) \geq (1 - {(1 - \frac{1}{k})}^{t}) σ (S^{*}) \end{matrix}

(19)

which proves the theorem! ☐

Although Algorithm 1 can provide an approximation guarantee, but it cannot work with real-social networks because the calculation of the influence function

σ (S)

is

# P

-hard under

IC

model [12]. To overcome this challenge, we propose a randomize algorithm with provable approximation guarantee based on combining

IG

with a sampling technique.

4. Sampling Algorithm with Provable Guarantees

In this section, we present an efficient algorithm for

IMP

problem called Integrated Greedy Sampling (

IGS

) algorithm that can provide an guarantee theoretical. In addition, we show that our algorithm can also be applied to large networks in experiments.

4.1. Estimator of Influence Functions

Firstly, we recap the concept of Reachable Reverse (RR) set [40] to estimate influence function

σ (\cdot)

. Base on that, we propose the concept of Targeted Reachable Reverse (TRR) set to estimate influence function

σ_{U} (S)

. Then we propose

IGS

algorithm and provide theoretical analysis based on statistical evidence.

Definition 2

(Reachable Reverse (RR) set [40]). Given a graph

G = (V, E)

under

IC

model. A random RR set

R_{j}

is generated from G by:

1.: Picking a source node u with probability $\frac{1}{n}$ .
2.: Generating a sample graph g from G, and returning $R_{j}$ as nodes which can be reached from u in g.

For a random RR set

R_{j}

, define a random variable

X_{g} (S) = min {1, | R_{g} \cap S |}

. Borgs et al. [40] show that RR samples can be used to estimate the influence function by applying the following Lemma.

Lemma 1.

For any set of nodes

S \subseteq V

, we have

σ (S) = n \cdot E [X_{g} (S)]

.

Given a set of RR set

R

, and a set node S, we can approximate the value of

σ (S)

by

\hat{σ} (S)

defined as follow:

\begin{matrix} \hat{σ} (S) = \frac{n}{| R |} \sum_{R_{g} \in R} X_{g} (S) \end{matrix}

(20)

Generating RR sets can be accomplished by using

IM

algorithms in [13,14,15,20,23]. The common algorithm for generating RR set

R_{j}

is described in Algorithm 2. This algorithm first selects a source node u with a probability

\frac{1}{n}

to add into

R_{j}

. The algorithm uses a queue Q to store the visited nodes. Initially, u is also added to Q. The algorithm next retrieves each node v in Q and picks an incoming node x with probability

p (x, v)

(line 6). If successful, it adds x in to Q and

R_{j}

. This process takes place until the set Q is empty.

Algorithm 2: Generating RR sample under

IC

model

We now introduce the definition of Targeted Reachable Reverse (TRR) Set on the basis of modifying RR concept.

Definition 3

(Targeted Reachable Reverse (TRR) Set). Given a graph

G = (V, E)

under

IC

model. A random TRR set

R_{j}^{U}

is generated from G by:

1.: Picking a source node $u \in U$ with probability $\frac{1}{| U |}$ .
2.: Generating a sample graph g from G, and returning $R_{g}^{U}$ as nodes which can be reached from u in g.

We define a random variable

Y_{g} (A) = min {1, | R_{g}^{U} \cap S |}

. Similar to Lemma 1, Lemma 2 shows that we can use the value of

Y_{g} (S)

to estimate function

σ_{U} (S)

.

Lemma 2.

For any set of nodes

S \subseteq V

, we have

σ_{U} (S) = | U | \cdot E [Y_{g} (S)]

Proof.

Denote

R_{g}^{U} (u)

is a TRR sample with a source node u for the sample graph g, we have:

\begin{matrix} σ_{U} (S) & = \sum_{g \sim G} | R (g, S \to U) | \\ = \sum_{u \in U} \sum_{g \sim G} Pr [g \sim G] [\exists v \in S : u is reached from v] \\ = \sum_{u \in U} \sum_{g \sim G} Pr [g \sim G] [\exists v \in S : v \in R_{g}^{U} (u)] \\ = | U | \sum_{u \in V} \frac{1}{| U |} \sum_{g \sim G} Pr [g \sim G] Y_{g} (S) \\ = | U | \sum_{u \in V} \sum_{g \sim G} Pr [u is source node] Pr [g \sim G] Y_{g} (S) \\ = | U | \cdot E [Y_{g} (S)] \end{matrix}

The transition from the second equality to the third equality comes from the definition of

R_{g}^{U} (u)

and from the third to the fourth then to the fifth is caused by the distribution of choosing a node u as a source node. ☐

Given a set of TRR samples

R

and a set node S, we define and an approximation value of

σ_{U} (S)

as follow:

\begin{matrix} {\hat{σ}}_{U} (S) = \frac{| U |}{| R |} \sum_{R_{g}^{U} \in R} Y_{g} (S) \end{matrix}

(21)

From Lemma 2, we can give a good approximation of

σ_{U} (\cdot)

when the number of TRR samples is large enough. We can re-use Algorithm 2 to generate a TRR set

R_{j}^{U}

by a modification. We replace line 1 in the algorithm by picking source node

u \in U

with probability

\frac{1}{| U |}

and leave the rest as is.

4.2. Algorithm Description and Theoretical Analysis

Algorithm description. The algorithm is detailed in Algorithm 3. It generates the set of

N_{U}

TRR sets

R_{1}

, and set two candidate solutions

S_{1}

,

S_{2}

empty at first. Then the body of the algorithm divides into two phases. In phase 1, it finds a candidate solution

S_{1}

with minimum-size so that

\hat{σ} (S) \geq (1 + α) T

by using a greedy strategy with potential function

\hat{σ}

over

R_{1}

. In each iterator, it selects a node u with maximal incremental value of the potential function (line 4) until

\hat{σ} (S) \geq (1 + α) T

. The candidate solution

S_{1}

obtained by this phase satisfies the priority constraint,

σ_{U} (S_{1}) \geq T

with probability at least

1 - δ

(Lemma 4).

The phase 2 selects a candidate solution

S_{2}

with the remaining budget (

t = k - | S_{1} |

) so that the influence spread

σ (\cdot)

is maximized. In this phase, it first sets the parameters

ϵ_{1}, t_{m a x}

,

N_{m a x}

and generates

N_{1}

set of RR samples

R_{2}

. The main of this phase operates in several iterators (line 12–27) until meeting the stopping condition (line 22). In each iterator, it finds a candidate solution

S_{2}

by a greedy strategy. It picks a node u with maximal incremental of approximation influence

\hat{σ} (\cdot)

over

R_{2}

(line 12) until t nodes are selected. Similar to

IG

algorithm, if u already belongs to

S_{1}

, the algorithm increases t by 1. After that, the algorithm checks the quality of candidate solution

S_{2}

(line 17). It calculates

F_{l} (S_{2}, R_{2}, δ)

- a lower-bounded of

σ (S_{2})

, and

F_{u} (S_{2}, R_{2}, δ)

-an upper-bounded of an optimal solution respect to

IMP

problem. These functions ensure the statistical criterion, which are claimed in the Lemmas 5 and 6. If solution

S_{2}

meets the approximation condition (line 19), the algorithm returns

S_{2}

. If not, it moves to the next iterator and stops when the number of TRR samples is at least

N_{m a x}

(line 21).

Algorithm 3: Integrated Greedy -based Sampling (

IGS

) algorithm

Theoretical analysis. Fortunately, the sequence of random variables

X_{g} (S)

and

Y_{g} (S)

constructed from the RR and TRR samples can be shown to form a martingale. For any random variable

X_{g} (S) \in [0, 1]

, let a random variable

M_{i} = \sum_{j = 1}^{i} (X_{g}^{i} (S) - μ), \forall i \geq 1

, where

μ = E [X_{g}]

. For a sequence of random variables

M_{1}, M_{2}, \dots

we have

E [M_{i} | M_{1}, \dots, M_{j - 1}] = E [M_{i - 1}] + E [X_{g}^{i} (S) - μ] = E [M_{i - 1}]

. Hence,

M_{1}, M_{2}, \dots

be a form of martingale [41]. Similarly,

Y_{g}

is also a form of martingale. Therefore, the following concentration inequality [41] applies:

Lemma 3.

If

M_{1}, M_{2}, \dots

be a form of martingale,

| M_{1} | \leq a

,

| M_{j} - M_{j - 1} | \leq a

for

j \in [1, i]

, and

\begin{matrix} Var [M_{1}] + \sum_{j = 2}^{i} Var [M_{j} | M_{1}, M_{2}, \dots, M_{j - 1}] = b \end{matrix}

(22)

where

Var [\cdot]

denotes the variance of a random variable. Then, for any λ, we have:

\begin{matrix} Pr [M_{i} - E [M_{i}] \geq λ] \leq exp (- \frac{λ^{2}}{\frac{2}{3} a λ + 2 b}) \end{matrix}

(23)

Apply this Lemma with

| M_{1} | = | X_{g}^{1} (S) | \leq 1

,

| M_{j} - M_{j - 1} | = | X_{g}^{j} (S) - X_{g}^{j - 1} (S) | \leq 1

,

Var [M_{1}] = Var [X_{g}^{1} (S) - μ] = Var [X_{g} (S)]

,

Var [M_{j} | M_{1}, M_{2}, \dots, M_{j - 1}] = Var [X_{g}^{j} (S) - μ] = Var [X_{g} (S)]

, and

Var [X_{g} (S)] \leq μ (1 - μ) \leq μ

, we have:

\begin{matrix} Pr [\sum_{i = 1}^{| R |} X_{g}^{i} (S) - | R | \cdot μ \geq λ] & \leq exp (- \frac{λ^{2}}{\frac{2}{3} λ + 2 μ | R |}) \end{matrix}

(24)

Similarly,

- M_{1}, \dots, - M_{i}, \dots

also form a Martingale, so apply Lemma 3, we have:

\begin{matrix} P r [\sum_{i = 1}^{| R |} X_{g}^{i} (S) - | R | \cdot μ \leq - λ] & \leq exp (- \frac{λ^{2}}{2 μ | R |}) \end{matrix}

(25)

Let

λ = ϵ μ | R |

and put it in two above inequalities, we have:

\begin{matrix} Pr [\sum_{i = 1}^{| R |} X_{g}^{i} - | R | \cdot μ \geq ϵ | R | μ] \leq exp (- \frac{ϵ^{2} | R | μ}{2 + \frac{2}{3} ϵ}) \end{matrix}

(26)

\begin{matrix} Pr [\sum_{i = 1}^{| R |} X_{g}^{i} - | R | \cdot μ \leq - ϵ | R | μ] \leq exp (- \frac{ϵ^{2} | R | μ}{2}) \end{matrix}

(27)

The following Lemma shows the lower-bound of the influence of candidate solution

S_{1}

.

Lemma 4.

The candidate solution

S_{1}

obtained by phase 1 of Algorithm 3 satisfies

Pr [σ_{U} (S_{1}) \geq T] \geq 1 - δ

.

Proof.

Denote

μ_{Y} = E [Y_{g}] = \frac{σ_{U} (S_{1})}{| U |}

, and

{\hat{μ}}_{Y} = \frac{1}{N_{U}} \sum_{i = 1}^{N_{U}} Y_{g}^{i} = \frac{{\hat{σ}}_{U} (S_{1})}{| U |} \geq \frac{(T + α T)}{| U |}

. Apply (27) for set

R_{1}

, we have:

\begin{matrix} Pr [{\hat{μ}}_{Y} \leq (1 - α) μ_{Y}] & = Pr (\sum Y_{g}^{i} - N_{U} μ_{Y} \leq - α N_{U} μ_{Y}) \\ \leq exp (\frac{- ϵ^{2} N_{U} μ_{Y}}{2}) \\ \leq exp (\frac{- ϵ^{2} {\hat{μ}}_{Y} N_{U}}{2 (1 - α)}) \\ \leq exp (\frac{- ϵ^{2} (T + α T)}{2 (1 - α) | U |} N_{U}) \\ \leq exp (\frac{- (2 + \frac{2}{3} α) ln ((\binom{| U |}{⌊ | U | / 2 ⌋}) / δ)}{2 (1 - α)}) \\ \leq exp (- ln ((\binom{| U |}{⌊ | U | / 2 ⌋}) / δ)) \leq \frac{δ}{⌊ | U | / 2 ⌋} \end{matrix}

We assume that the event

{\hat{μ}}_{Y} \leq (1 - α) μ_{Y}

happens, apply (26) for set

R_{1}

, we have:

\begin{matrix} Pr [σ_{U} (S_{1}) \leq T] & \leq Pr (σ_{U} (S_{1}) \leq \frac{{\hat{σ}}_{U} (S_{1})}{1 + α}) \end{matrix}

(28)

\begin{matrix} = Pr ({\hat{σ}}_{U} (S_{1}) \geq (1 + α) σ_{U} (S_{1})) \end{matrix}

(29)

\begin{matrix} = Pr (\frac{| U |}{N_{U}} \sum_{i = 1}^{N_{U}} Y_{g}^{i} - | U | μ_{Y} \geq | U | α μ_{Y}) \end{matrix}

(30)

\begin{matrix} = Pr (\sum_{i = 1}^{N_{U}} Y_{g}^{i} - N_{U} μ_{Y} \geq N_{U} α μ_{Y}) \end{matrix}

(31)

\begin{matrix} \leq exp (- \frac{α^{2} μ_{Y}}{2 + \frac{2}{3} α} N_{U}) \end{matrix}

(32)

\begin{matrix} \leq exp (- \frac{α^{2} {\hat{μ}}_{Y}}{(2 + \frac{2}{3} α) (1 - α)} N_{U}) \end{matrix}

(33)

\begin{matrix} \leq exp (- \frac{α^{2} {\hat{σ}}_{U} (S_{1})}{(2 + \frac{2}{3} α) (1 - α) (T + α T)} N_{U}) \end{matrix}

(34)

\begin{matrix} \leq exp (- ln ((\binom{| U |}{⌊ | U | / 2 ⌋}) / δ)) \end{matrix}

(35)

\begin{matrix} \leq \frac{δ}{(\binom{| U |}{⌊ | U | / 2 ⌋})} \end{matrix}

(36)

Assume that

| S_{1} | = k_{1}

, there are at most

(\binom{n}{k_{1}})

possibilities for the candidate solutions

S_{1}

. Therefore,

\begin{matrix} Pr [\exists S_{1} : σ_{U} (S_{1}) \leq T] \leq (\binom{n}{k_{1}}) \frac{δ}{(\binom{| U |}{⌊ | U | / 2 ⌋})} \leq δ \end{matrix}

(37)

☐

Lemma 5

(Lower-bound). For any

δ \in (0, 1)

, a set of RR samples

R

, let

c = ln (\frac{1}{δ})

, and

\begin{matrix} F_{l} (R, S, δ) = min \{\hat{σ} (S) - \frac{n c}{3 | R |}, \hat{σ} (S) + \frac{n}{| R |} (\frac{2 c}{3} - \sqrt{\frac{4 c^{2}}{9} + 2 | R | c \frac{\hat{σ} (S)}{n}})\} \end{matrix}

(38)

We have

Pr [σ (S) \geq F_{l} (R, δ)] \geq 1 - δ

.

Proof.

Denote

μ = E [X_{g} (S)] = \frac{σ (S)}{n}

and

\hat{μ} = \frac{1}{n} \sum_{R_{g} \in R} X_{g} (S) = \frac{\hat{σ} (S)}{n}

. Apply (24) with

λ = \frac{c}{3} + \sqrt{\frac{c^{2}}{9} + 2 c μ | R |}

, we have:

\begin{matrix} Pr [\sum_{j = 1}^{T} X_{g} (S) - | R | \cdot μ \geq λ] & \leq δ \end{matrix}

(39)

Therefore, the following event happens with probability at least

1 - δ

\begin{matrix} \sum_{j = 1}^{T} Z_{j} (S) - | R | \cdot μ \leq λ \Leftrightarrow | R | \hat{μ} - | R | μ - \frac{c}{3} \leq \sqrt{\frac{c^{2}}{9} + 2 c μ | R |} \end{matrix}

(40)

We consider two following cases:

Case 1:: If $| R | \hat{μ} - | R | μ - \frac{c}{3} \leq 0$ , then $μ \geq \hat{μ} - \frac{c}{3 | R |}$ .
Case 2:: If $| R | \hat{μ} - | R | μ - \frac{c}{3} > 0$ , (40) becomes:

\begin{matrix} {(| R | \hat{μ} - | R | μ - \frac{c}{3})}^{2} \leq \frac{c^{2}}{9} + 2 c μ | R | \end{matrix}

(41)

\begin{matrix} \Leftrightarrow & {(\hat{μ} - μ)}^{2} | R | + \frac{4 c}{3} (\hat{μ} - μ) - 2 c \hat{μ} \leq 0 \end{matrix}

(42)

Solve the above inequality for

μ

, we obtain:

\begin{matrix} μ \geq \hat{μ} + \frac{1}{T} (\frac{2 c}{3} - \sqrt{\frac{4 c^{2}}{9} + 2 | R | c \hat{μ}}) \end{matrix}

(43)

Combine two above cases and replace

μ = \frac{σ (S)}{n}, \hat{μ} = \frac{\hat{σ} (S)}{n}

, we obtain the proof. ☐

Lemma 6

(Upper-bound). For any

δ \in (0, 1)

, in an iterator t of Algorithm 3, denote

R_{2}^{t}

is a set of RR samples with

N_{t} = | R_{t} |

,

S_{2}^{t}

is a candidate solution of phase 2, and

\begin{matrix} F_{u} (R_{2}^{t}, S_{2}^{t}, δ) = \frac{\hat{σ} (S_{2}^{t})}{1 - {(1 - \frac{1}{k})}^{t}} + \frac{n}{N_{t}} (\sqrt{c^{2} + 2 N_{t} c \frac{\hat{σ} (S_{2}^{t})}{(1 - {(1 - \frac{1}{k})}^{t}) n}} - c) \end{matrix}

We have

Pr [OPT \leq F_{u} (R_{2}^{t}, S_{2}^{t}, δ)] \geq 1 - δ

.

Proof.

Let

λ = \sqrt{2 c μ N_{t}}

, apply inequality (25), we have:

\begin{matrix} Pr [\sum_{i = 1}^{N_{t}} X_{g}^{i} (S) - | R | \cdot μ \geq - λ] & \leq exp (- \frac{λ^{2}}{2 μ | R |}) \leq δ \end{matrix}

(44)

Therefore, the following event happens with the probability at least

1 - δ

:

\begin{matrix} N_{t} \hat{μ} - N_{t} μ \leq - \sqrt{2 c μ N_{t}} \Leftrightarrow - N_{t} (\hat{μ} - μ) \geq \sqrt{2 c μ N_{t}} \end{matrix}

(45)

Solve the above quadratic inequality for

μ

, we obtain upper-bound for

μ

is,

\begin{matrix} μ & \leq max \{\hat{μ}, \hat{μ} + \frac{1}{N_{t}} (\sqrt{c^{2} + 2 N_{t} c \hat{μ}} - c)\} \end{matrix}

(46)

\begin{matrix} = \hat{μ} + \frac{1}{N_{t}} (\sqrt{c^{2} + 2 N_{t} c \hat{μ}} - c) \end{matrix}

(47)

Denote

S^{0} = arg {max}_{S, | S | \leq k} \hat{σ} (S)

, where

\hat{σ}

is calculated over

R_{t}^{2}

. Since the phase of Algorithm 3 selects a candidate solution

S_{2}^{t}

by a greedy strategy. Similar to Theorem 1, we have:

\begin{matrix} \hat{σ} (S_{2}^{t}) \geq {(1 - (1 - \frac{1}{k}))}^{t} \hat{σ} (S^{0}) \geq {(1 - (1 - \frac{1}{k}))}^{t} \hat{σ} (S^{*}) \end{matrix}

(48)

Replace

μ = \frac{σ (S_{2}^{t})}{n}, \hat{μ} = \frac{\hat{σ} (S_{2}^{t})}{n}

into (47) and combine it with (48), we have:

\begin{matrix} OPT = σ (S^{*}) & \leq \hat{σ} (S^{*}) + \frac{n}{N_{t}} (\sqrt{c^{2} + 2 N_{t} c \frac{\hat{σ} (S^{*})}{n}} - c) \end{matrix}

(49)

\begin{matrix} \leq \frac{\hat{σ} (S_{2}^{t})}{{(1 - (1 - \frac{1}{k}))}^{t}} + \frac{n}{N_{t}} (\sqrt{c^{2} + 2 N_{t} c \frac{\hat{σ} (S_{2}^{t})}{n {(1 - (1 - \frac{1}{k}))}^{t}}} - c) \end{matrix}

(50)

which completes the proof. ☐

Based on above theoretical analysis, the following Theorem Approximation guarantee of

IGS

algorithm.

Theorem 2.

The Algorithm 3 provides a solution S and an integer t, satisfies:

$Pr [σ_{U} (S) \geq T] \geq 1 - δ$
$Pr [σ (S) \geq (1 - {(1 - \frac{1}{k})}^{t}) OPT] \geq 1 - δ$

Proof.

Since

S = S_{1} \cup S_{2}

and Lemma 4, we have:

\begin{matrix} Pr [σ_{U} (S) \geq T] \geq Pr [σ_{U} (S_{1}) \geq T] \geq 1 - δ \end{matrix}

We consider two following cases:

Case 1:: If the algorithm stops with the condition $| R_{2}^{t} | \geq N_{m a x}$ , apply (26) with set $S^{*}$ and $R_{2}$ , we have:

$\begin{matrix} Pr [\hat{σ} (S^{*}) \leq (1 - ϵ_{1}) σ (S^{*})] & \leq exp \{\frac{- ϵ_{1}^{2} N_{m a x} OPT}{2 n}\} \end{matrix}$

(51)

$\begin{matrix} \leq exp \{\frac{- ϵ_{1}^{2} N_{m a x} k}{2 n}\} (Due to OPT \geq k) \end{matrix}$

(52)

$\begin{matrix} \leq exp \{\frac{- ϵ_{1}^{2} N_{m a x} t_{0}}{2 n}\} (Due to k \geq t_{0}) \end{matrix}$

(53)

$\begin{matrix} \leq δ_{2} / (\binom{n}{t_{m a x}}) \leq δ_{2} \end{matrix}$

(54)

From (27), we have:

$\begin{matrix} Pr [σ (S_{2}^{t}) \leq \frac{\hat{σ} (S_{2}^{t})}{(1 + ϵ_{1})}] & \leq exp \{\frac{- ϵ_{1}^{2} N_{m a x} σ (S_{2}^{t})}{(2 + \frac{2}{3} ϵ_{1}) n}\} \end{matrix}$

(55)

$\begin{matrix} \leq exp \{\frac{- ϵ_{1}^{2} N_{m a x} t}{(2 + \frac{2}{3} ϵ_{1}) n}\} (Due to σ (S_{2}^{t}) \geq t) \end{matrix}$

(56)

$\begin{matrix} \leq δ_{2} / (\binom{n}{t_{m a x}}) \leq δ_{2} \end{matrix}$

(57)

Apply an union probability that the events (54) and (57) happen with the probability at most $δ_{1} + δ_{1} = δ / 3$ . Assume that they do not happen, we have:

$\begin{matrix} σ (S_{2}^{t}) & \geq \frac{\hat{σ} (S_{2}^{t})}{1 + ϵ_{1}} \geq \frac{{(1 - (1 - \frac{1}{k}))}^{t} \hat{σ} (S^{0})}{1 + ϵ_{1}} \end{matrix}$

(58)

$\begin{matrix} \geq \frac{{(1 - (1 - \frac{1}{k}))}^{t} \hat{σ} (S^{*})}{1 + ϵ_{1}} \end{matrix}$

(59)

$\begin{matrix} \geq \frac{1 - ϵ_{1}}{1 + ϵ_{1}} {(1 - (1 - \frac{1}{k}))}^{t} σ (S^{*}) \end{matrix}$

(60)

$\begin{matrix} = ({(1 - (1 - \frac{1}{k}))}^{t} - \frac{2 ϵ_{1}}{1 + ϵ_{1}} {(1 - (1 - \frac{1}{k}))}^{t}) σ (S^{*}) \end{matrix}$

(61)

$\begin{matrix} \geq ({(1 - (1 - \frac{1}{k}))}^{t} - \frac{2 ϵ_{1}}{1 + ϵ_{1}} (1 - \frac{1}{e})) σ (S^{*}) \end{matrix}$

(62)

$\begin{matrix} \geq ({(1 - (1 - \frac{1}{k}))}^{t} - ϵ) σ (S^{*}) \end{matrix}$

(63)

Hence, in this case the algorithm satisfies approximation guarantee with probability at least $1 - \frac{δ}{3}$ .
Case 2:: If the algorithm stops at any iterator $i, i = 1, 2, \dots, i_{m a x}$ . At this iterator, the condition in line 19 is satisfied, apply Lemma 5 and Lemma 6, the following thing happens with the probability at least $1 - 2 i_{m a x} δ_{2} = 1 - 2 δ / 3$ :

$\begin{matrix} \frac{σ (S_{2}^{t})}{OPT} \geq \frac{F_{l} (S_{2}^{t}, R_{2}, δ_{2})}{F_{u} (S_{2}^{t}, R_{2}, δ_{2})} \geq {(1 - (1 - \frac{1}{k}))}^{t} - ϵ \end{matrix}$

(64)

Combine two above cases, the algorithm meets the approximation ratio condition with the probability at least

1 - δ / 3 - 2 δ / 3 = 1 - δ

. ☐

5. Experiments

In this section, we implement and compare our algorithm

IGS

to other influence maximization methods about the influence in general, the influence on priority nodes, running time and memory usage. The dataset includes several network databases with thousands or even millions nodes and edges (Table 1).

5.1. Experimental Settings

All the implementations are on Linux machine with configurations are 2× Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz and 4 × 16 GB DIMM ECC DDR4 @ 2400MHz.

Algorithm comparisons. Since

IMP

is an expansion of

IM

, we compare

IGS

algorithm with several state-of-the-art

IM

algorithms including:

DSSA

[15],

BCT

[2],

OPIM

-C [23]. In addition, we use the basic algorithm, Max degree (

Degree

), which is the common baseline for information diffusion problems. In

IMP

, there are two factors that impact the solution in practice: the budget (k) of selecting seed node and the priority set of nodes (U). As a result, these two factors also affect the algorithms. From the above observation, we conduct experiments under two settings: varies k and fixed T; varies T and fixed k.

The dataset. For experimental purpose, we choose 5 types of databases from various resources: NetHept, NetPhy, DBLP are citation networks, Email-Enron is communication network [15] and Twitter Retweet is online social networks [42]. The brief of these ones are described on Table 1. These databases are experimented because they are popular in information diffusion problems, especially used in the state-of-the-art algorithms what we are comparing.

Parameter Settings. Graphs are formatted as each edge

e = (u, v) \in E

has the weight

w (u, v)

formulated as

w (u, v) = \frac{1}{d_{i n} (v)}

where

d_{i n} (v)

is the in-degree of node v [14,15,20].

For the first case, k is assigned with 150, 160, 170, 180, 190 and 200, respectively, while T is fixed at 100. In addition, set U is generated with 200 nodes. With the second case, the value of k is fixed at 500. U set includes about 1000 nodes. We change the value of T increasing from 100 to 500. In all experiments, we keep

ϵ = 0.1

,

δ = 1 / n

according setting for

IM

algorithms [14,15,20] and

α = 0.01

.

5.2. Experimental Results

We install

IGS

to compare with state-of-the-art algorithms such as

BCT

,

DSSA

,

OPIM - C

and

Degree

then calculate the spread of influence on all nodes and to U, the priority set,

U \subset V

. Results are shown in following tables and figures.

The Influence. The Figure 2 and the Table 2 indicate

IGS

outperforms the others when influencing to priority nodes by a given threshold T.

The above figure gives information about the influence values in case k changes from 150 to 200, U includes 200 nodes and the threshold T is 100. The terms “infU”, “inf” mean the influences to set U (

σ_{U} (S)

) and to all nodes (

σ (S)

), respectively. These algorithms output differently on various databases. Looking at red bars, we can see

IGS

approximately affects the set U twice the value of the threshold T on most databases except Re-Tweet but still higher than T. Conversely, the influence on U of the remaining sharply fluctuate according to the databases. While

DSSA

and

BCT

influence on U over T with netHEPT and ENRON, they work quite low with the others.

OPIM - C

and

Degree

often affect U much lower than T. Besides, the

σ (S)

of

BCT

is highest on netHEPT whereas the one of

IGS

keeps at top in all other cases. In general, the values of

σ (S)

of

DSSA

,

OPIM - C

and

Degree

have similarities with each others.

Besides, Table 2 describes the experiment while T comes from 100 to 500, k = 500 and enlarge U up to 1K nodes. This setting is to check the case when U is large and when the threshold T is incremental. Certainly, the condition that

k \geq T

has to be maintained so we fixed k = 500. Looking at bold values, we can see although U and S both become large and T increments gradually, the influence on U of

IGS

is always significantly higher than T, even up to more than ten times.

DSSA

,

BCT

and

OPIM - C

also give the outputs over threshold T in many cases, they still have values lower than T = 500 on netPHY, DBLP and RETWEET however. The

σ_{U} (S)

of

Degree

is lowest, especially, is only 22.77 on Re-Tweet.

From Figure 2 and Table 2, we can see

σ_{U} (S)

of

IGS

is significantly higher than T and produces better results than the state-of-the-art algorithms. This is because

IGS

always prioritizes affecting U until over the threshold T then affects other nodes as well even with large values of k, U size and T. The other algorithms show that they are not always possible to influence U to exceed the desired threshold. On the whole, the state-of-the-art

IM

algorithms cannot influence the given priority set as well as

IGS

can.

Running time.Figure 3 compares running time of these algorithms. They indicate time of

IGS

gives lowest values on netHEPT, ENRON and netPHY databases. Nevertheless,

IGS

stays at top 3 on DBLP while it costs highest running time on the remaining of the dataset to find 150 and 160 seed sets but return to top 3 at the other values of budget k.

IGS

only takes about 0.1 s to find out the seed set in most cases except RETWEET. Besides, the figures also give information about the other algorithms. First,

BCT

runs significantly slow on netHEPT than the others. This method often stays at top 3 or top 4 on ENRON, DBLP and RETWEET. Second, running time of

DSSA

and

IGS

look similiar, while that of

OPIM

-C and

Degree

is usually higher than the above two algorithms. As the whole,

IGS

’s running time gives the most stable results and usually runs around the 0.1-s mark.

The time of

IGS

is fast and stable because of parallel programming and this algorithm costs most of time to find out

S 1

while the loop to calculate

S 2

usually stops at 1–2 rounds. The TRR sampling technique also helps to quickly identify which seeds will affect to the priority U.

Memory Usage. The Table 3 illustrates the memory consumption of

IGS

and state-of-the-art methods including

DSSA

,

BCT

,

OPIM - C

and

Degree

. The smallest numbers are highlighted in bold while the largest ones are in red. The output shows that

IGS

outperforms the others, especially on small databases with tens of thousands of nodes and from tens to hundreds of thousands of edges such as netHEPT, ENRON, and netPHY.

IGS

also consumes sharply less memory than

OPIM - C

and

Degree

when testing with larger databases such as DBLP and RETWEET. When

IGS

spends only more than 130 MB and more than 200 MB,

OPIM - C

and

Degree

spend about four times higher with DBLP and RETWEET, respectively. Besides,

DSSA

also results less expensive memory usage in all cases.

BCT

is less stable than

IGS

and

DSSA

because it works as

DSSA

does on ENRON, netPHY, DBLPB and RETWEET but suddenly costs the most memory in NetHEPT.

TRR sampling technique focuses on finding the seeds that influence the priority U first then Algorithm 3 explores another seeds to push on the seed set. Hence the algorithm 3 saves memory to run loop more than the others because of must not check whether a seed node influences to U set or not. Moreover, the condition of

\frac{F_{l} (S_{2}, R_{2}, δ)}{F_{u} (S_{2}, R_{2}, δ)} \geq 1 - {(1 - \frac{1}{k})}^{t} - ϵ

helps

S_{2}

generated soon without waiting for the stop condition of the repeat.

Finally, our algorithm,

IGS

, was designed very well to get a balance between the target to influence on the given priority set and the influence that has to propagate to the largest number of nodes. Hence, running time, memory used and the influence of

IGS

give significantly high results and even more steadily rather than the others in general.

6. Conclusions

In this paper, we investigate the

IMP

problem, which is a variant of the

IM

problem with priority constraint that arises in a realistic scenario in which companies or organizations often prioritize influencing potential users during their viral marketing campaigns. The goal of the

IMP

problem is to select a seed set with k nodes can influence of a given priority set U greater than a threshold T which adjusts the influence of the seed set to the priority set. Although the objective function (influence spread function) is still a monotone and sub-modular function, but when considering the priority constraint the state-of-the-art

IM

algorithms cannot be applied.

To address this challenge, we propose two algorithms with provable theoretical guarantees, called

IG

and

IGS

. We show that

IG

provides a

(1 - {(1 - \frac{1}{k})}^{t})

-approximation solution;

IGS

is an efficient randomized approximation algorithm based on sampling method that returns a

(1 - {(1 - \frac{1}{k})}^{t} - ϵ)

-approximation solution with probability at least

1 - δ

with

ϵ > 0, δ \in (0, 1)

as input parameters of the problem. Experiments on real world social networks show our algorithm outperforms state-of-the-art

IM

algorithms including

DSSA

[15],

BCT

[2] and

OPIM

[23] in terms of influences, running time, and memory used.

In the future, we are going to improve our algorithm to expand it with large networks to billions scale with acceptable time. In addition, the problem with multiple priority user sets and thresholds is going to be considered.

Author Contributions

Methodology and writing—original draft preparation, C.V.P. and D.K.T.H.; investigation Q.C.V., A.N.S.; Conceptualization, H.X.H.; Data curation, Q.C.V. and A.N.S.; Investigation, C.V.P. and D.K.T.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

There is no conflict of interest.

References

Kempe, D.; Kleinberg, J.M.; Tardos, É. Maximizing the spread of influence through a social network. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–27 August 2003; pp. 137–146. [Google Scholar] [CrossRef] [Green Version]
Nguyen, H.T.; Thai, M.T.; Dinh, T.N. A Billion-Scale Approximation Algorithm for Maximizing Benefit in Viral Marketing. IEEE/ACM Trans. Netw. 2017, 25, 2419–2429. [Google Scholar] [CrossRef]
Li, Y.; Zhang, D.; Tan, K. Real-time Targeted Influence Maximization for Online Advertisements. PVLDB 2015, 8, 1070–1081. [Google Scholar] [CrossRef] [Green Version]
Pham, C.V.; Thai, M.T.; Duong, H.V.; Bui, B.Q.; Hoang, H.X. Maximizing misinformation restriction within time and budget constraints. J. Comb. Optim. 2018, 35, 1202–1240. [Google Scholar] [CrossRef]
Tong, G.A.; Wu, W.; Guo, L.; Li, D.; Liu, C.; Liu, B.; Du, D. An efficient randomized algorithm for rumor blocking in online social networks. In Proceedings of the 2017 IEEE Conference on Computer Communications, INFOCOM 2017, Atlanta, GA, USA, 1–4 May 2017; pp. 1–9. [Google Scholar] [CrossRef] [Green Version]
Budak, C.; Agrawal, D.; El Abbadi, A. Limiting the spread of misinformation in social networks. In Proceedings of the 20th International Conference on World Wide Web, WWW 2011, Hyderabad, India, 28 March–1 April 2011; pp. 665–674. [Google Scholar] [CrossRef]
Nguyen, H.T.; Cano, A.; Tam, V.; Dinh, T.N. Blocking Self-avoiding Walks Stops Cyber-epidemics: A Scalable GPU-based Approach. IEEE Trans. Knowl. Data Eng. 2020, 32, 1263–1275. [Google Scholar] [CrossRef] [Green Version]
Nguyen, N.P.; Yan, G.; Thai, M.T. Analysis of misinformation containment in online social networks. Comput. Netw. 2013, 57, 2133–2146. [Google Scholar] [CrossRef]
Zhang, H.; Alim, M.A.; Li, X.; Thai, M.T.; Nguyen, H.T. Misinformation in Online Social Networks: Detect Them All with a Limited Budget. ACM Trans. Inf. Syst. 2016, 34, 18:1–18:24. [Google Scholar] [CrossRef]
Zhang, H.; Kuhnle, A.; Zhang, H.; Thai, M.T. Detecting misinformation in online social networks before it is too late. In Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016, San Francisco, CA, USA, 18–21 August 2016; pp. 541–548. [Google Scholar] [CrossRef]
Ye, M.; Liu, X.; Lee, W. Exploring social influence for recommendation: A generative model approach. In Proceedings of the 35th International ACM SIGIR conference on research and development in Information Retrieval, SIGIR ’12, Portland, OR, USA, 12–16 August 2012; pp. 671–680. [Google Scholar] [CrossRef]
Chen, W.; Collins, A.; Cummings, R.; Ke, T.; Liu, Z.; Rincón, D.; Sun, X.; Wang, Y.; Wei, W.; Yuan, Y. Influence Maximization in Social Networks When Negative Opinions May Emerge and Propagate. In Proceedings of the Eleventh SIAM International Conference on Data Mining, SDM 2011, Mesa, AZ, USA, 28–30 April 2011; pp. 379–390. [Google Scholar] [CrossRef] [Green Version]
Borodin, A.; Filmus, Y.; Oren, J. Threshold Models for Competitive Influence in Social Networks. In Proceedings of the Internet and Network Economics—6th International Workshop, WINE 2010, Stanford, CA, USA, 13–17 December 2010; pp. 539–550. [Google Scholar] [CrossRef]
Tang, Y.; Shi, Y.; Xiao, X. Influence Maximization in Near-Linear Time: A Martingale Approach. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, 31 May–4 June 2015; pp. 1539–1554. [Google Scholar] [CrossRef]
Nguyen, H.T.; Thai, M.T.; Dinh, T.N. Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-scale Networks. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, 26 June–1 July 2016; pp. 695–710. [Google Scholar] [CrossRef] [Green Version]
Chen, W.; Yuan, Y.; Zhang, L. Scalable Influence Maximization in Social Networks under the Linear Threshold Model. In Proceedings of the ICDM 2010, The 10th IEEE International Conference on Data Mining, Sydney, Australia, 14–17 December 2010; pp. 88–97. [Google Scholar] [CrossRef]
Chen, S.; Fan, J.; Li, G.; Feng, J.; Tan, K.; Tang, J. Online Topic-Aware Influence Maximization. PVLDB 2015, 8, 666–677. [Google Scholar] [CrossRef]
Aslay, Ç.; Barbieri, N.; Bonchi, F.; Baeza-Yates, R.A. Online Topic-aware Influence Maximization Queries. In Proceedings of the 17th International Conference on Extending Database Technology, EDBT 2014, Athens, Greece, 24–28 March 2014; pp. 295–306. [Google Scholar] [CrossRef]
Pham, C.V.; Duong, H.V.; Hoang, H.X.; Thai, M.T. Competitive Influence Maximization within Time and Budget Constraints in Online Social Networks: An Algorithmic Approach. Appl. Sci. 2019, 9, 2274. [Google Scholar] [CrossRef] [Green Version]
Tang, Y.; Xiao, X.; Shi, Y. Influence maximization: Near-optimal time complexity meets practical efficiency. In Proceedings of the International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, 22–27 June 2014; pp. 75–86. [Google Scholar] [CrossRef]
Domingos, P.M.; Richardson, M. Mining the network value of customers. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, San Francisco, CA, USA, 26–29 August 2001; pp. 57–66. [Google Scholar]
Leskovec, J.; Krause, A.; Guestrin, C.; Faloutsos, C.; VanBriesen, J.M.; Glance, N.S. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, 12–15 August 2007; pp. 420–429. [Google Scholar] [CrossRef] [Green Version]
Tang, J.; Tang, X.; Xiao, X.; Yuan, J. Online Processing Algorithms for Influence Maximization. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD’ 18), Houston, TX, USA, 10–15 June 2018; Association for Computing Machinery: New York, NY, USA, 2018; pp. 991–1005. [Google Scholar] [CrossRef]
Nguyen, H.; Zheng, R. On Budgeted Influence Maximization in Social Networks. IEEE J. Sel. Areas Commun. 2013, 31, 1084–1094. [Google Scholar] [CrossRef] [Green Version]
Pham, C.V.; Duong, H.V.; Thai, M.T. Importance Sample-Based Approximation Algorithm for Cost-Aware Targeted Viral Marketing. In Proceedings of the Computational Data and Social Networks—8th International Conference, CSoNet 2019, Ho Chi Minh City, Vietnam, 18–20 November 2019; pp. 120–132. [Google Scholar] [CrossRef]
Li, X.; Smith, J.D.; Dinh, T.N.; Thai, M.T. TipTop: (Almost) Exact Solutions for Influence Maximization in Billion-Scale Networks. IEEE/ACM Trans. Netw. 2019, 27, 649–661. [Google Scholar] [CrossRef]
Barbieri, N.; Bonchi, F.; Manco, G. Topic-aware social influence propagation models. Knowl. Inf. Syst. 2013, 37, 555–584. [Google Scholar] [CrossRef]
Li, G.; Chen, S.; Feng, J.; Tan, K.-L.; Li, W.-S. Efficient Location-Aware Influence Maximization. In Proceedings of the 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, 16–19 April 2018; pp. 1569–1572. [Google Scholar] [CrossRef]
Wang, X.; Zhang, Y.; Zhang, W.; Lin, X. Efficient Distance-Aware Influence Maximization in Geo-Social Networks. IEEE Trans. Knowl. Data Eng. 2017, 29, 599–612. [Google Scholar] [CrossRef]
Bharathi, S.; Kempe, D.; Salek, M. Competitive Influence Maximization in Social Networks. In Proceedings of the Internet and Network Economics, Third International Workshop, WINE 2007, San Diego, CA, USA, 12–14 December 2007; pp. 306–311. [Google Scholar] [CrossRef]
Liu, W.; Yue, K.; Wu, H.; Li, J.; Liu, D.; Tang, D. Containment of competitive influence spread in social networks. Knowl.-Based Syst. 2016, 109, 266–275. [Google Scholar] [CrossRef]
He, X.; Song, G.; Chen, W.; Jiang, Q. Influence Blocking Maximization in Social Networks under the Competitive Linear Threshold Model. In Proceedings of the Twelfth SIAM International Conference on Data Mining, Anaheim, CA, USA, 26–28 April 2012; pp. 463–474. [Google Scholar] [CrossRef]
Lu, W.; Bonchi, F.; Goyal, A.; Lakshmanan, L.V.S. The bang for the buck: Fair competitive viral marketing from the host perspective. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, IL, USA, 11–14 August 2013; pp. 928–936. [Google Scholar] [CrossRef]
Chen, W.; Lakshmanan, L.V.S.; Castillo, C. Information and Influence Propagation in Social Networks; Synthesis Lectures on Data Management; Morgan & Claypool Publishers: San Rafael, CA, USA, 2013. [Google Scholar] [CrossRef]
Bozorgi, A.; Samet, S.; Kwisthout, J.; Wareham, T. Community-based influence maximization in social networks under a competitive linear threshold model. Knowl.-Based Syst. 2017, 134, 149–158. [Google Scholar] [CrossRef]
Tsang, A.; Wilder, B.; Rice, E.; Tambe, M.; Zick, Y. Group-Fairness in Influence Maximization. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, 10–16 August 2019; pp. 5997–6005. [Google Scholar] [CrossRef] [Green Version]
Farnadi, G.; Babaki, B.; Gendreau, M. A Unifying Framework for Fairness-Aware Influence Maximization. In Proceedings of the Companion of The 2020 Web Conference 2020, Taipei, Taiwan, 20–24 April 2020; pp. 714–722. [Google Scholar] [CrossRef]
Stoica, A.; Han, J.X.; Chaintreau, A. Seeding Network Influence in Biased Networks and the Benefits of Diversity. In Proceedings of the WWW ’20: The Web Conference 2020, Taipei, Taiwan, 20–24 April 2020; pp. 2089–2098. [Google Scholar] [CrossRef]
Nguyen, L.N.; Zhou, K.; Thai, M.T. Influence Maximization at Community Level: A New Challenge with Non-submodularity. In Proceedings of the 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019, Dallas, TX, USA, 7–10 July 2019; pp. 327–337. [Google Scholar] [CrossRef]
Borgs, C.; Brautbar, M.; Chayes, J.T.; Lucier, B. Maximizing Social Influence in Nearly Optimal Time. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, OR, USA, 5–7 January 2014; pp. 946–957. [Google Scholar] [CrossRef] [Green Version]
Chung, F.R.K.; Lu, L. Survey: Concentration Inequalities and Martingale Inequalities: A Survey. Internet Math. 2006, 3, 79–127. [Google Scholar] [CrossRef] [Green Version]
Rossi, R.A.; Ahmed, N.K. The Network Data Repository with Interactive Graph Analytics and Visualization; AAAI: Palo Alto, CA, USA, 2015. [Google Scholar]

Figure 1. A toy example shows the difference between the influence maximization and our proposed problem.

Figure 2. Comparisons of Influence Spreading with

k = 100 \to 500

, T = 100 and U size = 200

Figure 2. Comparisons of Influence Spreading with

k = 100 \to 500

, T = 100 and U size = 200

Figure 3. Comparisons about Runtime (s) with k varies from 150 to 200 between

IGS

and the others.

Figure 3. Comparisons about Runtime (s) with k varies from 150 to 200 between

IGS

and the others.

Table 1. Dataset’s statistics.

Database	#Nodes	#Edges	Types	Avg. Degree
netHEPT [15]	15 K	59 K	directed	4.1
ENRON [15]	37 K	184 K	directed	5
netPHY [15]	37 K	181 K	directed	13.4
DBLP [15]	655 K	2 M	directed	6.1
TWITTER RETWEET [42]	1 M	2 M	directed	4

Table 2. Comparisons about

σ (S)

and $σ_{U} (S)$ between $IGS$ and the others with k = 500, U size = 1 K and

T = 100 \to 500

.

Table 2. Comparisons about

σ (S)

and $σ_{U} (S)$ between $IGS$ and the others with k = 500, U size = 1 K and

T = 100 \to 500

.

	Dataset
	T		NetHept	Enron	netPHY	DBLP	RETWEET
$IGS$	100	$σ (S)$	5666.16	14,267.40	1865.92	54,033.50	17,307.70
	100	$σ_{U} (S)$	1482.04	1075.77	1192.84	1271.62	511.08
	200	$σ (S)$	5581.34	14,162.20	1805.26	53,553.90	18,581.50
	200	$σ_{U} (S)$	1478.93	1079.74	1175.32	1267.52	491.35
	300	$σ (S)$	5645.40	14,284.80	1773.33	53,240.50	19,459.10
	300	$σ_{U} (S)$	1476.08	1074.30	1153.32	1264.79	492.39
	400	$σ (S)$	5640.21	14,196.50	1688.53	52,918.80	18,832.20
	400	$σ (S)$	1468.48	1075.68	1125.69	1260.31	490.46
	500	$σ (S)$	5039.45	14,245.50	1593.66	52,130.90	228,801.00
	500	$σ_{U} (S)$	1238.54	1079.28	1104.20	1252.70	994.40
DSSA		$σ (S)$	4098.63	9960.35	3230.27	58,197.7	38,253.7
DSSA		$σ_{U} (S)$	1093.7	857.608	174.479	474.635	168.087
BCT		$σ (S)$	11,088.10	19,901.70	6675.95	117,197.00	77,316.90
BCT		$σ_{U} (S)$	1280.54	1701.60	386.49	474.635	159.77
OPIM-C		$σ (S)$	3779.09	19,326.3	6262.5	112,334	72,026.1
OPIM-C		$σ_{U} (S)$	600.93	894.18	194.04	459.801	173.41
Degree		$σ (S)$	3824.44	19,349.10	6345.86	114,249	73,936
Degree		$σ_{U} (S)$	292.82	779.84	164	260.94	22.77

Table 3. Memory usage (MB) comparisons between $IGS$ and the others.

Dataset	Algorithm	Budget k
Dataset	Algorithm	150	160	170	180	190	200
NetHEPT	IGS	9.90	9.90	9.90	9.89	9.89	9.95
	DSSA	22.84	22.84	22.84	22.84	22.84	22.84
	BCT	1023.79	1017.52	1021.60	1012.21	1020.18	1020.74
	OPIM-C	47.76	47.91	48.03	48.11	48.30	48.46
	Degree	49.14	49.18	49.48	49.68	49.86	50.13
ENRON	IGS	16.82	16.79	16.81	16.81	16.82	16.82
	DSSA	30.48	28.07	28.07	28.07	28.07	30.48
	BCT	30.35	30.35	30.39	30.39	30.39	30.39
	OPIM-C	27.16	27.20	42.00	27.22	27.25	27.30
	Degree	27.98	28.08	43.77	28.19	28.27	28.41
NetPHY	IGS	15.18	15.18	15.18	15.18	15.18	15.04
	DSSA	52.12	52.12	52.12	52.12	38.50	52.14
	BCT	34.82	34.82	34.82	34.82	34.82	34.80
	OPIM-C	87.88	88.39	88.92	89.31	90.26	90.51
	Degree	92.26	92.71	93.33	93.88	94.68	94.98
DBLP	IGS	138.66	138.66	138.66	138.66	138.66	138.66
	DSSA	152.90	152.87	152.87	152.91	152.91	152.83
	BCT	162.88	162.87	162.87	162.88	162.88	162.89
	OPIM-C	475.05	373.72	373.78	373.95	477.18	477.51
	Degree	500.87	395.00	394.26	395.35	504.52	505.26
RETWEET	IGS	214.67	214.67	214.67	214.67	214.67	214.67
	DSSA	253.14	253.14	253.14	253.14	253.14	253.14
	BCT	282.50	282.50	282.50	282.47	282.50	282.48
	OPIM-C	877.31	874.20	722.91	876.99	886.78	877.80
	Degree	918.53	916.23	756.93	920.00	930.33	921.95

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pham, C.V.; Ha, D.K.T.; Vu, Q.C.; Su, A.N.; Hoang, H.X. Influence Maximization with Priority in Online Social Networks. Algorithms 2020, 13, 183. https://doi.org/10.3390/a13080183

AMA Style

Pham CV, Ha DKT, Vu QC, Su AN, Hoang HX. Influence Maximization with Priority in Online Social Networks. Algorithms. 2020; 13(8):183. https://doi.org/10.3390/a13080183

Chicago/Turabian Style

Pham, Canh V., Dung K. T. Ha, Quang C. Vu, Anh N. Su, and Huan X. Hoang. 2020. "Influence Maximization with Priority in Online Social Networks" Algorithms 13, no. 8: 183. https://doi.org/10.3390/a13080183

APA Style

Pham, C. V., Ha, D. K. T., Vu, Q. C., Su, A. N., & Hoang, H. X. (2020). Influence Maximization with Priority in Online Social Networks. Algorithms, 13(8), 183. https://doi.org/10.3390/a13080183

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Influence Maximization with Priority in Online Social Networks

Abstract

1. Introduction

2. Model and Problem Definition

2.1. Graph Notation and Independent Cascade Model

2.2. Problem Definition

3. Integrated Greedy Algorithm

4. Sampling Algorithm with Provable Guarantees

4.1. Estimator of Influence Functions

4.2. Algorithm Description and Theoretical Analysis

5. Experiments

5.1. Experimental Settings

5.2. Experimental Results

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI