Inﬂuence Maximization with Priority in Online Social Networks

: The Inﬂuence Maximization ( IM ) problem, which ﬁnds a set of k nodes (called seedset ) in a social network to initiate the inﬂuence spread so that the number of inﬂuenced nodes after propagation process is maximized, is an important problem in information propagation and social network analysis. However, previous studies ignored the constraint of priority that led to inefﬁcient seed collections. In some real situations, companies or organizations often prioritize inﬂuencing potential users during their inﬂuence diffusion campaigns. With a new approach to these existing works, we propose a new problem called Inﬂuence Maximization with Priority ( IMP ) which ﬁnds out a set seed of k nodes in a social network to be able to inﬂuence the largest number of nodes subject to the inﬂuence spread to a speciﬁc set of nodes U (called priority set ) at least a given threshold T in this paper. We show that the problem is NP-hard under well-known IC model. To ﬁnd the solution, we propose two efﬁcient algorithms, called Integrated Greedy ( IG ) and Integrated Greedy Sampling ( IGS ) with provable theoretical guarantees. IG provides a (cid:16) 1 − ( 1 − 1 k ) t (cid:17) -approximation solution with t is an outcome of algorithm and t ≥ 1. The worst-case approximation ratio is obtained when t = 1 and it is equal to 1/ k . In addition, IGS is an efﬁcient randomized approximation algorithm based on sampling method that provides a (cid:16) 1 − ( 1 − 1 k ) t − (cid:101) (cid:17) -approximation solution with probability at least 1 − δ with (cid:101) > 0, δ ∈ ( 0, 1 ) as input parameters of the problem. We conduct extensive experiments on various real networks to compare our IGS algorithm to the state-of-the-art algorithms in IM problem. The results indicate that our algorithm provides better solutions interns of inﬂuence on the priority sets when approximately give twice to ten times higher than threshold T while running time, memory usage and the inﬂuence spread also give considerable results compared to the others.


Introduction
Presently, Online Social Networks (OSNs) have become an important platform in communication as well as e-commerce.Companies and businesses have leveraged a rapid spread of information thanks to the "word of mouth" effect among friends in social networks as a powerful tool for viral marketing.For instance, companies can provide some ones with free samples over an OSN so that much more people may know about their products and they have more chances to sell them.Influence Maximization (IM) problem [1], a key problem in viral marketing, has been extensively studied for this decade due to its tremendous value in business, viral marketing and influence propagation.Basically, IM aims to find some nodes (called seedset) in a social network to inject opinion, innovation or influence that can effect the largest the number of nodes.Kempe et al. [1] first studied IM as an optimization problem combined with two well-known models, Independent Cascade (IC) and Linear Threshold (LT).Since IM is NP-hard, they designed a native greedy algorithm that returned an (1 − 1/e)-approximation solution.The research shows that IM is not only a potential commercial role in viral marketing [2,3] but also a foundation of various applications in many fields such as epidemics control in social network [4][5][6][7][8], social network monitoring [9,10], recommendation system [11], etc.Hence, IM has been extensively studied recently [2,4,[12][13][14][15][16][17][18][19].
Although IM has a lot of great applications in viral marketing, previous studies ignored considering the impact on priority users who could play an important role for effectiveness of viral marketing campaigns.In fact, companies often prioritize specific potential customers, who are financially competent or suitable for their products.For examples, if a company produces baby diapers, they tend to introduce the product to married women aged 20 to 45. Supposing that they have some data about user accounts on a social network, hence they launch a promotion with suitable amount of gifts to married female users via this social network.If we only care about the number of influenced individuals, as in the case of IM, we will not evaluate the impact to the potential users and lead to wrong selection of a seed set. Figure 1 shows an example.This network contains 8 nodes and 9 edges, the priority set is {b, d} and the weight of each edge (or influence probability) is assigned to 1. Considering the case when the budget k = 1 (number of seed nodes), the optimal solution of IM is { f } influences to 6 nodes including { f , d, g, c, e, h} except b.Hence, IM cannot take effect to all priority nodes.The solution must be {a} that has the total influence is only 5. Motivated by such interesting scenarios, in this paper we investigate the Influence Maximization with Priority (IMP) problem, which takes into account the priority constraint for influence process.Given a social network G = (V, E), a priority set U ⊂ V, a budget k and a priority threshold T, (T ≤ k), the goal is to find the seed set S sized at k so that it influences to U at least T and the influence of the cascade is maximized.In fact, IMP is more suitable than IM.Besides, it generalizes IM problem.Nevertheless this problem faces with complicated challenges caused by the constraint of priority.To address this problem, we propose two approximation algorithms, Integrated Greedy (IG) and Integrated Greedy-based Sampling (IGS), with provable theoretical guarantees.IG meets the theoretical guarantee based on a modification of the natural greedy algorithm while IGS is an efficient randomized approximation algorithm based on sampling method [13][14][15]20].This algorithm combines two novel techniques.Firstly, we propose Targeted Reverse Reachable (TRR) concept by modifying the Reverse Reachable Sampling (RR) technique [13][14][15]20] to estimate influence from a seed set to a given priority set.Secondly, we develop a new strategy to select a set of seeds in accordance with the priority constraint and set the number of samples to give a theoretical guarantees.Because IMP is a separate case of IM, we have built extensive experiments on various real networks to compare our IGS algorithm to the state-of-the-art algorithms for IM problem such as DSSA [15], BCT [2], OPIM about the influence on a given priority set, running time and memory used while the influence spread approximations are ensures as in IM.
Our contributions are summarized as follows:

•
We propose the Influence Maximization with Priority (IMP) problem that considers priority constraint in Influence Maximization (IM) problem.It means we expand the IM by adding a constraint to influence on a given set of users.IMP aims to find the seed set S with size k so that total influence of priority users is at least a given threshold T, (k ≥ T) and still maintain the influence of cascade maximized.

•
We propose two approximation algorithms, IG and IGS, for the IMP problem.IG algorithm provides an approximation ratio of 1 − (1 − 1 k ) t , where t ≥ k − T is an output of the algorithm.In addition, IGS is a randomized approximation algorithm providing an approximation ratio of 1 − (1 − 1 k ) t − with probability at least 1 − δ, where > 0, δ ∈ (0, 1) are input parameters and t is an output of algorithm.

•
We conduct extensive experiments on various real networks such as netHEPT, netPHY, Email-Enron, DBLP, and Twitter ReTweet.The results indicate that our algorithm, IGS, often outperforms state-of-the-art IM algorithms in terms of influence, running time and memory used.In particular, IGS provides the solution which ensures that the influence on the priority set is approximately from twice to 10 times greater than its threshold T while still maintains influence spread approximations as in IM algorithms.Further, we also demonstrate that IGS is faster and uses lower memory than the others in a lot of cases.On the whole, although IGS has to care about how influences to a target given users, IGS still gives considerable fast runtime, low memory used and high maximized influence on all nodes such as state-of-the-art algorithms such as DSSA, BCT, OPIM-C.It proves that IGS has been very well designed.
Related work.Kempe et al. [1] first studied the Influence Maximization (IM) problem inspired by exploiting the influence among users in social networks for viral marketing [21].They formulated IM as a discrete optimization problem under two classical information diffusion models, Independent Cascade (IC) and Linear Threshold (LT).They proved that IM could be approximated within a ratio of 1 − 1/e + for any ∈ (0, 1) and proposed a greedy algorithm that provided an approximation ratio of 1 − 1/e − for > 0. Later, Chen et al. [12,16] continued to study IM and proved that to calculate exactly the influence spread of a seeding set was #P-Hard.Hence although many heuristics algorithms have been proposed to solve this problem in large networks, they still have failed to retain the approximation ratio of 1 − 1/e − and have provided a low quality solutions such as the cost-effective lazy-forward heuristic (CELF) proposed by Leskovec et al. [22] which is based on improving greedy algorithm to get 700 times faster than the greedy algorithm with Mote-Carlo simulation; a fast heuristics algorithm called PMIA proposed by Chen et al. [12] which constructs a directed acyclic graph to estimate the influence under IC model or the algorithm proposed by the authors in [16] which uses a local directed acyclic graphs (LDAG) to calculate the local influence of nodes under LT model.To keep the 1 − 1/e − ratio, research on the approximation approach continues to be explored.Borgs et al. [13] first presented an (1 − 1/e − )-approximation algorithm with probability at least 1 − δ in O(kl 2 (m + n) log 2 n/ 3 ) time complexity by introducing Reverse Influence Sampling (RIS) model.This model has formed the foundation for further algorithm development.[14,15,20,23].
From then on, many works expanded IM in contexts of viral marketing.Nguyen et al. [24] investigated the Budged Influence Maximization (BIM) problem which considered the cost of selecting a node and proposed a (1 − 1/ √ e − ) approximation algorithm.The authors in [2] studied the a generalization of IM and BIM problems, called Cost-aware Targeted Viral Marketing (CTVM).In this work, each node u had an arbitrary cost c(u) and a benefit b(u) and the goal of CTVM was to select a seed set within a given budget so that the total benefit was maximized.We believe that this is the closest problem to our work.In CTVM problem, we can set parameters that maximize the influence on a given target set of users but cannot simultaneously maximize the influence of the others as in our problem.Later, several works improve the approximation as well as the scalability of CTVM algorithms [25,26].
Moreover, there are also many variants of IM problem that were studied.Some works studied the constraints of IM such as [17,18,27], in which edges were associated with a topic influence weight.These problems aimed to find a set of k users that maximized influenced users according to a topic query.However, the proposed algorithms did not provide any theoretical guarantee.Li et al. [28] proposed the Location-aware Influence Maximization (LIM) problem with the goal was to select the k-seed set so that the number of influenced nodes in the given query region was maximized.[29] investigated the Distance-aware Influence Maximization (DAIM) problem which considered the role of distance between users and the promoted location in seed selection.They extended a RIS process model and provided an unbiased estimator for the DAIM problem.
Besides, some works investigated the problem of Competitive Influence Maximization (CIM), which considered the context of IM under the competition of many rivals.Bharathi et al. [30] first formulated the CIM problem under a new competitive propagation model which was an extension of IC model.Chen et al. [12] investigated CIM under the combating with negative opinions based on an assumption that negative information was often more attractive than official information.Some authors considered the problem under many different cases in viral marketing, such as proposing a distance-aware problem [31], expanding the LT model to reflect competition [13,[32][33][34], proposing a heuristic algorithm [35], etc.
Recently, some authors studied the selection of seed nodes in a social network to influence groups of users or communities instead of individuals [36][37][38][39].They argue that in real-world scenarios, creating impact on groups is more beneficial than the individuals in a network.Tsang et al. [36] investigated the Fairness Group Maximization problem with two fairness criteria including maximin fairness and diversity.While the maximin fairness aimed to maximize the minimum influence nodes of any per their population, the criterion of diversity was an alternate fairness concept by extending the notion of individual rationality to group rationality.They proposed an approximation algorithm based on multi-submodular objective function processing techniques.More recent, the authors in [37] proposed exact algorithms for fairness group influence with multiple criteria based on mix integer linear programming formulation on a specific set of sample graphs under IC model.In [38], the authors characterize the intricate relationship between diversity and efficiency, which sometimes may be at odds but may also reinforce each other.Nguyen et al. [39] considered the Influence Maximization problem at the Community level problem, which found seed set of k nodes that influenced to largest number of communities.They showed that the objective function was neither sub-modular nor super-modular and proposed some approximation algorithms with provable guarantees.Different to our studied problem in this paper, these studies did not address the priority set in influence maximization context.Hence the proposed algorithms cannot be applied to the IMP problem.
Organization.The rest of the paper is organized as follows: Section 2 presents information diffusion model and problem definitions.Sections 3 and 4 present our proposed Integrated Greedy and Integrated Greedy-based Sampling algorithms for IMP problem with the theoretical analysis.Experimental results are shown in Section 5.In Section 6 we discuss the future work and conclude this paper.

Model and Problem Definition
In this section, we introduce about network model and the well-known Independent Cascade (IC) diffusion information model [1].Under IC model, we formally define the Influence Maximization with Priority (IMP) problem.

Graph Notation and Independent Cascade Model
Let G = (V, E) be a directed graph representing a social network with a node set V and a directed edge set E, |V| = n and |E| = m.Let N in (v) and N out (v) be two sets of in-neighbors and out-neighbor of a node v, respectively.The notations of S and S * represent to a seed set that is a solution and an optimal solution of IMP, respectively.We also note OPT = σ(S * ) is the influence of an optimal solution.
In Independent Cascade (IC) model, each edge e = (u, v) ∈ E has an influence probability p(u, v) ∈ (0, 1) that represents the information transmission from u to v.Each node v ∈ V has two possible states, active and inactive.Given a seed set S ⊆ V, the diffusion process from S happens in discrete steps t = 0, 1, . .., as follow:

•
At step t = 0, all nodes in S is activated.

•
At step t ≥ 1, for an activated node u in previous steps, it has a single chance to activate each inactive neighbour v with the successful probability p(u, v).An activated node remains active till the end of the diffusion process.

•
The propagation process ends when no more node is activated.
Kempe et al. [1] show that IC model is equivalent to live-edge model and estimating the quantity of influence nodes can be done as follows.We first generate a sample graph g from original graph G by selecting each edge e = (u, v) ∈ E, independently, with probability p(u, v), and no select edge (u, v) with probability 1 − p(u, v).The probability that a realization g can be generated from G (denoted as In this equation, E(g) is the set edge of g.The number of sample graphs is 2 |E| .The influence spread of a seed set S in G is calculated as follows: where R(g, S) denotes the set of reachable nodes from S in g.For a set of priority nodes U, the influence spread of S to U is calculated as follows: where R g (S → P) denotes the set of nodes in U that can reach from S in g.Kempe et al. [1] also show that, σ(•) is a monotone and sub-modular function, i.e, for any A ⊂ V, and v / ∈ V \ B, we have: and for any A ⊆ B ⊂ V, and v / ∈ V \ B, we have: We also easy to see that σ U (•) is a monotone and sub-modular function.

Problem Definition
We investigate Influence Maximization with Priority (IMP) defined as follows: Definition 1 (IMP problem).Given a graph G = (V, E) under IC model, a positive integer k (budget), the priority set U ⊂ V, and the threshold T with T ≤ k, T ≤ |U|.IMP problem asks to find the seed set S ⊂ V, with |S| ≤ k and σ U (S) ≥ T so that influence spread, σ(S), is maximized, i.e, find S that is the solution to the following optimization problem: IMP becomes IM problem when U = ∅.Therefore, IM is a special case of IMP and IMP is also NP-hard.In addition, the calculation of the influence function from the seed set is proven to be #P-hard [12].Thus finding the solution to the problem within the time allowed is very challenging.

Integrated Greedy Algorithm
In this section, we first propose Integrated Greedy (IG) Algorithm which is well-known to resolve monotone and sub-modular problems that ensures an lower-bounded of optimization solution.The details of algorithm is described in Algorithm 1. Assume S 1 is the solution of the problem that finds the minimum seed nodes such that the influence on the priority set is greater than threshold T, and S 2 is a solution of IM problem.The main idea of this algorithm is to modify the native greedy algorithm [1] by combining two above solutions.

Algorithm 1: Integrated Greedy (IG) algorithm
The algorithm is divided into two main phases.In the first phase, it tries to find a solution S 1 by a greedy strategy (line 2-4).In each iterator, the algorithm chooses a node u with largest influence incremental to set U into S 1 (line 3-4) until the σ U (S 1 ) ≥ T. Since T < k, |S 1 | ≤ T < k.Denote t = k − T as the remaining budget (line 6).The algorithm next finds the candidate solution S 2 for IM with the remaining budget t by using a greedy method in the second phase (line 6-10).In each iterator i, it selects a node u with largest influence incremental (line 7).If u already belongs to S 1 , the algorithm increases t by 1 (line 8-9).This phase ends when the remaining budget t is exhausted (line 6).Finally, the algorithm returns the solution S which unites S 1 and S 2 .It is easy to confirm that |S| = k, and t > T − k ≥ 1 since k > T. Theorem 1 shows the approximation guarantee of IG algorithm.Theorem 1. IG algorithm returns (S, t), where S is a feasible solution and t ≥ 1, satisfies: The worst-case approximation ratio is obtained when t = 1 and it is equal to 1/k.
Proof.Denote S * IM = {s 1 , s 2 , . . ., s k } is an optimal solution of IM problem for input data of Algorithm 1 (the graph G and budget k).Obviously, we have σ(S * IM ) ≥ σ(S * ).After ending the second phase, assume that S 2 = {u 1 2 , u 2 2 , . . ., u t 2 }, S i 2 = {u 1 2 , u 2 2 , . . ., u i 2 }, and S 0 2 = ∅.In the second phase, the algorithm repeatedly selects a node u of which incremental influence gain is largest and due to the function σ(•) is monotone and sub-modular [1], so we have: Therefore, for any i = 0, . . ., t − 1, we have Minus two inequality terms to σ(S * IM ), we have: Rearrange the terms of the above inequality, we have Together with the fact that S 0 2 = ∅ and σ(∅) = 0, the above inequality implies Since σ U (S 1 ) ≥ T and S = S 1 ∪ S 2 , S is feasible solution of IMP, and which proves the theorem!Although Algorithm 1 can provide an approximation guarantee, but it cannot work with real-social networks because the calculation of the influence function σ(S) is #P-hard under IC model [12].To overcome this challenge, we propose a randomize algorithm with provable approximation guarantee based on combining IG with a sampling technique.

Sampling Algorithm with Provable Guarantees
In this section, we present an efficient algorithm for IMP problem called Integrated Greedy Sampling (IGS) algorithm that can provide an guarantee theoretical.In addition, we show that our algorithm can also be applied to large networks in experiments.

Estimator of Influence Functions
Firstly, we recap the concept of Reachable Reverse (RR) set [40] to estimate influence function σ(•).Base on that, we propose the concept of Targeted Reachable Reverse (TRR) set to estimate influence function σ U (S).Then we propose IGS algorithm and provide theoretical analysis based on statistical evidence.
Definition 2 (Reachable Reverse (RR) set [40]).Given a graph G = (V, E) under IC model.A random RR set R j is generated from G by: 1.
Picking a source node u with probability 1 n .

2.
Generating a sample graph g from G, and returning R j as nodes which can be reached from u in g.
For a random RR set R j , define a random variable X g (S) = min{1, |R g ∩ S|}.Borgs et al. [40] show that RR samples can be used to estimate the influence function by applying the following Lemma.
Lemma 1.For any set of nodes S ⊆ V, we have Given a set of RR set R, and a set node S, we can approximate the value of σ(S) by σ(S) defined as follow: Generating RR sets can be accomplished by using IM algorithms in [13][14][15]20,23].The common algorithm for generating RR set R j is described in Algorithm 2. This algorithm first selects a source node u with a probability 1  n to add into R j .The algorithm uses a queue Q to store the visited nodes.Initially, u is also added to Q.The algorithm next retrieves each node v in Q and picks an incoming node x with probability p(x, v) (line 6).If successful, it adds x in to Q and R j .This process takes place until the set Q is empty.

Algorithm 2: Generating RR sample under IC model
With probability p(x, v): Q.push(x) and R j ← R j ∪ {u} 1.
Picking a source node u ∈ U with probability 1 |U| .

2.
Generating a sample graph g from G, and returning R U g as nodes which can be reached from u in g.
We define a random variable Y g (A) = min{1, |R U g ∩ S|}.Similar to Lemma 1, Lemma 2 shows that we can use the value of Y g (S) to estimate function σ U (S).
Lemma 2. For any set of nodes S ⊆ V, we have σ is a TRR sample with a source node u for the sample graph g, we have: The transition from the second equality to the third equality comes from the definition of R U g (u) and from the third to the fourth then to the fifth is caused by the distribution of choosing a node u as a source node.
Given a set of TRR samples R and a set node S, we define and an approximation value of σ U (S) as follow: From Lemma 2, we can give a good approximation of σ U (•) when the number of TRR samples is large enough.We can re-use Algorithm 2 to generate a TRR set R U j by a modification.We replace line 1 in the algorithm by picking source node u ∈ U with probability 1  |U| and leave the rest as is.

Algorithm Description and Theoretical Analysis
Algorithm description.The algorithm is detailed in Algorithm 3. It generates the set of N U TRR sets R 1 , and set two candidate solutions S 1 , S 2 empty at first.Then the body of the algorithm divides into two phases.In phase 1, it finds a candidate solution S 1 with minimum-size so that σ(S) ≥ (1 + α)T by using a greedy strategy with potential function σ over R 1 .In each iterator, it selects a node u with maximal incremental value of the potential function (line 4) until σ(S) ≥ (1 + α)T.The candidate solution S 1 obtained by this phase satisfies the priority constraint, σ U (S 1 ) ≥ T with probability at least 1 − δ (Lemma 4).
The phase 2 selects a candidate solution S 2 with the remaining budget (t = k − |S 1 |) so that the influence spread σ(•) is maximized.In this phase, it first sets the parameters 1 , t max , N max and generates N 1 set of RR samples R 2 .The main of this phase operates in several iterators (line [12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27] until meeting the stopping condition (line 22).In each iterator, it finds a candidate solution S 2 by a greedy strategy.It picks a node u with maximal incremental of approximation influence σ(•) over R 2 (line 12) until t nodes are selected.Similar to IG algorithm, if u already belongs to S 1 , the algorithm increases t by 1.After that, the algorithm checks the quality of candidate solution S 2 (line 17).It calculates F l (S 2 , R 2 , δ)-a lower-bounded of σ(S 2 ), and F u (S 2 , R 2 , δ)-an upper-bounded of an optimal solution respect to IMP problem.These functions ensure the statistical criterion, which are claimed in the Lemmas 5 and 6.If solution S 2 meets the approximation condition (line 19), the algorithm returns S 2 .If not, it moves to the next iterator and stops when the number of TRR samples is at least N max (line 21).

Algorithm 3: Integrated Greedy -based Sampling (IGS) algorithm
, N max ← return S 2 Fortunately, the sequence of random variables X g (S) and Y g (S) constructed from the RR and TRR samples can be shown to form a martingale.For any random variable X g (S) ∈ [0, 1], let a random variable . .be a form of martingale [41].Similarly, Y g is also a form of martingale.Therefore, the following concentration inequality [41] applies: and where Var[•] denotes the variance of a random variable.Then, for any λ, we have: Apply this Lemma with Similarly, −M 1 , . . ., −M i , . . .also form a Martingale, so apply Lemma 3, we have: Let λ = µ|R| and put it in two above inequalities, we have: The following Lemma shows the lower-bound of the influence of candidate solution S 1 .
Lemma 6 (Upper-bound).For any δ ∈ (0, 1), in an iterator t of Algorithm 3, denote R t 2 is a set of RR samples with N t = |R t |, S t 2 is a candidate solution of phase 2, and Let λ = 2cµN t , apply inequality (25), we have: Therefore, the following event happens with the probability at least 1 − δ: Solve the above quadratic inequality for µ, we obtain upper-bound for µ is, Denote S 0 = arg max S,|S|≤k σ(S), where σ is calculated over R 2 t .Since the phase of Algorithm 3 selects a candidate solution S t 2 by a greedy strategy.Similar to Theorem 1, we have: n into (47) and combine it with (48), we have: which completes the proof.
Based on above theoretical analysis, the following Theorem Approximation guarantee of IGS algorithm.
Theorem 2. The Algorithm 3 provides a solution S and an integer t, satisfies: and Lemma 4, we have: We consider two following cases: Case 1: If the algorithm stops with the condition |R t 2 | ≥ N max , apply (26) with set S * and R 2 , we have: From ( 27), we have: Apply an union probability that the events (54) and (57) happen with the probability at most Assume that they do not happen, we have: Hence, in this case the algorithm satisfies approximation guarantee with probability at least 1 − δ 3 .

Experiments
In this section, we implement and compare our algorithm IGS to other influence maximization methods about the influence in general, the influence on priority nodes, running time and memory usage.The dataset includes several network databases with thousands or even millions nodes and edges (Table 1).37 K 184 K directed 5 netPHY [15] 37 K 181 K directed 13.4 DBLP [15] 655 K 2 M directed 6.1 TWITTER RETWEET [42] 1 M 2 M directed 4
Algorithm comparisons.Since IMP is an expansion of IM, we compare IGS algorithm with several state-of-the-art IM algorithms including: DSSA [15], BCT [2], OPIM-C [23].In addition, we use the basic algorithm, Max degree (Degree), which is the common baseline for information diffusion problems.In IMP, there are two factors that impact the solution in practice: the budget (k) of selecting seed node and the priority set of nodes (U).As a result, these two factors also affect the algorithms.From the above observation, we conduct experiments under two settings: varies k and fixed T; varies T and fixed k.
The dataset.For experimental purpose, we choose 5 types of databases from various resources: NetHept, NetPhy, DBLP are citation networks, Email-Enron is communication network [15] and Twitter Retweet is online social networks [42].The brief of these ones are described on Table 1.These databases are experimented because they are popular in information diffusion problems, especially used in the state-of-the-art algorithms what we are comparing.
Parameter Settings.Graphs are formatted as each edge e = (u, v) ∈ E has the weight w(u, v) where d in (v) is the in-degree of node v [14,15,20].
For the first case, k is assigned with 150, 160, 170, 180, 190 and 200, respectively, while T is fixed at 100.In addition, set U is generated with 200 nodes.With the second case, the value of k is fixed at 500.U set includes about 1000 nodes.We change the value of T increasing from 100 to 500.In all experiments, we keep = 0.1, δ = 1/n according setting for IM algorithms [14,15,20] and α = 0.01.

Experimental Results
We install IGS to compare with state-of-the-art algorithms such as BCT, DSSA, OPIM − C and Degree then calculate the spread of influence on all nodes and to U, the priority set, U ⊂ V. Results are shown in following tables and figures.
The Influence.The Figure 2 and the Table 2 indicate IGS outperforms the others when influencing to priority nodes by a given threshold T.
The above figure gives information about the influence values in case k changes from 150 to 200, U includes 200 nodes and the threshold T is 100.The terms "infU", "inf " mean the influences to set U (σ U (S) ) and to all nodes (σ(S)), respectively.These algorithms output differently on various databases.Looking at red bars, we can see IGS approximately affects the set U twice the value of the threshold T on most databases except Re-Tweet but still higher than T .Conversely, the influence on U of the remaining sharply fluctuate according to the databases.While DSSA and BCT influence on U over T with netHEPT and ENRON, they work quite low with the others.OPIM − C and Degree often affect U much lower than T. Besides, the σ(S) of BCT is highest on netHEPT whereas the one of IGS keeps at top in all other cases.In general, the values of σ(S) of DSSA, OPIM − C and Degree have similarities with each others.Besides, Table 2 describes the experiment while T comes from 100 to 500, k = 500 and enlarge U up to 1K nodes.This setting is to check the case when U is large and when the threshold T is incremental.Certainly, the condition that k ≥ T has to be maintained so we fixed k = 500.Looking at bold values, we can see although U and S both become large and T increments gradually, the influence on U of IGS is always significantly higher than T, even up to more than ten times.DSSA, BCT and OPIM − C also give the outputs over threshold T in many cases, they still have values lower than T = 500 on netPHY, DBLP and RETWEET however.The σ U (S) of Degree is lowest, especially, is only 22.77 on Re-Tweet.
From Figure 2 and Table 2, we can see σ U (S) of IGS is significantly higher than T and produces better results than the state-of-the-art algorithms.This is because IGS always prioritizes affecting U until over the threshold T then affects other nodes as well even with large values of k, U size and T. The other algorithms show that they are not always possible to influence U to exceed the desired threshold.On the whole, the state-of-the-art IM algorithms cannot influence the given priority set as well as IGS can.running time on the remaining of the dataset to find 150 and 160 seed sets but return to top 3 at the other values of budget k.IGS only takes about 0.1 s to find out the seed set in most cases except RETWEET.Besides, the figures also give information about the other algorithms.First, BCT runs significantly slow on netHEPT than the others.This method often stays at top 3 or top 4 on ENRON, DBLP and RETWEET.Second, running time of DSSA and IGS look similiar, while that of OPIM-C and Degree is usually higher than the above two algorithms.As the whole, IGS's running time gives the most stable results and usually runs around the 0.1-s mark.
The time of IGS is fast and stable because of parallel programming and this algorithm costs most of time to find out S1 while the loop to calculate S2 usually stops at 1-2 rounds.The TRR sampling technique also helps to quickly identify which seeds will affect to the priority U.
Memory Usage.The Table 3 illustrates the memory consumption of IGS and state-of-the-art methods including DSSA, BCT, OPIM − C and Degree.The smallest numbers are highlighted in bold while the largest ones are in red.The output shows that IGS outperforms the others, especially on small databases with tens of thousands of nodes and from tens to hundreds of thousands of edges such as netHEPT, ENRON, and netPHY.IGS also consumes sharply less memory than OPIM − C and Degree when testing with larger databases such as DBLP and RETWEET.When IGS spends only more than 130 MB and more than 200 MB, OPIM − C and Degree spend about four times higher with DBLP and RETWEET, respectively.Besides, DSSA also results less expensive memory usage in all cases.BCT is less stable than IGS and DSSA because it works as DSSA does on ENRON, netPHY, DBLPB and RETWEET but suddenly costs the most memory in NetHEPT.TRR sampling technique focuses on finding the seeds that influence the priority U first then Algorithm 3 explores another seeds to push on the seed set.Hence the algorithm 3 saves memory to run loop more than the others because of must not check whether a seed node influences to U set or not.Moreover, the condition of F l (S 2 ,R 2 ,δ) F u (S 2 ,R 2 ,δ) ≥ 1 − (1 − 1 k ) t − helps S 2 generated soon without waiting for the stop condition of the repeat.
Finally, our algorithm, IGS, was designed very well to get a balance between the target to influence on the given priority set and the influence that has to propagate to the largest number of nodes.Hence, running time, memory used and the influence of IGS give significantly high results and even more steadily rather than the others in general.

Conclusions
In this paper, we investigate the IMP problem, which is a variant of the IM problem with priority constraint that arises in a realistic scenario in which companies or organizations often prioritize influencing potential users during their viral marketing campaigns.The goal of the IMP problem is to select a seed set with k nodes can influence of a given priority set U greater than a threshold T which adjusts the influence of the seed set to the priority set.Although the objective function (influence spread function) is still a monotone and sub-modular function, but when considering the priority constraint the state-of-the-art IM algorithms cannot be applied.
To address this challenge, we propose two algorithms with provable theoretical guarantees, called IG and IGS.
We show that IG provides a 1 − (1 − 1 k ) t -approximation solution; IGS is an efficient randomized approximation algorithm based on sampling method that returns a 1 − (1 − 1 k ) t − -approximation solution with probability at least 1 − δ with > 0, δ ∈ (0, 1) as input parameters of the problem.Experiments on real world social networks show our algorithm outperforms state-of-the-art IM algorithms including DSSA [15], BCT [2] and OPIM [23] in terms of influences, running time, and memory used.
In the future, we are going to improve our algorithm to expand it with large networks to billions scale with acceptable time.In addition, the problem with multiple priority user sets and thresholds is going to be considered.

Figure 1 .
Figure 1.A toy example shows the difference between the influence maximization and our proposed problem.

9 .
return R jWe now introduce the definition of Targeted Reachable Reverse (TRR) Set on the basis of modifying RR concept.Definition 3 (Targeted Reachable Reverse (TRR) Set).Given a graph G = (V, E) under IC model.A random TRR set R U j is generated from G by:

Figure 3 .
Figure 3. Comparisons about Runtime (s) with k varies from 150 to 200 between IGS and the others.

Table 2 .
Comparisons about σ(S) and σ U (S) between IGS and the others with k = 500, U size = 1 K and T = 100 → 500.Figure3compares running time of these algorithms.They indicate time of IGS gives lowest values on netHEPT, ENRON and netPHY databases.Nevertheless, IGS stays at top 3 on DBLP while it costs highest

Table 3 .
Memory usage (MB) comparisons between IGS and the others