Abstract
The Influence Maximization () problem, which finds a set of k nodes (called seedset) in a social network to initiate the influence spread so that the number of influenced nodes after propagation process is maximized, is an important problem in information propagation and social network analysis. However, previous studies ignored the constraint of priority that led to inefficient seed collections. In some real situations, companies or organizations often prioritize influencing potential users during their influence diffusion campaigns. With a new approach to these existing works, we propose a new problem called Influence Maximization with Priority () which finds out a set seed of k nodes in a social network to be able to influence the largest number of nodes subject to the influence spread to a specific set of nodes U (called priority set) at least a given threshold T in this paper. We show that the problem is NP-hard under well-known model. To find the solution, we propose two efficient algorithms, called Integrated Greedy () and Integrated Greedy Sampling () with provable theoretical guarantees. provides a -approximation solution with t is an outcome of algorithm and . The worst-case approximation ratio is obtained when and it is equal to . In addition, is an efficient randomized approximation algorithm based on sampling method that provides a -approximation solution with probability at least with as input parameters of the problem. We conduct extensive experiments on various real networks to compare our algorithm to the state-of-the-art algorithms in problem. The results indicate that our algorithm provides better solutions interns of influence on the priority sets when approximately give twice to ten times higher than threshold T while running time, memory usage and the influence spread also give considerable results compared to the others.
1. Introduction
Presently, Online Social Networks (OSNs) have become an important platform in communication as well as e-commerce. Companies and businesses have leveraged a rapid spread of information thanks to the “word of mouth” effect among friends in social networks as a powerful tool for viral marketing. For instance, companies can provide some ones with free samples over an OSN so that much more people may know about their products and they have more chances to sell them. Influence Maximization () problem [1], a key problem in viral marketing, has been extensively studied for this decade due to its tremendous value in business, viral marketing and influence propagation. Basically, aims to find some nodes (called seedset) in a social network to inject opinion, innovation or influence that can effect the largest the number of nodes. Kempe et al. [1] first studied as an optimization problem combined with two well-known models, Independent Cascade () and Linear Threshold (). Since is NP-hard, they designed a native greedy algorithm that returned an -approximation solution. The research shows that is not only a potential commercial role in viral marketing [2,3] but also a foundation of various applications in many fields such as epidemics control in social network [4,5,6,7,8], social network monitoring [9,10], recommendation system [11], etc. Hence, has been extensively studied recently [2,4,12,13,14,15,16,17,18,19].
Although has a lot of great applications in viral marketing, previous studies ignored considering the impact on priority users who could play an important role for effectiveness of viral marketing campaigns. In fact, companies often prioritize specific potential customers, who are financially competent or suitable for their products. For examples, if a company produces baby diapers, they tend to introduce the product to married women aged 20 to 45. Supposing that they have some data about user accounts on a social network, hence they launch a promotion with suitable amount of gifts to married female users via this social network. If we only care about the number of influenced individuals, as in the case of , we will not evaluate the impact to the potential users and lead to wrong selection of a seed set. Figure 1 shows an example. This network contains 8 nodes and 9 edges, the priority set is and the weight of each edge (or influence probability) is assigned to 1. Considering the case when the budget (number of seed nodes), the optimal solution of is influences to 6 nodes including except b. Hence, cannot take effect to all priority nodes.The solution must be that has the total influence is only 5.
Figure 1.
A toy example shows the difference between the influence maximization and our proposed problem.
Motivated by such interesting scenarios, in this paper we investigate the Influence Maximization with Priority () problem, which takes into account the priority constraint for influence process. Given a social network , a priority set , a budget k and a priority threshold , the goal is to find the seed set S sized at k so that it influences to U at least T and the influence of the cascade is maximized. In fact, is more suitable than . Besides, it generalizes problem. Nevertheless this problem faces with complicated challenges caused by the constraint of priority. To address this problem, we propose two approximation algorithms, Integrated Greedy () and Integrated Greedy-based Sampling (), with provable theoretical guarantees. meets the theoretical guarantee based on a modification of the natural greedy algorithm while is an efficient randomized approximation algorithm based on sampling method [13,14,15,20]. This algorithm combines two novel techniques. Firstly, we propose Targeted Reverse Reachable (TRR) concept by modifying the Reverse Reachable Sampling (RR) technique [13,14,15,20] to estimate influence from a seed set to a given priority set. Secondly, we develop a new strategy to select a set of seeds in accordance with the priority constraint and set the number of samples to give a theoretical guarantees. Because is a separate case of , we have built extensive experiments on various real networks to compare our algorithm to the state-of-the-art algorithms for problem such as [15], [2], about the influence on a given priority set, running time and memory used while the influence spread approximations are ensures as in .
Our contributions are summarized as follows:
- We propose the Influence Maximization with Priority () problem that considers priority constraint in Influence Maximization () problem. It means we expand the by adding a constraint to influence on a given set of users. aims to find the seed set S with size k so that total influence of priority users is at least a given threshold and still maintain the influence of cascade maximized.
- We propose two approximation algorithms, and , for the problem. algorithm provides an approximation ratio of , where is an output of the algorithm. In addition, is a randomized approximation algorithm providing an approximation ratio of with probability at least , where are input parameters and t is an output of algorithm.
- We conduct extensive experiments on various real networks such as netHEPT, netPHY, Email-Enron, DBLP, and Twitter ReTweet. The results indicate that our algorithm, , often outperforms state-of-the-art algorithms in terms of influence, running time and memory used. In particular, provides the solution which ensures that the influence on the priority set is approximately from twice to 10 times greater than its threshold T while still maintains influence spread approximations as in algorithms. Further, we also demonstrate that is faster and uses lower memory than the others in a lot of cases. On the whole, although has to care about how influences to a target given users, still gives considerable fast runtime, low memory used and high maximized influence on all nodes such as state-of-the-art algorithms such as DSSA, BCT, OPIM-C. It proves that has been very well designed.
Related work. Kempe et al. [1] first studied the Influence Maximization () problem inspired by exploiting the influence among users in social networks for viral marketing [21]. They formulated as a discrete optimization problem under two classical information diffusion models, Independent Cascade () and Linear Threshold (). They proved that could be approximated within a ratio of for any and proposed a greedy algorithm that provided an approximation ratio of for . Later, Chen et al. [12,16] continued to study and proved that to calculate exactly the influence spread of a seeding set was #P-Hard. Hence although many heuristics algorithms have been proposed to solve this problem in large networks, they still have failed to retain the approximation ratio of and have provided a low quality solutions such as the cost-effective lazy-forward heuristic (CELF) proposed by Leskovec et al. [22] which is based on improving greedy algorithm to get 700 times faster than the greedy algorithm with Mote-Carlo simulation; a fast heuristics algorithm called PMIA proposed by Chen et al. [12] which constructs a directed acyclic graph to estimate the influence under model or the algorithm proposed by the authors in [16] which uses a local directed acyclic graphs (LDAG) to calculate the local influence of nodes under model. To keep the ratio, research on the approximation approach continues to be explored. Borgs et al. [13] first presented an -approximation algorithm with probability at least in time complexity by introducing Reverse Influence Sampling (RIS) model. This model has formed the foundation for further algorithm development. [14,15,20,23].
From then on, many works expanded in contexts of viral marketing. Nguyen et al. [24] investigated the Budged Influence Maximization (BIM) problem which considered the cost of selecting a node and proposed a approximation algorithm. The authors in [2] studied the a generalization of and BIM problems, called Cost-aware Targeted Viral Marketing (CTVM). In this work, each node u had an arbitrary cost and a benefit and the goal of CTVM was to select a seed set within a given budget so that the total benefit was maximized. We believe that this is the closest problem to our work. In CTVM problem, we can set parameters that maximize the influence on a given target set of users but cannot simultaneously maximize the influence of the others as in our problem. Later, several works improve the approximation as well as the scalability of CTVM algorithms [25,26].
Moreover, there are also many variants of problem that were studied. Some works studied the constraints of such as [17,18,27], in which edges were associated with a topic influence weight. These problems aimed to find a set of k users that maximized influenced users according to a topic query. However, the proposed algorithms did not provide any theoretical guarantee. Li et al. [28] proposed the Location-aware Influence Maximization (LIM) problem with the goal was to select the k-seed set so that the number of influenced nodes in the given query region was maximized. [29] investigated the Distance-aware Influence Maximization (DAIM) problem which considered the role of distance between users and the promoted location in seed selection. They extended a RIS process model and provided an unbiased estimator for the DAIM problem.
Besides, some works investigated the problem of Competitive Influence Maximization (CIM), which considered the context of under the competition of many rivals. Bharathi et al. [30] first formulated the CIM problem under a new competitive propagation model which was an extension of model. Chen et al. [12] investigated CIM under the combating with negative opinions based on an assumption that negative information was often more attractive than official information. Some authors considered the problem under many different cases in viral marketing, such as proposing a distance-aware problem [31], expanding the model to reflect competition [13,32,33,34], proposing a heuristic algorithm [35], etc.
Recently, some authors studied the selection of seed nodes in a social network to influence groups of users or communities instead of individuals [36,37,38,39]. They argue that in real-world scenarios, creating impact on groups is more beneficial than the individuals in a network. Tsang et al. [36] investigated the Fairness Group Maximization problem with two fairness criteria including maximin fairness and diversity. While the maximin fairness aimed to maximize the minimum influence nodes of any per their population, the criterion of diversity was an alternate fairness concept by extending the notion of individual rationality to group rationality. They proposed an approximation algorithm based on multi-submodular objective function processing techniques. More recent, the authors in [37] proposed exact algorithms for fairness group influence with multiple criteria based on mix integer linear programming formulation on a specific set of sample graphs under model. In [38], the authors characterize the intricate relationship between diversity and efficiency, which sometimes may be at odds but may also reinforce each other. Nguyen et al. [39] considered the Influence Maximization problem at the Community level problem, which found seed set of k nodes that influenced to largest number of communities. They showed that the objective function was neither sub-modular nor super-modular and proposed some approximation algorithms with provable guarantees. Different to our studied problem in this paper, these studies did not address the priority set in influence maximization context. Hence the proposed algorithms cannot be applied to the problem.
Organization. The rest of the paper is organized as follows: Section 2 presents information diffusion model and problem definitions. Section 3 and Section 4 present our proposed Integrated Greedy and Integrated Greedy-based Sampling algorithms for problem with the theoretical analysis. Experimental results are shown in Section 5. In Section 6 we discuss the future work and conclude this paper.
2. Model and Problem Definition
In this section, we introduce about network model and the well-known Independent Cascade () diffusion information model [1]. Under model, we formally define the Influence Maximization with Priority () problem.
2.1. Graph Notation and Independent Cascade Model
Let be a directed graph representing a social network with a node set V and a directed edge set E, and . Let and be two sets of in-neighbors and out-neighbor of a node v, respectively. The notations of S and represent to a seed set that is a solution and an optimal solution of , respectively. We also note is the influence of an optimal solution.
In Independent Cascade () model, each edge has an influence probability that represents the information transmission from u to v. Each node has two possible states, active and inactive. Given a seed set , the diffusion process from S happens in discrete steps , as follow:
- At step , all nodes in S is activated.
- At step , for an activated node u in previous steps, it has a single chance to activate each inactive neighbour v with the successful probability . An activated node remains till the end of the diffusion process.
- The propagation process ends when no more node is activated.
Kempe et al. [1] show that model is equivalent to live-edge model and estimating the quantity of influence nodes can be done as follows. We first generate a sample graph g from original graph G by selecting each edge , independently, with probability , and no select edge with probability . The probability that a realization g can be generated from G (denoted as ) is
In this equation, is the set edge of g. The number of sample graphs is . The influence spread of a seed set S in G is calculated as follows:
where denotes the set of reachable nodes from S in g. For a set of priority nodes U, the influence spread of S to U is calculated as follows:
where denotes the set of nodes in U that can reach from S in g. Kempe et al. [1] also show that, is a monotone and sub-modular function, i.e, for any , and , we have:
and for any , and , we have:
We also easy to see that is a monotone and sub-modular function.
2.2. Problem Definition
We investigate Influence Maximization with Priority () defined as follows:
Definition 1
( problem). Given a graph under model, a positive integer k (budget), the priority set , and the threshold T with . problem asks to find the seed set , with and so that influence spread, , is maximized, i.e, find S that is the solution to the following optimization problem:
becomes problem when . Therefore, is a special case of and is also NP-hard. In addition, the calculation of the influence function from the seed set is proven to be #P-hard [12]. Thus finding the solution to the problem within the time allowed is very challenging.
3. Integrated Greedy Algorithm
In this section, we first propose Integrated Greedy () Algorithm which is well-known to resolve monotone and sub-modular problems that ensures an lower-bounded of optimization solution. The details of algorithm is described in Algorithm 1.
| Algorithm 1: Integrated Greedy () algorithm |
![]() |
Assume is the solution of the problem that finds the minimum seed nodes such that the influence on the priority set is greater than threshold T, and is a solution of problem. The main idea of this algorithm is to modify the native greedy algorithm [1] by combining two above solutions.
The algorithm is divided into two main phases. In the first phase, it tries to find a solution by a greedy strategy (line 2–4). In each iterator, the algorithm chooses a node u with largest influence incremental to set U into (line 3–4) until the . Since , . Denote as the remaining budget (line 6). The algorithm next finds the candidate solution for with the remaining budget t by using a greedy method in the second phase (line 6–10). In each iterator i, it selects a node u with largest influence incremental (line 7). If u already belongs to , the algorithm increases t by 1 (line 8–9). This phase ends when the remaining budget t is exhausted (line 6). Finally, the algorithm returns the solution S which unites and . It is easy to confirm that , and since . Theorem 1 shows the approximation guarantee of algorithm.
Theorem 1.
algorithm returns , where S is a feasible solution and , satisfies:
The worst-case approximation ratio is obtained when t = 1 and it is equal to 1/k.
Proof.
Denote is an optimal solution of problem for input data of Algorithm 1 (the graph G and budget k). Obviously, we have . After ending the second phase, assume that , , and . In the second phase, the algorithm repeatedly selects a node u of which incremental influence gain is largest and due to the function is monotone and sub-modular [1], so we have:
Therefore, for any , we have
Minus two inequality terms to , we have:
Rearrange the terms of the above inequality, we have
Together with the fact that and , the above inequality implies
Since and , S is feasible solution of , and
which proves the theorem! ☐
Although Algorithm 1 can provide an approximation guarantee, but it cannot work with real-social networks because the calculation of the influence function is -hard under model [12]. To overcome this challenge, we propose a randomize algorithm with provable approximation guarantee based on combining with a sampling technique.
4. Sampling Algorithm with Provable Guarantees
In this section, we present an efficient algorithm for problem called Integrated Greedy Sampling () algorithm that can provide an guarantee theoretical. In addition, we show that our algorithm can also be applied to large networks in experiments.
4.1. Estimator of Influence Functions
Firstly, we recap the concept of Reachable Reverse (RR) set [40] to estimate influence function . Base on that, we propose the concept of Targeted Reachable Reverse (TRR) set to estimate influence function . Then we propose algorithm and provide theoretical analysis based on statistical evidence.
Definition 2
(Reachable Reverse (RR) set [40]). Given a graph under model. A random RR set is generated from G by:
- 1.
- Picking a source node u with probability .
- 2.
- Generating a sample graph g from G, and returning as nodes which can be reached from u in g.
For a random RR set , define a random variable . Borgs et al. [40] show that RR samples can be used to estimate the influence function by applying the following Lemma.
Lemma 1.
For any set of nodes , we have .
Given a set of RR set , and a set node S, we can approximate the value of by defined as follow:
Generating RR sets can be accomplished by using algorithms in [13,14,15,20,23]. The common algorithm for generating RR set is described in Algorithm 2. This algorithm first selects a source node u with a probability to add into . The algorithm uses a queue Q to store the visited nodes. Initially, u is also added to Q. The algorithm next retrieves each node v in Q and picks an incoming node x with probability (line 6). If successful, it adds x in to Q and . This process takes place until the set Q is empty.
| Algorithm 2: Generating RR sample under model |
![]() |
We now introduce the definition of Targeted Reachable Reverse (TRR) Set on the basis of modifying RR concept.
Definition 3
(Targeted Reachable Reverse (TRR) Set). Given a graph under model. A random TRR set is generated from G by:
- 1.
- Picking a source node with probability .
- 2.
- Generating a sample graph g from G, and returning as nodes which can be reached from u in g.
We define a random variable . Similar to Lemma 1, Lemma 2 shows that we can use the value of to estimate function .
Lemma 2.
For any set of nodes , we have
Proof.
Denote is a TRR sample with a source node u for the sample graph g, we have:
The transition from the second equality to the third equality comes from the definition of and from the third to the fourth then to the fifth is caused by the distribution of choosing a node u as a source node. ☐
Given a set of TRR samples and a set node S, we define and an approximation value of as follow:
From Lemma 2, we can give a good approximation of when the number of TRR samples is large enough. We can re-use Algorithm 2 to generate a TRR set by a modification. We replace line 1 in the algorithm by picking source node with probability and leave the rest as is.
4.2. Algorithm Description and Theoretical Analysis
Algorithm description. The algorithm is detailed in Algorithm 3. It generates the set of TRR sets , and set two candidate solutions , empty at first. Then the body of the algorithm divides into two phases. In phase 1, it finds a candidate solution with minimum-size so that by using a greedy strategy with potential function over . In each iterator, it selects a node u with maximal incremental value of the potential function (line 4) until . The candidate solution obtained by this phase satisfies the priority constraint, with probability at least (Lemma 4).
The phase 2 selects a candidate solution with the remaining budget () so that the influence spread is maximized. In this phase, it first sets the parameters , and generates set of RR samples . The main of this phase operates in several iterators (line 12–27) until meeting the stopping condition (line 22). In each iterator, it finds a candidate solution by a greedy strategy. It picks a node u with maximal incremental of approximation influence over (line 12) until t nodes are selected. Similar to algorithm, if u already belongs to , the algorithm increases t by 1. After that, the algorithm checks the quality of candidate solution (line 17). It calculates - a lower-bounded of , and -an upper-bounded of an optimal solution respect to problem. These functions ensure the statistical criterion, which are claimed in the Lemmas 5 and 6. If solution meets the approximation condition (line 19), the algorithm returns . If not, it moves to the next iterator and stops when the number of TRR samples is at least (line 21).
| Algorithm 3: Integrated Greedy -based Sampling () algorithm |
![]() |
Theoretical analysis. Fortunately, the sequence of random variables and constructed from the RR and TRR samples can be shown to form a martingale. For any random variable , let a random variable , where . For a sequence of random variables we have . Hence, be a form of martingale [41]. Similarly, is also a form of martingale. Therefore, the following concentration inequality [41] applies:
Lemma 3.
If be a form of martingale, , for , and
where denotes the variance of a random variable. Then, for any λ, we have:
Apply this Lemma with , , , , and , we have:
Similarly, also form a Martingale, so apply Lemma 3, we have:
Let and put it in two above inequalities, we have:
The following Lemma shows the lower-bound of the influence of candidate solution .
Lemma 4.
The candidate solution obtained by phase 1 of Algorithm 3 satisfies .
Proof.
Denote , and . Apply (27) for set , we have:
Assume that , there are at most possibilities for the candidate solutions . Therefore,
☐
Lemma 5
(Lower-bound). For any , a set of RR samples , let , and
We have .
Proof.
Denote and . Apply (24) with , we have:
Therefore, the following event happens with probability at least
We consider two following cases:
- Case 1:
- If , then .
- Case 2:
- If , (40) becomes:
Solve the above inequality for , we obtain:
Combine two above cases and replace , we obtain the proof. ☐
Lemma 6
(Upper-bound). For any , in an iterator t of Algorithm 3, denote is a set of RR samples with , is a candidate solution of phase 2, and
We have .
Proof.
Let , apply inequality (25), we have:
Therefore, the following event happens with the probability at least :
Solve the above quadratic inequality for , we obtain upper-bound for is,
Denote , where is calculated over . Since the phase of Algorithm 3 selects a candidate solution by a greedy strategy. Similar to Theorem 1, we have:
Based on above theoretical analysis, the following Theorem Approximation guarantee of algorithm.
Theorem 2.
The Algorithm 3 provides a solution S and an integer t, satisfies:
Proof.
Since and Lemma 4, we have:
We consider two following cases:
- Case 1:
- If the algorithm stops with the condition , apply (26) with set and , we have:From (27), we have:Apply an union probability that the events (54) and (57) happen with the probability at most . Assume that they do not happen, we have:Hence, in this case the algorithm satisfies approximation guarantee with probability at least .
- Case 2:
- If the algorithm stops at any iterator . At this iterator, the condition in line 19 is satisfied, apply Lemma 5 and Lemma 6, the following thing happens with the probability at least :
Combine two above cases, the algorithm meets the approximation ratio condition with the probability at least . ☐
5. Experiments
In this section, we implement and compare our algorithm to other influence maximization methods about the influence in general, the influence on priority nodes, running time and memory usage. The dataset includes several network databases with thousands or even millions nodes and edges (Table 1).
Table 1.
Dataset’s statistics.
5.1. Experimental Settings
All the implementations are on Linux machine with configurations are 2× Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz and 4 × 16 GB DIMM ECC DDR4 @ 2400MHz.
Algorithm comparisons. Since is an expansion of , we compare algorithm with several state-of-the-art algorithms including: [15], [2], -C [23]. In addition, we use the basic algorithm, Max degree (), which is the common baseline for information diffusion problems. In , there are two factors that impact the solution in practice: the budget (k) of selecting seed node and the priority set of nodes (U). As a result, these two factors also affect the algorithms. From the above observation, we conduct experiments under two settings: varies k and fixed T; varies T and fixed k.
The dataset. For experimental purpose, we choose 5 types of databases from various resources: NetHept, NetPhy, DBLP are citation networks, Email-Enron is communication network [15] and Twitter Retweet is online social networks [42]. The brief of these ones are described on Table 1. These databases are experimented because they are popular in information diffusion problems, especially used in the state-of-the-art algorithms what we are comparing.
Parameter Settings. Graphs are formatted as each edge has the weight formulated as where is the in-degree of node v [14,15,20].
For the first case, k is assigned with 150, 160, 170, 180, 190 and 200, respectively, while T is fixed at 100. In addition, set U is generated with 200 nodes. With the second case, the value of k is fixed at 500. U set includes about 1000 nodes. We change the value of T increasing from 100 to 500. In all experiments, we keep , according setting for algorithms [14,15,20] and .
5.2. Experimental Results
We install to compare with state-of-the-art algorithms such as , , and then calculate the spread of influence on all nodes and to U, the priority set, . Results are shown in following tables and figures.
The Influence. The Figure 2 and the Table 2 indicate outperforms the others when influencing to priority nodes by a given threshold T.
Figure 2.
Comparisons of Influence Spreading with , T = 100 and U size = 200
Table 2.
Comparisons about and between and the others with k = 500, U size = 1 K and .
The above figure gives information about the influence values in case k changes from 150 to 200, U includes 200 nodes and the threshold T is 100. The terms “infU”, “inf” mean the influences to set U ( ) and to all nodes (), respectively. These algorithms output differently on various databases. Looking at red bars, we can see approximately affects the set U twice the value of the threshold T on most databases except Re-Tweet but still higher than T. Conversely, the influence on U of the remaining sharply fluctuate according to the databases. While and influence on U over T with netHEPT and ENRON, they work quite low with the others. and often affect U much lower than T. Besides, the of is highest on netHEPT whereas the one of keeps at top in all other cases. In general, the values of of , and have similarities with each others.
Besides, Table 2 describes the experiment while T comes from 100 to 500, k = 500 and enlarge U up to 1K nodes. This setting is to check the case when U is large and when the threshold T is incremental. Certainly, the condition that has to be maintained so we fixed k = 500. Looking at bold values, we can see although U and S both become large and T increments gradually, the influence on U of is always significantly higher than T, even up to more than ten times. , and also give the outputs over threshold T in many cases, they still have values lower than T = 500 on netPHY, DBLP and RETWEET however. The of is lowest, especially, is only 22.77 on Re-Tweet.
From Figure 2 and Table 2, we can see of is significantly higher than T and produces better results than the state-of-the-art algorithms. This is because always prioritizes affecting U until over the threshold T then affects other nodes as well even with large values of k, U size and T. The other algorithms show that they are not always possible to influence U to exceed the desired threshold. On the whole, the state-of-the-art algorithms cannot influence the given priority set as well as can.
Running time.Figure 3 compares running time of these algorithms. They indicate time of gives lowest values on netHEPT, ENRON and netPHY databases. Nevertheless, stays at top 3 on DBLP while it costs highest running time on the remaining of the dataset to find 150 and 160 seed sets but return to top 3 at the other values of budget k. only takes about 0.1 s to find out the seed set in most cases except RETWEET. Besides, the figures also give information about the other algorithms. First, runs significantly slow on netHEPT than the others. This method often stays at top 3 or top 4 on ENRON, DBLP and RETWEET. Second, running time of and look similiar, while that of -C and is usually higher than the above two algorithms. As the whole, ’s running time gives the most stable results and usually runs around the 0.1-s mark.
Figure 3.
Comparisons about Runtime (s) with k varies from 150 to 200 between and the others.
The time of is fast and stable because of parallel programming and this algorithm costs most of time to find out while the loop to calculate usually stops at 1–2 rounds. The TRR sampling technique also helps to quickly identify which seeds will affect to the priority U.
Memory Usage. The Table 3 illustrates the memory consumption of and state-of-the-art methods including , , and . The smallest numbers are highlighted in bold while the largest ones are in red. The output shows that outperforms the others, especially on small databases with tens of thousands of nodes and from tens to hundreds of thousands of edges such as netHEPT, ENRON, and netPHY. also consumes sharply less memory than and when testing with larger databases such as DBLP and RETWEET. When spends only more than 130 MB and more than 200 MB, and spend about four times higher with DBLP and RETWEET, respectively. Besides, also results less expensive memory usage in all cases. is less stable than and because it works as does on ENRON, netPHY, DBLPB and RETWEET but suddenly costs the most memory in NetHEPT.
Table 3.
Memory usage (MB) comparisons between and the others.
TRR sampling technique focuses on finding the seeds that influence the priority U first then Algorithm 3 explores another seeds to push on the seed set. Hence the algorithm 3 saves memory to run loop more than the others because of must not check whether a seed node influences to U set or not. Moreover, the condition of helps generated soon without waiting for the stop condition of the repeat.
Finally, our algorithm, , was designed very well to get a balance between the target to influence on the given priority set and the influence that has to propagate to the largest number of nodes. Hence, running time, memory used and the influence of give significantly high results and even more steadily rather than the others in general.
6. Conclusions
In this paper, we investigate the problem, which is a variant of the problem with priority constraint that arises in a realistic scenario in which companies or organizations often prioritize influencing potential users during their viral marketing campaigns. The goal of the problem is to select a seed set with k nodes can influence of a given priority set U greater than a threshold T which adjusts the influence of the seed set to the priority set. Although the objective function (influence spread function) is still a monotone and sub-modular function, but when considering the priority constraint the state-of-the-art algorithms cannot be applied.
To address this challenge, we propose two algorithms with provable theoretical guarantees, called and . We show that provides a -approximation solution; is an efficient randomized approximation algorithm based on sampling method that returns a -approximation solution with probability at least with as input parameters of the problem. Experiments on real world social networks show our algorithm outperforms state-of-the-art algorithms including [15], [2] and [23] in terms of influences, running time, and memory used.
In the future, we are going to improve our algorithm to expand it with large networks to billions scale with acceptable time. In addition, the problem with multiple priority user sets and thresholds is going to be considered.
Author Contributions
Methodology and writing—original draft preparation, C.V.P. and D.K.T.H.; investigation Q.C.V., A.N.S.; Conceptualization, H.X.H.; Data curation, Q.C.V. and A.N.S.; Investigation, C.V.P. and D.K.T.H. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Conflicts of Interest
There is no conflict of interest.
References
- Kempe, D.; Kleinberg, J.M.; Tardos, É. Maximizing the spread of influence through a social network. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–27 August 2003; pp. 137–146. [Google Scholar] [CrossRef]
- Nguyen, H.T.; Thai, M.T.; Dinh, T.N. A Billion-Scale Approximation Algorithm for Maximizing Benefit in Viral Marketing. IEEE/ACM Trans. Netw. 2017, 25, 2419–2429. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, D.; Tan, K. Real-time Targeted Influence Maximization for Online Advertisements. PVLDB 2015, 8, 1070–1081. [Google Scholar] [CrossRef]
- Pham, C.V.; Thai, M.T.; Duong, H.V.; Bui, B.Q.; Hoang, H.X. Maximizing misinformation restriction within time and budget constraints. J. Comb. Optim. 2018, 35, 1202–1240. [Google Scholar] [CrossRef]
- Tong, G.A.; Wu, W.; Guo, L.; Li, D.; Liu, C.; Liu, B.; Du, D. An efficient randomized algorithm for rumor blocking in online social networks. In Proceedings of the 2017 IEEE Conference on Computer Communications, INFOCOM 2017, Atlanta, GA, USA, 1–4 May 2017; pp. 1–9. [Google Scholar] [CrossRef]
- Budak, C.; Agrawal, D.; El Abbadi, A. Limiting the spread of misinformation in social networks. In Proceedings of the 20th International Conference on World Wide Web, WWW 2011, Hyderabad, India, 28 March–1 April 2011; pp. 665–674. [Google Scholar] [CrossRef]
- Nguyen, H.T.; Cano, A.; Tam, V.; Dinh, T.N. Blocking Self-avoiding Walks Stops Cyber-epidemics: A Scalable GPU-based Approach. IEEE Trans. Knowl. Data Eng. 2020, 32, 1263–1275. [Google Scholar] [CrossRef]
- Nguyen, N.P.; Yan, G.; Thai, M.T. Analysis of misinformation containment in online social networks. Comput. Netw. 2013, 57, 2133–2146. [Google Scholar] [CrossRef]
- Zhang, H.; Alim, M.A.; Li, X.; Thai, M.T.; Nguyen, H.T. Misinformation in Online Social Networks: Detect Them All with a Limited Budget. ACM Trans. Inf. Syst. 2016, 34, 18:1–18:24. [Google Scholar] [CrossRef]
- Zhang, H.; Kuhnle, A.; Zhang, H.; Thai, M.T. Detecting misinformation in online social networks before it is too late. In Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016, San Francisco, CA, USA, 18–21 August 2016; pp. 541–548. [Google Scholar] [CrossRef]
- Ye, M.; Liu, X.; Lee, W. Exploring social influence for recommendation: A generative model approach. In Proceedings of the 35th International ACM SIGIR conference on research and development in Information Retrieval, SIGIR ’12, Portland, OR, USA, 12–16 August 2012; pp. 671–680. [Google Scholar] [CrossRef]
- Chen, W.; Collins, A.; Cummings, R.; Ke, T.; Liu, Z.; Rincón, D.; Sun, X.; Wang, Y.; Wei, W.; Yuan, Y. Influence Maximization in Social Networks When Negative Opinions May Emerge and Propagate. In Proceedings of the Eleventh SIAM International Conference on Data Mining, SDM 2011, Mesa, AZ, USA, 28–30 April 2011; pp. 379–390. [Google Scholar] [CrossRef]
- Borodin, A.; Filmus, Y.; Oren, J. Threshold Models for Competitive Influence in Social Networks. In Proceedings of the Internet and Network Economics—6th International Workshop, WINE 2010, Stanford, CA, USA, 13–17 December 2010; pp. 539–550. [Google Scholar] [CrossRef]
- Tang, Y.; Shi, Y.; Xiao, X. Influence Maximization in Near-Linear Time: A Martingale Approach. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, 31 May–4 June 2015; pp. 1539–1554. [Google Scholar] [CrossRef]
- Nguyen, H.T.; Thai, M.T.; Dinh, T.N. Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-scale Networks. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, 26 June–1 July 2016; pp. 695–710. [Google Scholar] [CrossRef]
- Chen, W.; Yuan, Y.; Zhang, L. Scalable Influence Maximization in Social Networks under the Linear Threshold Model. In Proceedings of the ICDM 2010, The 10th IEEE International Conference on Data Mining, Sydney, Australia, 14–17 December 2010; pp. 88–97. [Google Scholar] [CrossRef]
- Chen, S.; Fan, J.; Li, G.; Feng, J.; Tan, K.; Tang, J. Online Topic-Aware Influence Maximization. PVLDB 2015, 8, 666–677. [Google Scholar] [CrossRef]
- Aslay, Ç.; Barbieri, N.; Bonchi, F.; Baeza-Yates, R.A. Online Topic-aware Influence Maximization Queries. In Proceedings of the 17th International Conference on Extending Database Technology, EDBT 2014, Athens, Greece, 24–28 March 2014; pp. 295–306. [Google Scholar] [CrossRef]
- Pham, C.V.; Duong, H.V.; Hoang, H.X.; Thai, M.T. Competitive Influence Maximization within Time and Budget Constraints in Online Social Networks: An Algorithmic Approach. Appl. Sci. 2019, 9, 2274. [Google Scholar] [CrossRef]
- Tang, Y.; Xiao, X.; Shi, Y. Influence maximization: Near-optimal time complexity meets practical efficiency. In Proceedings of the International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, 22–27 June 2014; pp. 75–86. [Google Scholar] [CrossRef]
- Domingos, P.M.; Richardson, M. Mining the network value of customers. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, San Francisco, CA, USA, 26–29 August 2001; pp. 57–66. [Google Scholar]
- Leskovec, J.; Krause, A.; Guestrin, C.; Faloutsos, C.; VanBriesen, J.M.; Glance, N.S. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, 12–15 August 2007; pp. 420–429. [Google Scholar] [CrossRef]
- Tang, J.; Tang, X.; Xiao, X.; Yuan, J. Online Processing Algorithms for Influence Maximization. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD’ 18), Houston, TX, USA, 10–15 June 2018; Association for Computing Machinery: New York, NY, USA, 2018; pp. 991–1005. [Google Scholar] [CrossRef]
- Nguyen, H.; Zheng, R. On Budgeted Influence Maximization in Social Networks. IEEE J. Sel. Areas Commun. 2013, 31, 1084–1094. [Google Scholar] [CrossRef]
- Pham, C.V.; Duong, H.V.; Thai, M.T. Importance Sample-Based Approximation Algorithm for Cost-Aware Targeted Viral Marketing. In Proceedings of the Computational Data and Social Networks—8th International Conference, CSoNet 2019, Ho Chi Minh City, Vietnam, 18–20 November 2019; pp. 120–132. [Google Scholar] [CrossRef]
- Li, X.; Smith, J.D.; Dinh, T.N.; Thai, M.T. TipTop: (Almost) Exact Solutions for Influence Maximization in Billion-Scale Networks. IEEE/ACM Trans. Netw. 2019, 27, 649–661. [Google Scholar] [CrossRef]
- Barbieri, N.; Bonchi, F.; Manco, G. Topic-aware social influence propagation models. Knowl. Inf. Syst. 2013, 37, 555–584. [Google Scholar] [CrossRef]
- Li, G.; Chen, S.; Feng, J.; Tan, K.-L.; Li, W.-S. Efficient Location-Aware Influence Maximization. In Proceedings of the 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, 16–19 April 2018; pp. 1569–1572. [Google Scholar] [CrossRef]
- Wang, X.; Zhang, Y.; Zhang, W.; Lin, X. Efficient Distance-Aware Influence Maximization in Geo-Social Networks. IEEE Trans. Knowl. Data Eng. 2017, 29, 599–612. [Google Scholar] [CrossRef]
- Bharathi, S.; Kempe, D.; Salek, M. Competitive Influence Maximization in Social Networks. In Proceedings of the Internet and Network Economics, Third International Workshop, WINE 2007, San Diego, CA, USA, 12–14 December 2007; pp. 306–311. [Google Scholar] [CrossRef]
- Liu, W.; Yue, K.; Wu, H.; Li, J.; Liu, D.; Tang, D. Containment of competitive influence spread in social networks. Knowl.-Based Syst. 2016, 109, 266–275. [Google Scholar] [CrossRef]
- He, X.; Song, G.; Chen, W.; Jiang, Q. Influence Blocking Maximization in Social Networks under the Competitive Linear Threshold Model. In Proceedings of the Twelfth SIAM International Conference on Data Mining, Anaheim, CA, USA, 26–28 April 2012; pp. 463–474. [Google Scholar] [CrossRef]
- Lu, W.; Bonchi, F.; Goyal, A.; Lakshmanan, L.V.S. The bang for the buck: Fair competitive viral marketing from the host perspective. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, IL, USA, 11–14 August 2013; pp. 928–936. [Google Scholar] [CrossRef]
- Chen, W.; Lakshmanan, L.V.S.; Castillo, C. Information and Influence Propagation in Social Networks; Synthesis Lectures on Data Management; Morgan & Claypool Publishers: San Rafael, CA, USA, 2013. [Google Scholar] [CrossRef]
- Bozorgi, A.; Samet, S.; Kwisthout, J.; Wareham, T. Community-based influence maximization in social networks under a competitive linear threshold model. Knowl.-Based Syst. 2017, 134, 149–158. [Google Scholar] [CrossRef]
- Tsang, A.; Wilder, B.; Rice, E.; Tambe, M.; Zick, Y. Group-Fairness in Influence Maximization. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, 10–16 August 2019; pp. 5997–6005. [Google Scholar] [CrossRef]
- Farnadi, G.; Babaki, B.; Gendreau, M. A Unifying Framework for Fairness-Aware Influence Maximization. In Proceedings of the Companion of The 2020 Web Conference 2020, Taipei, Taiwan, 20–24 April 2020; pp. 714–722. [Google Scholar] [CrossRef]
- Stoica, A.; Han, J.X.; Chaintreau, A. Seeding Network Influence in Biased Networks and the Benefits of Diversity. In Proceedings of the WWW ’20: The Web Conference 2020, Taipei, Taiwan, 20–24 April 2020; pp. 2089–2098. [Google Scholar] [CrossRef]
- Nguyen, L.N.; Zhou, K.; Thai, M.T. Influence Maximization at Community Level: A New Challenge with Non-submodularity. In Proceedings of the 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019, Dallas, TX, USA, 7–10 July 2019; pp. 327–337. [Google Scholar] [CrossRef]
- Borgs, C.; Brautbar, M.; Chayes, J.T.; Lucier, B. Maximizing Social Influence in Nearly Optimal Time. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, OR, USA, 5–7 January 2014; pp. 946–957. [Google Scholar] [CrossRef]
- Chung, F.R.K.; Lu, L. Survey: Concentration Inequalities and Martingale Inequalities: A Survey. Internet Math. 2006, 3, 79–127. [Google Scholar] [CrossRef]
- Rossi, R.A.; Ahmed, N.K. The Network Data Repository with Interactive Graph Analytics and Visualization; AAAI: Palo Alto, CA, USA, 2015. [Google Scholar]
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).


