1. Introduction
Presently, Online Social Networks (OSNs) have become an important platform in communication as well as e-commerce. Companies and businesses have leveraged a rapid spread of information thanks to the “word of mouth” effect among friends in social networks as a powerful tool for viral marketing. For instance, companies can provide some ones with free samples over an OSN so that much more people may know about their products and they have more chances to sell them. Influence Maximization (
) problem [
1], a key problem in viral marketing, has been extensively studied for this decade due to its tremendous value in business, viral marketing and influence propagation. Basically,
aims to find some nodes (called
seedset) in a social network to inject opinion, innovation or influence that can effect the largest the number of nodes. Kempe et al. [
1] first studied
as an optimization problem combined with two well-known models, Independent Cascade (
) and Linear Threshold (
). Since
is NP-hard, they designed a native greedy algorithm that returned an
-approximation solution. The research shows that
is not only a potential commercial role in viral marketing [
2,
3] but also a foundation of various applications in many fields such as epidemics control in social network [
4,
5,
6,
7,
8], social network monitoring [
9,
10], recommendation system [
11], etc. Hence,
has been extensively studied recently [
2,
4,
12,
13,
14,
15,
16,
17,
18,
19].
Although
has a lot of great applications in viral marketing, previous studies ignored considering the impact on priority users who could play an important role for effectiveness of viral marketing campaigns. In fact, companies often prioritize specific potential customers, who are financially competent or suitable for their products. For examples, if a company produces baby diapers, they tend to introduce the product to married women aged 20 to 45. Supposing that they have some data about user accounts on a social network, hence they launch a promotion with suitable amount of gifts to married female users via this social network. If we only care about the number of influenced individuals, as in the case of
, we will not evaluate the impact to the potential users and lead to wrong selection of a seed set.
Figure 1 shows an example. This network contains 8 nodes and 9 edges, the priority set is
and the weight of each edge (or influence probability) is assigned to 1. Considering the case when the budget
(number of seed nodes), the optimal solution of
is
influences to 6 nodes including
except
b. Hence,
cannot take effect to all priority nodes.The solution must be
that has the total influence is only 5.
Motivated by such interesting scenarios, in this paper we investigate the
Influence Maximization with Priority (
) problem, which takes into account the priority constraint for influence process. Given a social network
, a
priority set , a budget
k and a
priority threshold , the goal is to find the seed set
S sized at
k so that it influences to
U at least
T and the influence of the cascade is maximized. In fact,
is more suitable than
. Besides, it generalizes
problem. Nevertheless this problem faces with complicated challenges caused by the constraint of priority. To address this problem, we propose two approximation algorithms,
Integrated Greedy (
) and
Integrated Greedy-based Sampling (
), with provable theoretical guarantees.
meets the theoretical guarantee based on a modification of the natural greedy algorithm while
is an efficient randomized approximation algorithm based on sampling method [
13,
14,
15,
20]. This algorithm combines two novel techniques. Firstly, we propose Targeted Reverse Reachable (TRR) concept by modifying the Reverse Reachable Sampling (RR) technique [
13,
14,
15,
20] to estimate influence from a seed set to a given priority set. Secondly, we develop a new strategy to select a set of seeds in accordance with the priority constraint and set the number of samples to give a theoretical guarantees. Because
is a separate case of
, we have built extensive experiments on various real networks to compare our
algorithm to the state-of-the-art algorithms for
problem such as
[
15],
[
2],
about the influence on a given priority set, running time and memory used while the influence spread approximations are ensures as in
.
Our contributions are summarized as follows:
We propose the Influence Maximization with Priority () problem that considers priority constraint in Influence Maximization () problem. It means we expand the by adding a constraint to influence on a given set of users. aims to find the seed set S with size k so that total influence of priority users is at least a given threshold and still maintain the influence of cascade maximized.
We propose two approximation algorithms, and , for the problem. algorithm provides an approximation ratio of , where is an output of the algorithm. In addition, is a randomized approximation algorithm providing an approximation ratio of with probability at least , where are input parameters and t is an output of algorithm.
We conduct extensive experiments on various real networks such as netHEPT, netPHY, Email-Enron, DBLP, and Twitter ReTweet. The results indicate that our algorithm, , often outperforms state-of-the-art algorithms in terms of influence, running time and memory used. In particular, provides the solution which ensures that the influence on the priority set is approximately from twice to 10 times greater than its threshold T while still maintains influence spread approximations as in algorithms. Further, we also demonstrate that is faster and uses lower memory than the others in a lot of cases. On the whole, although has to care about how influences to a target given users, still gives considerable fast runtime, low memory used and high maximized influence on all nodes such as state-of-the-art algorithms such as DSSA, BCT, OPIM-C. It proves that has been very well designed.
Related work. Kempe et al. [
1] first studied the Influence Maximization (
) problem inspired by exploiting the influence among users in social networks for viral marketing [
21]. They formulated
as a discrete optimization problem under two classical information diffusion models, Independent Cascade (
) and Linear Threshold (
). They proved that
could be approximated within a ratio of
for any
and proposed a greedy algorithm that provided an approximation ratio of
for
. Later, Chen et al. [
12,
16] continued to study
and proved that to calculate exactly the influence spread of a seeding set was #P-Hard. Hence although many heuristics algorithms have been proposed to solve this problem in large networks, they still have failed to retain the approximation ratio of
and have provided a low quality solutions such as the cost-effective lazy-forward heuristic (CELF) proposed by Leskovec et al. [
22] which is based on improving greedy algorithm to get 700 times faster than the greedy algorithm with Mote-Carlo simulation; a fast heuristics algorithm called PMIA proposed by Chen et al. [
12] which constructs a directed acyclic graph to estimate the influence under
model or the algorithm proposed by the authors in [
16] which uses a local directed acyclic graphs (LDAG) to calculate the local influence of nodes under
model. To keep the
ratio, research on the approximation approach continues to be explored. Borgs et al. [
13] first presented an
-approximation algorithm with probability at least
in
time complexity by introducing Reverse Influence Sampling (RIS) model. This model has formed the foundation for further algorithm development. [
14,
15,
20,
23].
From then on, many works expanded
in contexts of viral marketing. Nguyen et al. [
24] investigated the Budged Influence Maximization (BIM) problem which considered the cost of selecting a node and proposed a
approximation algorithm. The authors in [
2] studied the a generalization of
and BIM problems, called Cost-aware Targeted Viral Marketing (CTVM). In this work, each node
u had an arbitrary cost
and a benefit
and the goal of CTVM was to select a seed set within a given budget so that the total benefit was maximized. We believe that this is the closest problem to our work. In CTVM problem, we can set parameters that maximize the influence on a given target set of users but cannot simultaneously maximize the influence of the others as in our problem. Later, several works improve the approximation as well as the scalability of CTVM algorithms [
25,
26].
Moreover, there are also many variants of
problem that were studied. Some works studied the constraints of
such as [
17,
18,
27], in which edges were associated with a topic influence weight. These problems aimed to find a set of
k users that maximized influenced users according to a topic query. However, the proposed algorithms did not provide any theoretical guarantee. Li et al. [
28] proposed the Location-aware Influence Maximization (LIM) problem with the goal was to select the
k-seed set so that the number of influenced nodes in the given query region was maximized. [
29] investigated the Distance-aware Influence Maximization (DAIM) problem which considered the role of distance between users and the promoted location in seed selection. They extended a RIS process model and provided an unbiased estimator for the DAIM problem.
Besides, some works investigated the problem of Competitive Influence Maximization (CIM), which considered the context of
under the competition of many rivals. Bharathi et al. [
30] first formulated the CIM problem under a new competitive propagation model which was an extension of
model. Chen et al. [
12] investigated CIM under the combating with negative opinions based on an assumption that negative information was often more attractive than official information. Some authors considered the problem under many different cases in viral marketing, such as proposing a distance-aware problem [
31], expanding the
model to reflect competition [
13,
32,
33,
34], proposing a heuristic algorithm [
35], etc.
Recently, some authors studied the selection of seed nodes in a social network to influence groups of users or communities instead of individuals [
36,
37,
38,
39]. They argue that in real-world scenarios, creating impact on groups is more beneficial than the individuals in a network. Tsang et al. [
36] investigated the Fairness Group Maximization problem with two fairness criteria including maximin fairness and diversity. While the maximin fairness aimed to maximize the minimum influence nodes of any per their population, the criterion of diversity was an alternate fairness concept by extending the notion of individual rationality to group rationality. They proposed an approximation algorithm based on multi-submodular objective function processing techniques. More recent, the authors in [
37] proposed exact algorithms for fairness group influence with multiple criteria based on mix integer linear programming formulation on a specific set of sample graphs under
model. In [
38], the authors characterize the intricate relationship between diversity and efficiency, which sometimes may be at odds but may also reinforce each other. Nguyen et al. [
39] considered the Influence Maximization problem at the Community level problem, which found seed set of
k nodes that influenced to largest number of communities. They showed that the objective function was neither sub-modular nor super-modular and proposed some approximation algorithms with provable guarantees. Different to our studied problem in this paper, these studies did not address the priority set in influence maximization context. Hence the proposed algorithms cannot be applied to the
problem.
Organization. The rest of the paper is organized as follows:
Section 2 presents information diffusion model and problem definitions.
Section 3 and
Section 4 present our proposed Integrated Greedy and Integrated Greedy-based Sampling algorithms for
problem with the theoretical analysis. Experimental results are shown in
Section 5. In
Section 6 we discuss the future work and conclude this paper.