Next Article in Journal
A Precise Prediction of Tunnel Deformation Caused by Circular Foundation Pit Excavation
Previous Article in Journal
Analytical Electromechanical Modeling of Nanoscale Flexoelectric Energy Harvesting
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Competitive Influence Maximization within Time and Budget Constraints in Online Social Networks: An Algorithmic Approach

1
Vietnam National University, University of Engineering and Technology, Hanoi 100803, Vietnam
2
People’s Security Academy, Hanoi 100803, Vietnam
3
Department of Computer & Information Science & Engineering, University of Florida, Gainesville, FL 32611, USA
*
Author to whom correspondence should be addressed.
Appl. Sci. 2019, 9(11), 2274; https://doi.org/10.3390/app9112274
Submission received: 19 April 2019 / Revised: 26 May 2019 / Accepted: 28 May 2019 / Published: 1 June 2019
(This article belongs to the Section Computing and Artificial Intelligence)

Abstract

:
Competitive Influence Maximization ( CIM ) problem, which seeks a seed set nodes of a player or a company to propagate their product’s information while at the same time their competitors are conducting similar strategies, has been paid much attention recently due to its application in viral marketing. However, existing works neglect the fact that the limited budget and time constraints can play an important role in competitive influence strategy of each company. In addition, based on the the assumption that one of the competitors dominates in the competitive influence process, the majority of prior studies indicate that the competitive influence function (objective function) is monotone and submodular.This led to the fact that CIM can be approximated within a factor of 1 1 / e ϵ by a Greedy algorithm combined with Monte Carlo simulation method. Unfortunately, in a more realistic scenario where there is fair competition among competitors, the objective function is no longer submodular. In this paper, we study a general case of CIM problem, named Budgeted Competitive Influence Maximization ( BCIM ) problem, which considers CIM with budget and time constraints under condition of fair competition. We found that the objective function is neither submodular nor suppermodular. Therefore, it cannot admit Greedy algorithm with approximation ratio of 1 1 / e . We propose Sandwich Approximation based on Polling-Based Approximation ( SPBA ), an approximation algorithm based on Sandwich framework and polling-based method. Our experiments on real social network datasets showed the effectiveness and scalability of our algorithm that outperformed other state-of-the-art methods. Specifically, our algorithm is scalable with million-scale networks in only 1.5 min.

1. Introduction

Online social networks (OSNs) have recently been a very effective method for diffusing information, propagating opinions or ideas. Many companies have leveraged word-of-mouth effect in OSNs to promote their products. The key problem of viral marketing is Influence Maximization ( IM ), which aims to select a set of k users (called seed set) in a social network with maximum influence spread. Kempe et al. [1] first formulated IM problem in two diffusion models, named Linear Threshold (LT) and Independent Cascade (IC), which simulate the propagation of influence through social networks. IM has been widely studied due to its important role in viral marketing [2,3,4,5,6,7,8,9,10]. However, all of the above-mentioned studies only focus on studying influence propagation of single player or company in social networks. In the context of viral marketing, there are often many competitors simultaneously implementing the same strategy of marketing spread on OSNs. This phenomenon requires the task of maximizing a product’s influences under competitive circumstances, called Competitive Influence Maximization ( CIM ) problem.
Bharathi et al. [11] first proposed CIM problem, which seeks a seed set to maximize the propagation of their product’s information while their competitors employ the same strategy. Since then, related works have tried to investigate CIM in many different contexts. Some authors show that the objective function is monotone and submodular. A set function f over a ground U is set to be submodular if
f ( A { u } ) f ( A ) f ( B { u } ) f ( B )
for any A B U , u U \ B . Based on that, they applied the classic hill-climbing algorithm, which provides a napproximation ratio of ( 1 1 / e ) to solve CIM problem [12,13,14,15,16,17]. For example, Lu et al. [12] studied the problem in context of fair competitive influence from the host perspective. Chen et al. [13] proposed an independent cascade model with negative opinions (IC-N) model by extending IC model and showed a greedy algorithm with the approximation ratio of 1 1 / e . Recently, some works have addressed the problem in other directions, including proposing heuristic algorithm [15] and studying some variants of CIM [16,18,19].
Although previous works try to solve the CIM problem in many circumstances, the feasibility of the existing works is limited for following reasons. Firstly, they often assume that one of the competitors takes advantage of the competitive influence process. In this case, the objective function is monotone and submodular. As a result, the existing algorithms provide an approximation ratio of 1 1 / e based on using Greedy framework algorithm, which sequentially selects the node that has the largest marginal benefit [20]. However, users also have different views when they receive the same information. As a result, the influence function is no longer submodular and there is no approximation algorithm for this case. Secondly, the prior works do not take into account time constraint and cost to select a seed user for CIM . In a more realistic scenario, the effectiveness of competitive influence process depends very much on these two factors. Thirdly, although many CIM algorithms have been proposed, there are no scalable and efficient algorithms for CIM in large social networks (million-scale). For the problems related to information diffusion, the complexity of calculating the objective function is enormous due to the randomness of the probabilistic diffusion model [8,9]. To address these challenges, some works use Mote-Carlo method to estimate the objective function [1,13,15,17]. However, the method requires high complexity, and it takes several hours even on very small networks. To the best of our knowledge, there is no randomized algorithm for CIM that can meet approximation guarantee with low complexity.
In this paper, we study a general problem of CIM , Budgeted Competitive Influence Maximization ( BCIM ), which takes into account both arbitrary cost and time constraints for CIM problem. To model this problem, we first introduce Time constraint Competitive Linear Threshold ( TCLT ) to capture competitive influence progress within time constraint by extending Competitive Linear Threshold [21,22]. Under TCLT model, the main challenges of BCIM lie in following aspects. Firstly, BCIM problem is NP-hard and it is #P-Hard [23] to calculate the objective function. Moreover, we point out that the objective function is neither submodular nor supermodular. Thus, it makes BCIM difficult to be solved using greedy-based algorithms, as well as methods for influence maximization. To address the above challenges, in this article, we present SPBA , an efficient randomized algorithm based on polling method and Sandwich Approximation framework [16]. Our main contributions are summarized as follows:
  • We formulate Time constraint Competitive Linear Threshold ( TCLT ) model by extending Competitive Linear Threshold model in [21,22] to simulate competitive influence within time constraint τ . Given two competitors A and B who need to advertise their productions on OSNs, assume that we know nodes that are activated by B (B-seed set). Given the limited budget L, heterogeneous cost of each node to active by A (i.e., each node has a cost to add it into A-seed set), and the time constraint τ , we study BCIM problem, which aims to seek A-seed set nodes within limited budget L and time constraint τ to maximize nodes influenced by A under TCLT model. We then show that BCIM is NP-hard and the objective function is neither submodular nor supermodular.
  • We propose SPBA , an efficient randomized algorithm based on Sandwich approximation and polling method. We first design upper bound and lower bound submodular functions of the objective function and develop a polling-based approximation algorithm to find the solution of bound functions that guarantees approximation ratio of ( 1 1 / e ϵ ) with high probability. Based on that, the Sandwich framework approximation in [16] is applied to give a data-dependent approximation factor.
  • We conducted extensive experiments on various real social networks. The experiments suggest that SPBA provides significantly higher quality solutions than existing methods including baseline algorithms and influence maximization algorithms. Furthermore, we also demonstrate that our algorithm can scale to million-scale networks within about 1.5 min.
Organization. The rest of paper is organized as follows. The related work is presented in Section 2 and the preliminaries for Competitive Linear Threshold model and Competitive Influence Maximization are introduced in Section 3. We introduce our propagation model, problem definition and its properties in Section 4. Section 5 presents our proposed algorithms. The experiments are shown in Section 6. Finally, we give some tasks for future work and conclusion in Section 7.

2. Related Work

Since CIM is one of variants of IM , we review the literature related to this work from two areas, namely influence maximization and competitive influence maximization.

2.1. Influence Maximization

The IM problem is a crucial problem in information diffusion research due to its potential commercial value. Basically, IM focuses on finding a set of k seed users on a social network to maximize the number of influenced nodes. Kempe et al. [1] first proposed two information diffusion models, Linear Threshold (LT) and Independent Cascade (IC). On these models, they formulated IM problem as a combinatorial optimization problem and designed a natural greedy algorithm with approximation ratio is 1 1 / e . IM has been received much attention from the following aspects: proposing efficiency algorithms [2,3,4,5,6,7,8,9,24] and studying its variants [7,10,25,26,27,28].
Kempe et al. [1] first purposed Greedy algorithm based on Monte-Carlo simulation with ( 1 1 / e ϵ ) approximation guarantee. To improve the running time of Greedy algorithm, Leskovec et al. [3] proposed the cost-effective lazy forward (CELF) algorithm, which is up to 700 times faster than Greedy algorithm. This algorithm is improved in [29]. Several works propose heuristic algorithms to find solutions in large networks [8,9,30,31]. Although those heuristics are often faster in practice, they fail to retain the ( 1 1 / e ϵ ) approximation guarantee and often give lower results than greedy algorithm. Chen et al. [9] proposed a heuristic algorithm based on the maximum influence arborescence (MIA) structure. In the LT model, Chen et al. [8] proposed using local directed acyclic graphs (LDAG) to approximate the influence of nodes. Recently, Borgs et al. [2] made a theoretical breakthrough for finding solution to IM by proposing Reverse Influence Sampling (RIS) algorithm. RIS algorithm returns a ( 1 1 / e ϵ ) approximate solution with probability at least 1 n l . The main idea of the RIS algorithm is to generate Reachable Reverse (RR) sets to estimate the objective function and use greedy algorithm for a collection that includes a large enough RR set to the find solution. This motivates many state-of-the-art methods for IM including TIM/TIM++ [5], IMM [6], and SSA/D-SSA [4].
Recently, IM ’s variants have also received much attention due to their potential commercial value [25,27,28]. Lin et al. [28] studied k-Boosting problem, which aims at finding the set of k users to boost so that the “boosted” influence spread is maximized. The authors of [25] investigated distance-aware influence maximization, which takes into account the effect of distance on influence process for IM problem. They showed that the objective function is monotone and submodular and proposed RIS-based and MIA-based algorithms. Influence Maximization with awareness of the topic are also studied in [27]. In this work, each user is associated with a profile that consists of the users preferences on different topics in their model and the problem asks to select seed set with budget and topic queries so that the competitive influence function is maximized.

2.2. Competitive Influence Maximization

In the context of viral marketing, it is often the case that many companies propagate their product’s information simultaneously, leading to the fact that the Competitive Influence Maximization ( CIM ) problem has been studied in recent years. Bharathi et al. [11] first proposed CIM problem by proposing a new propagation model, which is an extension of IC model. Later, some authors proposed variations of the IC model for CIM problem. Chen et al. [13] investigated CIM under the context of combating the dissemination of negative opinions under IC-N model. This model is based on the observation that rumors and misinformation are often more attractive than official information. Lui et al. [14] considered CIM problem under a new diffusion–containment model and presented a ( 1 1 / e ) -approximation algorithm. Carnes et al. [17] proposed distance based and wave propagation models in competitive influence process social networks. They showed that the objective function is submodular and devised a greedy algorithm with approximation ratio of ( 1 1 / e ) . Some other works propose an expanding LT model approach for CIM problem [12,21,22,24]. For instance, Borodin et al. [24] proposed Competitive Linear Threshold models and provided some property results of these models. Lu et al. [12] considered the problem of fair competitive viral marketing from the host’s perspective. They proposed K-LT model and showed that the influence function is monotone and submodular. Generally, Chen et al. [21] summarized two competitive influence models, which are extended from the IC model and the LT model, named Competitive Independent Cascade (CIC) and Competitive Linear Threshold (CLT) models. They also provided the properties of such models and categorized them by tie-breaking rules including fixed probability tie-breaking (TB-FP) rule and proportional probability tie-breaking (TB-PP) rule. TB-FP means that, when a node v is influenced by both competitors, it will be influenced by one of them with a fixed probability. TB-FP means that v becomes influenced by one competitor with a proportional probability. TB-FP reflects the dominance of a competitor and most purposed algorithms are based on this feature to give an approximation of 1 1 / e . However, there is no approximation algorithm for TP-FP case.
In other directions, Bozrgi et al. [15] proposed a community-based algorithm for CIM under DC model. Some variants of CIM have been studied. Yan et al. [19] found the seed set with minimum cost set for threshold competitive influence problem. Lu et al. [16], Yan et al. [19] proposed competition and complementary approaches for CIM problem by extending IC model. The authors of [18] formulated Dominated Competitive Influence Maximization (DCIM) problem, which aims to maximize the difference in value between the influence of desired information and its competitors under a new competitive independent cascade model with meeting events.
Different from most of the existing works, in this paper, we study a general problem of CIM , namely BCIM , which considers the CIM within budget and time constraints under condition of fair competition. We show that the objective function is not submodular and calculate that the objective function is #P-Hard. To overcome this challenge, we propose a randomized algorithm based on Sandwich approximation and polling-based method.

3. Preliminaries

To clearly introduce the problem of BCIM , we first introduce some preliminaries. Table 1 summarizes the frequently used notations.

3.1. Competitive Linear Threshold ( CLT ) Model

In this model, a social network is abstracted by a directed graph G = ( V , E ) , where V is the set of nodes (or vertices) representing users and E is the set of edges representing links among users. There are two competitors A and B who want to promote their products in a social network G. Each edge ( u , v ) E has two weights w A ( u , v ) and w B ( u , v ) representing the influence of A and B on edge ( u , v ) , respectively. The weights satisfy conditions
u N ( v ) w A ( u , v ) 1 , u N ( v ) w B ( u , v ) 1 , v V
Each node can choose one of three status: A-active, B-active, and inactive, which represent the nodes that have been successfully activated by A, activated by B, and have not been activated by either A or B. Each node v picks two independent thresholds θ A ( v ) , θ B ( v ) uniformly from [ 0 , 1 ] , called A-threshold and B-threshold. The propagation process happens in discrete steps t = 0 , 1 , . S A and S B are the seed sets of competitors A and B ( S A S B = ). A t and B t are the set of A-active and B-active nodes at step t, respectively. The process of propagation operates as follows:
  • At step t = 0 , A 0 = S A , B 0 = S B .
  • At step t 1 , it first sets A t = A t 1 and B t = B t 1 . Each node v A t 1 B t 1 becomes A-active if
    u N ( v ) A t 1 w A ( u , v ) θ A ( v ) u N ( v ) B t 1 w B ( u , v ) < θ B ( v )
    Node v becomes B-active if
    u N ( v ) B t 1 w B ( u , v ) θ B ( v ) u N ( v ) A t 1 w A ( u , v ) < θ A ( v )
    in the case when node u that has the total influence weight of two competitors are greater than corresponding thresholds. Chen et al. [21] summarized tie-breaking rules can be used to determine whether v is A-active or B-active.
    Fixed probability tie-breaking rule (TB-FP): TB-FP means that with a fixed probability p, u becomes A-active with probability p and becomes B-active with probability 1 p . The special cases of this rule include TB-FP(A)-competitor A’s dominance, TB-FP(B)-competitor B’s dominance.
    Proportional Probability tie-breaking rule (TB-PP): A t ( v ) = N ( v ) A t 1 \ A t 2 is A-active successful attempt set of u and B t ( v ) = N ( u ) B t 1 \ B t 2 is B-active successful attempt set of u. Node v becomes A-active with probability | A t ( u ) | | A t ( u ) | + | B t ( u ) | , and u is B-activated with probability | B t ( u ) | | A t ( u ) | + | B t ( u ) | .
  • Once a node becomes activated (A-active or B-active), its status remains in next steps. The propagation process ends when no more nodes can be activated.
TB-FP is used in [11,13,22,32,33] to reflect the dominance of one competitor. This is motivated by the phenomenon of negativity bias, which is well studied in social psychology, and matches the common sense that rumors or misinformation are usually hard to fight with in social networks. In contrast, TB-PP reflects fair competition among competitors. This rule is used for IC-N model (a variant of IC model) [13], while no study uses this rule for a variant of LT model.

3.2. Competitive Influence Maximization

Definition 1.
Given a directed graph G = ( V , E ) representing a social network under an information diffusion model M , there a two competitors A and B. Given B-seed set S B V and a positive number k, find A-seed set S A V \ S B with | A | k so that the number of A-active nodes is maximized

4. Models and Problem Definition

4.1. Time Constraint Competitive Linear Threshold ( TCLT ) Model

In this section, we introduce our model incorporating CLT model with limited spread step τ , namely Time Constraint Competitive Linear Threshold Model ( TCLT ). In addition, we propose a new tie-breaking rule in our model that can truly reflect the competitive context in viral marketing by our explanation.
In this model, we reuse all notations and symbols in CLT model. Given a constraint of propagation hop τ 1 , the propagation process happens in discrete steps t = 0 , 1 , , τ as follows:
  • At step t = 0 , A 0 = S A , B 0 = S B .
  • At step t 1 , first set A t = A t 1 and B t = B t 1 . Each node v A t 1 B t 1 becomes A-active if
    u N ( v ) A t 1 w A ( u , v ) θ A ( v ) u N ( v ) B t 1 w B ( u , v ) < θ B ( v )
    Node v becomes B-active if
    u N ( v ) B t 1 w B ( u , v ) θ B ( v ) u N ( v ) A t 1 w A ( u , v ) < θ A ( v )
  • If in step t, a node v has
    u N ( v ) A t 1 w A ( u , v ) θ A ( v ) u N ( v ) B t 1 w B ( u , v ) θ B ( v )
    We propose weight proportional probability tie-breaking rule (TB-WPP) to determine its state. Accordingly, v is A-activated with probability.
    p A ( v | A t 1 , B t 1 ) = u N ( v ) A t 1 w A ( u , v ) u N ( v ) A t 1 w A ( u , v ) + u N ( v ) B t 1 w B ( u , v )
    and v is B-activated with probability
    p B ( v | A t 1 , B t 1 ) = u N ( v ) B t 1 w B ( u , v ) u N ( v ) A t 1 w A ( u , v ) + u N ( v ) B t 1 w B ( u , v )
  • Once a node becomes activated (A-active or B-active), it keeps this status in the next steps. The propagation process ends after τ hops of propagation or no more nodes can be activated.
Different from TB-PP, in TB-WPP rule, we consider the total influence weight of the in-neighbors to decide state of node v. Our TB-WPP rule reflects more closely the competition process. Consider the example in Figure 1 to clarify this observation. Graph G contains four nodes { a , b , c , u } and three edges { ( a , u ) , ( b , u ) , ( c , u ) } . There is a pair ( w A , w B ) on each edge, S A = { a , b } , S B = { c } . At step t = 1 , if we use TB-PP, node v will change its state from inactive to A-active or B-active with probabilities 2 3 and 1 3 , respectively. In other words, the probability v becomes A-active is higher. If we use TB-WPP, node v would change its state to inactive to A-active or B-active with probabilities 3 11 and 8 11 , i.e. probability v becomes B-active is higher. In this case, the influence weight of c (0.8) for u is greater than that of two nodes a , b (total influence weight is 0.3). Considering the total weigh in TB-WPP rule is more suitable for the fact that users create different influences on each other depending on the relationship between them. Therefore, it is reasonable to consider TB-WPP about the competitive influence spread.

4.2. Budgeted Competitive Influence Maximization Problem

In this paper, we assume that we have known knowledge that seed set of competitor B is S B V and each node u is associated with an arbitrary cost c ( u ) 0 to add in S A . We define Budgeted Competitive Influence Maximization ( BCIM ) as follows:
Definition 2.
BCIM problem. Given a directed graph G = ( V , E ) representing a social network under TCLT model, B-seed set S B V , a budget L > 0 , and time constraint τ, find A-seed set S A V \ S B with total cost u A c ( u ) L to maximize I ( S A ) .
Theorem 1.
BCIM problem is NP-hard and calculating the objective function I ( · ) is #P-Hard.
Proof. 
We see that, when S B = and τ = n , the TCLT model becomes well-known LT model [1] and BCIM becomes IM problem [1]. In other words, IM is a special case of BCIM so BCIM is NP-hard problem and calculating the influence I ( S A ) is #P-hard. ☐
Although the objective function in IM problem is monotone and submodular function, unfortunately, the objective function in BCIM is neither submodular nor supermodular. Therefore, we cannot use the nature greedy for optimizing submodular and supermodular function to get an approximation guarantee.
Theorem 2.
The function I ( · ) is neither submodular nor supermodular under TCLT model
Proof. 
We prove that by counter example (see Figure 2). Consider an instance of BCIM problem G = ( V , E ) with V = { a , b , c , d , e , f } , E = { ( a , b ) , ( b , c ) , ( c , e ) , ( c , d ) , ( c , f ) } , and τ = 2 . The A-weight and B-weight on each edge is equal to 1, and we set S B = { f } . In this example, we have I ( ) = 0 , I ( { a } ) = 2 , I ( { a , c } ) = 5 , and I ( { a , b , c } ) = 5 . Therefore, I ( { a , c } ) I ( { a } ) = 3 > I ( { a } ) I ( ) = 2 . That is, I ( · ) is not submodular. On other hand, we have I ( { a , c } ) I ( { a } ) = 3 > I ( { a , b , c } ) I ( { a , c } ) = 0 . Therefore, I ( · ) is also not supermodular. ☐

4.3. Competitive Live-Edge ( CLE ) Model

We follow the method in [22] to construct a live-edge model and prove this model is equivalent to TCLT model. This property is used for estimating the objective function as well as the designing of our algorithm in the next sections.
From original graph G = ( V , E ) under TCLT model, we construct a sample graph (or realization) g from G as follows. For each v V , we randomly select one in-edge ( u , v ) with probability w A ( u , v ) , and do not select any in-edge with probability 1 u V w A ( u , v ) . The selected edge is called A-live edge. On the other hand, we also randomly select one in-edge ( u , v ) (called B-live edge) with probability w B ( u , v ) , and do not select any in-edge with probability 1 u V w B ( u , v ) . Let g A and g B be the sub-graph including only A-live edges and B-live edges, respectively. Finally, we return g as union of g A and g B .
In graph g, we denote A t and B t as sets of A-active nodes and B-active nodes on g at step t, respectively. we denote d A ( A t , u ) ( d B ( B t , u ) ) was the minimum distance from A t ( B t ) on g A ( g B ) to node u. The distribution of A-active and B-active nodes in g happens in discrete steps t as follows:
  • At step t = 0 , A t = S A and B t = S B .
  • At step t 1 , first set A t = A t 1 and B t = B t 1 . A node v A t 1 B t 1 becomes A-active if v is reachable from A t 1 in one step in g A (i.e., d A ( A t 1 , v ) = 1 ) but not reachable from B t 1 in one step in g B (i.e., d B ( B t 1 , v ) > 1 ), then v is in A t . Symmetrically, if v is reachable from B t 1 in one step in g B but not reachable from A t 1 in one step in g A , then v is in B t .
  • If at step t 1 , v is reachable from A t 1 in one step in g A and reachable from B t 1 in one step in g B , v is A-activated with probability
    p A ( v | A t 1 , B t 1 ) = u N ( v ) A t 1 w A ( u , v ) u N ( v ) A t 1 w A ( u , v ) + u N ( v ) B t 1 w B ( u , v )
    and v is B-activated with probability
    p B ( v | A t 1 , B t 1 ) = u N ( v ) B t 1 w B ( u , v ) u N ( v ) A t 1 w A ( u , v ) + u N ( v ) B t 1 w B ( u , v )
  • The process of propagation ends after hop t = τ or no more nodes can be activated.
We demonstrate the equivalence of two models through the following theorem.
Theorem 3.
For a given A-seed set S A and B-seed set S B , the distribution over A-active sets and B-active node sets at hop t for any t = 1 , 2 . . , τ on TCLT model and CLE are equivalent.
The proof of Theorem 3 is presented in Appendix A. We denote X G as the set of sample graphs generated from G and Pr [ g | G ] as the probability of generating sample graph g in G. We have:
Pr [ g | G ] = Pr [ g A | G ] · Pr [ g B | G ] = v V p A ( v , G , g ) · v V p B ( v , G , g )
where
p A ( v , G , g ) = w A ( u , v ) , If   u : ( u , v ) E ( g A ) 1 u : ( u , v ) E w A ( u , v ) , otherwise p B ( v , G , g ) = w B ( u , v ) , If   u : ( u , v ) E ( g B ) 1 u : ( u , v ) E w B ( u , v ) , otherwise
E ( g A ) and E ( g B ) are the set edges of g A and g B , respectively. We denote I B τ ( S A ) as the expected number of A-active nodes after τ hops with given B-seed set S B . For convenience, we simplify I B τ ( S A ) as I ( S A ) . Based on the result of Theorem 3, we have:
I ( S A ) = v V \ S B g X G Pr [ g | G ] γ g v ( S )
where γ g v ( S A ) is a random variable under sample graph g, defined as follows:
γ g v ( S A ) = 1 , If v is A - active when run CLE model in g 0 , Otherwise
Node v is called source node. Let v be randomly selected in V \ S B and g be a random graph generated from G. Lemma 1 shows that we can use γ g v ( · ) to estimate objective function.
Lemma 1.
For any S A V \ S B , we have I ( S A ) = n 0 · E [ γ ( S A ) ] , where γ ( S A ) is the expectation of γ g v ( A ) over all random sources and sample graphs.
Proof. 
Since the source node is randomly selected, the probability that v is selected is equal to 1 n 0 for n 0 = | V \ S B | . We have:
I ( S A ) = g X G Pr [ g | G ] v V \ S B γ g v ( S A ) = n 0 g X G Pr [ g ] v V \ S B γ g v ( S A ) 1 n 0 = n 0 · g X G Pr [ g | G ] v V \ S B γ g v ( S ) Pr [ v   is   source node ] = n 0 · E [ γ ( S A ) ]
The transition from the second to third equality follows from the definition of γ ( S A ) . This complete the proof. ☐
Based on Lemma 1, we design upper and lower submodular functions, which are cores of our proposed algorithms in next sections.

5. Our Proposed Algorithm for BCIM Problem

In this section, we present SPBA , our approximation algorithm for BCIM problem. Since the I ( · ) is not submodular, we use the Sandwich Approximation (SA) method [16] to design approximation for the problem.
Outline. Our algorithm contains two key components: (1) We devise the lower bound and upper bound submodular function of I , namely L ( · ) and U ( · ) , respectively. We then design Polling-Based Algorithm ( PBA ) a ( 1 1 / e ϵ ) approximation algorithm to find solution to maximize L and U based on polling-based method [4,5,6,7]. (2) We apply SA with upper and lower bound function to provide a solution with approximation guarantee. It first finds a solution to the BCIM problem with any strategy. It then finds an approximate solution to the lower bound and the upper bound by PBA algorithm. Finally, it returns the solution that has the best result for the BCIM problem. The framework of SPBA is presented in Figure 3.

5.1. Lower and Upper Bound  Functions

We leverage the equivalence between the TCLT and CLE model and result of Lemma 1 to design lower and upper bound of objective functions.

5.1.1. Upper Bound Function

For a random source node v and a sample graph g with given B-seed set S B , the idea of this method is that we only choose set of nodes satisfying: (1) the distance of influence path from them to v is smaller than τ ; and (2) the influence path from them to v is not blocked by S B . We consider set C U ( g , v ) defined as follows:
C U ( g , v ) = { u | d A ( u , v ) τ , d A ( u , v ) < d B ( u , S B ) }
Figure 4 shows an example of C ( g , v ) . In this example, the sample graph g contains nine nodes and eight edges, the source nodes is v and S B = { c , h } and τ = 4 . g A contains edges: ( a , v ) , ( b , a ) , ( c , b ) , ( d , h ) . g B contains edges: ( f , v ) , ( e , f ) , ( f , e ) , ( h , f ) . We have C U ( g , v ) = { b , a , v } . Node d lies on the simple path that ends at v, but it cannot influence v since its influence is blocked by c. We define Upper bound Reachable Reversal ( URR ) set as follows:
Definition 3
( URR  set). Given graph G = ( V , E , w A , w B ) , a random URR set R j is generated from G by: (1) picking a random source node v V ; and (2) generating a sample graph g from G by running CLE model, and returning R j C U ( g , v ) .
For any S A V \ S B , denote a random variable:
X j = 1 , If   R j S A 0 , Otherwise
We denote R j ( g , v ) is a URR set with source node v and sample graph g, and X j ( g , v ) is value of X j corresponding to R j ( g , u ) . The following lemma shows the upper bound characters of X j .
Lemma 2.
For any set S A V \ S B , a random source node v and random sample graph g, we have X j ( g , v ) γ g v ( S A ) .
Proof. 
We consider two following cases
Case 1: S A R j ( g , v ) = , each node u S A cannot reach v in g A after τ steps. By running CLE model, S A cannot activate v. Hence, X j ( g , v ) = γ g v ( S A ) = 0 .
Case 2: S A R j ( g , v ) . Assume that u S R j ( g , v ) , u can reach v in g A after τ steps, thus E [ γ g v ( S A ) ] 1 = X j ( g , v ) . Therefore, γ g v ( S A ) X j ( g , v ) . ☐
Define U ( S A ) = n 0 · E [ X j ] and R is a set of URR . We estimate U ( S A ) as
U ^ ( S A ) = n 0 | R | R j R X j
Lemma 3 shows the properties of U function.
Lemma 3.
Given seed set S B V , for any set of nodes S A V \ S B , we have: U ( S A ) I ( S ) and U ( · ) is a monotone and submodular function.
Proof. 
Using Lemma 2, we have
I ( A ) = n 0 · g X G Pr [ g | G ] v V \ S B γ g v ( A ) n 0 · g X G Pr [ g | G ] v V \ S B X j ( g , v ) = U ( A )
Since U ( S ) = n 0 · E [ X j ] = n 0 · X j Pr [ X j ] E ( S R j ) is a form of weight coverage function, in which every R j is an element, the universal is set of all URR sets, and each node u V corresponding to a subsets that contains LRR R j is covered by u. The probability n 0 Pr [ g | G ] is the weight of element R j g ( u ) . Since the weighted coverage function is monotone and submodular, it follows that U ( A ) has the same properties. ☐
Lemma 3 suggests that we can use U as an upper bound submodular function of I . We further devise an algorithm, which is summarized in Algorithm 1, to generate an URR set. We first randomly selected a source node v V \ S B with uniform distribution (Line 1). After that, it attempts to select an in-neighbor u of v on g A according to the CLE model (Line 4). Then, it moves from v to u and repeats the process. The algorithm stops within τ steps (Line 3), when no edge is selected (Line 10), or the selected node v belongs to S B or belongs to R j (Line 7).
Algorithm 1: Generate LRR set.
Applsci 09 02274 i001

5.1.2. Lower Bound Submodular Function

We next devise a lower submodular function of objective function. The idea of this method is that we only choose set of nodes that make v becomes A-active with probability 1 in estimating of objective function. We consider set C L ( g , v ) defined as follows:
C L ( g , v ) = { u | d A ( u , v ) < τ , d A ( u , w ) < d B ( S B , w ) , w P A ( u , v ) }
where P A ( u , v ) is the simple path from u to v in g A . The left inequality in Equation (17) ensures that v can reach from u on g A . The right inequality ensures the influence from u to v is not blocked by the influence from S B .
Consider the example in Figure 5. We have the sample graph g that contains nine nodes and eight edges, the source nodes is v and S B = { c , h } and τ = 5 . g A contains edges: ( a , v ) , ( b , a ) , ( c , b ) , ( d , b ) . g B contains edges: ( f , v ) , ( e , f ) , ( h , e ) , ( i , e ) . We have C L ( g , v ) = { b , a , v } . According CLE model, we can easily prove that γ g v ( u ) = 1 , u C L ( g , v ) . Based on that, we define Lower bound Reachable Reversal ( LRR ) set as follows:
Definition 4.
LRR set. Given graph G = ( V , E , w A , w B ) , a random LRR set R j is generated from G by: (1) picking a random node v V ; and (2) generating a sample graph g from G by CLE model and returning R j C L ( g , v ) .
For any S A V \ S B , denote a random variable:
Y j = 1 , If   R j S A 0 , Otherwise
and define L ( S ) = n 0 · E [ Y j ] . Let R be a set of LRR . We estimate L ( S ) as
L ^ ( S A ) = n 0 | R | R j R Y j
Lemma 4 shows the properties of L function.
Lemma 4.
Given seed set S B , for any set of nodes S A V \ S B , we have: L ( S A ) I ( S A ) and L ( · ) is a monotone and submodular function.
The proof of Lemma 4 is omitted here because it is similar to that of Lemma 3. Based on Lemma 4, we use L as a lower-bound submodular function of I . Algorithm 1 depicts the generation of LRR set. We first randomly select source node v, and then select an edge ( u , v ) on g A according to the Competitive live-edge model. If edge ( u , v ) is selected, we update d A as the distance from u to v on g A and check distance condition d A ( u , v ) < d B ( u , S B ) by Algorithm 2. In this algorithm, we sequentially generate a simple path from u on g B (called B-path) according to the CLE model until the length of path exceeds d A , or no B-edge is selected. If d B d A (Lines 13–16, Algorithm 2), it returns True and Algorithm 1 returns current LRR set R j (Line 14, Algorithm 1). In Algorithm 1, if node u is selected into R j , it moves from v to u and repeats process until distance from current selected node to v on g A exceeds τ (Line 18, Algorithm 1), or no A-edge is selected (Line 16, Algorithm 1). This process ensures that, if node u is added to the set R j , the previous nodes on P A ( u , v ) are not affected by S B .
Algorithm 2: Check the distance from u to B on g B CheckBP ( u , g B , d A ) .
Applsci 09 02274 i002

5.2. Polling-Based Algorithm for Maximum Bound Functions

We now introduce an approximation algorithm for finding maximum lower and upper function in previous subsection in which all nodes have heterogeneous cost, namely PBA . Our algorithm is based on polling method, which was proposed for IM problem [2,4,5,6]. We describe our algorithm for maximizing the lower bound function, and it is similar to applying for maximizing the upper bound function. The details of our algorithm are depicted in Algorithm 3.
Algorithm 3: Polling-Based Approximation algorithm ( PBA ).
Applsci 09 02274 i003
Algorithm 4: Generate URR set.
Applsci 09 02274 i004

5.2.1. Description of PBA

PBA first generates a collection R 1 of Λ URR sets. The main phrase of PBA contains several iterators (at most t m a x ). In each iterator, the algorithm first finds the candidate solution S A by using Greedy algorithm (Algorithm 5) to solve Budgeted Maximum Coverage (BMC) problem [20] (Line 6). It provides an approximation ratio of ( 1 1 e ) . We denote Greedy ( R , L ) as Greedy algorithm with the input data consisting a collection URR sets R and budget L > 0 . Then, S A is checked for quality in Algorithm 6, which independently generates more | R t | URR , adds them to R c , calculates
Cov R c ( S A ) = R j R c min { | S A R j | , 1 }
Algorithm 5: Greedy algorithm for Budgeted Maximum Coverage problem— Greedy ( R , L ) .
Applsci 09 02274 i005
and uses it to calculate parameters ϵ 1 , ϵ 2 . We next calculate the lower-bound of I ( S A ) : f l ( S , R c , ϵ 1 ) , and upper-bound of optimal solution f u ( OPT u , R t , ϵ 2 ) . If current solution S A meets approximation guarantee condition that
f l ( S A , R c , ϵ 1 ) f u ( OPT u , R t , ϵ 2 ) 1 1 e ϵ
the algorithm returns S A . If not, it moves to the next iterator and stops when the number of URR sets is at least N m a x .
Algorithm 6: Check quality of solution ( CheckQS ).
Applsci 09 02274 i006

5.2.2. Theoretical Analysis

Now, we prove that PBA returns a ( 1 1 / e ϵ ) -approximation solution with probability at least 1 δ .
We observe that X j [ 0 , 1 ] . Let random variable Z i = j = 1 i ( X j E [ X j ] ) , i 1 . For a sequence random variables Z 1 , Z 2 , , we have E [ Z i | Z 1 , , Z j 1 ] = E [ Z i 1 ] + E [ X i μ X ] = E [ Z i 1 ] . Hence, Z 1 , Z 2 , is a form of martingale [34]. Therefore, we obtain the same results as in [6].
Lemma 5 
([6]). For any T > 0 , ϵ > 0 , μ is the mean of X j , and an estimation of μ is μ ^ = i = 1 T X i T . We have:
Pr [ μ ^ ( 1 + ϵ ) μ ] exp T μ ϵ 2 2 + 2 3 ϵ
Pr [ μ ^ ( 1 ϵ ) μ ] exp T μ ϵ 2 2
Based on Lemma 5, Tang et al. [6] proposed IMM algorithm based on Reverse Influence Set (RIS) [24] process for solving IM problem. They showed that the number of random Reachable Reversal (RR) sets, which ensures RIS process returns an ( 1 1 / e ) -approximation with probability 1 δ , is
θ m a x = 2 n 1 1 e ln ( 2 δ ) + 1 1 e ln 2 δ + ln n k 2 1 ϵ 2 k
This threshold is also used to obtain stopping condition for IM algorithms [4,7]. However, it does not guarantee that the candidate solution S A is a ( 1 1 / e ) -approximation under the heterogeneous selecting costs. In this case, we provide the number of URR sets to guarantee ( 1 1 / e )-approximation ratio by the following theorem.
Theorem 4.
For ϵ > 0 , δ ( 0 , 1 ) is the parameter.If | R | is greater or equal to
N ( ϵ , δ ) = 2 n 0 ( 1 1 e ) ln ( 2 δ ) + 1 1 e ln 2 δ + ln n k m a x 2 1 ϵ 2 OPT u
Algorithm 5 returns a ( 1 1 / e ϵ )-approximation solution with probability at least 1 δ .
Lemma 6.
Let event B t 1 = | R t | N m a x U ( S ) < ( 1 1 / e ϵ ) OPT u ) then, Pr [ B t 1 ] < δ 3 .
Proof. 
From Theorem 4, since N m a x N ϵ , δ 3 , the bad event U ( S ) < ( 1 1 / e ϵ ) OPT u happens with probability at most δ 3  ☐
Lemma 7.
For each 1 t t m a x , let f l ( S A , R c , ϵ 1 ) = n 0 | R c | ( 1 + ϵ 1 ) Cov R c ( S A ) . We have: Pr [ f l ( S A , R c , ϵ 1 ) U ( S A ) ] 1 δ 1
Proof. 
We denote U ^ c ( S A ) as an estimation of U ( S A ) over R c . In each iterator t, after ending the for loop (Line 7) in Algorithm 6, we have
Cov R c ( S A ) = Υ ( ϵ 1 , δ 1 ) = ( 1 + ϵ 1 ) 2 + 2 3 ϵ 1 ln 2 δ 1 1 ϵ 1 2
It satisfies the stopping rule theorem in [35], therefore it guarantees that
Pr [ U ^ c ( S A ) ( 1 + ϵ 1 ) U ( S A ) ] = Pr [ μ ^ c ( 1 + ϵ 1 ) μ ] 1 δ 1
Hence, Pr [ f l ( S A ) U ( S A ) ] 1 δ 1  ☐
Lemma 8.
Assume that the bad event in Lemma 7 does not happen. Let f u ( S A , R t , ϵ 2 ) = n 0 Cov R t ( S A ) | R t | ( 1 e ) ( 1 ϵ 2 ) . We have: Pr [ f u ( S A , R t , ϵ 2 ) U ( S U * ) ] 1 δ 1 .
Proof. 
Since the bad event in Lemma 7 does not happen, we have μ ( S A ) ( 1 + ϵ 1 ) μ ^ c ( S A ) . Applying Lemma 5 for optimal set S U * , with random variable X j , the mean μ ( S U * ) = U ( S U * ) / n 0 , and | R t | samples
Pr [ U ^ t ( S U * ) < ( 1 ϵ 2 ) U ( S U * ) ] = Pr [ μ ^ t ( S U * ) < ( 1 ϵ 2 ) μ ( S U * ) ] exp | R t | ϵ 2 2 μ ( S U * ) 2 exp | R t | ϵ 2 2 μ ( S A ) 2 = exp | R t | ϵ 2 2 μ ^ c ( S A ) / ( 1 + ϵ 1 ) 2 = δ 3 t m a x = δ 1
Now, since Greedy algorithm returns a ( 1 1 / e ) -approximation solution, we have:
U ^ t ( S A ) = n 0 | R t | Cov R t ( S A ) n 0 | R t | ( 1 1 / e ) Cov R t ( S t * ) n 0 | R t | ( 1 1 / e ) Cov R t ( S U * ) = ( 1 1 / e ) U ^ t ( S U * ) ( 1 1 / e ) ( 1 ϵ 2 ) U ( S U * )
where S t * is an optimal solution of instance ( R t , L ) of maximum coverage problem. Therefore, f u ( S A , R t , ϵ 2 ) = U ^ ( S ) ( 1 1 / e ) ( 1 ϵ 2 ) U ( S U * ) with probability at least 1 δ 1 . ☐
Theorem 5.
Given 0 ϵ , δ 1 , PBA algorithm returns the set node S satisfying:
Pr [ U ( S A ) ( 1 1 / e ϵ ) U ( S U * ) ] 1 δ
Proof. 
Assume that none of the bad events in Lemmas 6–8 happen in any iterator t = 1 , 2 , t m a x . We apply the union bounding the probability of bad events, and the probability of this assumption is at least
1 δ 3 + δ 1 · t m a x + δ 1 · t m a x = 1 δ
Under this assumption, we show that PBA algorithm returns a solution satisfying:
U ( S A ) ( 1 1 / e ϵ ) U ( S U * )
If the algorithm stops with condition | R t | N m a x , the solution S satisfies Equation (27) due to Lemma 6. Otherwise, PBA algorithm stops at some iterator t , t = 1 , 2 , , t m a x , in which the CheckQS on Line 5 returns “True”. Since the bad events do not happen and the condition on Line 12 of Algorithm 6 is true, we have
U ( S A ) OPT u f l ( S A , R c , ϵ 1 ) f u ( S A , R t , ϵ 2 ) 1 1 e ϵ
This completes the proof. ☐

5.2.3. Improved Guarantees with Tightened Bound

Lemma 8 provides an upper bound of OPT u , in which we use the inequality Cov R t ( S A ) ( 1 1 / e ) Cov R t ( S t * ) . However, this upper bound is tight in the worst case [20], but loose for specific instances of budgeted maximum coverage problem. We propose another upper bound of Cov R t ( S t * ) that is much tighter in practice, as explained in the following. In Greedy algorithm, we denote S i at iterator i. The following lemma provides a tighter bound of Cov R t ( S t * ) .
Lemma 9.
Assume that the number iterators in this algorithm is k and u i is the node added at ith iterator. Letting g ( R t , S k ) = L · Cov R t ( S k ) c ( u k ) + 1 L c ( u k ) Cov R t ( S k 1 ) , we have
Cov R t ( S t * ) g ( R t , S k ) Cov R t ( S ) 1 1 / e
Proof. 
From Lemma 1 in [20], we have
Cov R t ( S k ) Cov R t ( S k 1 ) c ( u k ) L ( Cov R t ( S t * ) Cov R t ( S k 1 ) )
Rearranging Equation (30) yields Cov R t ( S t * ) g ( R t , S k ) . On the other hand,
g ( R t , S k ) Cov R t ( S k ) = 1 c ( u k ) L g ( R t , S k ) Cov R t ( S k 1 ) = i = 1 k 1 c ( u i ) L g ( R t , S k )
It follows that
g ( R t , S k ) = Cov R t ( S k ) 1 i = 1 k 1 c ( u i ) L Cov R t ( S k ) 1 1 / e
 ☐
By Lemma 9, we have the new upper bound OPT u as follows:
f u a ( S A , R t , ϵ 2 ) = n 0 | R t | ( 1 ϵ 2 ) g ( R t , S k )

5.3. Sandwich Approximation

We apply Sandwich Approximation framework in [16] to design our algorithm, namely SA - PBA . Let S U and S L be solutions selected by PBA algorithm for maximizing L and U within the total cost at most L, respectively. S A is a solution for original problem. We denote I ^ ( S A ) as a ( δ , ϵ ) -approximation of I ( S A ) , i.e.,
Pr [ ( 1 ϵ ) I ( S A ) I ^ ( S A ) ( 1 + ϵ ) I ( S A ) ] 1 δ
The sandwich approximation algorithm operates as follows. First, we find a solution to the original problem with any strategy. Second, we find an approximate solution to the lower bound and the upper bound by PBA algorithm. Last, we return S s a = arg max S { S L , S , S U } I ^ ( S ) as the solution of SPBA algorithm. The details of our algorithm are shown in Algorithm 7.
Algorithm 7: Sandwich Approximation base on PBA algorithm ( SPBA ).
Input: Graph G = ( V , E , w A , w B ) , budget L > 0 , and  ϵ , δ , ϵ , δ ( 0 , 1 )
Output: Seed set S A
1. S U PBA ( L , G , L , ϵ , δ )
2. S L PBA ( U , G , L , ϵ , δ )
3. S a solution for maximizing I by any algorithm.
4. S arg max S { S U , S L , S } I ^ ( S )
5. return S;
The following theorem shows the approximation ratio of our algorithm.
Theorem 6.
Let S * be the optimal solution, and S s a be a solution returned by Algorithm 7. We have:
I ( S s a ) max I ( S U ) U ( S U ) , L ( S L * ) I ( S A * ) ( 1 ϵ ) ( 1 + ϵ ) 1 1 e ϵ OPT
with probability at least 1 2 δ δ .
Proof. 
Due to U ( S U * ) U ( S A * ) I ( S A * ) , we have:
I ^ ( S U ) = I ^ ( S U ) U ( S U ) U ( S U ) I ( S U ) U ( S U ) 1 1 e ϵ U ( S U * ) I ( S U ) U ( S U ) ( 1 ϵ ) 1 1 e ϵ I ( S A * )
On the other hand,
I ^ ( S L ) ( 1 ϵ ) I ( S L ) ( 1 ϵ ) L ( S L ) 1 1 e ϵ L ( S L * ) L ( S L * ) I ( S A * ) ( 1 ϵ ) 1 1 e ϵ I ( S A * )
Since ( 1 ϵ ) I ( S s a ) I ^ ( S s a ) ( 1 + ϵ ) I ( S s a ) , we have:
I ( S s a ) 1 1 + ϵ I ^ ( S s a ) max I ( S U ) U ( S U ) , L ( S L * ) I ( S A * ) · ( 1 ϵ ) ( 1 + ϵ ) ( 1 1 e ϵ ) · OPT u
Applying union bound of probabilities, the inequality in Equation (38) happens with probability at least 1 2 δ δ . ☐

6. Experiments

We experimentally evaluated and compared the performance of our algorithm to other algorithms, namely baseline algorithms and influence maximization methods, on two aspects: solution quality and the scalability from various network datasets.

6.1. Experimental Settings

6.1.1. Datasets

We performed our experiments on six real-world datasets: Gnutella, Enron, Epinions, Email-Eu, DBLP and Wiki.The basic statistics of these networks are summarized in Table 2.

6.1.2. Algorithm Compared

We compared our algorithm with influence maximization BCT algorithm and several baseline algorithms, which are described as follows
  • BCT : An influence maximization algorithm under the heterogeneous selecting cost. The reason we chose BCT to compare is that BCIM is a variant of IM and BCT considers of nodes with arbitrary costs.
  • Degree : This algorithm selects nodes with the highest degree and we keep on adding the highest-degree nodes until total costs of the selection of nodes exceeds L.
  • Random : This algorithm randomly selects nodes within budget L.

6.1.3. Parameters

In all the experiments, we kept ϵ = 0 . 1 and δ = 1 / n as general settings. We set ϵ = δ = 0 . 01 and used the stopping condition algorithm in [35] to estimate I ^ . We assigned the weights of edges in TCLT model according to LT model in previous studies [4,5,6,7,8,13]. The weight of the edge ( u , v ) was calculated as follows,
w ( u , v ) = 1 | N ( v ) |
Our implementation was written in C++ and compiled with GCC 4.7. All our experiments were carried out using a Linux machine with a 2 × Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30 GHz 8 × 16 GB DIMM ECC DDR4 @ 2400 MHz

6.2. Results

6.2.1. Comparison of Algorithms under General Case

In this experiment, we compared algorithms when τ = 5 , the budget L varied from 0 to 100 and the costs of node were uniformly distributed in [ 1 . 0 , 3 . 0 ] . Figure 6 shows the performance of all algorithms on Gnutella, Enron, Epinions, Email-Eu, DBLP and Wiki networks. On average, we observed that our algorithm SPBA always had the best performance. SPBA was 10– 30 % better than the result of BCT . This is because BCT only considers the number of influenced nodes and ignores the competitive influence under time constraint τ . It also confirmed that IM ’s algorithms do not have good performance for BCIM problem. Random algorithm had the worst performance of all cases. Degree algorithm performed well on the Enron dataset but had bad performance on the other datasets. SPBA was up to 7 . 7 times better than Degree . The reason is that Degree only uses the topology properties of the social network but can not consider competitive diffusion process. In the opposite, our algorithm takes advantage of the upper and lower bounds of objective function to obtain the approximation ratio. This explains why our algorithm had good performance while the others had poor performance in many cases.

6.2.2. Comparison of Algorithms under Unit-Cost Setting

To show more clearly the performance of these algorithms, we conducted experiments on the unit-cost case (i.e., all node costs are equal to 1) on Gnutella, Enron, Epinions and Email-EU datasets. We set τ = 5 and L varied from 1 to 100. Figure 7 displays the results of all algorithms. Once again, we found that our algorithm SPBA gave the best performance. SPBA was 1 . 06 1 . 76 times better than BCT and 1 . 2 17 . 2 times better than the result of Degree . These results are also consistent with what was observed in the previous case.

6.2.3. Comparison of Running Time

Figure 8 shows running time of algorithms on six datasets. SPBA had the longest running time on any datasets. This is because the running time of SPBA consists of total running time of PBA ( L , G , L , ϵ , δ ) , PBA ( U , G , L , ϵ , δ ) and calculating I ^ ( · ) . Random and Degree algorithms are simple heuristic algorithms thus their costs are low. This resulted in their shortest running time. Although BCT is based on polling method, it ran faster than our algorithm. The reason for this result is due to the following reasons. Firstly, the sampling process of BCT and our algorithm are different. The sampling complexity of BCT is mainly dependent on the number of randomly selected node while the sampling process in our algorithm is more complicated because it must check the influence paths from S B . Secondly, to obtain a data approximation, our algorithm must solve three problems with polling based method. It is worth noting that SPBA is scalable with million-scale networks. For Wiki network, which has 1 . 79 millions nodes and 28 . 5 millions edges, our algorithm finished in 90 s.

6.2.4. Impact of τ

Considering the importance of early competitive influence in viral marketing, we were very interested in the role of time constraint in influence. We compared our solution with three other algorithms while varying τ from 3 to 5. Figure 9 shows results of algorithms when L = 50 . SPBA was clearly still the best performer. Specifically, our SPBA was 1 . 01 1 . 23 times better than BCT and up to 2 . 5 times better than Degree .

7. Conclusions

In this paper, we investigate BCIM problem, which finds the seed set of a player to maximize their influence while their competitors are conducting similar strategies. We first propose TCLT model to capture the competitive influence of two competitors on a social network and formulate BCIM in this model. We provide the hardness results and properties of objective function. A randomized SPBA -based approximation is proposed for finding the solution of BCIM . Experiments on real world social networks were conducted. The results show that our proposed algorithm outperformed the other heuristics.

Author Contributions

Investigation, C.V.P. and H.V.D.; Methodology, C.V.P.; Supervision, H.X.H. and M.T.T.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Proof of Theorem 3.
We use the inductive method according to two models when t = 0 , 1 , 2 . . . , τ . In the step t = 0 , we have A 0 = A 0 and B 0 = B 0 . Assume that, at step t 0 , the two models give the same distribution of A-active set and B-active set nodes, which are A t and B t . For TCLT model, we consider a node v that has not been activated by the end of step t, i.e., v A t B t . We divide the state of v in step t + 1 into two cases:
Cases 1: Total A-influence weights exceed threshold θ A ( v ) while the total B-influence weights is smaller than θ B ( v ) in step t + 1 . The probability that this case happens is:
P 1 , t + 1 ( v ) = u A t \ A t 1 w A ( u , v ) 1 u B t \ B t 1 w B ( u , v ) 1 u A t 1 w A ( u , v ) 1 u B t 1 w B ( u , v )
Cases 2: Total A-influence weights exceed threshold θ A ( v ) while the total negative influence weights also exceed than threshold θ B ( v ) . The probability that this case happens is:
P 2 , t + 1 ( v ) = u A t \ A t 1 w A ( u , v ) u B t \ B t 1 w B ( u , v ) 1 u A t 1 w A ( u , v ) 1 u B t 1 w B ( u , v )
In this case, TB-WPP rule is used to determine state of v. According to this rule, the probability node v is A-activated at step t + 1 is equal to
P 1 , t + 1 ( v ) + p A ( v | A t 1 , B t 1 ) · P 2 , t ( v )
On CLE model, we assume that A t and B t are the set of A-active set and B-active set nodes at step t. For v B t A t , the probability that v has an in-edge from A t and does not have an in-edge from B t (called P 1 , t ( v ) ) is equal to
P 1 , t ( v ) = u A t \ A t 1 w A ( u , v ) 1 u B t \ B t 1 w B ( u , v ) 1 u A t 1 w A ( u , v ) 1 u B t 1 w B ( u , v ) = P 1 , t ( v )
We denote P 2 , t ( v ) as the probability v has both in-edge from A t and in-edge from B t . We have
P 2 , t ( v ) = u A t \ A t 1 w A ( u , v ) u B t \ B t 1 w B ( u , v ) 1 u A t 1 w A ( u , v ) 1 u B t 1 w B ( u , v ) = P 2 , t ( v )
According to CLE model, the probability v is A-activated in this case is P 2 , t ( v ) · p A ( v | A t 1 , B t 1 ) . Thus, the probability v is A-activated at step t + 1 is
P 1 , t ( v ) + P 2 , t ( v ) · p A ( v | A t 1 , B t 1 ) = P 1 , t ( v ) + P 2 , t ( v ) · p A ( v | A t 1 , B t 1 )
Due to A 0 = A 0 = A , by step-by-step induction, we reach the conclusion that the random CLE model producing the same distribution over A-active sets as the TCLT model at any hop t = 0 , 1 , , τ . Similarly, we obtain two models produce the same distribution over B-active sets. ☐
Proof of Theorem 4.
We first show that, for any parameters ϵ 1 > 0 and θ 1 ( 0 , 1 ) : If
| R | θ 1 = 2 n 0 ln ( 1 / δ 1 ) OPT u ϵ 1 2
then
U ^ ( S U * ) ( 1 ϵ 1 ) · OPT u
with probability at least 1 δ 1 . Indeed, applying Lemma 5, we obtain
Pr [ U ^ ( S U * ) ( 1 ϵ 1 ) U ( S U * ) ] = Pr n 0 T j = 1 T X j ( 1 ϵ 1 ) μ · n 0 = Pr μ ^ ( 1 ϵ 1 ) μ exp ϵ 1 2 μ T 2 = δ 1
For the instance ( R , L ) , we denote S as solution Greedy , and S 0 * as an optimal solution. Since Algorithm 5 returns a ( 1 1 / e ) -approximation solution, the following event happens with probability at least 1 δ 1 :
U ^ ( S ) ( 1 1 / e ) U ^ ( S 0 * ) ( 1 1 / e ) U ^ ( S U * ) ( 1 1 / e ) ( 1 ϵ 1 ) U ^ ( S U * )
We next show that for ϵ 2 > 0 , ϵ 2 = ϵ ( 1 1 / e ) ϵ 1 and δ 2 ( 0 , 1 ) . If Equation (A8) holds, and
T θ 2 = 2 ( 1 1 / e ) ln ( n 0 k m a x / δ 2 ) OPT l ( ϵ ϵ 1 ( 1 1 / e ) ) 2
the following inequality holds with probability at least 1 δ 2
U ^ ( S ) ( 1 1 / e ϵ ) · OPT l
Since Equation (A8) holds, we have
U ^ ( S ) ( 1 1 / e ) ( 1 ϵ 1 ) U ( S U * ) = ( 1 1 / e ϵ ) OPT u + ϵ 2 OPT u
Applying Equation (A13) and combining with Lemma 5, we have:
Pr [ U ( S ) ( 1 1 / e ϵ ) · OPT u ] Pr [ U ^ ( S ) U ( S ) ϵ 2 · OPT u ] = Pr n 0 T i = 1 T Y i n 0 μ ϵ 2 · OPT u = Pr 1 T i = 1 T Y i μ OPT u ϵ 2 n 0 μ · μ = Pr μ ^ μ OPT l ϵ 2 n 0 μ · μ exp OPT u ϵ 2 n 0 μ 2 T μ 2 + 3 · OPT u ϵ 2 n 0 μ exp ϵ 2 2 · OPT u 2 2 n 0 2 μ + 2 3 ϵ 2 n 0 OPT u · T exp ϵ 2 2 · OPT u 2 2 n 0 ( 1 1 / e ϵ ) · OPT u + 2 3 ϵ 2 n 0 OPT u · T exp ( ϵ ( 1 1 / e ) ϵ 1 ) 2 · OPT u 2 n 0 ( 1 1 / e ) · θ 2 = δ 2 / n 0 k m a x
Due to k m a x = max { k : S V , c ( S ) L } , there are at most n 0 k m a x candidate solutions. By applying union bound, the inequality in Equation (A12) happens with probability at least 1 δ 2 . From the above results, we found that, if T max { θ 1 , θ 2 } , Algorithm 5 returns a ( 1 1 / e ) -approximation solution with probability at least 1 ( δ 1 + δ 2 ) . By setting θ 1 = θ 2 = θ / 2 , ϵ 2 = ϵ ( 1 1 / e ) ϵ 1 , and
ϵ 1 = ϵ ln ( 2 / δ ) ( 1 1 / e ) ln ( 2 / δ ) + ( 1 1 / e ) ln 2 n 0 k m a x / δ
we obtain θ 1 = θ 2 = N ( δ , ϵ ) . Hence, if T N ( δ , ϵ ) , Algorithm 5 returns a ( 1 1 / e ϵ ) -approximation solution with probability at least 1 ( δ 1 + δ 2 ) = 1 δ . ☐

References

  1. Kempe, D.; Kleinberg, J.M.; Tardos, É. Maximizing the spread of influence through a social network. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–27 August 2003; pp. 137–146. [Google Scholar] [CrossRef]
  2. Borgs, C.; Brautbar, M.; Chayes, J.T.; Lucier, B. Maximizing Social Influence in Nearly Optimal Time. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, ON, USA, 5–7 January 2014; pp. 946–957. [Google Scholar] [CrossRef]
  3. Leskovec, J.; Krause, A.; Guestrin, C.; Faloutsos, C.; VanBriesen, J.M.; Glance, N.S. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, 12–15 August 2007; pp. 420–429. [Google Scholar] [CrossRef]
  4. Nguyen, H.T.; Thai, M.T.; Dinh, T.N. Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-scale Networks. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, 26 June–1 July 2016; pp. 695–710. [Google Scholar] [CrossRef]
  5. Tang, Y.; Xiao, X.; Shi, Y. Influence maximization: Near-optimal time complexity meets practical efficiency. In Proceedings of the International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, 22–27 June 2014; pp. 75–86. [Google Scholar] [CrossRef]
  6. Tang, Y.; Shi, Y.; Xiao, X. Influence Maximization in Near-Linear Time: A Martingale Approach. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, 31 May–4 June 2015; pp. 1539–1554. [Google Scholar] [CrossRef]
  7. Nguyen, H.T.; Thai, M.T.; Dinh, T.N. A Billion-Scale Approximation Algorithm for Maximizing Benefit in Viral Marketing. IEEE/ACM Trans. Netw. 2017, 25, 2419–2429. [Google Scholar] [CrossRef]
  8. Chen, W.; Yuan, Y.; Zhang, L. Scalable Influence Maximization in Social Networks under the Linear Threshold Model. In Proceedings of the 10th IEEE International Conference on Data Mining, Sydney, Australia, 14–17 December 2010; pp. 88–97. [Google Scholar] [CrossRef]
  9. Chen, W.; Wang, C.; Wang, Y. Scalable influence maximization for prevalent viral marketing in large-scale social networks. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 25–28 July 2010; pp. 1029–1038. [Google Scholar] [CrossRef]
  10. Nguyen, H.; Zheng, R. On Budgeted Influence Maximization in Social Networks. IEEE J. Sel. Areas Commun. 2013, 31, 1084–1094. [Google Scholar] [CrossRef] [Green Version]
  11. Bharathi, S.; Kempe, D.; Salek, M. Competitive Influence Maximization in Social Networks. In Proceedings of the Internet and Network Economics, Third International Workshop, WINE 2007, San Diego, CA, USA, 12–14 December 2007; pp. 306–311. [Google Scholar] [CrossRef]
  12. Lu, W.; Bonchi, F.; Goyal, A.; Lakshmanan, L.V.S. The bang for the buck: Fair competitive viral marketing from the host perspective. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, IL, USA, 11–14 August 2013; pp. 928–936. [Google Scholar] [CrossRef]
  13. Chen, W.; Collins, A.; Cummings, R.; Ke, T.; Liu, Z.; Rincón, D.; Sun, X.; Wang, Y.; Wei, W.; Yuan, Y. Influence Maximization in Social Networks When Negative Opinions May Emerge and Propagate. In Proceedings of the Eleventh SIAM International Conference on Data Mining, SDM 2011, Mesa, AZ, USA, 28–30 April 2011; pp. 379–390. [Google Scholar] [CrossRef]
  14. Liu, W.; Yue, K.; Wu, H.; Li, J.; Liu, D.; Tang, D. Containment of competitive influence spread in social networks. Knowl.-Based Syst. 2016, 109, 266–275. [Google Scholar] [CrossRef]
  15. Bozorgi, A.; Samet, S.; Kwisthout, J.; Wareham, T. Community-based influence maximization in social networks under a competitive linear threshold model. Knowl.-Based Syst. 2017, 134, 149–158. [Google Scholar] [CrossRef]
  16. Lu, W.; Chen, W.; Lakshmanan, L.V.S. From Competition to Complementarity: Comparative Influence Diffusion and Maximization. PVLDB 2015, 9, 60–71. [Google Scholar] [CrossRef]
  17. Carnes, T.; Nagarajan, C.; Wild, S.; van Zuylen, A. Maximizing Influence in a Competitive Social Network: A Follower’s Perspective. In Proceedings of the Ninth International Conference on Electronic Commerce, Minneapolis, MN, USA, 19–22 August 2007; pp. 351–360. [Google Scholar]
  18. Wang, X.; Zhang, Y.; Zhang, W.; Lin, X. Dominated competitive influence maximization with time-critical and time-delayed diffusion in social networks. J. Comput. Sci. 2018, 28, 318–327. [Google Scholar] [CrossRef]
  19. Yan, R.; Zhu, Y.; Li, D.; Ye, Z. Minimum cost seed set for threshold influence problem under competitive models. World Wide Web 2018. [Google Scholar] [CrossRef]
  20. Khuller, S.; Moss, A.; Naor, J. The Budgeted Maximum Coverage Problem. Inf. Process. Lett. 1999, 70, 39–45. [Google Scholar] [CrossRef]
  21. Chen, W.; Lakshmanan, L.V.S.; Castillo, C. Information and Influence Propagation in Social Networks; Synthesis Lectures on Data Management, Morgan & Claypool Publishers: Williston, VT, USA, 2013. [Google Scholar]
  22. He, X.; Song, G.; Chen, W.; Jiang, Q. Influence Blocking Maximization in Social Networks under the Competitive Linear Threshold Model. In Proceedings of the Twelfth SIAM International Conference on Data Mining, Anaheim, CA, USA, 26–28 April 2012; pp. 463–474. [Google Scholar] [CrossRef]
  23. Valiant, L.G. The Complexity of Enumeration and Reliability Problems. SIAM J. Comput. 1979, 8, 410–421. [Google Scholar] [CrossRef]
  24. Borodin, A.; Filmus, Y.; Oren, J. Threshold Models for Competitive Influence in Social Networks. In Proceedings of the Internet and Network Economics—6th International Workshop, WINE 2010, Stanford, CA, USA, 13–17 December 2010; pp. 539–550. [Google Scholar] [CrossRef]
  25. Wang, X.; Zhang, Y.; Zhang, W.; Lin, X. Efficient Distance-Aware Influence Maximization in Geo-Social Networks. IEEE Trans. Knowl. Data Eng. 2017, 29, 599–612. [Google Scholar] [CrossRef]
  26. Song, C.; Hsu, W.; Lee, M. Targeted Influence Maximization in Social Networks. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, IN, USA, 24–28 October 2016; pp. 1683–1692. [Google Scholar] [CrossRef]
  27. Li, Y.; Zhang, D.; Tan, K. Real-time Targeted Influence Maximization for Online Advertisements. PVLDB 2015, 8, 1070–1081. [Google Scholar] [CrossRef]
  28. Lin, Y.; Chen, W.; Lui, J.C.S. Boosting Information Spread: An Algorithmic Approach. In Proceedings of the 33rd IEEE International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, 19–22 April 2017; pp. 883–894. [Google Scholar] [CrossRef]
  29. Goyal, A.; Lu, W.; Lakshmanan, L.V.S. CELF++: Optimizing the greedy algorithm for influence maximization in social networks. In Proceedings of the 20th International Conference on World Wide Web, WWW 2011, Hyderabad, India, 28 March–1 April 2011; pp. 47–48. [Google Scholar] [CrossRef]
  30. Jung, K.; Heo, W.; Chen, W. IRIE: Scalable and Robust Influence Maximization in Social Networks. In Proceedings of the 12th IEEE International Conference on Data Mining, ICDM 2012, Brussels, Belgium, 10–13 December 2012; pp. 918–923. [Google Scholar] [CrossRef]
  31. Goyal, A.; Lu, W.; Lakshmanan, L.V.S. SIMPATH: An Efficient Algorithm for Influence Maximization under the Linear Threshold Model. In Proceedings of the 11th IEEE International Conference on Data Mining, ICDM 2011, Vancouver, BC, Canada, 11–14 December 2011; pp. 211–220. [Google Scholar] [CrossRef]
  32. Budak, C.; Agrawal, D.; El Abbadi, A. Limiting the spread of misinformation in social networks. In Proceedings of the 20th International Conference on World Wide Web, WWW 2011, Hyderabad, India, 28 March–1 April 2011; pp. 665–674. [Google Scholar] [CrossRef]
  33. Tong, G.A.; Wu, W.; Guo, L.; Li, D.; Liu, C.; Liu, B.; Du, D. An efficient randomized algorithm for rumor blocking in online social networks. In Proceedings of the 2017 IEEE Conference on Computer Communications, INFOCOM 2017, Atlanta, GA, USA, 1–4 May 2017; pp. 1–9. [Google Scholar] [CrossRef]
  34. Chung, F.R.K.; Lu, L. Survey: Concentration Inequalities and Martingale Inequalities: A Survey. Int. Math. 2006, 3, 79–127. [Google Scholar] [CrossRef]
  35. Dagum, P.; Karp, R.M.; Luby, M.; Ross, S.M. An Optimal Algorithm for Monte Carlo Estimation. SIAM J. Comput. 2000, 29, 1484–1496. [Google Scholar] [CrossRef]
  36. Leskovec, J.; Kleinberg, J.M.; Faloutsos, C. Graph evolution: Densification and shrinking diameters. TKDD 2007, 1, 2. [Google Scholar] [CrossRef]
  37. Leskovec, J.; Lang, K.J.; Dasgupta, A.; Mahoney, M.W. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. Int. Math. 2009, 6, 29–123. [Google Scholar] [CrossRef] [Green Version]
  38. Richardson, M.; Agrawal, R.; Domingos, P.M. Trust Management for the Semantic Web. In Proceedings of the Semantic Web—ISWC 2003, Second International Semantic Web Conference, Sanibel Island, FL, USA, 20–23 October 2003; pp. 351–368. [Google Scholar] [CrossRef]
  39. Yang, J.; Leskovec, J. Defining and Evaluating Network Communities based on Ground-truth. CoRR 2012, 42, 181–213. [Google Scholar]
  40. Yin, H.; Benson, A.R.; Leskovec, J.; Gleich, D.F. Local Higher-Order Graph Clustering. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; ACM: New York, NY, USA, 2017; pp. 555–564. [Google Scholar] [CrossRef]
Figure 1. Illustrating for TB-WPP and TB-PP rules.
Figure 1. Illustrating for TB-WPP and TB-PP rules.
Applsci 09 02274 g001
Figure 2. A counter example.
Figure 2. A counter example.
Applsci 09 02274 g002
Figure 3. Framework of SPBA algorithm.
Figure 3. Framework of SPBA algorithm.
Applsci 09 02274 g003
Figure 4. Description for C U ( g , v ) .
Figure 4. Description for C U ( g , v ) .
Applsci 09 02274 g004
Figure 5. Description for C L ( g , v ) .
Figure 5. Description for C L ( g , v ) .
Applsci 09 02274 g005
Figure 6. Comparison between different algorithms under general cost setting.
Figure 6. Comparison between different algorithms under general cost setting.
Applsci 09 02274 g006
Figure 7. Comparison between different algorithms under unit-cost setting.
Figure 7. Comparison between different algorithms under unit-cost setting.
Applsci 09 02274 g007
Figure 8. Running time of algorithms for BCIM problem.
Figure 8. Running time of algorithms for BCIM problem.
Applsci 09 02274 g008
Figure 9. Comparison between different algorithms when τ varies.
Figure 9. Comparison between different algorithms when τ varies.
Applsci 09 02274 g009
Table 1. Notations.
Table 1. Notations.
NotationsDescriptions
n , m the number of nodes and the number of edges
N ( v ) , N + ( v ) the sets of incoming, and outgoing neighbor nodes of v
S A , S B seed sets of A and B, respectively
n 0 n | S B |
I ( · ) , L ( · ) , U ( · ) The expected number of A-active nodes, its lower bound and its upper bound, respectively
U ^ c ( S A ) , U ^ t ( S A ) Estimations of U ( S A ) over set R c and R t , respectively
S A * , S L * , S U * Optimal solution for BCIM , optimal solution for maximizing L ( · ) , and  U ( · )
OPT , OPT l , OPT u I ( S * ) , L ( S L * ) , U ( S U * )
Υ ( ϵ , δ ) 1 + ( 1 + ϵ ) ( 2 + 2 3 ϵ ) ln 2 δ 1 ϵ 2
Cov R ( S ) number of LRR (or URR ) sets R j be covered by S
k m a x max { k : A V , c ( A ) L }
α , β ( 1 1 e ) ln ( 2 δ ) , 1 1 e ln 2 δ + ln n k m a x
N ( ϵ , δ ) 2 n α + β 2 1 ϵ 2 OPT l
Table 2. Datasets.
Table 2. Datasets.
DatasetNodesEdgesTypeAvg. Degree
Gnutella [36]630120,777Directed3.29
Enron [37]36,692183,831Undirected5.01
Epinions [38]75,879508,837Directed6.7
Email-Eu [36]265,214420,045Directed1.58
DBLP [39]317,0801,049,866Undirected5.01
Wiki [40]1,791,48928,511,807Directed6.7

Share and Cite

MDPI and ACS Style

Pham, C.V.; Duong, H.V.; Hoang, H.X.; Thai, M.T. Competitive Influence Maximization within Time and Budget Constraints in Online Social Networks: An Algorithmic Approach. Appl. Sci. 2019, 9, 2274. https://doi.org/10.3390/app9112274

AMA Style

Pham CV, Duong HV, Hoang HX, Thai MT. Competitive Influence Maximization within Time and Budget Constraints in Online Social Networks: An Algorithmic Approach. Applied Sciences. 2019; 9(11):2274. https://doi.org/10.3390/app9112274

Chicago/Turabian Style

Pham, Canh V., Hieu V. Duong, Huan X. Hoang, and My T. Thai. 2019. "Competitive Influence Maximization within Time and Budget Constraints in Online Social Networks: An Algorithmic Approach" Applied Sciences 9, no. 11: 2274. https://doi.org/10.3390/app9112274

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop