Competitive Inﬂuence Maximization within Time and Budget Constraints in Online Social Networks: An Algorithmic Approach

: Competitive Inﬂuence Maximization ( CIM ) problem, which seeks a seed set nodes of a player or a company to propagate their product’s information while at the same time their competitors are conducting similar strategies, has been paid much attention recently due to its application in viral marketing. However, existing works neglect the fact that the limited budget and time constraints can play an important role in competitive inﬂuence strategy of each company. In addition, based on the the assumption that one of the competitors dominates in the competitive inﬂuence process, the majority of prior studies indicate that the competitive inﬂuence function (objective function) is monotone and submodular.This led to the fact that CIM can be approximated within a factor of 1 − 1/ e − (cid:101) by a Greedy algorithm combined with Monte Carlo simulation method. Unfortunately, in a more realistic scenario where there is fair competition among competitors, the objective function is no longer submodular. In this paper, we study a general case of CIM problem, named Budgeted Competitive Inﬂuence Maximization ( BCIM ) problem, which considers CIM with budget and time constraints under condition of fair competition. We found that the objective function is neither submodular nor suppermodular. Therefore, it cannot admit Greedy algorithm with approximation ratio of 1 − 1/ e . We propose Sandwich Approximation based on Polling-Based Approximation ( SPBA ), an approximation algorithm based on Sandwich framework and polling-based method. Our experiments on real social network datasets showed the effectiveness and scalability of our algorithm that outperformed other state-of-the-art methods. Speciﬁcally, our algorithm is scalable with million-scale networks in only 1.5 min.


Introduction
Online social networks (OSNs) have recently been a very effective method for diffusing information, propagating opinions or ideas. Many companies have leveraged word-of-mouth effect in OSNs to promote their products. The key problem of viral marketing is Influence Maximization (IM), which aims to select a set of k users (called seed set) in a social network with maximum influence spread. Kempe et al. [1] first formulated IM problem in two diffusion models, named Linear Threshold (LT) and Independent Cascade (IC), which simulate the propagation of influence through social networks. IM has been widely studied due to its important role in viral marketing [2][3][4][5][6][7][8][9][10]. However, all of the above-mentioned studies only focus on studying influence propagation of single player or company in social networks. In the context of viral marketing, there are often many competitors simultaneously implementing the same strategy of marketing spread on OSNs. This phenomenon requires the task of maximizing a product's influences under competitive circumstances, called Competitive Influence Maximization (CIM) problem.
Bharathi et al. [11] first proposed CIM problem, which seeks a seed set to maximize the propagation of their product's information while their competitors employ the same strategy. Since then, related works have tried to investigate CIM in many different contexts. Some authors show that the objective function is monotone and submodular. A set function f over a ground U is set to be submodular if (1) for any A ⊆ B ⊆ U, u ∈ U \ B. Based on that, they applied the classic hill-climbing algorithm, which provides a napproximation ratio of (1 − 1/e) to solve CIM problem [12][13][14][15][16][17]. For example, Lu et al. [12] studied the problem in context of fair competitive influence from the host perspective. Chen et al. [13] proposed an independent cascade model with negative opinions (IC-N) model by extending IC model and showed a greedy algorithm with the approximation ratio of 1 − 1/e. Recently, some works have addressed the problem in other directions, including proposing heuristic algorithm [15] and studying some variants of CIM [16,18,19].
Although previous works try to solve the CIM problem in many circumstances, the feasibility of the existing works is limited for following reasons. Firstly, they often assume that one of the competitors takes advantage of the competitive influence process. In this case, the objective function is monotone and submodular. As a result, the existing algorithms provide an approximation ratio of 1 − 1/e based on using Greedy framework algorithm, which sequentially selects the node that has the largest marginal benefit [20]. However, users also have different views when they receive the same information. As a result, the influence function is no longer submodular and there is no approximation algorithm for this case. Secondly, the prior works do not take into account time constraint and cost to select a seed user for CIM. In a more realistic scenario, the effectiveness of competitive influence process depends very much on these two factors. Thirdly, although many CIM algorithms have been proposed, there are no scalable and efficient algorithms for CIM in large social networks (million-scale). For the problems related to information diffusion, the complexity of calculating the objective function is enormous due to the randomness of the probabilistic diffusion model [8,9]. To address these challenges, some works use Mote-Carlo method to estimate the objective function [1,13,15,17]. However, the method requires high complexity, and it takes several hours even on very small networks. To the best of our knowledge, there is no randomized algorithm for CIM that can meet approximation guarantee with low complexity.
In this paper, we study a general problem of CIM, Budgeted Competitive Influence Maximization (BCIM), which takes into account both arbitrary cost and time constraints for CIM problem. To model this problem, we first introduce Time constraint Competitive Linear Threshold (TCLT) to capture competitive influence progress within time constraint by extending Competitive Linear Threshold [21,22]. Under TCLT model, the main challenges of BCIM lie in following aspects. Firstly, BCIM problem is NP-hard and it is #P-Hard [23] to calculate the objective function. Moreover, we point out that the objective function is neither submodular nor supermodular. Thus, it makes BCIM difficult to be solved using greedy-based algorithms, as well as methods for influence maximization. To address the above challenges, in this article, we present SPBA, an efficient randomized algorithm based on polling method and Sandwich Approximation framework [16]. Our main contributions are summarized as follows: • We formulate Time constraint Competitive Linear Threshold (TCLT) model by extending Competitive Linear Threshold model in [21,22] to simulate competitive influence within time constraint τ. Given two competitors A and B who need to advertise their productions on OSNs, assume that we know nodes that are activated by B (B-seed set). Given the limited budget L, heterogeneous cost of each node to active by A (i.e., each node has a cost to add it into A-seed set), and the time constraint τ, we study BCIM problem, which aims to seek A-seed set nodes within limited budget L and time constraint τ to maximize nodes influenced by A under TCLT model. We then show that BCIM is NP-hard and the objective function is neither submodular nor supermodular.

•
We propose SPBA, an efficient randomized algorithm based on Sandwich approximation and polling method. We first design upper bound and lower bound submodular functions of the objective function and develop a polling-based approximation algorithm to find the solution of bound functions that guarantees approximation ratio of (1 − 1/ √ e − ) with high probability. Based on that, the Sandwich framework approximation in [16] is applied to give a data-dependent approximation factor. • We conducted extensive experiments on various real social networks. The experiments suggest that SPBA provides significantly higher quality solutions than existing methods including baseline algorithms and influence maximization algorithms. Furthermore, we also demonstrate that our algorithm can scale to million-scale networks within about 1.5 min.
Organization. The rest of paper is organized as follows. The related work is presented in Section 2 and the preliminaries for Competitive Linear Threshold model and Competitive Influence Maximization are introduced in Section 3. We introduce our propagation model, problem definition and its properties in Section 4. Section 5 presents our proposed algorithms. The experiments are shown in Section 6. Finally, we give some tasks for future work and conclusion in Section 7.

Related Work
Since CIM is one of variants of IM, we review the literature related to this work from two areas, namely influence maximization and competitive influence maximization.

Influence Maximization
The IM problem is a crucial problem in information diffusion research due to its potential commercial value. Basically, IM focuses on finding a set of k seed users on a social network to maximize the number of influenced nodes. Kempe et al. [1] first proposed two information diffusion models, Linear Threshold (LT) and Independent Cascade (IC). On these models, they formulated IM problem as a combinatorial optimization problem and designed a natural greedy algorithm with approximation ratio is 1 − 1/e. IM has been received much attention from the following aspects: proposing efficiency algorithms [2][3][4][5][6][7][8][9]24] and studying its variants [7,10,[25][26][27][28].
Kempe et al. [1] first purposed Greedy algorithm based on Monte-Carlo simulation with (1 − 1/e − ) approximation guarantee. To improve the running time of Greedy algorithm, Leskovec et al. [3] proposed the cost-effective lazy forward (CELF) algorithm, which is up to 700 times faster than Greedy algorithm. This algorithm is improved in [29]. Several works propose heuristic algorithms to find solutions in large networks [8,9,30,31]. Although those heuristics are often faster in practice, they fail to retain the (1 − 1/e − ) approximation guarantee and often give lower results than greedy algorithm. Chen et al. [9] proposed a heuristic algorithm based on the maximum influence arborescence (MIA) structure. In the LT model, Chen et al. [8] proposed using local directed acyclic graphs (LDAG) to approximate the influence of nodes. Recently, Borgs et al. [2] made a theoretical breakthrough for finding solution to IM by proposing Reverse Influence Sampling (RIS) algorithm. RIS algorithm returns a (1 − 1/e − ) approximate solution with probability at least 1 − n −l . The main idea of the RIS algorithm is to generate Reachable Reverse (RR) sets to estimate the objective function and use greedy algorithm for a collection that includes a large enough RR set to the find solution. This motivates many state-of-the-art methods for IM including TIM/TIM++ [5], IMM [6], and SSA/D-SSA [4].
Recently, IM's variants have also received much attention due to their potential commercial value [25,27,28]. Lin et al. [28] studied k-Boosting problem, which aims at finding the set of k users to boost so that the "boosted" influence spread is maximized. The authors of [25] investigated distance-aware influence maximization, which takes into account the effect of distance on influence process for IM problem. They showed that the objective function is monotone and submodular and proposed RIS-based and MIA-based algorithms. Influence Maximization with awareness of the topic are also studied in [27]. In this work, each user is associated with a profile that consists of the users preferences on different topics in their model and the problem asks to select seed set with budget and topic queries so that the competitive influence function is maximized.

Competitive Influence Maximization
In the context of viral marketing, it is often the case that many companies propagate their product's information simultaneously, leading to the fact that the Competitive Influence Maximization (CIM) problem has been studied in recent years. Bharathi et al. [11] first proposed CIM problem by proposing a new propagation model, which is an extension of IC model. Later, some authors proposed variations of the IC model for CIM problem. Chen et al. [13] investigated CIM under the context of combating the dissemination of negative opinions under IC-N model. This model is based on the observation that rumors and misinformation are often more attractive than official information. Lui et al. [14] considered CIM problem under a new diffusion-containment model and presented a (1 − 1/e)-approximation algorithm. Carnes et al. [17] proposed distance based and wave propagation models in competitive influence process social networks. They showed that the objective function is submodular and devised a greedy algorithm with approximation ratio of (1 − 1/e). Some other works propose an expanding LT model approach for CIM problem [12,21,22,24]. For instance, Borodin et al. [24] proposed Competitive Linear Threshold models and provided some property results of these models. Lu et al. [12] considered the problem of fair competitive viral marketing from the host's perspective. They proposed K-LT model and showed that the influence function is monotone and submodular. Generally, Chen et al. [21] summarized two competitive influence models, which are extended from the IC model and the LT model, named Competitive Independent Cascade (CIC) and Competitive Linear Threshold (CLT) models. They also provided the properties of such models and categorized them by tie-breaking rules including fixed probability tie-breaking (TB-FP) rule and proportional probability tie-breaking (TB-PP) rule. TB-FP means that, when a node v is influenced by both competitors, it will be influenced by one of them with a fixed probability. TB-FP means that v becomes influenced by one competitor with a proportional probability. TB-FP reflects the dominance of a competitor and most purposed algorithms are based on this feature to give an approximation of 1 − 1/e. However, there is no approximation algorithm for TP-FP case.
In other directions, Bozrgi et al. [15] proposed a community-based algorithm for CIM under DC model. Some variants of CIM have been studied. Yan et al. [19] found the seed set with minimum cost set for threshold competitive influence problem. Lu et al. [16], Yan et al. [19] proposed competition and complementary approaches for CIM problem by extending IC model. The authors of [18] formulated Dominated Competitive Influence Maximization (DCIM) problem, which aims to maximize the difference in value between the influence of desired information and its competitors under a new competitive independent cascade model with meeting events.
Different from most of the existing works, in this paper, we study a general problem of CIM, namely BCIM, which considers the CIM within budget and time constraints under condition of fair competition. We show that the objective function is not submodular and calculate that the objective function is #P-Hard. To overcome this challenge, we propose a randomized algorithm based on Sandwich approximation and polling-based method.

Preliminaries
To clearly introduce the problem of BCIM, we first introduce some preliminaries. Table 1 summarizes the frequently used notations.
the sets of incoming, and outgoing neighbor nodes of v S A , S B seed sets of A and B, respectively The expected number of A-active nodes, its lower bound and its upper bound, respectivelŷ Optimal solution for BCIM, optimal solution for maximizing L(·), and U(·)

. Competitive Linear Threshold (CLT) Model
In this model, a social network is abstracted by a directed graph G = (V, E), where V is the set of nodes (or vertices) representing users and E is the set of edges representing links among users. There are two competitors A and B who want to promote their products in a social network G. Each edge (u, v) ∈ E has two weights w A (u, v) and w B (u, v) representing the influence of A and B on edge (u, v), respectively. The weights satisfy conditions Each node can choose one of three status: A-active, B-active, and inactive, which represent the nodes that have been successfully activated by A, activated by B, and have not been activated by either A or B. Each node v picks two independent thresholds θ A (v), θ B (v) uniformly from [0, 1], called A-threshold and B-threshold. The propagation process happens in discrete steps t = 0, 1, . . .. S A and S B are the seed sets of competitors A and B (S A ∩ S B = ∅). A t and B t are the set of A-active and B-active nodes at step t, respectively. The process of propagation operates as follows: in the case when node u that has the total influence weight of two competitors are greater than corresponding thresholds. Chen et al. [21] summarized tie-breaking rules can be used to determine whether v is A-active or B-active.
-Fixed probability tie-breaking rule (TB-FP): TB-FP means that with a fixed probability p, u becomes A-active with probability p and becomes B-active with probability 1 − p. The special cases of this rule include TB-FP(A)-competitor A's dominance, Once a node becomes activated (A-active or B-active), its status remains in next steps. The propagation process ends when no more nodes can be activated.
TB-FP is used in [11,13,22,32,33] to reflect the dominance of one competitor. This is motivated by the phenomenon of negativity bias, which is well studied in social psychology, and matches the common sense that rumors or misinformation are usually hard to fight with in social networks. In contrast, TB-PP reflects fair competition among competitors. This rule is used for IC-N model (a variant of IC model) [13], while no study uses this rule for a variant of LT model.

Competitive Influence Maximization
Definition 1. Given a directed graph G = (V, E) representing a social network under an information diffusion model M, there a two competitors A and B. Given B-seed set S B ⊂ V and a positive number k, find A-seed set S A ⊆ V \ S B with |A| ≤ k so that the number of A-active nodes is maximized

Time Constraint Competitive Linear Threshold (TCLT) Model
In this section, we introduce our model incorporating CLT model with limited spread step τ, namely Time Constraint Competitive Linear Threshold Model (TCLT). In addition, we propose a new tie-breaking rule in our model that can truly reflect the competitive context in viral marketing by our explanation.
In this model, we reuse all notations and symbols in CLT model. Given a constraint of propagation hop τ ≥ 1, the propagation process happens in discrete steps t = 0, 1, . . . , τ as follows: Node v becomes B-active if We propose weight proportional probability tie-breaking rule (TB-WPP) to determine its state. Accordingly, v is A-activated with probability.
and v is B-activated with probability Once a node becomes activated (A-active or B-active), it keeps this status in the next steps.
The propagation process ends after τ hops of propagation or no more nodes can be activated.
Different from TB-PP, in TB-WPP rule, we consider the total influence weight of the in-neighbors to decide state of node v. Our TB-WPP rule reflects more closely the competition process. Consider the example in Figure 1 to clarify this observation. Graph G contains four nodes {a, b, c, u} and three edges node v will change its state from inactive to A-active or B-active with probabilities 2 3 and 1 3 , respectively. In other words, the probability v becomes A-active is higher. If we use TB-WPP, node v would change its state to inactive to A-active or B-active with probabilities 3 11 and 8 11 , i.e. probability v becomes B-active is higher. In this case, the influence weight of c (0.8) for u is greater than that of two nodes a, b (total influence weight is 0.3). Considering the total weigh in TB-WPP rule is more suitable for the fact that users create different influences on each other depending on the relationship between them. Therefore, it is reasonable to consider TB-WPP about the competitive influence spread.

Budgeted Competitive Influence Maximization Problem
In this paper, we assume that we have known knowledge that seed set of competitor B is S B ⊂ V and each node u is associated with an arbitrary cost c(u) ≥ 0 to add in S A . We define Budgeted Competitive Influence Maximization (BCIM) as follows: Definition 2. BCIM problem. Given a directed graph G = (V, E) representing a social network under TCLT model, B-seed set S B ⊂ V, a budget L > 0, and time constraint τ, find A-seed set S A ⊆ V \ S B with total cost ∑ u∈A c(u) ≤ L to maximize I(S A ). Theorem 1. BCIM problem is NP-hard and calculating the objective function I(·) is #P-Hard.
Proof. We see that, when S B = ∅ and τ = n, the TCLT model becomes well-known LT model [1] and BCIM becomes IM problem [1]. In other words, IM is a special case of BCIM so BCIM is NP-hard problem and calculating the influence I(S A ) is #P-hard.
Although the objective function in IM problem is monotone and submodular function, unfortunately, the objective function in BCIM is neither submodular nor supermodular. Therefore, we cannot use the nature greedy for optimizing submodular and supermodular function to get an approximation guarantee. Theorem 2. The function I(·) is neither submodular nor supermodular under TCLT model Proof. We prove that by counter example (see Figure 2). Consider an instance of BCIM problem The A-weight and B-weight on each edge is equal to 1, and we set S B = { f }. In this example, we have I(∅) = 0,

Competitive Live-Edge (CLE) Model
We follow the method in [22] to construct a live-edge model and prove this model is equivalent to TCLT model. This property is used for estimating the objective function as well as the designing of our algorithm in the next sections.
From original graph G = (V, E) under TCLT model, we construct a sample graph (or realization) g from G as follows. For each v ∈ V, we randomly select one in-edge (u, v) with probability w A (u, v), and do not select any in-edge with probability 1 − ∑ u∈V w A (u, v). The selected edge is called A-live edge. On the other hand, we also randomly select one in-edge (u, v) (called B-live edge) with probability w B (u, v), and do not select any in-edge with probability 1 − ∑ u∈V w B (u, v). Let g A and g B be the sub-graph including only A-live edges and B-live edges, respectively. Finally, we return g as union of g A and g B .
In graph g, we denote A t and B t as sets of A-active nodes and B-active nodes on g at step t, The distribution of A-active and B-active nodes in g happens in discrete steps t as follows: and v is B-activated with probability The process of propagation ends after hop t = τ or no more nodes can be activated.
We demonstrate the equivalence of two models through the following theorem.
Theorem 3. For a given A-seed set S A and B-seed set S B , the distribution over A-active sets and B-active node sets at hop t for any t = 1, 2.., τ on TCLT model and CLE are equivalent.
The proof of Theorem 3 is presented in Appendix A. We denote X G as the set of sample graphs generated from G and Pr[g|G] as the probability of generating sample graph g in G. We have: where , otherwise E(g A ) and E(g B ) are the set edges of g A and g B , respectively. We denote I τ B (S A ) as the expected number of A-active nodes after τ hops with given B-seed set S B . For convenience, we simplify I τ B (S A ) as I(S A ). Based on the result of Theorem 3, we have: where γ v g (S A ) is a random variable under sample graph g, defined as follows: Node v is called source node. Let v be randomly selected in V \ S B and g be a random graph generated from G. Lemma 1 shows that we can use γ v g (·) to estimate objective function.

Lemma 1. For any S
is the expectation of γ v g (A) over all random sources and sample graphs.
Proof. Since the source node is randomly selected, the probability that v is selected is equal to 1 n 0 for n 0 = |V \ S B |. We have: The transition from the second to third equality follows from the definition of γ(S A ). This complete the proof.
Based on Lemma 1, we design upper and lower submodular functions, which are cores of our proposed algorithms in next sections.

Our Proposed Algorithm for BCIM Problem
In this section, we present SPBA, our approximation algorithm for BCIM problem. Since the I(·) is not submodular, we use the Sandwich Approximation (SA) method [16] to design approximation for the problem.
Outline. Our algorithm contains two key components: (1) We devise the lower bound and upper bound submodular function of I, namely L(·) and U(·), respectively. We then design Polling-Based Algorithm (PBA) a (1 − 1/ √ e − ) approximation algorithm to find solution to maximize L and U based on polling-based method [4][5][6][7]. (2) We apply SA with upper and lower bound function to provide a solution with approximation guarantee. It first finds a solution to the BCIM problem with any strategy. It then finds an approximate solution to the lower bound and the upper bound by PBA algorithm. Finally, it returns the solution that has the best result for the BCIM problem. The framework of SPBA is presented in Figure 3.

Lower and Upper Bound Functions
We leverage the equivalence between the TCLT and CLE model and result of Lemma 1 to design lower and upper bound of objective functions.

Upper Bound Function
For a random source node v and a sample graph g with given B-seed set S B , the idea of this method is that we only choose set of nodes satisfying: (1) the distance of influence path from them to v is smaller than τ; and (2) the influence path from them to v is not blocked by S B . We consider set C U (g, v) defined as follows: Figure 4 shows an example of C(g, v). In this example, the sample graph g contains nine nodes and eight edges, the source nodes is v and S B = {c, h} and τ = 4. g A contains edges: Node d lies on the simple path that ends at v, but it cannot influence v since its influence is blocked by c. We define Upper bound Reachable Reversal (URR) set as follows: Definition 3 (URR set). Given graph G = (V, E, w A , w B ), a random URR set R j is generated from G by: (1) picking a random source node v ∈ V; and (2) generating a sample graph g from G by running CLE model, and returning R j ← C U (g, v).
For any S A ⊆ V \ S B , denote a random variable: We denote R j (g, v) is a URR set with source node v and sample graph g, and X j (g, v) is value of X j corresponding to R j (g, u). The following lemma shows the upper bound characters of X j .

Lemma 2.
For any set S A ⊆ V \ S B , a random source node v and random sample graph g, Proof. We consider two following cases Case 1: S A ∩ R j (g, v) = ∅, each node u ∈ S A cannot reach v in g A after τ steps. By running CLE model, S A cannot activate v. Hence, .
Lemma 3 shows the properties of U function.

Lemma 3.
Given seed set S B ⊂ V, for any set of nodes S A ⊆ V \ S B , we have: U(S A ) ≥ I(S) and U(·) is a monotone and submodular function.
Proof. Using Lemma 2, we have is a form of weight coverage function, in which every R j is an element, the universal is set of all URR sets, and each node u ∈ V corresponding to a subsets that contains LRR R j is covered by u. The probability n 0 Pr[g|G] is the weight of element R g j (u). Since the weighted coverage function is monotone and submodular, it follows that U(A) has the same properties Lemma 3 suggests that we can use U as an upper bound submodular function of I. We further devise an algorithm, which is summarized in Algorithm 1, to generate an URR set. We first randomly selected a source node v ∈ V \ S B with uniform distribution (Line 1). After that, it attempts to select an in-neighbor u of v on g A according to the CLE model (Line 4). Then, it moves from v to u and repeats the process. The algorithm stops within τ steps (Line 3), when no edge is selected (Line 10), or the selected node v belongs to S B or belongs to R j (Line 7).
Add v to R j 6.
Select an A-edge (u, v) by CLE model 7. if (u, v) is selected then 8. If u ∈ S B then break 9.

. Lower Bound Submodular Function
We next devise a lower submodular function of objective function. The idea of this method is that we only choose set of nodes that make v becomes A-active with probability 1 in estimating of objective function. We consider set C L (g, v) defined as follows: where P A (u, v) is the simple path from u to v in g A . The left inequality in Equation (17) ensures that v can reach from u on g A . The right inequality ensures the influence from u to v is not blocked by the influence from S B . Consider the example in Figure 5. We have the sample graph g that contains nine nodes and eight edges, the source nodes is v and S B = {c, h} and τ = 5. g A contains edges: (a, v), (b, a), (c, b), (d, b). g B contains edges: ( f , v), (e, f ), (h, e), (i, e). We have C L (g, v) = {b, a, v}. According CLE model, we can easily prove that γ v g (u) = 1, ∀u ∈ C L (g, v). Based on that, we define Lower bound Reachable Reversal Definition 4. LRR set. Given graph G = (V, E, w A , w B ), a random LRR set R j is generated from G by: (1) picking a random node v ∈ V; and (2) generating a sample graph g from G by CLE model and returning R j ← C L (g, v).
For any S A ⊂ V \ S B , denote a random variable: and define L(S) Lemma 4 shows the properties of L function. The proof of Lemma 4 is omitted here because it is similar to that of Lemma 3. Based on Lemma 4, we use L as a lower-bound submodular function of I. Algorithm 1 depicts the generation of LRR set. We first randomly select source node v, and then select an edge (u, v) on g A according to the Competitive live-edge model. If edge (u, v) is selected, we update d A as the distance from u to v on g A and check distance condition d A (u, v) < d B (u, S B ) by Algorithm 2. In this algorithm, we sequentially generate a simple path from u on g B (called B-path) according to the CLE model until the length of path exceeds d A , or no B-edge is selected. If d B ≤ d A (Lines 13-16, Algorithm 2), it returns True and Algorithm 1 returns current LRR set R j (Line 14, Algorithm 1). In Algorithm 1, if node u is selected into R j , it moves from v to u and repeats process until distance from current selected node to v on g A exceeds τ (Line 18, Algorithm 1), or no A-edge is selected (Line 16, Algorithm 1). This process ensures that, if node u is added to the set R j , the previous nodes on P A (u, v) are not affected by S B . if exists an B-edge (u, v) in g B then 4.
Select a B-edge (u, v) using Competitive live-edge model 7. if a edge (u, v) is selected then else 11. break ; // If no B-edge is selected 12. end 13. if v ∈ B then 14.
Check ← True

Polling-Based Algorithm for Maximum Bound Functions
We now introduce an approximation algorithm for finding maximum lower and upper function in previous subsection in which all nodes have heterogeneous cost, namely PBA. Our algorithm is based on polling method, which was proposed for IM problem [2,[4][5][6]. We describe our algorithm for maximizing the lower bound function, and it is similar to applying for maximizing the upper bound function. The details of our algorithm are depicted in Algorithm 3.
if CheckQS(R t , Cov R t (S A ), δ, ) = True or |R t | ≥ N max then 6. return S A end 11. until |R t | ≥ N max ; 12. return S A ; Algorithm 4: Generate URR set.
Select an A-edge (u, v) by using Competitive live-edge model 5. if An edge (u, v) is selected then If (v ∈ R j or v ∈ S B ) then break 8. Add v into R j PBA first generates a collection R 1 of Λ URR sets. The main phrase of PBA contains several iterators (at most t max ). In each iterator, the algorithm first finds the candidate solution S A by using Greedy algorithm (Algorithm 5) to solve Budgeted Maximum Coverage (BMC) problem [20] (Line 6). It provides an approximation ratio of (1 − 1 √ e ). We denote Greedy(R, L) as Greedy algorithm with the input data consisting a collection URR sets R and budget L > 0. Then, S A is checked for quality in Algorithm 6, which independently generates more |R t | URR, adds them to R c , calculates Algorithm 5: Greedy algorithm for Budgeted Maximum Coverage problem-Greedy(R, L). Input: URR set R, budget k Output: Seed set S A and uses it to calculate parameters 1 , 2 . We next calculate the lower-bound of I(S A ): f l (S, R c , 1 ), and upper-bound of optimal solution f u (OPT u , R t , 2 ). If current solution S A meets approximation guarantee condition that the algorithm returns S A . If not, it moves to the next iterator and stops when the number of URR sets is at least N max .

Algorithm 6: Check quality of solution (CheckQS).
Input: R t , Cov R t (S A ), δ, Output: True or (False and R t+1 ) Generate a URR set R j by Algorithm 4 and add it to R c 6. 16. return True 17. else 18. return False and R c 19. end 20. return False;
We observe that X j ∈ [0, 1]. Let random variable . is a form of martingale [34]. Therefore, we obtain the same results as in [6].

Lemma 5 ([6]
). For any T > 0, > 0, µ is the mean of X j , and an estimation of µ isμ = ∑ T i=1 X i T . We have: Based on Lemma 5, Tang et al. [6] proposed IMM algorithm based on Reverse Influence Set (RIS) [24] process for solving IM problem. They showed that the number of random Reachable Reversal (RR) sets, which ensures RIS process returns an (1 − 1/e)-approximation with probability 1 − δ, is This threshold is also used to obtain stopping condition for IM algorithms [4,7]. However, it does not guarantee that the candidate solution S A is a (1 − 1/ √ e)-approximation under the heterogeneous selecting costs. In this case, we provide the number of URR sets to guarantee (1 − 1/ √ e)-approximation ratio by the following theorem.
Proof. From Theorem 4, since N max ≥ N , δ 3 , the bad event U(S) Proof. We denoteÛ c (S A ) as an estimation of U(S A ) over R c . In each iterator t, after ending the for loop (Line 7) in Algorithm 6, we have It satisfies the stopping rule theorem in [35], therefore it guarantees that Lemma 8. Assume that the bad event in Lemma 7 does not happen. Let f u (S A , R t , 2 ) = Proof. Since the bad event in Lemma 7 does not happen, we have µ(S A )(1 + 1 ) ≥μ c (S A ). Applying Lemma 5 for optimal set S * U , with random variable X j , the mean µ(S * U ) = U(S * U )/n 0 , and |R t | samples Now, since Greedy algorithm returns a (1 − 1/ √ e)-approximation solution, we have: where S * t is an optimal solution of instance (R t , L) of maximum coverage problem. Therefore, Theorem 5. Given 0 ≤ , δ ≤ 1, PBA algorithm returns the set node S satisfying: Proof. Assume that none of the bad events in Lemmas 6-8 happen in any iterator t = 1, 2 . . . , t max . We apply the union bounding the probability of bad events, and the probability of this assumption is at least Under this assumption, we show that PBA algorithm returns a solution satisfying: If the algorithm stops with condition |R t | ≥ N max , the solution S satisfies Equation (27) due to Lemma 6. Otherwise, PBA algorithm stops at some iterator t, t = 1, 2, . . . , t max , in which the CheckQS on Line 5 returns "True". Since the bad events do not happen and the condition on Line 12 of Algorithm 6 is true, we have This completes the proof.

Improved Guarantees with Tightened Bound
Lemma 8 provides an upper bound of OPT u , in which we use the inequality . However, this upper bound is tight in the worst case [20], but loose for specific instances of budgeted maximum coverage problem. We propose another upper bound of Cov R t (S * t ) that is much tighter in practice, as explained in the following. In Greedy algorithm, we denote S i at iterator i. The following lemma provides a tighter bound of Cov R t (S * t ).

Lemma 9.
Assume that the number iterators in this algorithm is k and u i is the node added at ith iterator.
Letting g(R t , S k ) = Proof. From Lemma 1 in [20], we have Rearranging Equation (30) yields Cov R t (S * t ) ≤ g(R t , S k ). On the other hand, It follows that By Lemma 9, we have the new upper bound OPT u as follows:

Sandwich Approximation
We apply Sandwich Approximation framework in [16] to design our algorithm, namely SA-PBA. Let S U and S L be solutions selected by PBA algorithm for maximizing L and U within the total cost at most L, respectively. S A is a solution for original problem. We denoteÎ(S A ) as a (δ , )-approximation of I(S A ), i.e., The sandwich approximation algorithm operates as follows. First, we find a solution to the original problem with any strategy. Second, we find an approximate solution to the lower bound and the upper bound by PBA algorithm. Last, we return S sa = arg max S∈{S L ,S ,S U }Î (S) as the solution of SPBA algorithm. The details of our algorithm are shown in Algorithm 7. Input: Graph G = (V, E, w A , w B ), budget L > 0, and , δ, , δ ∈ (0, 1) Output: Seed set S A The following theorem shows the approximation ratio of our algorithm. Theorem 6. Let S * be the optimal solution, and S sa be a solution returned by Algorithm 7. We have: with probability at least 1 − 2δ − δ .

Experiments
We experimentally evaluated and compared the performance of our algorithm to other algorithms, namely baseline algorithms and influence maximization methods, on two aspects: solution quality and the scalability from various network datasets.

Datasets
We performed our experiments on six real-world datasets: Gnutella, Enron, Epinions, Email-Eu, DBLP and Wiki.The basic statistics of these networks are summarized in Table 2. Random: This algorithm randomly selects nodes within budget L.

Parameters
In all the experiments, we kept = 0.1 and δ = 1/n as general settings. We set = δ = 0.01 and used the stopping condition algorithm in [35] to estimateÎ. We assigned the weights of edges in TCLT model according to LT model in previous studies [4][5][6][7][8]13]. The weight of the edge (u, v) was calculated as follows, Our implementation was written in C++ and compiled with GCC 4.7. All our experiments were carried out using a Linux machine with a 2 × Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30 GHz 8 × 16 GB DIMM ECC DDR4 @ 2400 MHz

Comparison of Algorithms under General Case
In this experiment, we compared algorithms when τ = 5, the budget L varied from 0 to 100 and the costs of node were uniformly distributed in [1.0, 3.0]. Figure 6 shows the performance of all algorithms on Gnutella, Enron, Epinions, Email-Eu, DBLP and Wiki networks. On average, we observed that our algorithm SPBA always had the best performance. SPBA was 10-30% better than the result of BCT. This is because BCT only considers the number of influenced nodes and ignores the competitive influence under time constraint τ. It also confirmed that IM's algorithms do not have good performance for BCIM problem. Random algorithm had the worst performance of all cases. Degree algorithm performed well on the Enron dataset but had bad performance on the other datasets. SPBA was up to 7.7 times better than Degree. The reason is that Degree only uses the topology properties of the social network but can not consider competitive diffusion process. In the opposite, our algorithm takes advantage of the upper and lower bounds of objective function to obtain the approximation ratio. This explains why our algorithm had good performance while the others had poor performance in many cases.

Comparison of Algorithms under Unit-Cost Setting
To show more clearly the performance of these algorithms, we conducted experiments on the unit-cost case (i.e., all node costs are equal to 1) on Gnutella, Enron, Epinions and Email-EU datasets. We set τ = 5 and L varied from 1 to 100. Figure 7 displays the results of all algorithms. Once again, we found that our algorithm SPBA gave the best performance. SPBA was 1.06-1.76 times better than BCT and 1.    Figure 8 shows running time of algorithms on six datasets. SPBA had the longest running time on any datasets. This is because the running time of SPBA consists of total running time of PBA(L, G, L, , δ), PBA(U, G, L, , δ) and calculatingÎ(·). Random and Degree algorithms are simple heuristic algorithms thus their costs are low. This resulted in their shortest running time. Although BCT is based on polling method, it ran faster than our algorithm. The reason for this result is due to the following reasons. Firstly, the sampling process of BCT and our algorithm are different. The sampling complexity of BCT is mainly dependent on the number of randomly selected node while the sampling process in our algorithm is more complicated because it must check the influence paths from S B . Secondly, to obtain a data approximation, our algorithm must solve three problems with polling based method. It is worth noting that SPBA is scalable with million-scale networks. For Wiki network, which has 1.79 millions nodes and 28.5 millions edges, our algorithm finished in 90 s.

Impact of τ
Considering the importance of early competitive influence in viral marketing, we were very interested in the role of time constraint in influence. We compared our solution with three other algorithms while varying τ from 3 to 5. Figure 9 shows results of algorithms when L = 50. SPBA was clearly still the best performer. Specifically, our SPBA was 1.01-1.23 times better than BCT and up to 2.5 times better than Degree.

Conclusions
In this paper, we investigate BCIM problem, which finds the seed set of a player to maximize their influence while their competitors are conducting similar strategies. We first propose TCLT model to capture the competitive influence of two competitors on a social network and formulate BCIM in this model. We provide the hardness results and properties of objective function. A randomized SPBA-based approximation is proposed for finding the solution of BCIM. Experiments on real world social networks were conducted. The results show that our proposed algorithm outperformed the other heuristics.
Cases 1: Total A-influence weights exceed threshold θ A (v) while the total B-influence weights is smaller than θ B (v) in step t + 1. The probability that this case happens is: Cases 2: Total A-influence weights exceed threshold θ A (v) while the total negative influence weights also exceed than threshold θ B (v). The probability that this case happens is: In this case, TB-WPP rule is used to determine state of v. According to this rule, the probability node v is A-activated at step t + 1 is equal to On CLE model, we assume that A t and B t are the set of A-active set and B-active set nodes at step t. For v / ∈ B t ∪ A t , the probability that v has an in-edge from A t and does not have an in-edge from B t (called P 1,t (v)) is equal to We denote P 2,t (v) as the probability v has both in-edge from A t and in-edge from B t . We have According to CLE model, the probability v is A-activated in this case is P 2,t (v) · p A (v|A t−1 , B t−1 ). Thus, the probability v is A-activated at step t + 1 is P 1,t (v) + P 2,t (v) · p A (v|A t−1 , B t−1 ) = P 1,t (v) + P 2,t (v) · p A (v|A t−1 , B t−1 ) Due to A 0 = A 0 = A, by step-by-step induction, we reach the conclusion that the random CLE model producing the same distribution over A-active sets as the TCLT model at any hop t = 0, 1, . . . , τ. Similarly, we obtain two models produce the same distribution over B-active sets.