Multiple Benefit Thresholds Problem in Online Social Networks: An Algorithmic Approach
Abstract
:1. Introduction
- The Multiple Benefit Thresholds (MBT) is first formulated with the Independent Cascade (IC) information diffusion model.
- With a view to developing the solution, the Efficient Sampling for Multiple Seed Set Selection (ESSM) is proposed, a theoretical approximation algorithm bounds by developing a novel algorithmic framework that utilizes the sample technique to estimate the benefit function, denoted as , and leverages the seed set and the samples with smaller benefit threshold with the purpose of finding the seed set of the larger ones. Accordingly, our algorithm can find multiple seed sets in only one run. For solution guarantee, our algorithm returns multiple seed sets satisfying and the total cost a strong possibility (w.h.p), where is an input and is the best seed set in terms of threshold for all .
- Extensive experiments on six real-world networks are performed, including Gnutella, Email-Enron, Net-Hept, Net-Phy, Amazon, and DBLP for the comparison of the efficiency between our algorithm and other state-of-the-art ones. The results of experiments indicated that our algorithm outperformed the state-of-the-art ones in respect of both the cost and the running time.
2. Related Works
3. Methodology
3.1. Independent Cascade Model
- At the beginning (step ), all nodes in the seed set are active.
- At the next steps (step ), an node u, which is activated in previous steps, has a single chance to influence each of its neighbors v with the probability of success .
- All active nodes retain their status until the end of the diffusion process, and the process ends at step t if there is no new activated node in this step.
3.2. Problem Definition
3.3. Our Proposed Algorithm
3.3.1. Benefit Sampling
Algorithm 1: An algorithm for generating a BS under the IC model. |
Input: Graph under IC model |
Output: A BS set |
1: Choose a source node u with probability |
2: Initialize a queue and |
3: while Q is not empty do |
4: |
5: for do |
6: With probability do: , ; |
7: end for |
8: end while |
9: return |
3.3.2. ESSM Algorithm
Algorithm 2: ESSM algorithm. |
Input: A graph , , |
Output: |
1: Generate containing BSs by using Algorithm 1 |
2: |
3: for to k do |
4: |
5: |
6: Calculate by Equation (6) |
7: while do |
8: |
9: |
10: |
11: |
12: if then |
13: Generate more BSs and add them into |
14: |
15: |
16: end if |
17: end while |
18: end for |
19: return |
- (a)
- .
- (b)
- .
4. Experiments and Discussion
4.1. Experiment Settings
4.1.1. Datasets
- Gnutella [41] represents Gnutella peer-to-peer file sharing network in August 2002. In this network, 20,777 edges among 6301 nodes show connections among hosts in the Gnutella network topology.
- Email-Enron [42] network covers all the email communication within a dataset of around half a million emails. These originally public data were posted on the web, by the Federal Energy Regulatory Commission during its investigation. Nodes of the network are email addresses and if an address i has sent at least one email to address j, the graph contains an undirected edge. Note that non-Enron email addresses act as sinks and sources in the network as their communication with the Enron email addresses is only under observation. The Enron email data were originally released by William Cohen at CMU.
- Amazon [44] was collected in 2 March 2003 by crawling the Amazon website. It is based on customers who bought an item and also bought features of the Amazon website. If a product i is frequently copurchased with product j, the graph contains a directed edge from i to j.
- DBLP computer science bibliography [45] provides a comprehensive list of research papers in computer science. If two authors publish at least one publication together, they establish a coauthorship network.
4.1.2. Algorithms Compared
- BCT is an algorithm for CTVM problem [4]. BCT is used by comparison due to the similarity between the BCT and CTVM problem by considering the costs and benefits of the nodes. However, due to the differences between MBT and CTVM, BCT is adapted with some modifications as follows: For each threshold , we use a binary search on the cost from range until the reached benefit function falls in , where and returns the seed set with minimum cost.
4.1.3. Parameter Settings
4.2. Experimental Results
4.2.1. Comparison of the Cost
4.2.2. Comparison of Running Time
4.2.3. Comparison of Memory Usage
4.3. Discussions
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Kempe, D.; Kleinberg, J.M.; Tardos, É. Maximizing the spread of influence through a social network. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–27 August 2003; pp. 137–146. [Google Scholar] [CrossRef] [Green Version]
- Tang, Y.; Xiao, X.; Shi, Y. Influence maximization: Near-optimal time complexity meets practical efficiency. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, Snowbird, UT, USA, 22–27 June 2014; pp. 75–86. [Google Scholar] [CrossRef]
- Tang, Y.; Shi, Y.; Xiao, X. Influence Maximization in Near-Linear Time: A Martingale Approach. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Australia, 31 May–4 June 2015; pp. 1539–1554. [Google Scholar] [CrossRef]
- Nguyen, H.T.; Thai, M.T.; Dinh, T.N. A Billion-Scale Approximation Algorithm for Maximizing Benefit in Viral Marketing. IEEE ACM Trans. Netw. 2017, 25, 2419–2429. [Google Scholar] [CrossRef]
- Chen, W.; Lakshmanan, L.V.S.; Castillo, C. Information and Influence Propagation in Social Networks; Synthesis Lectures on Data Management; Morgan & Claypool Publishers: San Rafael, CA, USA, 2013. [Google Scholar] [CrossRef]
- Chen, W.; Wang, C.; Wang, Y. Scalable Influence Maximization for Prevalent Viral Marketing in Large-Scale Social Networks. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 25–28 July 2010; pp. 1029–1038. [Google Scholar]
- Chen, W.; Collins, A.; Cummings, R.; Ke, T.; Liu, Z.; Rincón, D.; Sun, X.; Wang, Y.; Wei, W.; Yuan, Y. Influence Maximization in Social Networks When Negative Opinions May Emerge and Propagate. In Proceedings of the Eleventh SIAM International Conference on Data Mining, Mesa, AZ, USA, 28–30 April 2011; pp. 379–390. [Google Scholar] [CrossRef] [Green Version]
- Kuhnle, A.; Pan, T.; Alim, M.A.; Thai, M.T. Scalable Bicriteria Algorithms for the Threshold Activation Problem in Online Social Networks. In Proceedings of the IEEE Conference on Computer Communications, Atlanta, GA, USA, 1–4 May 2017. [Google Scholar] [CrossRef] [Green Version]
- Pham, C.V.; Duong, H.V.; Bui, B.Q.; Thai, M.T. Budgeted Competitive Influence Maximization on Online Social Networks. In Lecture Notes in Computer Science, Proceedings of the Computational Data and Social Networks— 7th International Conference, CSoNet 2018, Shanghai, China, 18–20 December 2018; Chen, X., Sen, A., Li, W.W., Thai, M.T., Eds.; Springer: Cham, Switzerland, 2018; Volume 11280, pp. 13–24. [Google Scholar] [CrossRef]
- Pham, C.V.; Thai, M.T.; Ha, D.K.; Ngo, D.Q.; Hoang, H.X. Time-Critical Viral Marketing Strategy with the Competition on Online Social Networks. In Lecture Notes in Computer Science Proceedings of the Computational Social Networks—5th International Conference, CSoNet 2016, Ho Chi Minh City, Vietnam, 2–4 August 2016; Nguyen, H.T., Snásel, V., Eds.; Springer: Cham, Switzerland, 2016; Volume 9795, pp. 111–122. [Google Scholar] [CrossRef]
- Pham, C.V.; Dinh, H.M.; Nguyen, H.D.; Xuan, H.H.; Dang, H.T. Limiting the Spread of Epidemics within Time Constraint on Online Social Networks. In Proceedings of the Eight International Symposium on Information and Communication Technology, Nha Trang City, Vietnam, 7–8 December 2017; pp. 262–269. [Google Scholar] [CrossRef]
- Pham, C.V.; Phu, Q.V.; Hoang, H.X.; Pei, J.; Thai, M.T. Minimum budget for misinformation blocking in onlinesocial networks. Comb. Optim. 2019, 38, 1101–1127. [Google Scholar] [CrossRef]
- Budak, C.; Agrawal, D.; El Abbadi, A. Limiting the spread of misinformation in social networks. In Proceedings of the 20th International Conference on World Wide Web, WWW 2011, Hyderabad, India, 28 March–1 April 2011; pp. 665–674. [Google Scholar] [CrossRef]
- Zhang, H.; Alim, M.A.; Li, X.; Thai, M.T.; Nguyen, H.T. Misinformation in Online Social Networks: Detect Them All with a Limited Budget. ACM Trans. Inf. Syst. 2016, 34, 1–24. [Google Scholar] [CrossRef]
- Pham, C.V.; Pham, D.V.; Bui, B.Q.; Nguyen, A.V. Minimum budget for misinformation detection in online social networks with provable guarantees. Optim. Lett. 2022, 16, 515–544. [Google Scholar] [CrossRef]
- Goyal, A.; Lu, W.; Lakshmanan, L.V. Simpath: An Efficient Algorithm for Influence Maximization under the Linear Threshold Model. In Proceedings of the 11th IEEE International Conference on Data Mining, ICDM 2011, Vancouver, BC, Canada, 11–14 December 2011; pp. 211–220. [Google Scholar] [CrossRef]
- Crawford, V.G.; Kuhnle, A.; Thai, M.T. Submodular Cost Submodular Cover with an Approximate Oracle. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, CA, USA, 9–15 June 2019; Chaudhuri, K., Salakhutdinov, R., Eds.; PMLR: Mountain View, CA, USA, 2019; Volume 97, pp. 1426–1435. [Google Scholar]
- Pham, C.V.; Duong, H.V.; Thai, M.T. Importance Sample-Based Approximation Algorithm for Cost-Aware Targeted Viral Marketing. In Proceedings of the Computational Data and Social Networks—8th International Conference, Ho Chi Minh City, Vietnam, 18–20 November 2019; pp. 120–132. [Google Scholar] [CrossRef]
- Pham, P.N.H.; Nguyen, B.T.; Pham, C.V.; Nghia, N.D.; Snásel, V. Efficient Algorithm for Multiple Benefit Thresholds Problem in Online Social Networks. In Proceedings of the 15th IEEE-RIVF International Conference on Computing and Communication Technologies, Hanoi, Vietnam, 19–21 August 2021; pp. 1–6. [Google Scholar] [CrossRef]
- Borgs, C.; Brautbar, M.; Chayes, J.T.; Lucier, B. Maximizing Social Influence in Nearly Optimal Time. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, OR, USA, 5–7 January 2014; pp. 946–957. [Google Scholar] [CrossRef] [Green Version]
- Nguyen, H.T.; Thai, M.T.; Dinh, T.N. Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-scale Networks. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, 26 June–1 July 2016; pp. 695–710. [Google Scholar] [CrossRef] [Green Version]
- Chen, W.; Yuan, Y.; Zhang, L. Scalable Influence Maximization in Social Networks under the Linear Threshold Model. In Proceedings of the ICDM 2010, the 10th IEEE International Conference on Data Mining, Sydney, Australia, 14–17 December 2010; pp. 88–97. [Google Scholar] [CrossRef]
- Bozorgi, A.; Samet, S.; Kwisthout, J.; Wareham, T. Community-based influence maximization in social networks under a competitive linear threshold model. Knowl.-Based Syst. 2017, 134, 149–158. [Google Scholar] [CrossRef]
- Borodin, A.; Filmus, Y.; Oren, J. Threshold Models for Competitive Influence in Social Networks. In Proceedings of the Internet and Network Economics—6th International Workshop, WINE 2010, Stanford, CA, USA, 13–17 December 2010; pp. 539–550. [Google Scholar] [CrossRef]
- Tang, J.; Tang, X.; Xiao, X.; Yuan, J. Online Processing Algorithms for Influence Maximization. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, 10–15 June 2018; Das, G., Jermaine, C.M., Bernstein, P.A., Eds.; pp. 991–1005. [Google Scholar] [CrossRef]
- Akram, M.; Zafar, F. Hybrid Soft Computing Models Applied to Graph Theory. In Studies in Fuzziness and Soft Computing; Springer: Cham, Switzerland, 2020; Volume 380. [Google Scholar] [CrossRef]
- Akram, M.; Luqman, A. Fuzzy Hypergraphs and Related Extensions. In Studies in Fuzziness and Soft Computing; Springer: Singapore, 2020; Volume 390. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, D.; Tan, K. Targeted Influence Maximization for Online Advertisements. PVLDB 2015, 8, 1070–1081. [Google Scholar]
- Barbieri, N.; Bonchi, F.; Manco, G. Topic-aware social influence propagation models. Knowl. Inf. Syst. 2013, 37, 555–584. [Google Scholar] [CrossRef]
- Chen, S.; Fan, J.; Li, G.; Feng, J.; Tan, K.; Tang, J. Online Topic-Aware Influence Maximization. PVLDB 2015, 8, 666–677. [Google Scholar] [CrossRef]
- Li, G.; Chen, S.; Feng, J.; Tan, K.L.; Li, W.-S. Efficient Location-Aware Influence Maximization. In Proceedings of the 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, 16–19 April 2018; pp. 1569–1572. [Google Scholar]
- Wang, X.; Zhang, Y.; Zhang, W.; Lin, X. Efficient Distance-Aware Influence Maximization in Geo-Social Networks. IEEE Trans. Knowl. Data Eng. 2017, 29, 599–612. [Google Scholar] [CrossRef]
- Bharathi, S.; Kempe, D.; Salek, M. Competitive Influence Maximization in Social Networks. In Proceedings of the Internet and Network Economics, Third International Workshop, WINE 2007, San Diego, CA, USA, 12–14 December 2007; pp. 306–311. [Google Scholar] [CrossRef]
- Chen, W.; Lu, W.; Zhang, N. Time-Critical Influence Maximization in Social Networks with Time-Delayed Diffusion Process. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, Toronto, ON, Canada, 22–26 July 2012; pp. 592–598. [Google Scholar]
- Nguyen, H.; Zheng, R. On Budgeted Influence Maximization in Social Networks. IEEE J. Sel. Areas Commun. 2013, 31, 1084–1094. [Google Scholar] [CrossRef] [Green Version]
- Goyal, A.; Bonchi, F.; Lakshmanan, L.V.S.; Venkatasubramanian, S. On minimizing budget and time in influence propagation over social networks. Soc. Netw. Anal. Min. 2013, 3, 179–192. [Google Scholar] [CrossRef]
- Cohen, E.; Delling, D.; Pajor, T.; Werneck, R.F. Sketch-Based Influence Maximization and Computation: Scaling Up with Guarantees. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, Shangai, China, 3–7 November 2014; pp. 629–638. [Google Scholar] [CrossRef] [Green Version]
- Goyal, A.; Lu, W.; Lakshmanan, L.V. CELF++: Optimizing the Greedy Algorithm for Influence Maximization in Social Networks. In Proceedings of the 20th International Conference Companion on World Wide Web, New York, NY, USA, 28 March 2011; pp. 47–48. [Google Scholar]
- Chung, F.R.K.; Lu, L. Survey: Concentration Inequalities and Martingale Inequalities: A Survey. Internet Math. 2006, 3, 79–127. [Google Scholar] [CrossRef] [Green Version]
- Sachdeva, S.; Vishnoi, N.K. Approximation Theory and the Design of Fast Algorithms. arXiv 2013, arXiv:1309.4882. [Google Scholar]
- Leskovec, J.; Kleinberg, J.M.; Faloutsos, C. Graph evolution: Densification and shrinking diameters. TKDD 2007, 1, 2. [Google Scholar] [CrossRef]
- Leskovec, J.; Lang, K.J.; Dasgupta, A.; Mahoney, M.W. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. Internet Math. 2009, 6, 29–123. [Google Scholar] [CrossRef] [Green Version]
- Chen, W.; Wang, Y.; Yang, S. Efficient influence maximization in social networks. In Proceedings of the KDD ’09 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 28 June–1 July 2009; pp. 199–208. [Google Scholar] [CrossRef]
- Leskovec, J.; Adamic, L.A.; Huberman, B.A. From Competition to Complementarity: Comparative Influence Diffusion and Maximization. arXiv 2015, arXiv:1507.00317. [Google Scholar]
- Yang, J.; Leskovec, J. Defining and Evaluating Network Communities based on Ground-truth. Knowl. Inf. Syst. 2015, 42, 181–213. [Google Scholar] [CrossRef] [Green Version]
Notional | Description |
---|---|
The number of nodes and of edges in G, respectively | |
The incoming and outgoing neighbor node set of v. | |
The solution returned by our algorithm for threshold | |
Define the benefit function and an estimation of benefit function | |
The optimal seed set for threshold | |
Dataset | #Nodes | #Edges | Avg. Degree | Source |
---|---|---|---|---|
Gnutella | 6301 | 20,777 | 3.3 | [41] |
Enron | 36,692 | 183,831 | 5.0 | [42] |
Net-Hept | 15,233 | 58,891 | 5.5 | [43] |
Net-Phy | 37,154 | 231,584 | 13.4 | [5] |
Amazon | 262,111 | 1,234,877 | 9.4 | [44] |
DBLP | 317,080 | 1,049,866 | 6.6 | [45] |
Dataset | Threshold | Algorithm | |||
---|---|---|---|---|---|
BCT | IT | DEGREE | ESSM | ||
Gnutella | 300 | 0.758 | 0.77 | 0.855 | 1.02 |
540 | 0.805 | 0.75 | 0.852 | 1.02 | |
780 | 0.805 | 0.758 | 0.719 | 1.02 | |
1020 | 0.758 | 0.789 | 0.75 | 1.02 | |
1260 | 0.758 | 0.789 | 0.723 | 1.02 | |
1500 | 0.809 | 0.816 | 0.723 | 1.02 | |
1740 | 0.824 | 0.855 | 0.785 | 1.02 | |
1980 | 0.824 | 0.813 | 0.746 | 1.02 | |
Email-Enron | 3300 | 4859.98 | 1.051 | 0.746 | 0.813 |
3350 | 4874.96 | 1.051 | 0.855 | 0.809 | |
3400 | 4841.89 | 1.051 | 0.77 | 0.715 | |
3450 | 4863.27 | 1.051 | 0.75 | 0.855 | |
3500 | 4839.59 | 2.328 | 0.809 | 0.75 | |
3550 | 4856.99 | 2.582 | 0.711 | 0.816 | |
3600 | 4858.67 | 2.582 | 0.746 | 0.7 | |
3650 | 4835.6 | 2.582 | 0.855 | 0.715 | |
Net-Hept | 2800 | 0.723 | 0.711 | 0.711 | 0.77 |
2850 | 0.723 | 0.742 | 0.855 | 0.77 | |
2900 | 0.723 | 0.75 | 0.754 | 0.77 | |
2950 | 0.77 | 0.75 | 0.855 | 0.77 | |
3000 | 0.805 | 0.77 | 0.754 | 0.77 | |
3050 | 0.746 | 0.809 | 0.809 | 0.77 | |
3100 | 0.75 | 0.715 | 0.75 | 0.77 | |
3150 | 0.75 | 0.754 | 0.809 | 0.77 | |
Net-Phy | 1700 | 2800.25 | 0.805 | 20.66 | 1.117 |
1900 | 1444.21 | 0.723 | 20.66 | 1.117 | |
2100 | 1446.43 | 0.82 | 20.66 | 1.117 | |
2300 | 1442.56 | 0.82 | 20.66 | 1.117 | |
2500 | 1434.55 | 0.867 | 20.66 | 1.117 | |
2700 | 1429.17 | 0.75 | 20.66 | 1.117 | |
2900 | 1426.53 | 0.758 | 20.66 | 1.117 | |
3100 | 1437.99 | 0.793 | 20.66 | 1.117 | |
Amazon | 600 | 0.195 | N/A | 0.723 | 12.453 |
1800 | 0.742 | N/A | 0.789 | 12.453 | |
3000 | 0.742 | N/A | 0.809 | 12.512 | |
4200 | 0.746 | N/A | 0.719 | 12.512 | |
5400 | 0.715 | N/A | 0.813 | 12.512 | |
6600 | 0.715 | N/A | 0.746 | 12.512 | |
7800 | 0.805 | N/A | 0.758 | 12.512 | |
9000 | 0.715 | N/A | 0.742 | 12.512 | |
DBLP | 1280 | 26,316.8 | N/A | 0.711 | 19.121 |
1400 | 41,369.6 | N/A | 0.715 | 19.227 | |
1520 | 26,009.6 | N/A | 0.813 | 19.227 | |
1640 | 24,883.2 | N/A | 0.816 | 19.227 | |
1760 | 24,883.2 | N/A | 0.719 | 19.227 | |
1880 | 24,883.2 | N/A | 0.711 | 19.227 | |
2000 | 24,883.2 | N/A | 0.711 | 19.227 | |
2120 | 24,883.2 | N/A | 0.754 | 19.227 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pham, P.N.H.; Nguyen, B.-N.T.; Co, Q.T.N.; Snášel, V. Multiple Benefit Thresholds Problem in Online Social Networks: An Algorithmic Approach. Mathematics 2022, 10, 876. https://doi.org/10.3390/math10060876
Pham PNH, Nguyen B-NT, Co QTN, Snášel V. Multiple Benefit Thresholds Problem in Online Social Networks: An Algorithmic Approach. Mathematics. 2022; 10(6):876. https://doi.org/10.3390/math10060876
Chicago/Turabian StylePham, Phuong N. H., Bich-Ngan T. Nguyen, Quy T. N. Co, and Václav Snášel. 2022. "Multiple Benefit Thresholds Problem in Online Social Networks: An Algorithmic Approach" Mathematics 10, no. 6: 876. https://doi.org/10.3390/math10060876
APA StylePham, P. N. H., Nguyen, B.-N. T., Co, Q. T. N., & Snášel, V. (2022). Multiple Benefit Thresholds Problem in Online Social Networks: An Algorithmic Approach. Mathematics, 10(6), 876. https://doi.org/10.3390/math10060876