A Distributed Bagging Ensemble Methodology for Community Prediction in Social Networks
Abstract
:1. Introduction
2. Background
2.1. Classic Community Detection Methods and Algorithms
- Projecting the original community detection definition to conceptually different contexts and thereby transform the graph representation to other alternatives, aka transformation methods,
- Enhancing the community extraction criteria with the inherent vertex information, aka node attributes information or profile information, and by
- Applying general probabilistic heuristic criteria.
2.2. Community Detection and Prediction Concept Combination
2.3. Ensemble Learning Methods
3. Proposed Methodology
- The number of common neighbors: this is the number of common nodes for the sets of the kth-depth nodes of the corresponding subjacent vertices. It should be noted that since a common node might be of a different depth for each impeding node, each kth-depth set should include all nodes up to the kth-depth.
- The loose similarity: this is the fraction of the relevant kth-depth sets’ common nodes, over the union’s total number of nodes.
- The dissimilarity: this is the fraction of the relevant kth-depth sets’ uncommon nodes, over the union’s total number of nodes.
- The edge balance: this is the fraction of the absolute difference between of the relevant kth-depth distinct sets, over the union’s total number of nodes. This feature’s value ranges from 0 to 1, with values close to 0 indicating a balanced edge that have merely equal kth-depth vertices on both sides.
- The edge information: considering an edge on the graph’s kth-power adjacency matrix, this metric quantifies the number of distinct kth-length paths between the impeding vertices [23] and indicates the impending pair of nodes’ interconnection strength.
4. Experiments and Results Discussion
- the community prediction methodology requiring the extraction of a representative subgraph [4], which will be now referred as the representative subgraph’s community prediction methodology in terms of brevity,
- Accuracy: The fraction of the correctly classified edges, either inter-connected or intra-connected, over all the predictions made.
- Specificity: The fraction of the correctly predicted intra-connection edges over all the intra-connection predictions made.
- Sensitivity: The fraction of the correctly predicted inter-connection edges over all the inter-connection predictions made.
- Precision: The fraction of the correctly predicted inter-connection edges over the truly inter-connection edges.
5. Conclusions
- The adoption of more accurate and more complex base classifiers,
- The consideration of more sophisticated subgraph extraction techniques that would ensure data graph’s descriptiveness and exclude information multi-occurrence,
- The examination of the depth-first search strategy during the subgraphs extraction step,
- The reconsideration and the potential extension of the network topology metrics used,
- The prediction enhancement by including the node attributes information,
- The consideration of different community detection algorithms in terms of subgraphs’ edges classification.
Author Contributions
Funding
Conflicts of Interest
References
- Internet Fact. Available online: https://hostingfacts.com/internet-facts-stats/ (accessed on 12 February 2020).
- Schaeffer, S. Graph Clustering. Comput. Sci. Rev. 2007, 1, 27–64. [Google Scholar] [CrossRef]
- Lancichinetti, A.; Kivelä, M.; Saramäki, J.; Fortunato, S. Characterizing the community structure of complex networks. PLoS ONE 2010, 5, e11976. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Makris, C.; Pettas, D.; Pispirigos, G. Distributed Community Prediction for Social Graphs Based on Louvain Algorithm. AIAI 2019, 500–511. [Google Scholar] [CrossRef]
- Fortunato, S. Community detection in graphs. CoRR. arXiv 2009, arXiv:0906.0612. [Google Scholar]
- Newman, M.E.; Girvan, M. Finding and Evaluating Community Structure in Networks. Phys. Rev. 2004, 69, 026113. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Girvan, M.; Newman, M.E. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 2002, 99, 7821–7826. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jia, C.; Li, Y.; Carson, M.B.; Wang, X.; Yu, J. Node Attribute-enhanced Community Detection in Complex Networks. Sci. Rep. 2017, 7, 2626. [Google Scholar] [CrossRef] [Green Version]
- Devi, J.; Eswaran, P. An Analysis of Overlapping Community Detection Algorithms in Social Networks. Procedia Comput. Sci. 2016, 89, 349–358. [Google Scholar] [CrossRef] [Green Version]
- Blondel, V.D.; Guillaume, J.L.; Lambiotte, R.; Lefebvre, E. Fast unfolding of community hierarchies in large networks. J. Stat. Mech. Theory Exp. 2008, 2008, P10008. [Google Scholar] [CrossRef] [Green Version]
- Peel, L.; Larremore, D.B.; Clauset, A. The ground truth about metadata and community detection in networks. Sci. Adv. 2017, 3, e1602548. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Takaffoli, M.; Rabbany, R.; Zaïane, O.R. Community Evolution Prediction in Dynamic Social Networks. In Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014), Beijing, China, 17–20 August 2014; pp. 9–16. [Google Scholar] [CrossRef]
- Appel, A.P.; Cunha, R.L.; Aggarwal, C.C.; Terakado, M.M. Temporally Evolving Community Detection and Prediction in Content-Centric Networks. In Proceedings of the European Conference, ECML PKDD 2018, Dublin, Ireland, 10–14 September 2018. [Google Scholar] [CrossRef] [Green Version]
- Cherifi, H.; Gonçalves, B.; Menezes, R.; Sinatra, R. Improving Network Community Structure with Link Prediction Ranking. In Complex Networks VII; Springer International Publishing: Cham, Switzerland, 2016. [Google Scholar] [CrossRef]
- Cheng, H.-M.; Zhang, Z. Community Detection Based on Link Prediction Methods. CoRR. arXiv 2016, arXiv:1611.00254. [Google Scholar]
- Zamani, M.; Schwartz, H.A.; Lynn, V.E.; Giorgi, S.; Balasubramanian, N. Residualized Factor Adaptation for Community Social Media Prediction Tasks. arXiv 2018, arXiv:1808.09479. [Google Scholar]
- Shao, J.; Zhang, Z.; Yu, Z.; Wang, J.; Zhao, Y.; Yang, Q. Community Detection and Link Prediction via Cluster-driven Low-rank Matrix Completion. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 3382–3388. [Google Scholar] [CrossRef] [Green Version]
- Gao, F.; Musial, K.; Gabrys, B. A Community Bridge Boosting Social Network Link Prediction Model. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Sydney, Australia, 31 July–3 August 2017; pp. 683–689. [Google Scholar] [CrossRef]
- Li, L.; Fang, S.; Bai, S.; Xu, S.; Cheng, J.; Chen, X. Effective Link Prediction Based on Community Relationship Strength. IEEE Access 2019, 7, 43233–43248. [Google Scholar] [CrossRef]
- Sethu, H.; Chu, X. A new algorithm for extracting a small representative subgraph from a very large graph. arXiv 2012, arXiv:1207.4825. [Google Scholar]
- Cukierski, W.; Hamner, B.; Yang, B. Graph-based features for supervised link prediction. In Proceedings of the International Joint Conference on Neural Networks, San Jose, CA, USA, 31 July–5 August 2011; pp. 1237–1244. [Google Scholar] [CrossRef]
- Sotera Distributed Graph Analytics (DGA): Sotera Defence Solution. Available online: https://github.com/Sotera/spark-distributed-louvain-modularity.git (accessed on 12 February 2020).
- Kranda, D. The Square of Adjacency Matrices. arXiv 2012, arXiv:1207.3122. [Google Scholar]
- Meng, X.; Bradley, J.; Yavuz, B.; Sparks, E.; Venkataraman, S.; Liu, D.; Freeman, J.; Tsai, D.B.; Amde, M.; Owen, S.; et al. MLlib: Machine Learning in Apache Spark. J. Mach. Learn. Res. 2016, 17, 1235–1241. [Google Scholar]
- NetworkX Clauset–Newman–Moore Implementation. Available online: https://networkx.github.io/documentation/latest/reference/algorithms/generated/networkx.algorithms.community.modularity_max.greedy_modularity_communities.html (accessed on 12 January 2020).
- Zachary Karate Club Network Dataset. KONECT, April 2017. Available online: http://konect.uni-koblenz.de/networks/ucidata-zachary (accessed on 12 February 2020).
- Dolphins Network Dataset—KONECT, April 2017. Available online: http://konect.uni-koblenz.de/networks/dolphins (accessed on 12 February 2020).
- Hamster Friendships Network Dataset. KONECT, April 2017. Available online: http://konect.uni-koblenz.de/networks/petster-friendships-hamster (accessed on 12 February 2020).
- Kumar, S.; Hooi, B.; Makhija, D.; Kumar, M.; Subrahmanian, V.S.; Faloutsos, C. REV2: Fraudulent User Prediction in Rating Platforms. 11th ACM International Conference on Web Searchand Data Mining (WSDM). 2018. Available online: https://snap.stanford.edu/data/soc-sign-bitcoin-alpha.html (accessed on 12 February 2020).
- Yin, H.; Benson, A.R.; Leskovec, J.; Gleich, D.F. Local Higher-order Graph Clustering. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2017. Available online: https://snap.stanford.edu/data/email-Eu-core.html (accessed on 12 February 2020).
- McAuley, J.; Leskovec, J. Learning to Discover Social Circles in Ego Networks. NIPS. 2012. Available online: https://snap.stanford.edu/data/ego-Facebook.html (accessed on 12 February 2020).
- Klimmt, B.; Yang, Y. Introducing the Enron corpus. CEAS Conference. 2004. Available online: https://snap.stanford.edu/data/email-Enron.html (accessed on 12 February 2020).
- Douban Network Dataset. KONECT, April 2017. Available online: http://konect.uni-koblenz.de/networks/douban (accessed on 12 February 2020).
- Richardson, M.; Agrawal, R.; Domingos, P. Trust Management for the Semantic Web. ISWC, 2003. Available online: https://snap.stanford.edu/data/soc-Epinions1.html (accessed on 12 February 2020).
Algorithm | Computational Complexity |
---|---|
Girvan–Newman [6] | O(m * n2) |
Random-walk edge betweenness [3] | O(m * n + n3) |
Latora–Marchiori [2] | O(m3 * n) |
Newman [2] | O(m * n + n2) |
Clauset–Newman–Moore [3] | O(m * d * (log n)) |
Spectral clustering [2] | O(n3) |
Current-flow betweenness [2] | O(m * n + n3) |
Louvain [10] | O(n * (log n) * P) |
Evaluated Dataset | Number of Nodes | Number of Edges | Average Degree |
---|---|---|---|
Karate [26] | 32 | 78 | 4.59 |
Dolphins [27] | 64 | 159 | 5 |
Hamsterster [28] | 1858 | 12534 | 13.49 |
Bitcoin [29] | 3783 | 24186 | 12.78 |
Email-Eu-Core [30] | 1005 | 25571 | 31.96 |
Facebook [31] | 4039 | 88234 | 43.69 |
Email-Enron [32] | 36692 | 183831 | 10.02 |
Douban [33] | 154908 | 327162 | 4.23 |
Epinions [34] | 75879 | 508837 | 10.69 |
Evaluated Dataset | Ratio of the Training Dataset over the Complete Available Dataset | Number of Bagging Predictor’s Base Estimators |
---|---|---|
Karate [26] | 35% | 3 |
Dolphins [27] | 35% | 3 |
Hamsterster [28] | 16% | 5 |
Bitcoin [29] | 10% | 5 |
Email-Eu-Core [30] | 10% | 5 |
Facebook [31] | 8% | 3 |
Email-Enron [32] | 8% | 5 |
Douban [33] | 2% | 3 |
Epinions [34] | 3% | 3 |
Evaluated Dataset | Accuracy | Specificity | Sensitivity | Precision |
---|---|---|---|---|
Karate [26] | 28.21% | 30.65% | 18.75% | 19.38% |
Dolphins [27] | 17.11% | 14.43% | 25.95% | 21.35% |
Hamsterster [28] | 27.07% | 31.89% | 13.2% | 25.36% |
Bitcoin [29] | 26.02% | 35.52% | −28.49% | 14.7% |
Email-Eu-Core [30] | 8.59% | 12.34% | 0.66% | 10.76% |
Facebook [31] | 26.03% | 27.27% | −5.03% | 12.02% |
Email-Enron [32] | 4.04% | 6.86% | −9.03% | 3.94% |
Douban [33] | 3.65% | 13.14% | −22.93% | 7.17% |
Epinions [34] | 14.1% | 20.42% | −22.54% | 4.06% |
Average | 17.2% | 21.39% | −3.27% | 13.19% |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Makris, C.; Pispirigos, G.; Rizos, I.O. A Distributed Bagging Ensemble Methodology for Community Prediction in Social Networks. Information 2020, 11, 199. https://doi.org/10.3390/info11040199
Makris C, Pispirigos G, Rizos IO. A Distributed Bagging Ensemble Methodology for Community Prediction in Social Networks. Information. 2020; 11(4):199. https://doi.org/10.3390/info11040199
Chicago/Turabian StyleMakris, Christos, Georgios Pispirigos, and Ioannis Orestis Rizos. 2020. "A Distributed Bagging Ensemble Methodology for Community Prediction in Social Networks" Information 11, no. 4: 199. https://doi.org/10.3390/info11040199
APA StyleMakris, C., Pispirigos, G., & Rizos, I. O. (2020). A Distributed Bagging Ensemble Methodology for Community Prediction in Social Networks. Information, 11(4), 199. https://doi.org/10.3390/info11040199