Estimating the Number of Communities in Weighted Networks
Abstract
:1. Introduction
2. Methodology
2.1. The Degree-Corrected Distribution-Free Model
2.2. Estimation of the Number of Communities
- Let be the top-k eigendecomposition of A.
- Let the matrix be the row normalization of such that for .
- Apply k-means algorithm on all rows of with k clusters to obtain .
3. Experimental Results
3.1. Simulations
3.1.1. Bernoulli Distribution
3.1.2. Binomial Distribution
3.1.3. Poisson Distribution
3.1.4. Geometric Distribution
3.1.5. Exponential Distribution
3.1.6. Normal Distribution
3.1.7. Laplace Distribution
3.1.8. Uniform Distribution
3.1.9. Signed Networks
3.2. Real-World Networks
4. Conclusions and Future Work
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Barabási, A.L.; Albert, R. Emergence of scaling in random networks. Science 1999, 286, 509–512. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Albert, R.; Barabási, A.L. Statistical mechanics of complex networks. Rev. Mod. Phys. 2002, 74, 47. [Google Scholar] [CrossRef] [Green Version]
- Newman, M.E. The structure and function of complex networks. SIAM Rev. 2003, 45, 167–256. [Google Scholar] [CrossRef] [Green Version]
- Boccaletti, S.; Latora, V.; Moreno, Y.; Chavez, M.; Hwang, D.U. Complex networks: Structure and dynamics. Phys. Rep. 2006, 424, 175–308. [Google Scholar] [CrossRef]
- Lusseau, D.; Newman, M.E. Identifying the role that animals play in their social networks. Proc. R. Soc. Lond. Ser. B Biol. Sci. 2004, 271, S477–S481. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Guimera, R.; Nunes Amaral, L.A. Functional cartography of complex metabolic networks. Nature 2005, 433, 895–900. [Google Scholar] [CrossRef] [Green Version]
- Barabasi, A.L.; Oltvai, Z.N. Network biology: Understanding the cell’s functional organization. Nat. Rev. Genet. 2004, 5, 101–113. [Google Scholar] [CrossRef]
- Palla, G.; Barabási, A.L.; Vicsek, T. Quantifying social group evolution. Nature 2007, 446, 664–667. [Google Scholar] [CrossRef] [Green Version]
- Bullmore, E.; Sporns, O. Complex brain networks: Graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci. 2009, 10, 186–198. [Google Scholar] [CrossRef]
- Foster, J. From simplistic to complex systems in economics. Camb. J. Econ. 2005, 29, 873–892. [Google Scholar] [CrossRef] [Green Version]
- Schweitzer, F.; Fagiolo, G.; Sornette, D.; Vega-Redondo, F.; Vespignani, A.; White, D.R. Economic networks: The new challenges. Science 2009, 325, 422–425. [Google Scholar] [CrossRef]
- Pastor-Satorras, R.; Castellano, C.; Van Mieghem, P.; Vespignani, A. Epidemic processes in complex networks. Rev. Mod. Phys. 2015, 87, 925. [Google Scholar] [CrossRef] [Green Version]
- Chow, K.; Ay, A.; Elhesha, R.; Kahveci, T. ANCA: Alignment-based network construction algorithm. In Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, Washington, DC, USA, 29 August–1 September 2018; pp. 21–26. [Google Scholar]
- Elhesha, R.; Sarkar, A.; Cinaglia, P.; Boucher, C.; Kahveci, T. Co-evolving patterns in temporal networks of varying evolution. In Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, New York, NY, USA, 7–10 September 2019; pp. 494–503. [Google Scholar]
- Cinaglia, P.; Cannataro, M. Network alignment and motif discovery in dynamic networks. Netw. Model. Anal. Health Inform. Bioinform. 2022, 11, 38. [Google Scholar] [CrossRef]
- Newman, M.E. Analysis of weighted networks. Phys. Rev. E 2004, 70, 056131. [Google Scholar] [CrossRef] [Green Version]
- Fortunato, S. Community detection in graphs. Phys. Rep. 2010, 486, 75–174. [Google Scholar] [CrossRef] [Green Version]
- Fortunato, S.; Hric, D. Community detection in networks: A user guide. Phys. Rep. 2016, 659, 1–44. [Google Scholar] [CrossRef] [Green Version]
- Newman, M.E. The structure of scientific collaboration networks. Proc. Natl. Acad. Sci. USA 2001, 98, 404–409. [Google Scholar] [CrossRef]
- Ji, P.; Jin, J. Coauthorship and citation networks for statisticians. Ann. Appl. Stat. 2016, 10, 1779–1812. [Google Scholar] [CrossRef]
- Ji, P.; Jin, J.; Ke, Z.T.; Li, W. Co-citation and Co-authorship Networks of Statisticians. J. Bus. Econ. Stat. 2022, 40, 469–485. [Google Scholar] [CrossRef]
- Schwikowski, B.; Uetz, P.; Fields, S. A network of protein–protein interactions in yeast. Nat. Biotechnol. 2000, 18, 1257–1261. [Google Scholar] [CrossRef]
- Ideker, T.; Sharan, R. Protein networks in disease. Genome Res. 2008, 18, 644–652. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Holland, P.W.; Laskey, K.B.; Leinhardt, S. Stochastic blockmodels: First steps. Soc. Netw. 1983, 5, 109–137. [Google Scholar] [CrossRef]
- Rohe, K.; Chatterjee, S.; Yu, B. Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Stat. 2011, 39, 1878–1915. [Google Scholar] [CrossRef] [Green Version]
- Amini, A.A.; Chen, A.; Bickel, P.J.; Levina, E. Pseudo-likelihood methods for community detection in large sparse networks. Ann. Stat. 2013, 41, 2097–2122. [Google Scholar] [CrossRef]
- Lei, J.; Rinaldo, A. Consistency of spectral clustering in stochastic block models. Ann. Stat. 2015, 43, 215–237. [Google Scholar] [CrossRef]
- Jin, J. Fast community detection by SCORE. Ann. Stat. 2015, 43, 57–89. [Google Scholar] [CrossRef]
- Joseph, A.; Yu, B. Impact of regularization on spectral clustering. Ann. Stat. 2016, 44, 1765–1791. [Google Scholar] [CrossRef]
- Mao, X.; Sarkar, P.; Chakrabarti, D. On Mixed Memberships and Symmetric Nonnegative Matrix Factorizations. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 2324–2333. [Google Scholar]
- Chen, Y.; Li, X.; Xu, J. Convexified modularity maximization for degree-corrected stochastic block models. Ann. Stat. 2018, 46, 1573–1602. [Google Scholar] [CrossRef] [Green Version]
- Zhang, Y.; Levina, E.; Zhu, J. Detecting overlapping communities in networks using spectral methods. SIAM J. Math. Data Sci. 2020, 2, 265–283. [Google Scholar] [CrossRef]
- Mao, X.; Sarkar, P.; Chakrabarti, D. Overlapping Clustering Models, and One (class) SVM to Bind Them All. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; Volume 31, pp. 2126–2136. [Google Scholar]
- Mao, X.; Sarkar, P.; Chakrabarti, D. Estimating Mixed Memberships With Sharp Eigenvector Deviations. J. Am. Stat. Assoc. 2020, 116, 1928–1940. [Google Scholar] [CrossRef] [Green Version]
- Li, X.; Chen, Y.; Xu, J. Convex relaxation methods for community detection. Stat. Sci. 2021, 36, 2–15. [Google Scholar] [CrossRef]
- Jing, B.; Li, T.; Ying, N.; Yu, X. Community detection in sparse networks using the symmetrized laplacian inverse matrix (slim). Stat. Sin. 2022, 32, 1. [Google Scholar] [CrossRef]
- Newman, M.E.; Reinert, G. Estimating the number of communities in a network. Phys. Rev. Lett. 2016, 117, 078301. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bickel, P.J.; Sarkar, P. Hypothesis testing for automated community detection in networks. J. R. Stat. Soc. Ser. B Stat. Methodol. 2016, 78, 253–273. [Google Scholar] [CrossRef] [Green Version]
- Lei, J. A goodness-of-fit test for stochastic block models. Ann. Stat. 2016, 44, 401–424. [Google Scholar] [CrossRef]
- Riolo, M.A.; Cantwell, G.T.; Reinert, G.; Newman, M.E. Efficient method for estimating the number of communities in a network. Phys. Rev. E 2017, 96, 032310. [Google Scholar] [CrossRef] [Green Version]
- Saldaña, D.F.; Yu, Y.; Feng, Y. How many communities are there. J. Comput. Graph. Stat. 2017, 26, 171–181. [Google Scholar] [CrossRef] [Green Version]
- Wang, Y.R.; Bickel, P.J. Likelihood-based model selection for stochastic block models. Ann. Stat. 2017, 45, 500–528. [Google Scholar] [CrossRef] [Green Version]
- Yan, B.; Sarkar, P.; Cheng, X. Provable estimation of the number of blocks in block models. In Proceedings of the International Conference on Artificial Intelligence and Statistics, PMLR, Virtual Event, 28–30 March 2022; pp. 1185–1194. [Google Scholar]
- Chen, K.; Lei, J. Network cross-validation for determining the number of communities in network data. J. Am. Stat. Assoc. 2018, 113, 241–251. [Google Scholar] [CrossRef] [Green Version]
- Ma, S.; Su, L.; Zhang, Y. Determining the number of communities in degree-corrected stochastic block models. J. Mach. Learn. Res. 2021, 22. [Google Scholar]
- Le, C.M.; Levina, E. Estimating the number of communities by spectral methods. Electron. J. Stat. 2022, 16, 3315–3342. [Google Scholar] [CrossRef]
- Jin, J.; Ke, Z.T.; Luo, S.; Wang, M. Optimal estimation of the number of network communities. J. Am. Stat. Assoc. 2022. [Google Scholar] [CrossRef]
- Aicher, C.; Jacobs, A.Z.; Clauset, A. Learning latent block structure in weighted networks. J. Complex Netw. 2015, 3, 221–248. [Google Scholar] [CrossRef] [Green Version]
- Jog, V.; Loh, P.L. Recovering communities in weighted stochastic block models. In Proceedings of the 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 29 September–2 October 2015; pp. 1308–1315. [Google Scholar]
- Ahn, K.; Lee, K.; Suh, C. Hypergraph Spectral Clustering in the Weighted Stochastic Block Model. IEEE J. Sel. Top. Signal Process. 2018, 12, 959–974. [Google Scholar] [CrossRef] [Green Version]
- Palowitch, J.; Bhamidi, S.; Nobel, A.B. Significance-based community detection in weighted networks. J. Mach. Learn. Res. 2018, 18, 1–48. [Google Scholar]
- Peixoto, T.P. Nonparametric weighted stochastic block models. Phys. Rev. E 2018, 97, 12306. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Xu, M.; Jog, V.; Loh, P.L. Optimal rates for community estimation in the weighted stochastic block model. Ann. Stat. 2020, 48, 183–204. [Google Scholar] [CrossRef] [Green Version]
- Ng, T.L.J.; Murphy, T.B. Weighted stochastic block model. Stat. Methods Appl. 2021, 30, 1365–1398. [Google Scholar] [CrossRef] [PubMed]
- Qing, H. Distribution-Free Model for Community Detection. Prog. Theor. Exp. Phys. 2023, 2023, 033A01. [Google Scholar] [CrossRef]
- Qing, H. Degree-corrected distribution-free model for community detection in weighted networks. Sci. Rep. 2022, 12, 15153. [Google Scholar] [CrossRef]
- Karrer, B.; Newman, M.E.J. Stochastic blockmodels and community structure in networks. Phys. Rev. E 2011, 83, 16107. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Gómez, S.; Jensen, P.; Arenas, A. Analysis of community structure in networks of correlated data. Phys. Rev. E 2009, 80, 016114. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Newman, M.E.J. Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 2006, 103, 8577–8582. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Budel, G.; Van Mieghem, P. Detecting the number of clusters in a network. J. Complex Netw. 2020, 8, cnaa047. [Google Scholar] [CrossRef]
- Yang, B.; Cheung, W.; Liu, J. Community mining from signed social networks. IEEE Trans. Knowl. Data Eng. 2007, 19, 1333–1348. [Google Scholar] [CrossRef]
- Liu, W.; Jiang, X.; Pellegrini, M.; Wang, X. Discovering communities in complex networks by edge label propagation. Sci. Rep. 2016, 6, 22470. [Google Scholar] [CrossRef]
- Zachary, W.W. An information flow model for conflict and fission in small groups. J. Anthropol. Res. 1977, 33, 452–473. [Google Scholar] [CrossRef] [Green Version]
- Read, K.E. Cultures of the central highlands, New Guinea. Southwest. J. Anthropol. 1954, 10, 1–43. [Google Scholar] [CrossRef]
- Ferligoj, A.; Kramberger, A. An analysis of the slovene parliamentary parties network. Dev. Stat. Methodol. 1996, 12, 209–216. [Google Scholar]
- Lusseau, D.; Schneider, K.; Boisseau, O.J.; Haase, P.; Slooten, E.; Dawson, S.M. The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations. Behav. Ecol. Sociobiol. 2003, 54, 396–405. [Google Scholar] [CrossRef]
- Girvan, M.; Newman, M.E. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 2002, 99, 7821–7826. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Newman, M.E. Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 2006, 74, 036104. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Adamic, L.A.; Glance, N. The political blogosphere and the 2004 US election: Divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery, Chicago, IL, USA, 21–25 August 2005; pp. 36–43. [Google Scholar]
- Qing, H. Mixed membership distribution-free model. arXiv 2021, arXiv:2112.04389. [Google Scholar]
Dataset | Source | n | K | Weighted? | nDFAwm | ME | NB | BHm | BHa | BHmc | BHac |
---|---|---|---|---|---|---|---|---|---|---|---|
Karate club (weighted) | [63] | 34 | 2 | Yes | 2 | 2 | 4 | 4 | 4 | 4 | 4 |
Gahuku-Gama subtribes | [64] | 16 | 3 | Yes | 3 | N/A | 1 | 1 | 12 | N/A | 13 |
Slovene Parliamentary Party | [65] | 10 | 2 | Yes | 2 | 2 | N/A | N/A | N/A | N/A | N/A |
Dolphins | [66] | 62 | 2, 4 | No | 4 | 2 | 2 | 2 | 2 | 2 | 2 |
College football | [67] | 110 | 11 | No | 11 | 10 | 10 | 10 | 10 | 10 | 10 |
Karate club | [63] | 34 | 2 | No | 2 | 34 | 2 | 2 | 2 | 2 | 2 |
Political books | [68] | 105 | 3 | No | 4 | 2 | 3 | 3 | 4 | 4 | 4 |
Political blogs | [69] | 1222 | 2 | No | 2 | 2 | 7 | 7 | 7 | 8 | 8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Qing, H. Estimating the Number of Communities in Weighted Networks. Entropy 2023, 25, 551. https://doi.org/10.3390/e25040551
Qing H. Estimating the Number of Communities in Weighted Networks. Entropy. 2023; 25(4):551. https://doi.org/10.3390/e25040551
Chicago/Turabian StyleQing, Huan. 2023. "Estimating the Number of Communities in Weighted Networks" Entropy 25, no. 4: 551. https://doi.org/10.3390/e25040551
APA StyleQing, H. (2023). Estimating the Number of Communities in Weighted Networks. Entropy, 25(4), 551. https://doi.org/10.3390/e25040551