Dynamics-Preserving Graph Embedding for Community Mining and Network Immunization
Abstract
:1. Introduction
1.1. Motivation
1.2. Related Work
1.3. Our Contributions
- We develop a dynamics-preserving graph embedding method (EpiEm) to generate node representations that preserve the dynamic characteristics of the epidemic spreading on networks. Specifically, we first generate a set of propagation sequences by simulating the Susceptible-Infectious process on a network, and then learning node representations from an influence matrix using the singular value decomposition method.
- We propose an embedding-based network immunization strategy to immunize network nodes based on the preserved infectiousness and vulnerability in their representations. Such representations embed not only the structural properties of the network, but also the epidemic dynamics on the network.
- By conducting experiments on both synthetic and real-world networks, we demonstrate that the proposed embedding method outperforms the state-of-the-art graph embedding methods in terms of community mining tasks. Moreover, we also show that the embedding-based network immunization strategy outperforms several typical network immunization strategies by considering the dynamic characteristics of epidemic spreading.
2. EpiEm: A Dynamics-Preserving Graph Embedding Method
2.1. Generating Propagation Sequences
- Calculate the state transition rates of each node. The rate at which a susceptible individual i becomes infected is × number of his/her infected neighbors. The infected individuals remain infected. The total transition rate at time t is .
- After time , determine the next node to change its state, where is sampled from an exponential distribution with mean . The node k will change its state if
- Repeat (1) and (2) until a predetermined time period.
Algorithm 1: generateSequence(G,r,i,k,T) |
Input: Network G, Transmission rate r, Starting node i, Sequence ID k, Termination time T Output: Propagation sequence 1 Initialize ; 2 Initialize ; 3 while ; 4 Calculate based on Step (1); 5 Generate based on ; 6 Determine the next node v based on Step (2); 7 ; 8 Append v to the sequence ; 9 return ; |
2.2. Learning Node Representations
Algorithm 2: The EpiEm Algorithm |
Input: Network G, Transmission rate r, Dimension d, Number of propagation sequences per node K, Termination time T, Influence matrix X Output: Infectiousness Embedding , vulnerability Embedding 1 for each node ; 2 for to K do; 3 ; 4 for do; 5 6 ; 7 ; 8 ; 9 return , ; |
2.3. An Embedding-Based Network Immunization Strategy
3. Experiments
3.1. Node Clustering and Visualization
3.1.1. Clustering Visualization on Barbell and Karate Networks
3.1.2. Quantitative Evaluation for Clustering on Real Networks
- Spectral Clustering [40]: This is a matrix factorization approach to calculate the d smallest eigenvectors of the normalized Laplacian matrix of graph as the feature representation of nodes.
- DeepWalk [30]: This approach is one of the first attempts to apply the word2vec approach for network embedding. The neighbor information of the nodes is captured via simulating uniform random walks. (We use the code provided by the author, Source: https://github.com/phanein/deepwalk)
- Node2vec [33]: This is another random-walks based network embedding method. The random walks are balanced between breadth-first and depth-first sampling strategies with hyperparameters p and q. If there is no specific explanation, we adopt the default values of p and q in the authors’ paper. (We use the code provided by the author, Source: https://github.com/aditya-grover/node2vec)
- Brazilian air-traffic network [35]: The network has 131 nodes and 1038 edges. The data counts airport activities by the National Civil Aviation Administration (ANAC) from January to December 2016, which records the total number of landings and takeoffs in 2016. The dataset has four node labels.
- European air-traffic network [35]: The network has 399 nodes and 5995 edges. The data counts airport activities by the Statistical Office of the European Union (Eurostat) from January to November 2016. The dataset has four node labels.
- USA air-traffic network [35]: The network has 1190 nodes and edges. The data counts airport activity by the Bureau of Transportation Statistics from January to October. The dataset has four node labels.
- Homogeneity [61]: It measures the percentage of detected clusters containing only a single class label through conditional entropy.
- Completeness [61]: It measures the percentage of nodes with the same class label allocated to the same cluster through conditional entropy.
3.2. Network Immunization
- Random immunization [18]: A proportion of nodes in the network are randomly selected for vaccination before epidemic spreads. The probabilities to select different nodes are the same.
- Max-degree immunization [52]: A widely used target immunity strategy. A proportion of nodes with the highest degree are selected for vaccination before epidemic spreads.
- Eigenvector centrality immunization [15]: Another widely used target immunity strategy. Eigenvector centrality is defined as the main eigenvector of the network adjacency matrix, in which each element indicates the eigenvector centrality for the corresponding node. The nodes with the largest eigenvector centrality are selected for vaccination before epidemic spreads.
4. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Borgatti, S.P.; Mehra, A.; Brass, D.J.; Labianca, G. Network analysis in the social sciences. Science 2009, 323, 892–895. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bassett, D.S.; Sporns, O. Network neuroscience. Nat. Neurosci. 2017, 20, 353. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Theocharidis, A.; Van Dongen, S.; Enright, A.J.; Freeman, T.C. Network visualization and analysis of gene expression data using BioLayout Express 3D. Nat. Protoc. 2009, 4, 1535. [Google Scholar] [CrossRef] [PubMed]
- Lawrence, S.; Giles, C.L. Searching the world wide web. Science 1998, 280, 98–100. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Albert, R.; Barabási, A.L. Statistical mechanics of complex networks. Rev. Mod. Phys. 2002, 74, 47. [Google Scholar] [CrossRef] [Green Version]
- Boccaletti, S.; Latora, V.; Moreno, Y.; Chavez, M.; Hwang, D.U. Complex networks: Structure and dynamics. Phys. Rep. 2006, 424, 175–308. [Google Scholar] [CrossRef]
- Schaeffer, S.E. Graph clustering. Comput. Sci. Rev. 2007, 1, 27–64. [Google Scholar] [CrossRef]
- Fortunato, S. Community detection in graphs. Phys. Rep. 2010, 486, 75–174. [Google Scholar] [CrossRef] [Green Version]
- Ying, J.C.; Shi, B.N.; Tseng, V.S.; Tsai, H.W.; Cheng, K.H.; Lin, S.C. Preference-aware community detection for item recommendation. In Proceedings of the 2013 Conference on Technologies and Applications of Artificial Intelligence, Taipei, Taiwan, 6–8 December 2013; pp. 49–54. [Google Scholar] [CrossRef]
- Harenberg, S.; Bello, G.; Gjeltema, L.; Ranshous, S.; Harlalka, J.; Seay, R.; Padmanabhan, K.; Samatova, N. Community detection in large-scale networks: A survey and empirical evaluation. Wiley Interdiscip. Rev. Comput. Stat. 2014, 6, 426–439. [Google Scholar] [CrossRef] [Green Version]
- Choudhury, D.; Paul, A. Community detection in social networks: An overview. Int. J. Res. Eng. Technol. 2013, 2, 6–13. [Google Scholar] [CrossRef] [Green Version]
- Borgatti, S.P. Centrality and network flow. Soc. Netw. 2005, 27, 55–71. [Google Scholar] [CrossRef]
- Hu, P.; Fan, W.; Mei, S. Identifying node importance in complex networks. Phys. A Stat. Mech. Appl. 2015, 429, 169–176. [Google Scholar] [CrossRef]
- Cao, J.; Ding, C.; Shi, B. Motif-based functional backbone extraction of complex networks. Phys. A Stat. Mech. Appl. 2019, 526, 121123. [Google Scholar] [CrossRef]
- Restrepo, J.G.; Ott, E.; Hunt, B.R. Characterizing the dynamical importance of network nodes and links. Phys. Rev. Lett. 2006, 97, 094102. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhang, J.; Xu, X.K.; Li, P.; Zhang, K.; Small, M. Node importance for dynamical process on networks: A multiscale characterization. Chaos Interdiscip. J. Nonlinear Sci. 2011, 21, 016107. [Google Scholar] [CrossRef] [Green Version]
- Pastor-Satorras, R.; Vespignani, A. Epidemic spreading in scale-free networks. Phys. Rev. Lett. 2001, 86, 3200. [Google Scholar] [CrossRef] [Green Version]
- Pastor-Satorras, R.; Vespignani, A. Immunization of complex networks. Phys. Rev. E 2002, 65, 036104. [Google Scholar] [CrossRef] [Green Version]
- Nowzari, C.; Preciado, V.M.; Pappas, G.J. Analysis and control of epidemics: A survey of spreading processes on complex networks. IEEE Control Syst. Mag. 2016, 36, 26–46. [Google Scholar] [CrossRef] [Green Version]
- Kernighan, B.W.; Lin, S. An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 1970, 49, 291–307. [Google Scholar] [CrossRef]
- Newman, M.E. Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 2006, 103, 8577–8582. [Google Scholar] [CrossRef] [Green Version]
- Tremblay, N.; Borgnat, P. Graph wavelets for multiscale community mining. IEEE Trans. Signal Process. 2014, 62, 5227–5239. [Google Scholar] [CrossRef]
- Bengio, Y.; Courville, A.; Vincent, P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [Google Scholar] [CrossRef] [PubMed]
- Hamilton, W.L.; Ying, R.; Leskovec, J. Representation learning on graphs: Methods and applications. arXiv 2017, arXiv:1709.05584. [Google Scholar]
- Cui, P.; Wang, X.; Pei, J.; Zhu, W. A survey on network embedding. IEEE Trans. Knowl. Data Eng. 2018. [Google Scholar] [CrossRef] [Green Version]
- Goyal, P.; Ferrara, E. Graph embedding techniques, applications, and performance: A survey. Knowl. Based Syst. 2018, 151, 78–94. [Google Scholar] [CrossRef] [Green Version]
- Cai, H.; Zheng, V.W.; Chang, K.C.C. A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE Trans. Knowl. Data Eng. 2018, 30, 1616–1637. [Google Scholar] [CrossRef] [Green Version]
- Tang, J.; Qu, M.; Wang, M.; Zhang, M.; Yan, J.; Mei, Q. LINE: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, 18–22 May 2015; pp. 1067–1077. [Google Scholar] [CrossRef] [Green Version]
- Wang, D.; Cui, P.; Zhu, W. Structural deep network embedding. In Proceedings of the 22nd ACM International Conference on Knowledge Discovery and Data Mining; ACM: New York, NY, USA, 2016; pp. 1225–1234. [Google Scholar] [CrossRef]
- Perozzi, B.; Al-Rfou, R.; Skiena, S. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM International Conference on Knowledge Discovery and Data Mining; ACM: New York, NY, USA, 2014; pp. 701–710. [Google Scholar] [CrossRef] [Green Version]
- Dong, Y.; Chawla, N.V.; Swami, A. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM International Conference on Knowledge Discovery and Data Mining; ACM: New York, NY, USA, 2017; pp. 135–144. [Google Scholar] [CrossRef]
- Zitnik, M.; Leskovec, J. Predicting multicellular function through multi-layer tissue networks. Bioinformatics 2017, 33, i190–i198. [Google Scholar] [CrossRef] [Green Version]
- Grover, A.; Leskovec, J. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM International Conference on Knowledge Discovery and Data Mining; ACM: New York, NY, USA, 2016; pp. 855–864. [Google Scholar]
- Lyu, T.; Zhang, Y.; Zhang, Y. Enhancing the network embedding quality with structural similarity. In Proceedings of the ACM Conference on Information and Knowledge Management; ACM: New York, NY, USA, 2017; pp. 147–156. [Google Scholar] [CrossRef]
- Ribeiro, L.F.; Saverese, P.H.; Figueiredo, D.R. struc2vec: Learning node representations from structural identity. In Proceedings of the 23rd ACM International Conference on Knowledge Discovery and Data Mining; ACM: New York, NY, USA, 2017; pp. 385–394. [Google Scholar] [CrossRef] [Green Version]
- Shi, B.; Zhou, C.; Qiu, H.; Xu, X.; Liu, J. Unifying structural proximity and equivalence for network embedding. IEEE Access 2019, 7, 106124–106138. [Google Scholar] [CrossRef]
- Madar, N.; Kalisky, T.; Cohen, R.; Ben-avraham, D.; Havlin, S. Immunization and epidemic dynamics in complex networks. Eur. Phys. J. B 2004, 38, 269–276. [Google Scholar] [CrossRef]
- Tenenbaum, J.B.; De Silva, V.; Langford, J.C. A global geometric framework for nonlinear dimensionality reduction. Science 2000, 290, 2319–2323. [Google Scholar] [CrossRef]
- Roweis, S.T.; Saul, L.K. Nonlinear dimensionality reduction by locally linear embedding. Science 2000, 290, 2323–2326. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Belkin, M.; Niyogi, P. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Proceedings of the NIPS 2002 Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 9–14 December 2002; pp. 585–591. [Google Scholar]
- Ahmed, A.; Shervashidze, N.; Narayanamurthy, S.; Josifovski, V.; Smola, A.J. Distributed large-scale natural graph factorization. In Proceedings of the 22nd International Conference on World Wide Web; ACM: New York, NY, USA, 2013; pp. 37–48. [Google Scholar] [CrossRef] [Green Version]
- Ou, M.; Cui, P.; Pei, J.; Zhang, Z.; Zhu, W. Asymmetric transitivity preserving graph embedding. In Proceedings of the 22nd ACM International Conference on Knowledge Discovery and Data Mining; ACM: New York, NY, USA, 2016; pp. 1105–1114. [Google Scholar] [CrossRef]
- Wang, X.; Cui, P.; Wang, J.; Pei, J.; Zhu, W.; Yang, S. Community Preserving Network Embedding. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence AAAI, San Francisco, CA, USA, 4–9 February 2017; pp. 203–209. [Google Scholar]
- Tu, K.; Cui, P.; Wang, X.; Yu, P.S.; Zhu, W. Deep recursive network embedding with regular equivalence. In Proceedings of the 24th ACM International Conference on Knowledge Discovery & Data Mining; ACM: New York, NY, USA, 2018; pp. 2357–2366. [Google Scholar] [CrossRef]
- Donnat, C.; Zitnik, M.; Hallac, D.; Leskovec, J. Learning structural node embeddings via diffusion wavelets. In Proceedings of the 24th ACM International Conference on Knowledge Discovery and Data Mining; ACM: New York, NY, USA, 2018; pp. 1320–1329. [Google Scholar] [CrossRef] [Green Version]
- Cao, S.; Lu, W.; Xu, Q. GraRep: Learning graph representations with global structural information. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management; ACM: New York, NY, USA, 2015; pp. 891–900. [Google Scholar] [CrossRef]
- Masuda, N.; Porter, M.A.; Lambiotte, R. Random walks and diffusion on networks. Phys. Rep. 2017, 716, 1–58. [Google Scholar] [CrossRef]
- Keeling, M.J.; Eames, K.T. Networks and epidemic models. J. R. Soc. Interface 2005, 2, 295–307. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Witten, G.; Poulter, G. Simulations of infectious diseases on networks. Comput. Biol. Med. 2007, 37, 195–205. [Google Scholar] [CrossRef] [PubMed]
- Newman, M.E. The structure and function of complex networks. SIAM Rev. 2003, 45, 167–256. [Google Scholar] [CrossRef] [Green Version]
- Cohen, R.; Havlin, S.; Ben-Avraham, D. Efficient immunization strategies for computer networks and populations. Phys. Rev. Lett. 2003, 91, 247901. [Google Scholar] [CrossRef] [Green Version]
- Hadidjojo, J.; Cheong, S.A. Equal graph partitioning on estimated infection network as an effective epidemic mitigation measure. PLoS ONE 2011, 6. [Google Scholar] [CrossRef]
- Schneider, C.M.; Mihaljev, T.; Havlin, S.; Herrmann, H.J. Suppressing epidemics with a limited amount of immunization units. Phys. Rev. E 2011, 84, 061911. [Google Scholar] [CrossRef] [Green Version]
- Allen, L.J. Some discrete-time SI, SIR, and SIS epidemic models. Math. Biosci. 1994, 124, 83–105. [Google Scholar] [CrossRef]
- Gillespie, D.T. Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 1977, 81, 2340–2361. [Google Scholar] [CrossRef]
- Shi, B.; Liu, G.; Qiu, H.; Wang, Z.; Ren, Y.; Chen, D. Exploring voluntary vaccination with bounded rationality through reinforcement learning. Phys. A Stat. Mech. Appl. 2019, 515, 171–182. [Google Scholar] [CrossRef]
- Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar] [CrossRef]
- Kalman, D. A singularly valuable decomposition: The SVD of a matrix. Coll. Math. J. 1996, 27, 2–23. [Google Scholar] [CrossRef]
- Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J. Application of Dimensionality Reduction in Recommender System—A Case Study; Technical Report; Minnesota Univ Minneapolis Dept of Computer Science: Minneapolis, MN, USA, 2000. [Google Scholar] [CrossRef]
- Zhang, Z.; Cui, P.; Wang, X.; Pei, J.; Yao, X.; Zhu, W. Arbitrary-order proximity preserved network embedding. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; ACM: New York, NY, USA, 2018; pp. 2778–2786. [Google Scholar]
- Rosenberg, A.; Hirschberg, J. V-measure: A conditional entropy-based external cluster evaluation measure. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, 28–30 June 2007. [Google Scholar] [CrossRef]
- McCallum, A.K.; Nigam, K.; Rennie, J.; Seymore, K. Automating the construction of internet portals with machine learning. Inf. Retr. 2000, 3, 127–163. [Google Scholar] [CrossRef]
Network | Spectral Clustering | DeepWalk | Node2vec | EpiEm |
---|---|---|---|---|
Brazilian | d = 16 | d = 16 | d = 16, p = 4, q = 4 | d = 16, r = 0.5, K = 10 |
European | d = 128 | d = 128 | d = 128, p = 1, q = 0.25 | d = 128, r = 0.5, K = 10 |
USA | d = 128 | d = 128 | d = 128, p = 2, q = 1 | d = 128, r = 0.5, K = 10 |
Network | Method | Homogeneity | Completeness |
---|---|---|---|
Brazilian | DeepWalk | 0.056 | 0.076 |
air-traffic | Node2vec | 0.048 | 0.057 |
network | Spectral | 0.062 | 0.106 |
EpiEm | 0.293 | 0.361 | |
European | DeepWalk | 0.025 | 0.019 |
air-traffic | Node2vec | 0.030 | 0.024 |
network | Spectral | 0.022 | 0.084 |
EpiEm | 0.207 | 0.243 | |
Usa | DeepWalk | 0.062 | 0.115 |
air-traffic | Node2vec | 0.087 | 0.112 |
network | Spectral | 0.007 | 0.120 |
EpiEm | 0.250 | 0.444 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhong, J.; Qiu, H.; Shi, B. Dynamics-Preserving Graph Embedding for Community Mining and Network Immunization. Information 2020, 11, 250. https://doi.org/10.3390/info11050250
Zhong J, Qiu H, Shi B. Dynamics-Preserving Graph Embedding for Community Mining and Network Immunization. Information. 2020; 11(5):250. https://doi.org/10.3390/info11050250
Chicago/Turabian StyleZhong, Jianan, Hongjun Qiu, and Benyun Shi. 2020. "Dynamics-Preserving Graph Embedding for Community Mining and Network Immunization" Information 11, no. 5: 250. https://doi.org/10.3390/info11050250
APA StyleZhong, J., Qiu, H., & Shi, B. (2020). Dynamics-Preserving Graph Embedding for Community Mining and Network Immunization. Information, 11(5), 250. https://doi.org/10.3390/info11050250