A Two-Phase Feature Selection Method for Identifying Influential Spreaders of Disease Epidemics in Complex Networks
Abstract
:1. Introduction
2. Methods
2.1. Data Set Generation
2.1.1. Features Based on Centralities
2.1.2. Labels Based on the SIR Model
2.2. A Two-Phase Feature Selection Method
2.2.1. Initial Selection of the Features
- Step 1: Train SVM classifiers on the training data set with 10-fold cross-validation;
- Step 2: Summarize the 10 importance scores of each feature independently, and then accumulate the classifier performance of each fold;
- Step 3: Remove the least important feature;
- Step 4: Return to Step 1 until all features are eliminated.
2.2.2. Secondary Selection of the Features
Algorithm 1: The proposed two-phase feature selection method FFS-SFS |
3. Results
- (1)
- : ignore feature selection and use the original feature set;
- (2)
- : directly adopt SVM-RFE-CV on the raw imbalanced training set;
- (3)
- First-time Feature Selection–ReliefF (-): filter features using ReliefF based on the result of the initial selection;
- (4)
- First-time Feature Selection–Weight (-): choose features according to the frequency based on the results of the initial selection.
3.1. Results on BA Networks
3.1.1. Contrasting Experiment
3.1.2. Ablation Experiment
3.2. Results on ER Networks and WS Networks
3.2.1. Contrasting Experiment on ER Networks
3.2.2. Ablation Experiment on ER Networks
3.2.3. Results on WS Networks
3.3. Results on Real-World Networks
4. Conclusions
5. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Soares, F.; Villavicencio, A.; Fogliatto, F.S.; Pitombeira Rigatto, M.H.; José Anzanello, M.; Idiart, M.A.; Stevenson, M. A novel specific artificial intelligence-based method to identify COVID-19 cases using simple blood exams. medRxiv 2020. [Google Scholar] [CrossRef] [Green Version]
- Belkacem, A.N.; Ouhbi, S.; Lakas, A.; Benkhelifa, E.; Chen, C. End-to-end AI-based point-of-care diagnosis system for classifying respiratory illnesses and early detection of COVID-19: A theoretical framework. Front. Med. 2021, 8, 585578. [Google Scholar] [CrossRef]
- Bhosale, Y.H.; Patnaik, K.S. Application of deep learning techniques in diagnosis of COVID-19 (coronavirus): A systematic review. Neural Process. Lett. 2022, 55, 3551–3603. [Google Scholar] [CrossRef]
- Chen, H.J.; Mao, L.; Chen, Y.; Yuan, L.; Wang, F.; Li, X.; Cai, Q.; Qiu, J.; Chen, F. Machine learning-based CT radiomics model distinguishes COVID-19 from non-COVID-19 pneumonia. BMC Infect. Dis. 2021, 21, 931. [Google Scholar] [CrossRef] [PubMed]
- Pastor-Satorras, R.; Castellano, C.; Van Mieghem, P.; Vespignani, A. Epidemic processes in complex networks. Rev. Mod. Phys. 2015, 87, 925. [Google Scholar] [CrossRef] [Green Version]
- Borge-Holthoefer, J.; Moreno, Y. Absence of influential spreaders in rumor dynamics. Phys. Rev. E 2012, 85, 026116. [Google Scholar] [CrossRef] [Green Version]
- De Arruda, G.F.; Barbieri, A.L.; Rodriguez, P.M.; Rodrigues, F.A.; Moreno, Y.; da Fontoura Costa, L. Role of centrality for the identification of influential spreaders in complex networks. Phys. Rev. E 2014, 90, 032812. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lü, L.Y.; Chen, D.B.; Ren, X.L.; Zhang, Q.M.; Zhang, Y.C.; Zhou, T. Vital nodes identification in complex networks. Phys. Rep. 2016, 650, 1–63. [Google Scholar] [CrossRef] [Green Version]
- Freeman, L.C. Centrality in social networks conceptual clarification. Soc. Netw. 1978, 1, 215–239. [Google Scholar] [CrossRef] [Green Version]
- Kitsak, M.; Gallos, L.K.; Havlin, S.; Liljeros, F.; Muchnik, L.; Stanley, H.E.; Makse, H.A. Identification of influential spreaders in complex networks. Nat. Phys. 2010, 6, 888–893. [Google Scholar] [CrossRef] [Green Version]
- Bonacich, P. Factoring and weighting approaches to status scores and clique identification. J. Math. Sociol. 1972, 2, 113–120. [Google Scholar] [CrossRef]
- Katz, L.; Moustaki, I. A new status index derived from sociometric analysis. Psychometrika 1953, 18, 39–43. [Google Scholar] [CrossRef]
- Page, L. The pagerank citation ranking: Bringing order to the Web. In Stanford Digital Library Technologies Project; Technical report; Stanford University: Stanford, CA, USA, 1998. [Google Scholar]
- Mehta, P.; Bukov, M.; Wang, C.H.; Day, A.G.; Richardson, C.; Fisher, C.K.; Schwab, D.J. A high-bias, low-variance introduction to machine learning for physicists. Phys. Rep. 2019, 810, 1–124. [Google Scholar] [CrossRef]
- Ni, Q.; Tang, M.; Liu, Y.; Lai, Y.C. Machine learning dynamical phase transitions in complex networks. Phys. Rev. E 2019, 100, 052312. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ni, Q.; Kang, J.; Tang, M.; Liu, Y.; Zou, Y. Learning epidemic threshold in complex networks by Convolutional Neural Network. Chaos 2019, 29, 113106. [Google Scholar] [CrossRef] [PubMed]
- Tripathi, R.; Reza, A.; Garg, D. Prediction of the disease controllability in a complex network using machine learning algorithms. arXiv 2019, arXiv:1902.10224. [Google Scholar]
- Shah, C.; Dehmamy, N.; Perra, N.; Chinazzi, M.; Barabási, A.L.; Vespignani, A.; Yu, R. Finding patient zero: Learning contagion source with graph neural networks. arXiv 2020, arXiv:2006.11913. [Google Scholar]
- Murphy, C.; Laurence, E.; Allard, A. Deep learning of contagion dynamics on complex networks. Nat. Commun. 2021, 12, 4720. [Google Scholar] [CrossRef]
- Tomy, A.; Razzanelli, M.; Di Lauro, F.; Rus, D.; Della Santina, C. Estimating the state of epidemics spreading with graph neural networks. Nonlinear Dyn. 2022, 109, 249–263. [Google Scholar] [CrossRef]
- Rodrigues, F.A.; Peron, T.; Connaughton, C.; Kurths, J.; Moreno, Y. A machine learning approach to predicting dynamical observables from network structure. arXiv 2019, arXiv:1910.00544. [Google Scholar]
- Bucur, D.; Holme, P. Beyond ranking nodes: Predicting epidemic outbreak sizes by network centralities. PLoS Comput. Biol. 2020, 16, e1008052. [Google Scholar] [CrossRef] [PubMed]
- Zhao, G.; Jia, P.; Huang, C.; Zhou, A.; Fang, Y. A machine learning based framework for identifying influential nodes in complex networks. IEEE Access 2020, 8, 65462–65471. [Google Scholar] [CrossRef]
- Bucur, D. Top influencers can be identified universally by combining classical centralities. Sci. Rep. 2020, 10, 20550. [Google Scholar] [CrossRef] [PubMed]
- Yu, E.Y.; Wang, Y.P.; Fu, Y.; Chen, D.B.; Xie, M. Identifying critical nodes in complex networks via graph convolutional networks. Knowl.-Based Syst. 2020, 198, 105893. [Google Scholar] [CrossRef]
- Zhao, G.; Jia, P.; Zhou, A.; Zhang, B. InfGCN: Identifying influential nodes in complex networks with graph convolutional networks. Neurocomputing 2020, 414, 18–26. [Google Scholar] [CrossRef]
- Wang, Q.; Ren, J.; Wang, Y.; Zhang, B.; Cheng, Y.; Zhao, X. CDA: A clustering degree based influential spreader identification algorithm in weighted complex network. IEEE Access 2018, 6, 19550–19559. [Google Scholar] [CrossRef]
- Anukrishna, P.; Paul, V. A review on feature selection for high dimensional data. In Proceedings of the 2017 International Conference on Inventive Systems and Control (ICISC), Coimbatore, India, 19–20 January 2017; pp. 1–4. [Google Scholar]
- Azadifar, S.; Rostami, M.; Berahmand, K.; Moradi, P.; Oussalah, M. Graph-based relevancy-redundancy gene selection method for cancer diagnosis. Comput. Biol. Med. 2022, 147, 105766. [Google Scholar] [CrossRef]
- Zhou, Y.; Zhang, W.; Kang, J.; Zhang, X.; Wang, X. A problem-specific non-dominated sorting genetic algorithm for supervised feature selection. Inf. Sci. 2021, 547, 841–859. [Google Scholar] [CrossRef]
- Zhou, Y.; Kang, J.; Kwong, S.; Wang, X.; Zhang, Q. An evolutionary multi-objective optimization framework of discretization-based feature selection for classification. Swarm Evol. Comput. 2021, 60, 100770. [Google Scholar] [CrossRef]
- Viegas, F.; Rocha, L.; Gonçalves, M.; Mourão, F.; Sá, G.; Salles, T.; Andrade, G.; Sandin, I. A genetic programming approach for feature selection in highly dimensional skewed data. Neurocomputing 2018, 273, 554–569. [Google Scholar] [CrossRef]
- Cilibrasi, R.L.; Vitányi, P.M. A fast quartet tree heuristic for hierarchical clustering. Pattern Recogn. 2011, 44, 662–677. [Google Scholar] [CrossRef] [Green Version]
- Pei, S.; Muchnik, L.; Andrade, J.S., Jr.; Zheng, Z.; Makse, H.A. Searching for superspreaders of information in real-world social media. Sci. Rep. 2014, 4, 5547. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Shu, P.; Wang, W.; Tang, M.; Do, Y. Numerical identification of epidemic thresholds for susceptible-infected-recovered model on finite-size networks. Chaos 2015, 25, 063104. [Google Scholar] [CrossRef]
- Zhang, F.; Kaufman, H.L.; Deng, Y.; Drabier, R. Recursive SVM biomarker selection for early detection of breast cancer in peripheral blood. BMC Med. Genom. 2013, 6, S4. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhao, Z.; Wang, L.; Liu, H.; Ye, J. On similarity preserving feature selection. IEEE Trans. Knowl. Data Eng. 2011, 25, 619–632. [Google Scholar] [CrossRef]
- Barabási, A.L.; Albert, R. Emergence of scaling in random networks. Science 1999, 286, 509–512. [Google Scholar] [CrossRef] [Green Version]
- Erdös, P.; Rényi, A. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 1960, 5, 17–60. [Google Scholar]
- Watts, D.J.; Strogatz, S.H. Collective dynamics of ‘small-world’ networks. Nature 1998, 393, 440–442. [Google Scholar] [CrossRef]
- Gleiser, P.M.; Danon, L. Community Structure in Jazz. Adv. Complex Syst. 2003, 6, 565–574. [Google Scholar] [CrossRef] [Green Version]
- Yin, H.; Benson, A.R.; Leskovec, J.; Gleich, D.F. Local Higher-Order Graph Clustering. In Proceedings of the International Conference on Knowledge Discovery & Data Mining (KDD), Halifax, NS, Canada, 13–17 August 2017; pp. 555–564. [Google Scholar]
- Colizza, V.; Pastor-Satorras, R.; Vespignani, A. Reaction-diffusion processes and metapopulation models in heterogeneous networks. Nat. Phys. 2007, 3, 276–282. [Google Scholar] [CrossRef] [Green Version]
- Boguñá, M.; Pastor-Satorras, R.; Díaz-Guilera, A.; Arenas, A. Models of social networks based on social distance attachment. Phys. Rev. E 2004, 70, 056122. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sa-ngasoongsong, A.; Bukkapatnam, S.T. Variable Selection for Multivariate Cointegrated Time Series Prediction with PROC VARCLUS in SAS® Enterprise MinerTM 7.1. Available online: https://support.sas.com/resources/papers/proceedings12/340-2012.pdf (accessed on 22 April 2012).
- Szalay, K.Z.; Csermely, P. Perturbation Centrality and Turbine: A Novel Centrality Measure Obtained Using a Versatile Network Dynamics Tool. PLoS ONE 2013, 8, e78059. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Centola, D. The spread of behavior in an online social network experiment. Science 2010, 329, 1194–1197. [Google Scholar] [CrossRef] [PubMed]
Centrality | Definition | Formula |
---|---|---|
K | Counting the number of one-hop neighbors of a node | |
Counting the degrees of one-hop neighbors of a node | ||
Counting the degrees of two-hop neighbors of a node | ||
Resulting from K-shell decomposition [10] | - | |
C | Measuring the fraction of triangles around the node | |
B | Measuring the ability of a node when regarded as the bridge between pairs of nodes | |
Averaging the shortest path lengths to other nodes | ||
Combining with the centrality of its neighbors to obtain the result | ||
The variant of eigenvector centrality |
Node | K | C | B | Label | ||||||
---|---|---|---|---|---|---|---|---|---|---|
1 | 6 | 52 | 568 | 3 | 0 | 793.44 | 0.2818 | 0.0124 | 0.0010 | −1 |
2 | 4 | 55 | 441 | 3 | 0.1667 | 322.06 | 0.2733 | 0.0108 | 0.0007 | −1 |
3 | 4 | 115 | 1060 | 3 | 0 | 463.69 | 0.3091 | 0.0278 | 0.0006 | 1 |
4 | 6 | 58 | 545 | 3 | 0 | 642.25 | 0.2807 | 0.0101 | 0.0010 | −1 |
… | … | … | … | … | … | … | … | … | … | |
1000 | 13 | 135 | 1268 | 3 | 0.0128 | 2592.87 | 0.3175 | 0.0332 | 0.0021 | 1 |
Method | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
All | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | ||
Imbalanced | 8 | 8 | 7 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | ||
FFS-ReliefF | 3 | 3 | 2 | 2 | 3 | 3 | 3 | 3 | 3 | 2 | 2 | 2 | ||
FFS-Weight | 3 | 3 | 2 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 2 | 2 | ||
FFS-SFS | 3 | 3 | 2 | 2 | 3 | 3 | 3 | 3 | 3 | 2 | 2 | 2 | ||
Selected by FFS-SFS | {two-hop, clustering, betweenness} | {one-hop, two-hop, PR} | {two-hop, PR} | {two-hop, PR} | {two-hop, clustering, betweenness} | {one-hop, two-hop, PR} | {one-hop, two-hop, PR} | {one-hop, two-hop, PR} | {two-hop, clustering, betweenness} | {degree, two-hop} | {degree, two-hop} | {degree, two-hop} |
Method | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
w/o SFS | 8 | 5 | 5 | 3 | 8 | 6 | 5 | 6 | 8 | 5 | 5 | 4 | ||
w/o FFS | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | ||
FFS-SFS | 3 | 3 | 2 | 2 | 3 | 3 | 3 | 3 | 3 | 2 | 2 | 2 | ||
Selected by FFS-SFS | {two-hop, clustering, betweenness} | {one-hop, two-hop, PR} | {two-hop, PR} | {two-hop, PR} | {two-hop, clustering, betweenness} | {one-hop, two-hop, PR} | {one-hop, two-hop, PR} | {one-hop, two-hop, PR} | {two-hop, clustering, betweenness} | {degree, two-hop} | {degree, two-hop} | {degree, two-hop} |
Method | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
All | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | ||
Imbalanced | 9 | 9 | 9 | 2 | 9 | 2 | 9 | 3 | 9 | 9 | 9 | 3 | ||
FFS-ReliefF | 3 | 2 | 2 | 2 | 3 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | ||
FFS-Weight | 3 | 2 | 2 | 2 | 3 | 1 | 1 | 2 | 3 | 3 | 3 | 3 | ||
FFS-SFS | 3 | 2 | 2 | 2 | 3 | 2 | 1 | 2 | 3 | 3 | 3 | 2 | ||
Selected by FFS-SFS | {k-shell, clustering, betweenness} | {degree, one-hop} | {degree, one-hop} | {degree, one-hop} | {two-hop, k-shell, clustering} | {two-hop, EC} | {EC} | {two-hop, EC} | {k-shell, clustering, EC} | {degree, one-hop, EC} | {degree, two-hop, closeness} | {degree, two-hop} |
Method | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
w/o SFS | 9 | 3 | 6 | 2 | 9 | 1 | 1 | 1 | 9 | 4 | 4 | 4 | ||
w/o FFS | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | ||
FFS-SFS | 3 | 2 | 2 | 2 | 3 | 2 | 1 | 2 | 3 | 3 | 3 | 2 | ||
Selected by FFS-SFS | {k-shell, clustering, betweenness} | {degree, one-hop} | {degree, one-hop} | {degree, one-hop} | {two-hop, k-shell, clustering} | {two-hop, EC} | {EC} | {two-hop, EC} | {k-shell, clustering, EC} | {degree, one-hop, EC} | {degree, two-hop, closeness} | {degree, two-hop} |
Network | N | m | c | d | ||
---|---|---|---|---|---|---|
Jazz | 198 | 2742 | 27.697 | 100 | 0.6175 | 0.140593 |
986 | 16,687 | 32.751 | 347 | 0.4071 | 0.034363 | |
USairport | 1572 | 17,214 | 34.428 | 314 | 0.5048 | 0.013941 |
Pretty Good Privacy | 10,680 | 24,316 | 4.554 | 205 | 0.2659 | 0.000426 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, X.; Han, Y.; Wang, B. A Two-Phase Feature Selection Method for Identifying Influential Spreaders of Disease Epidemics in Complex Networks. Entropy 2023, 25, 1068. https://doi.org/10.3390/e25071068
Wang X, Han Y, Wang B. A Two-Phase Feature Selection Method for Identifying Influential Spreaders of Disease Epidemics in Complex Networks. Entropy. 2023; 25(7):1068. https://doi.org/10.3390/e25071068
Chicago/Turabian StyleWang, Xiya, Yuexing Han, and Bing Wang. 2023. "A Two-Phase Feature Selection Method for Identifying Influential Spreaders of Disease Epidemics in Complex Networks" Entropy 25, no. 7: 1068. https://doi.org/10.3390/e25071068
APA StyleWang, X., Han, Y., & Wang, B. (2023). A Two-Phase Feature Selection Method for Identifying Influential Spreaders of Disease Epidemics in Complex Networks. Entropy, 25(7), 1068. https://doi.org/10.3390/e25071068