Analyzing Physics-Inspired Metaheuristic Algorithms in Feature Selection with K-Nearest-Neighbor
Abstract
:1. Introduction
- The main novelty of our paper lies in its comparative analysis of six well-cited physics-inspired metaphor algorithms for the problem of feature selection.
- To the best of our knowledge, this is the first time these physics-inspired algorithms have been compared for this specific problem, and our findings provide valuable insights into their performance.
- Our study also has broader implications for the field of machine learning and data mining, as it helps to shed light on the effectiveness of different optimization algorithms for feature selection.
- Our work contributes to the growing body of research on metaheuristics and their potential applications in machine learning and data mining, and it highlights the potential value of using physics-inspired optimization algorithms for feature selection.
- Additionally, our use of variable-sized classification datasets allows us to assess the applicability of these algorithms on a wide range of problems, making our results more generalizable and applicable to practitioners.
2. Methodology
2.1. Wrapper Method for Feature Selection
2.2. Fitness Function
2.3. Physics-Inspired Metaphor Algorithms
2.3.1. Simulated Annealing
2.3.2. Gravitational Search Algorithm
2.3.3. Sine Cosine Algorithm
- is randomly generated in the range of 0 to 2π.
- is also a random number that is generated in the range of 0 to 2.
- is also a random number that is generated in the range of 0 to 1, and based on its value, it is decided whether to use the sine function or the cosine function in updating the position of the current solution.
2.3.4. Atom Search Optimization
2.3.5. Henry Gas Solubility Optimization
“At a constant temperature, the amount of a given gas that dissolves in a given type and volume of liquid is directly proportional to the partial pressure of that gas in equilibrium with that liquid”.
2.3.6. Equilibrium Optimizer (EO)
3. Results and Discussion
3.1. Datasets
3.2. Parameter Settings
3.3. Performance Evaluation
3.3.1. Fitness Comparison
3.3.2. Comparison of Classification Accuracy
3.3.3. Convergence Analysis
3.3.4. Overall Performance Analysis
3.4. Comparison with Other Methods from the Literature
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Köppen, M. The curse of dimensionality. In Proceedings of the 5th Online World Conference on Soft Computing in Industrial Applications (WSC5), Online, 4–18 September 2000; Volume 1, pp. 4–8. [Google Scholar]
- Ikotun, A.M.; Almutari, M.S.; Ezugwu, A.E. K-Means-Based Nature-Inspired Metaheuristic Algorithms for Automatic Data Clustering Problems: Recent Advances and Future Directions. Appl. Sci. 2021, 11, 11246. [Google Scholar] [CrossRef]
- Khalid, S.; Khalil, T.; Nasreen, S. A survey of feature selection and feature extraction techniques in machine learning. In Proceedings of the Science and Information Conference (SAI), London, UK, 27–29 August 2014; pp. 372–378. [Google Scholar]
- Porkodi, R. Comparison of filter based feature selection algorithms: An overview. Int. J. Innov. Res. Technol. Sci. 2014, 2, 108–113. [Google Scholar]
- Jović, A.; Brkić, K.; Bogunović, N. A review of feature selection methods with applications. In Proceedings of the 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 25–29 May 2015; pp. 1200–1205. [Google Scholar]
- Brezočnik, L.; Fister, I.; Podgorelec, V. Swarm Intelligence Algorithms for Feature Selection: A Review. Appl. Sci. 2018, 8, 1521. [Google Scholar] [CrossRef] [Green Version]
- Askari, Q.; Saeed, M.; Younas, I. Heap-based optimizer inspired by corporate rank hierarchy for global optimization. Expert Syst. Appl. 2020, 161, 113702. [Google Scholar] [CrossRef]
- Rahman, A.; Sokkalingam, R.; Othman, M.; Biswas, K.; Abdullah, L.; Kadir, E.A. Nature-Inspired Metaheuristic Techniques for Combinatorial Optimization Problems: Overview and Recent Advances. Mathematics 2021, 9, 2633. [Google Scholar] [CrossRef]
- Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey wolf optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef] [Green Version]
- Mirjalili, S.; Lewis, A. The whale optimization algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
- Passino, K.M. Bacterial foraging optimization. Int. J. Swarm Intell. Res. (IJSIR) 2010, 1, 1–16. [Google Scholar] [CrossRef]
- Whitley, D. A genetic algorithm tutorial. Stat. Comput. 1994, 4, 65–85. [Google Scholar] [CrossRef]
- Storn, R. On the usage of differential evolution for function optimization. In Proceedings of the North American Fuzzy Information Processing, Berkeley, CA, USA, 19–22 June 1996; pp. 519–523. [Google Scholar]
- Simon, D. Biogeography-based optimization. IEEE Trans. Evol. Comput. 2008, 12, 702–713. [Google Scholar] [CrossRef] [Green Version]
- Askari, Q.; Younas, I.; Saeed, M. Political Optimizer: A novel socio-inspired meta-heuristic for global optimization. Knowl.-Based Syst. 2020, 195, 105709. [Google Scholar] [CrossRef]
- Fadakar, E.; Ebrahimi, M. A new metaheuristic football game inspired algorithm. In Proceedings of the 2016 1st Conference on Swarm Intelligence and Evolutionary Computation (CSIEC), Bam, Iran, 9–11 March 2016; pp. 6–11. [Google Scholar]
- Rashedi, E.; Nezamabadi-Pour, H.; Saryazdi, S. GSA: A Gravitational Search Algorithm. Inf. Sci. 2009, 179, 2232–2248. [Google Scholar] [CrossRef]
- Hatamlou, A. Black hole: A new heuristic optimization approach for data clustering. Inf. Sci. 2013, 222, 175–184. [Google Scholar] [CrossRef]
- Zerigat, D.H.; Benasla, L.; Belmadani, A.; Rahli, M. Galaxy-based search algorithm to solve combined economic and emission dispatch. UPB Sci. Bull. Ser. C Electr. Eng. 2014, 76, 209–220. [Google Scholar]
- Abualigah, L.M.; Khader, A.T.; Hanandeh, E.S. A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J. Comput. Sci. 2018, 25, 456–466. [Google Scholar] [CrossRef]
- Zakeri, A.; Hokmabadi, A. Efficient feature selection method using real-valued grasshopper optimization algorithm. Expert Syst. Appl. 2019, 119, 61–72. [Google Scholar] [CrossRef]
- Mafarja, M.M.; Mirjalili, S. Hybrid Whale Optimization Algorithm with simulated annealing for feature selection. Neurocomputing 2017, 260, 302–312. [Google Scholar] [CrossRef]
- Vijayanand, R.; Devaraj, D. A Novel Feature Selection Method Using Whale Optimization Algorithm and Genetic Operators for Intrusion Detection System in Wireless Mesh Network. IEEE Access 2020, 8, 56847–56854. [Google Scholar] [CrossRef]
- Kelidari, M.; Hamidzadeh, J. Feature selection by using chaotic cuckoo optimization algorithm with levy flight, opposition-based learning and disruption operator. Soft Comput. 2021, 25, 2911–2933. [Google Scholar] [CrossRef]
- Zawbaa, H.M.; Emary, E.; Parv, B.; Sharawi, M. Feature selection approach based on moth-flame optimization algorithm. In Proceedings of the 2016 IEEE Congress on Evolutionary Computation (CEC), Vancouver, BC, Canada, 24–29 July 2016; pp. 4612–4617. [Google Scholar]
- Selvakumar, B.; Muneeswaran, K. Firefly algorithm based feature selection for network intrusion detection. Comput. Secur. 2019, 81, 148–155. [Google Scholar]
- Abdel-Basset, M.; Ding, W.; El-Shahat, D. A hybrid Harris Hawks optimization algorithm with simulated annealing for feature selection. Artif. Intell. Rev. 2021, 54, 593–637. [Google Scholar] [CrossRef]
- Wolpert, D.H.; Macready, W.G. No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1997, 1, 67–82. [Google Scholar] [CrossRef] [Green Version]
- Too, J.; Liang, G.; Chen, H. Memory-based Harris hawk optimization with learning agents: A feature selection approach. Eng. Comput. 2021, 38, 4457–4478. [Google Scholar] [CrossRef]
- Bertsimas, D.; Tsitsiklis, J. Simulated annealing. Stat. Sci. 1993, 8, 10–15. [Google Scholar] [CrossRef]
- Mirjalili, S. SCA: A Sine Cosine Algorithm for solving optimization problems. Knowl.-Based Syst. 2016, 96, 120–133. [Google Scholar] [CrossRef]
- Zhao, W.; Wang, L.; Zhang, Z. Atom search optimization and its application to solve a hydrogeologic parameter estimation problem. Knowl.-Based Syst. 2019, 163, 283–304. [Google Scholar] [CrossRef]
- Hashim, F.A.; Houssein, E.H.; Mabrouk, M.S.; Al-Atabany, W.; Mirjalili, S. Henry gas solubility optimization: A novel physics-based algorithm. Future Gener. Comput. Syst. 2019, 101, 646–667. [Google Scholar] [CrossRef]
- Faramarzi, A.; Heidarinejad, M.; Stephens, B.; Mirjalili, S. Equilibrium optimizer: A novel optimization algorithm. Knowl.-Based Syst. 2020, 191, 105190. [Google Scholar] [CrossRef]
- Conrads, T.P.; Fusaro, V.A.; Ross, S.; Johann, D.; Rajapakse, V.; Hitt, B.A.; Steinberg, S.M.; Kohn, E.C.; Fishman, D.A.; Whitely, G.; et al. High-resolution serum proteomic features for ovarian cancer detection. Endocr.-Relat. Cancer 2004, 11, 163–178. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Street, W.N.; Wolberg, W.H.; Mangasarian, O.L. Nuclear feature extraction for breast tumor diagnosis. In Proceedings of the IS&T/SPIE 1993 International Symposium on Electronic Imaging: Science and Technology, San Jose, CA, USA, 31 January–5 February 1993; Volume 1905, pp. 861–870. [Google Scholar]
- Elminaam, D.S.A.; Nabil, A.; Ibraheem, S.A.; Houssein, E.H. An Efficient Marine Predators Algorithm for Feature Selection. IEEE Access 2021, 9, 60136–60153. [Google Scholar] [CrossRef]
- Ibrahim, S.; Nazir, S.; Velastin, S.A. Feature Selection Using Correlation Analysis and Principal Component Analysis for Accurate Breast Cancer Diagnosis. J. Imaging 2021, 7, 225. [Google Scholar] [CrossRef] [PubMed]
Dataset | Symbol | Number of Instances | Number of Features | Source |
---|---|---|---|---|
Breast cancer | DS1 | 569 | 30 | https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(diagnostic), accessed on 12 September 2022 |
German | DS2 | 1000 | 24 | https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data), accessed on 12 September 2022 |
Heart | DS3 | 303 | 13 | https://archive.ics.uci.edu/ml/datasets/heart+Disease, accessed on 12 September 2022 |
Ionosphere | DS4 | 351 | 34 | https://archive.ics.uci.edu/ml/datasets/ionosphere, accessed on 12 September 2022 |
Ovarian cancer | DS5 | 216 | 4000 | Conrads et al. [35] |
Sonar | DS6 | 208 | 60 | https://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+(Sonar,+Mines+vs.+Rocks), accessed on 12 September 2022 |
Algorithm | Parameter | Value |
---|---|---|
Common for all algorithms | 5 | |
Maximum iterations ( | 200 | |
Number of search agents () | 30 | |
Number of independent runs | 20 | |
Ratio of validation data | 0.2 | |
Simulated Annealing | Cooling rate () | 0.93 |
Initial temperature () | 100 | |
Gravitational Search Algorithm | Initial gravitational constant | 100 |
Constant, | 20 | |
Sine Cosine Algorithm | Constant, | 2 |
Atom Search Optimization | Depth weight, | 50 |
Multiplier weight, | 0.2 | |
Henry Gas Solubility Optimization | Number of gas types | 2 |
1 | ||
Influence of other gas, | 1 | |
1 | ||
0.05 | ||
100 | ||
0.01 | ||
Equilibrium Optimizer | 2 | |
1 | ||
Generation probability | 0.5 | |
1 |
Dataset | SA | GSA | SCA | ASO | HGSO | EO |
---|---|---|---|---|---|---|
DS1 | 0.01869 | 0.01758 | 0.0198 | 0.01157 | 0.0198 | 0.01157 |
DS2 | 0.22692 | 0.20258 | 0.20092 | 0.19268 | 0.19888 | 0.18153 |
DS3 | 0.13662 | 0.08635 | 0.10285 | 0.08712 | 0.11781 | 0.06985 |
DS4 | 0.08574 | 0.04419 | 0.02946 | 0.05951 | 0.01532 | 0.01591 |
DS5 | 0.02798 | 0.00453 | 0.00001 | 0.00407 | 0.00001 | 0.00001 |
DS6 | 0.07777 | 0.00333 | 0.02515 | 0.02665 | 0.04863 | 0.02515 |
Dataset | SA | GSA | SCA | ASO | HGSO | EO |
---|---|---|---|---|---|---|
DS1 | 0.98561 | 0.98561 | 0.98561 | 0.99281 | 0.98561 | 0.99511 |
DS2 | 0.775 | 0.8 | 0.8 | 0.81 | 0.805 | 0.82 |
DS3 | 0.86667 | 0.91667 | 0.9 | 0.91667 | 0.88333 | 0.93333 |
DS4 | 0.91429 | 0.95714 | 0.97143 | 0.94286 | 0.98571 | 0.98571 |
DS5 | 0.97674 | 1 | 1 | 1 | 1 | 1 |
DS6 | 0.92683 | 1 | 0.97561 | 0.97561 | 0.95122 | 0.97561 |
Dataset | SA | GSA | SCA | ASO | HGSO | EO | |
---|---|---|---|---|---|---|---|
DS1 | AFS Std. | 5.4 2.07364 | 3.6 0.54772 | 4.8 0.44721 | 4.2 0.83666 | 3.8 0.83666 | 4.8 0.83666 |
DS2 | AFS Std. | 9.6 1.81659 | 10.2 2.16795 | 6 1.41421 | 11.8 1.30384 | 9.4 3.91152 | 8.4 1.14018 |
DS3 | AFS Std. | 4.4 1.67332 | 4.4 0.54772 | 4 1.41421 | 4.8 1.09545 | 3.4 0.89443 | 4.2 0.83666 |
DS4 | AFS Std. | 11.6 4.92950 | 10 2.91548 | 3.4 0.89443 | 10.6 3.43511 | 3.6 1.14018 | 5 0.70711 |
DS5 | AFS Std. | 1986.6 37.35371 | 1891.8 73.07667 | 18 28.53945 | 1682.6 107.37225 | 77.2 60.95654 | 3.4 0.54772 |
DS6 | AFS Std. | 25.8 5.84808 | 23.8 4.14729 | 8 2.91548 | 17.8 3.11448 | 7.8 5.01996 | 11.4 4.15933 |
Dataset | Stats | SA | GSA | SCA | ASO | HGSO | EO |
---|---|---|---|---|---|---|---|
DS1 | Avg. fitness rank Avg. accuracy rank AFS rank | 5 5 6 | 3 3 1 | 6 6 4 | 1 1 3 | 4 4 2 | 2 2 4 |
DS2 | Avg. fitness rank Avg. accuracy rank AFS rank | 6 6 4 | 5 5 5 | 3 3 1 | 2 2 6 | 4 4 3 | 1 1 2 |
DS3 | Avg. fitness rank Avg. accuracy rank AFS rank | 6 6 4 | 1 1 4 | 5 5 2 | 3 3 6 | 4 4 1 | 2 2 3 |
DS4 | Avg. fitness rank Avg. accuracy rank AFS rank | 6 5 6 | 4 4 4 | 3 3 1 | 5 5 5 | 2 2 2 | 1 1 3 |
DS5 | Avg. fitness rank Avg. accuracy rank AFS rank | 6 6 6 | 5 5 5 | 1 1 2 | 4 4 4 | 3 3 3 | 1 1 1 |
DS6 | Avg. fitness rank Avg. accuracy rank AFS rank | 6 6 6 | 3 3 5 | 4 4 2 | 1 1 4 | 5 5 1 | 2 2 3 |
Avg. Rank | 5.61 | 3.66 | 3.11 | 3.33 | 3.11 | 1.88 |
Method | Breast Cancer | % Improvement & | Ionosphere | % Improvement | Sonar | % Improvement |
---|---|---|---|---|---|---|
EO | 0.995 | Best Solution | 0.986 | Best Solution | 0.976 | 2.46% |
SCA | 0.986 | 0.91% | 0.971 | 1.54% | 0.976 | 2.46% |
HGSO | 0.986 | 0.91% | 0.986 | Best Solution | 0.951 | 5.15% |
GWO [37] | 0.970 | 2.58% | 0.951 | 3.68% | 0.970 | 3.09% |
MFO [37] | 0.605 | 64.46% | 0.774 | 27.39% | 0.547 | 82.82% |
WOA [37] | 0.973 | 2.26% | 0.957 | 3.03% | 0.976 | 2.46% |
SSA [37] | 0.982 | 1.32% | 0.985 | 0.10% | 1.000 | Best Solution |
BOA [37] | 0.903 | 10.19% | 0.901 | 9.43% | 0.881 | 13.51% |
HHO [37] | 0.929 | 7.10% | 0.929 | 6.14% | 0.833 | 20.05% |
MPA [37] | 0.982 | 1.32% | 0.985 | 0.10% | 0.976 | 2.46% |
Naive Bayes [38] | 0.845 | 17.75% | - | - | - | - |
Logistic Regression [38] | 0.879 | 13.20% | - | - | - | - |
Random Forest [38] | 0.995 | Best Solution | - | - | - | - |
SVM [38] | 0.620 | 60.48% | - | - | - | - |
K-NN [38] | 0.900 | 10.56% | - | - | - | - |
Decision Tree [38] | 0.880 | 13.07% | - | - | - | - |
SGD [38] | 0.903 | 10.19% | - | - | - | - |
PCA-Naive Bayes [38] | 0.975 | 2.05% | - | - | - | - |
PCA-Logistic Regression [38] | 0.975 | 2.05% | - | - | - | - |
PCA-Random Forest [38] | 0.962 | 3.43% | - | - | - | - |
PCA-SVM [38] | 0.942 | 5.63% | - | - | - | - |
PCA-K-NN [38] | 0.921 | 8.03% | - | - | - | - |
PCA-Decision Tree [38] | 0.905 | 9.94% | - | - | - | - |
PCA-SGD [38] | 0.916 | 8.62% | - | - | - | - |
Method | Breast Cancer | % Feature Reduction & | Ionosphere | % Feature Reduction | Sonar | % Feature Reduction |
---|---|---|---|---|---|---|
EO | 4.8 | 84% | 5 | 85% | 11.4 | 81% |
SCA | 4.8 | 84% | 3.4 | 90% | 8 | 87% |
HGSO | 3.8 | 87% | 3.6 | 89% | 7.8 | 87% |
GWO [37] | 7 | 77% | 4 | 88% | 11 | 82% |
MFO [37] | 6 | 80% | 23 | 32% | 31 | 48% |
WOA [37] | 8 | 73% | 7 | 79% | 26 | 57% |
SSA [37] | 11 | 63% | 14 | 59% | 16 | 73% |
BOA [37] | 12 | 60% | 20 | 41% | 26 | 57% |
HHO [37] | 12 | 60% | 10 | 71% | 20 | 67% |
MPA [37] | 12 | 60% | 6 | 82% | 8 | 87% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Priyadarshini, J.; Premalatha, M.; Čep, R.; Jayasudha, M.; Kalita, K. Analyzing Physics-Inspired Metaheuristic Algorithms in Feature Selection with K-Nearest-Neighbor. Appl. Sci. 2023, 13, 906. https://doi.org/10.3390/app13020906
Priyadarshini J, Premalatha M, Čep R, Jayasudha M, Kalita K. Analyzing Physics-Inspired Metaheuristic Algorithms in Feature Selection with K-Nearest-Neighbor. Applied Sciences. 2023; 13(2):906. https://doi.org/10.3390/app13020906
Chicago/Turabian StylePriyadarshini, Jayaraju, Mariappan Premalatha, Robert Čep, Murugan Jayasudha, and Kanak Kalita. 2023. "Analyzing Physics-Inspired Metaheuristic Algorithms in Feature Selection with K-Nearest-Neighbor" Applied Sciences 13, no. 2: 906. https://doi.org/10.3390/app13020906
APA StylePriyadarshini, J., Premalatha, M., Čep, R., Jayasudha, M., & Kalita, K. (2023). Analyzing Physics-Inspired Metaheuristic Algorithms in Feature Selection with K-Nearest-Neighbor. Applied Sciences, 13(2), 906. https://doi.org/10.3390/app13020906