Enhanced Feature Subset Selection Using Niche Based Bat Algorithm
Abstract
:1. Introduction
1.1. Motivation for Using Bat Algorithm for This Research
1.2. Paper Division
2. Background
2.1. Feature Subset Selection
2.1.1. Different Approaches for Feature Subset Selection
Filter Approach
Wrapper Approach
Embedded Approach
2.1.2. Optimization Problem
Single Objective
Multi-Objective
2.1.3. Scalarization
2.1.4. Multi-Objective Feature Subset Selection
- Increase classifier accuracy;
- Decrease number of features.
2.1.5. Genetic Algorithm
2.1.6. Particle Swarm Optimization (PSO)
3. Literature Review
3.1. Fitness Sharing Method
3.2. Swarming Method
4. Proposed Methodology
4.1. Bat Algorithm
Algorithm 1: Bat Algorithm |
1. Define Objective function for (x), x = (x1, …,) |
2. Initialize population of bat with random position xi and vi, i = 1, 2, …, m. |
3. Define frequency for pulse fi at xi, ∀i = 1, 2, …, |
4. Initialize rates for pulse ri and the pulse loudness Ai, i = 1, 2, …, |
5. While t < T |
6. For every bat bi, generate some new solutions using Equations (1), (2) and (3). |
7. If rand > ri, then |
8. Select a solution from the best solutions. |
9. Create a local solution close to the best solution. |
10. End if |
11. Generate a new solution by flying randomly |
12. If rand < Ai and (xi) < (x*), then |
13. Accept the new solutions. |
14. Increase ri and reduce Ai. |
15. End if |
16. End for |
17. Rank the bats and find the current best x* |
18. End while. |
4.2. How Bat Algorithm Works
4.3. Bat Representation as a Solution for Feature Selection
4.4. Evaluation Function
4.5. Classifier Used
Classification Measurement
4.6. Niche Based Bat Algorithm (NBBA), Proposed Algorithm
Algorithm 2: Niche Based Bat Algorithm |
1. Create and initialize an n x-dimensional main bat population, S; |
2. repeat |
3. Train the main population, S, for one iteration using the local random walk; |
4. Update the fitness of each bat in main-population, S. xi; |
5. for each sub-population Sk do |
6. Train sub-population of bats, Sk. xi, using main bat algorithm that includes global search; |
7. Update each bat’s fitness; |
8. Update sub-population radius Sk. R; |
9. End For |
10. If possible, merge sub-populations; |
11. Allow sub-populations to absorb solutions/bats from the main population that moved into the radius of any sub-population; |
12. If possible, create new sub-populations; |
13. Until stopping condition is true; |
14. Return optimal solution Sk. ˆy for each sub-population Sk as a solution. |
4.7. How NBBA Works
4.8. Contribution
5. Experimentation and Results
5.1. Datasets
5.2. Results
6. Discussion
Comparative Analysis
7. Conclusions
8. Future Work
8.1. Advantages of NBBA
8.2. Limitations
Author Contributions
Funding
Conflicts of Interest
Abbreviations
GA | Genetic Algorithm |
PSO | Particle Swarm Optimization |
BA | Bat Algorithm |
MOGA | Multi Objective Genetic Algorithm |
MOPSO | Multi Objective Particle Swarm Optimization |
MOBA | Multi Objective Bat Algorithm |
k-NN | K Nearest Neighbor |
NBBA | Niche Based Bat Algorithm |
ACO | Ant Colony Optimization |
References
- Gheyas, I.A.; Smith, L.S. Feature subset selection in large dimensionality domains. Pattern Recognit. 2010, 43, 5–13. [Google Scholar] [CrossRef] [Green Version]
- Boopathi, V.; Subramaniyam, S.; Malik, A.; Lee, G.; Manavalan, B.; Yang, D.-C. mACPpred: A Support Vector Machine-Based Meta-Predictor for Identification of Anticancer Peptides. Int. J. Mol. Sci. 2019, 20, 1964. [Google Scholar] [CrossRef]
- Chen, Y.-W.; Lin, C.-J. Combining SVMs with Various Feature Selection Strategies. In Feature Extraction; Springer: Berlin/Heidelberg, Germany, 2006; pp. 315–324. [Google Scholar]
- Manavalan, B.; Shin, T.H.; Kim, M.O.; Lee, G. PIP-EL: A New Ensemble Learning Method for Improved Proinflammatory Peptide Predictions. Front. Immunol. 2018, 9, 1783. [Google Scholar] [CrossRef]
- RSu; Hu, J.; Zou, Q.; Manavalan, B.; Wei, L. Empirical comparison and analysis of web-based cell-penetrating peptide prediction tools. Brief. Bioinform. 2019. [Google Scholar] [CrossRef]
- Gulgezen, G.; Cataltepe, Z.; Yu, L. Stable feature selection using MRMR algorithm. In Proceedings of the 2009 IEEE 17th Signal Processing and Communications Applications Conference, Antalya, Turkey, 9–11 April 2009; pp. 596–599. [Google Scholar]
- Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef]
- Osei-Bryson, K.-M.; Giles, K.; Kositanurit, B. Exploration of a hybrid feature selection algorithm. J. Oper. Res. Soc. 2003, 54, 790–797. [Google Scholar] [CrossRef]
- Soufan, O.; Kleftogiannis, D.; Kalnis, P.; Bajic, V.B. DWFS: A wrapper feature selection tool based on a parallel genetic algorithm. PLoS ONE 2015, 10, e0117988. [Google Scholar] [CrossRef]
- Engelbrecht, A.P. Computational Intelligence: An Introduction; John Wiley & Sons: Hoboken, NJ, USA, 2007. [Google Scholar]
- Vafaie, H.; de Jong, K. Genetic algorithms as a tool for feature selection in machine learning. In Proceedings of the Fourth International Conference on Tools with Artificial Intelligence TAI ’92, Arlington, VA, USA, 10–13 November 1992; pp. 200–203. [Google Scholar]
- Tu, C.; Chuang, L.; Chang, J.; Yang, C. Feature selection using PSO-SVM. IAENG Int. J. Comput. Sci. 2007, 33, 1–6. [Google Scholar]
- Sun, Y.; Gao, Y. A Multi-Objective Particle Swarm Optimization Algorithm Based on Gaussian Mutation and an Improved Learning Strategy. Mathematics 2019, 7, 148. [Google Scholar] [CrossRef]
- K Nearest Neighbor. Available online: https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm (accessed on 27 September 2018).
- Sareni, B.; Krähenbühl, L. Fitness sharing and niching methods revisited. IEEE Trans. Evol. Comput. 1998, 2, 97–106. [Google Scholar] [CrossRef]
- Yang, X.-S. A New Metaheuristic Bat-Inspired Algorithm; Springer: Berlin/Heidelberg, Germany, 2010; pp. 65–74. [Google Scholar]
- Yang, X.-S. Bat Algorithm: Literature Review and Applications. Int. J. Bio-Inspired Comput. 2013. [Google Scholar] [CrossRef]
- Musikapun, P.; Pongcharoen, P. Solving Multi-Stage Multi-Machine Multi-Product Scheduling Problem Using Bat Algorithm. In Proceedings of the 2nd International Conference on Management and Artificial Intelligence, Bangkok, Thailand, 7–8 April 2012; IACSIT Press: Singapore, 2012; Volume 35. [Google Scholar]
- Yadav, S.L.; Phogat, M. A Review on Bat Algorithm. Int. J. Comput. Sci. Eng. 2017, 5, 39–43. [Google Scholar] [CrossRef]
- Taha, A.M.; Tang, A.Y.C. Bat Algorithm for Rough Set Attribute Reduction. J. Theor. Appl. Inf. Technol. 2013, 51, 10. [Google Scholar]
- An Introduction to Feature Selection. Available online: https://machinelearningmastery.com/an-introduction-to-feature-selection/ (accessed on 27 September 2018).
- Kohavi, R.; John, G.H. Wrappers for feature subset selection. Artif. Intell. 1997, 97, 273–324. [Google Scholar] [Green Version]
- Tsai, C.-F.; Dao, T.-K.; Yang, W.-J.; Nguyen, T.-T.; Pan, T.-S. Parallelized Bat Algorithm with a Communication Strategy; Springer: Cham, Switzerland, 2014; pp. 87–95. [Google Scholar]
- Taha, A.M.; Mustapha, A.; Chen, S.-D. Naive Bayes-guided bat algorithm for feature selection. Sci. World, J. 2013, 2013, 325973. [Google Scholar] [CrossRef]
- Xue, B.; Zhang, M.; Browne, W.N. Particle Swarm Optimization for Feature Selection in Classification: A Multi-Objective Approach. IEEE Trans. Cybern. 2013, 43, 1656–1671. [Google Scholar] [CrossRef]
- Narasimhan, B. Altered particle swarm optimization based attribute selection strategy with improved fuzzy Artificial Neural Network classifier for coronary artery heart disease risk prediction. Int. J. Adv. Res. Ideas Innov. Technol. 2019, 5, 1196–1203. [Google Scholar]
- AA AKINYELU. On the Performance of Cuckoo Search and Bat Algorithms Based Instance Selection Techniques for SVM Speed Optimization with Application to e-Fraud Detection. KSII Trans. Internet Inf. Syst. 2018, 12. [Google Scholar] [CrossRef]
- Brits, R.; Engelbrecht, A.P.; van den Bergh, F. A niching particle swarm optimizer. In Proceedings of the Conference on Simulated Evolution and Learning, Singapore, 1 January 2002; pp. 692–696. [Google Scholar]
- Nakamura, R.Y.M.; Pereira, L.A.M.; Costa, K.A.; Rodrigues, D.; Papa, J.P.; Yang, X.-S. BBA: A Binary Bat Algorithm for Feature Selection. In Proceedings of the 25th SIBGRAPI Conference on Graphics, Patterns and Images, Ouro Preto, Brazil, 22–25 August 2012; pp. 291–297. [Google Scholar]
- Ayyad, S.M.; Saleh, A.I.; Labib, L.M. Gene expression cancer classification using modified k-nearest neighbors technique. BioSystems 2019, 176, 41–51. [Google Scholar] [CrossRef]
- Raikwal, J.S.; Saxena, K. Performance Evaluation of SVM and k-nearest neighbor Algorithm over Medical Data set. Int. J. Comput. Appl. 2012, 50, 35–39. [Google Scholar]
- Gunavathi, K.P.C. Performance Analysis of Genetic Algorithm with kNN and SVM for Feature Selection in Tumor Classification. Int. J. Comput. Inf. Eng. 2014, 8, 1491–1497. [Google Scholar]
- Ma, J. Prediction of heart disease using k-nearest neighbor and particle swarm optimization. Biomed. Res. 2017, 28, 4154–4158. [Google Scholar]
- Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef] [Green Version]
Dataset Name | Total Samples | No. of Attributes | Attribute Characteristics | Train/Test Ratio | No. of Classes |
---|---|---|---|---|---|
Ionosphere | 351 | 34 | integer, real | 70/30 | 2 (Good/Bad) |
Sonar | 208 | 60 | real | 70/30 | 2 (Mine/Rock) |
Madelon | 4400 | 500 | real | 70/30 | 2 (+1/−1) |
Parameters | GA | MOGA | Single Objective Niche GA | Multi-Objective Niche GA |
---|---|---|---|---|
Particle representation | Bit string | Bit string | Bit string | Bit string |
Population | 30 | 50 | 50 | 50 |
Selection method | Tournament selection | Tournament selection | Tournament selection | Tournament selection |
Elitism | 0.3 | 0.3 | 0.3 | 0.3 |
Generation Gap | 0.7 | 0.7 | 0.7 | 0.7 |
Classifier | k-nearest neighbor (k-NN) | k-nearest neighbor | k-nearest neighbor | k-nearest neighbor |
Fitness function | k-NN accuracy | (w1 × k-NN accuracy) + (w2 × no. of features) | k-NN accuracy | (w1 × k-NN accuracy) + (w2 × no. of features) |
Parameter | PSO | MOPSO | Single Objective Niche PSO | Multi-Objective Niche PSO |
---|---|---|---|---|
Particle representation | Bit string | Bit string | Bit string | Bit string |
Swarm size | 30 | 30 | 50 | 50 |
Inertia Weight | 0.8 | 0.8 | 0.8 | 0.8 |
Classifier | k-nearest neighbor | k-nearest neighbor | k-nearest neighbor | k-nearest neighbor |
Fitness function | k-NN accuracy | (w1 × k-NN accuracy) + (w2 × no. of features) | k-NN accuracy | (w1 × k-NN accuracy) + (w2 × no. of features) |
Parameter | BA | MOBA | Single Objective NBBA | Multi-Objective NBBA |
---|---|---|---|---|
Particle representation | Bit string | Bit string | Bit string | Bit string |
Population | 30 | 30 | 50 | 50 |
Loudness | 0.9 | 0.9 | 0.9 | 0.9 |
Pulse rate | 0.3 | 0.3 | 0.3 | 0.3 |
Fmin | 0 | 0 | 0 | 0 |
Fmax | 2 | 2 | 2 | 2 |
Classifier | k-nearest neighbor | k-nearest neighbor | k-nearest neighbor | k-nearest neighbor |
Fitness function | k-NN accuracy | (w1 × k-NN accuracy) + (w2 × no. of features) | k-NN accuracy | (w1 × k-NN accuracy) + (w2 × no. of features) |
Datasets | ||||
---|---|---|---|---|
Ionosphere | Sonar | Madelon | ||
Details | Instances | 351 | 208 | 4400 |
Features | 34 | 60 | 500 | |
Features selected using | GA | 12 | 33 | 265 |
MOGA | 10 | 17 | 256 | |
Single-objective Niche GA | 16 | 30 | 287 | |
Multi-objective Niche GA | 8 | 15 | 196 | |
Accuracy | k-NN | 80.64% | 80.00% | 69.50% |
GA + k-NN | 92.38% | 91.96% | 77.83% | |
MOGA + k-NN | 91.28% | 88.70% | 75.16% | |
Single objective Niche GA + k-NN | 90.28% | 91.93% | 78.50% | |
Multi-objective Niche GA + k-NN | 93.33% | 91.93% | 79.16% |
Datasets | ||||
---|---|---|---|---|
Ionosphere | Sonar | Madelon | ||
Details | Instances | 351 | 351 | 4400 |
Features | 34 | 34 | 500 | |
Features selected using | PSO | 16 | 16 | 325 |
MOPSO | 9 | 9 | 312 | |
Single-objective Niche PSO | 26 | 26 | 317 | |
Multi-objective Niche PSO | 9 | 9 | 213 | |
Accuracy | k-NN | 80.64% | 80.00% | 69.50% |
PSO + k-NN | 91.70% | 91.70% | 75.80% | |
MOPSO + k-NN | 89.52% | 89.52% | 76.00% | |
Single objective Niche PSO + k-NN | 92.38% | 92.38% | 76.50% | |
Multi-objective Niche PSO + k-NN | 91.42% | 91.42% | 76.50% |
Datasets | ||||
---|---|---|---|---|
Ionosphere | Sonar | Madelon | ||
Details | Instances | 351 | 208 | 4400 |
Features | 34 | 60 | 500 | |
Features selected using | BA | 17 | 34 | 253 |
MOBA | 8 | 16 | 225 | |
Single-objective NBBA | 19 | 34 | 248 | |
Multi-objective NBBA | 6 | 13 | 178 | |
Accuracy | k-NN | 80.64 | 80.00 | 69.50 |
BA + k-NN | 93.33 | 91.93 | 78.33 | |
MOBA + k-NN | 92.38 | 93.54 | 78.33 | |
Single objective NBBA + k-NN | 91.70 | 93.54 | 79.16 | |
Multi-objective NBBA + k-NN | 93.33 | 95.16 | 80.16 |
Dataset | GA | PSO | BA | |||
---|---|---|---|---|---|---|
Features | Accuracy | Features | Accuracy | Features | Accuracy | |
Ionosphere | 12 | 92.38% | 16 | 91.7% | 17 | 93.33% |
Sonar | 33 | 91.93% | 36 | 88.70% | 34 | 91.93% |
Madelon | 265 | 77.83% | 325 | 75.83% | 253 | 78.33% |
Dataset | GA | PSO | BA | |||
---|---|---|---|---|---|---|
Features | Accuracy | Features | Accuracy | Features | Accuracy | |
Ionosphere | 16 | 90.28% | 26 | 92.38% | 19 | 91.7% |
Sonar | 30 | 91.93% | 36 | 90.32 | 34 | 93.54% |
Madelon | 287 | 78.5% | 317 | 76.5% | 248 | 79.16% |
Dataset | MOGA | MOPSO | MOBA | |||
---|---|---|---|---|---|---|
Features | Accuracy | Features | Accuracy | Features | Accuracy | |
Ionosphere | 10 | 91.28% | 9 | 89.52% | 8 | 92.38% |
Sonar | 17 | 88.7% | 21 | 91.93% | 16 | 93.54% |
Madelon | 256 | 75.16% | 312 | 76% | 225 | 78.33% |
Dataset | Niche GA | Niche PSO | Niche BA | |||
---|---|---|---|---|---|---|
Features | Accuracy | Features | Accuracy | Features | Accuracy | |
Ionosphere | 8 | 93.33% | 9 | 91.42% | 6 | 93.33% |
Sonar | 15 | 91.93% | 19 | 91.93% | 13 | 95.16% |
Madelon | 196 | 79.16% | 213 | 76.5% | 178 | 80.16% |
Ionosphere | Sonar | Madelon | ||||
---|---|---|---|---|---|---|
Algorithm | Features | Accuracy | Features | Accuracy | Features | Accuracy |
GA | 12 | 92.38% | 33 | 91.93% | 265 | 77.83% |
Niche GA | 16 | 90.28% | 16 | 90.28% | 287 | 78.50% |
PSO | 16 | 91.7% | 36 | 88.70% | 325 | 75.83% |
Niche PSO | 26 | 92.38% | 36 | 90.32 | 317 | 76.5% |
BA | 17 | 93.33% | 34 | 91.93% | 253 | 78.33% |
Niche BA | 19 | 91.7% | 34 | 93.54% | 248 | 79.16% |
Ionosphere | Sonar | Madelon | ||||
---|---|---|---|---|---|---|
Algorithm | Features | Accuracy | Features | Accuracy | Features | Accuracy |
GA | 12 | 92.38% | 17 | 88.7% | 256 | 75.16% |
Niche GA | 8 | 93.33% | 15 | 91.93% | 196 | 79.16% |
PSO | 9 | 89.52% | 21 | 91.93% | 312 | 76.00% |
Niche PSO | 8 | 93.33% | 19 | 91.93% | 213 | 76.5% |
BA | 8 | 92.38% | 16 | 93.54% | 225 | 78.33% |
Niche BA | 6 | 93.33% | 13 | 95.16% | 178 | 80.16% |
Dataset | NSGA-II | NBBA | ||||
---|---|---|---|---|---|---|
Generation | Population | Features | Accuracy | Features | Accuracy | |
Ionosphere | 100 | 20 | 2 | 96.59 | 6 | 93.33 |
50 | 16 | 2 | 96.59 | |||
Sonar | 100 | 20 | 6 | 98.07 | 13 | 95.16 |
50 | 16 | 12 | 100 | |||
Madelon | 100 | 20 | 149 | 86 | 178 | 80.16 |
500 | 20 | 123 | 89.4 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Saleem, N.; Zafar, K.; Sabzwari, A.F. Enhanced Feature Subset Selection Using Niche Based Bat Algorithm. Computation 2019, 7, 49. https://doi.org/10.3390/computation7030049
Saleem N, Zafar K, Sabzwari AF. Enhanced Feature Subset Selection Using Niche Based Bat Algorithm. Computation. 2019; 7(3):49. https://doi.org/10.3390/computation7030049
Chicago/Turabian StyleSaleem, Noman, Kashif Zafar, and Alizaa Fatima Sabzwari. 2019. "Enhanced Feature Subset Selection Using Niche Based Bat Algorithm" Computation 7, no. 3: 49. https://doi.org/10.3390/computation7030049
APA StyleSaleem, N., Zafar, K., & Sabzwari, A. F. (2019). Enhanced Feature Subset Selection Using Niche Based Bat Algorithm. Computation, 7(3), 49. https://doi.org/10.3390/computation7030049