A Feature Selection Algorithm Performance Metric for Comparative Analysis
Abstract
:1. Introduction
2. Background and Related Work
2.1. Feature Selection
2.2. Approaches to Evaluating Feature Selection Algorithm Performance
2.3. Algorithm Selection
3. Fitness Function
4. Baseline Fitness Improvement
5. Fitness and BFI Robustness
- Test Case A. One irrelevant feature is selected, e.g., . This case should have a fitness lower than all other test cases since no relevant features are selected. Performance is therefore expected to be poor.
- Test Case B. All features are selected. This case should have a fitness greater than test case A. A solution with discriminatory information should perform better than a solution with no discriminatory information.
- Test Case C. One relevant feature is selected, e.g., or . This case should result in the highest fitness over all other test cases. This is the ideal solution, where a single feature can perfectly classify.
- Test Case D. Two relevant features, but one redundant, are selected, e.g., and . This case should have a fitness greater than test case A, but lower than test case C. The features are both correlated; therefore, it is expected to perform better than test case A where there are no relevant features. Redundant features are included; therefore, performance should be worse than the best case scenario in test case C.
- Test Case E. One irrelevant feature and one relevant feature are selected, e.g., and . This case should have a fitness greater than test case B and D but lower than test case C. This case should be better than test case B because it uses fewer features. This case contains a feature that is strongly correlated with the class and should therefore perform better than a solution with no relevant features as in test case A. One additional feature is considered than for test case C; therefore, it should be penalised more.
- Test Case F. A mix of one relevant, one redundant and one irrelevant feature, e.g., and . This case should have a fitness better than test case B and worse than test case E. One feature was removed in this test case in comparison to test case B. Therefore, this test case is expected to perform better than test case B. Since this test case includes more features than test case E, this test case should perform worse than test case E.
6. Materials and Methods
6.1. Datasets
6.2. Feature Selection Algorithms
- Population size: 20
- Number of generations: 20
- Crossover probability: 0.6
- Mutation probability: 0.033
7. Results
- Comparison between two stochastic algorithms: A Mann–Whitney U test is conducted to determine if the BFI sample distributions are equal. If the null hypothesis is rejected, the BFI median values are used to determine which algorithm performed better. Should the null hypothesis not be rejected, the algorithms are assigned the same rank.
- Comparison between a stochastic and deterministic algorithm: The BFI sample median for the stochastic algorithm is compared with the single BFI value of the deterministic algorithm. If the BFI value of the deterministic algorithm falls within the interquartile range of the stochastic algorithm BFI sample, then the algorithms are assigned the same rank.
- Comparison between two deterministic algorithms: The single BFI values are simply compared. Algorithms with the same BFI value are assigned the same rank.
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Kotthoff, L. Algorithm selection for combinatorial search problems: A survey. In Data Mining and Constraint Programming; Springer: Berlin/Heidelberg, Germany, 2016; pp. 149–190. [Google Scholar]
- Kerschke, P.; Hoos, H.H.; Neumann, F.; Trautmann, H. Automated algorithm selection: Survey and perspectives. Evol. Comput. 2019, 27, 3–45. [Google Scholar] [CrossRef] [PubMed]
- Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
- Zongker, D.; Jain, A. Algorithms for feature selection: An evaluation. In Proceedings of the 13th International Conference on Pattern Recognition, Vienna, Austria, 25–29 August 1996; Volume 2, pp. 18–22. [Google Scholar]
- Kohavi, R.; John, G.H. Wrappers for feature subset selection. Artif. Intell. 1997, 97, 273–324. [Google Scholar] [CrossRef] [Green Version]
- Rice, J.R. The algorithm selection problem. In Advances in Computers; Elsevier: Amsterdam, The Netherlands, 1976; Volume 15, pp. 65–118. [Google Scholar]
- Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
- Zhao, Z.; Morstatter, F.; Sharma, S.; Alelyani, S.; Anand, A.; Liu, H. Advancing feature selection research. In ASU Feature Selection Repository; Arizona State University: Tempe, Arizona, 2010; pp. 1–28. [Google Scholar]
- Aha, D.W.; Bankert, R.L. A comparative evaluation of sequential feature selection algorithms. In Learning from Data; Lecture Notes in Statistics; Springer: Berlin/Heidelberg, Germany, 1996; Volume 112, pp. 199–206. [Google Scholar]
- Li, T.; Zhang, C.; Ogihara, M. A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 2004, 20, 2429–2437. [Google Scholar] [CrossRef] [PubMed]
- Bertolazzi, P.; Felici, G.; Festa, P.; Fiscon, G.; Weitschek, E. Integer programming models for feature selection: New extensions and a randomized solution algorithm. Eur. J. Oper. Res. 2016, 250, 389–399. [Google Scholar] [CrossRef]
- Mehri, M.; Chaieb, R.; Kalti, K.; Héroux, P.; Mullot, R.; Essoukri Ben Amara, N. A comparative study of two state-of-the-art feature selection algorithms for texture-based pixel-labeling task of ancient documents. J. Imaging 2018, 4, 97. [Google Scholar] [CrossRef] [Green Version]
- Mostert, W.; Malan, K.M.; Ochoa, G.; Engelbrecht, A.P. Insights into the feature selection problem using local optima networks. In Lecture Notes in Computer Science, Proceedings of the European Conference on Evolutionary Computation in Combinatorial Optimization, Leipzig, Germany, 24–26 April 2019; Springer: Berlin/Heidelberg, Germany, 2019; Volume 11452, pp. 147–162. [Google Scholar]
- Smith-Miles, K.A. Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput. Surv. 2009, 41, 1–25. [Google Scholar] [CrossRef]
- Lindauer, M.; Hoos, H.H.; Hutter, F.; Schaub, T. Autofolio: An automatically configured algorithm selector. J. Artif. Intell. Res. 2015, 53, 745–778. [Google Scholar] [CrossRef] [Green Version]
- Sakamoto, Y.; Ishiguro, M.; Kitagawa, G. Akaike Information Criterion Statistics; D. Reidel: Dordrecht, The Netherlands, 1986; Volume 81, p. 26853. [Google Scholar]
- Powers, D.M. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2011, 2, 37–63. [Google Scholar]
- Van Asch, V. Macro-and Micro-Averaged Evaluation Measures [Basic Draft]; CLiPS: Antwerp, Belgium, 2013; Volume 49. [Google Scholar]
- Chrysostomou, K.; Chen, S.Y.; Liu, X. Combining multiple classifiers for wrapper feature selection. Int. J. Data Mining Model. Manag. 2008, 1, 91–102. [Google Scholar] [CrossRef]
- Bajer, D.; Dudjak, M.; Zorić, B. Wrapper-based feature selection: How important is the wrapped classifier? In Proceedings of the 2020 International Conference on Smart Systems and Technologies (SST), Osijek, Croatia, 14–16 October 2020; pp. 97–105. [Google Scholar]
- Aha, D.; Kibler, D. Instance-based learning algorithms. Mach. Learn. 1991, 6, 37–66. [Google Scholar] [CrossRef] [Green Version]
- Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I.H. The WEKA Data Mining Software: An Update. SIGKDD Explor. Newsl. 2009, 11, 10–18. [Google Scholar] [CrossRef]
- Lichman, M. UCI Machine Learning Repository; UCI: Irvine, CA, USA, 2013. [Google Scholar]
- Tran, B.; Xue, B.; Zhang, M. Adaptive multi-subswarm optimisation for feature selection on high-dimensional classification. In Proceedings of the Genetic and Evolutionary Computation Conference, ACM, Prague, Czech Republic, 13–19 July 2019; pp. 481–489. [Google Scholar]
- Cheng, R.; Jin, Y. A competitive swarm optimizer for large scale optimization. IEEE Trans. Cybern. 2014, 45, 191–204. [Google Scholar] [CrossRef] [PubMed]
- Goldberg, D.E. Genetic Algorithms in Search, Optimization and Machine Learning; Addison-Wesley: Boston, MA, USA, 1989. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
- Hua, J.; Tembe, W.D.; Dougherty, E.R. Performance of feature-selection methods in the classification of high-dimension data. Pattern Recognit. 2009, 42, 409–424. [Google Scholar] [CrossRef]
Algorithm | |||
---|---|---|---|
Dataset 1 | |||
A | 0.9 | 0.5 | 0.4 |
B | 0.4 | 0.5 | −0.1 |
C | 0.7 | 0.5 | 0.2 |
Dataset 2 | |||
A | 0.7 | 0.3 | 0.4 |
B | 0.5 | 0.3 | 0.2 |
C | 0.9 | 0.3 | 0.6 |
Feature | Description |
---|---|
F1 | A unique incremental ID |
F2 | A completely correlated feature to the class |
F3 | A completely correlated and a completely redundant feature |
F4 | A completely irrelevant feature |
Class | true or false |
F1 | F2 | F3 | F4 | Class |
---|---|---|---|---|
1 | 1 | 2 | 0 | true |
2 | 2 | 1 | 0 | false |
3 | 1 | 2 | 0 | true |
4 | 2 | 1 | 0 | false |
5 | 1 | 2 | 0 | true |
Test Case | Pass? | ||||
---|---|---|---|---|---|
A | 0.50 | 0.50 | 0.50 | 0.00 | false |
B | 0.00 | 0.00 | 1.00 | 1.00 | false |
C | 1.00 | 1.00 | 1.00 | 0.00 | true |
D | 0.87 | 0.87 | 1.00 | 0.13 | true |
E | 0.87 | 0.87 | 1.00 | 0.13 | true |
F | 0.60 | 0.60 | 1.0 | 0.4 | true |
Test Case | Pass? | ||||
---|---|---|---|---|---|
A | 0.50 | −0.25 | 0.5 | 0.00 | true |
B | 0.75 | 0.00 | 1.00 | 0.25 | true |
C | 1.00 | 0.25 | 1.00 | 0.00 | true |
D | 0.97 | 0.22 | 1.00 | 0.03 | true |
E | 0.97 | 0.22 | 1.00 | 0.03 | true |
F | 0.90 | 0.14 | 1.00 | 0.10 | true |
Identifier | Name | # Attributes | # Instances | # Classes |
---|---|---|---|---|
D1 | colic | 28 | 368 | 2 |
D2 | vote | 17 | 435 | 2 |
D3 | urbanland | 148 | 675 | 9 |
D4 | lung-cancer | 57 | 32 | 2 |
D5 | primary-tumor | 18 | 339 | 22 |
D6 | heart-c | 14 | 303 | 5 |
D7 | breast-cancer | 10 | 286 | 2 |
D8 | solar-flare | 13 | 323 | 2 |
D9 | sponge | 46 | 76 | 3 |
D10 | flags | 30 | 194 | 8 |
D11 | heart-h | 14 | 294 | 5 |
D12 | zoo | 18 | 101 | 7 |
D13 | lymph | 19 | 148 | 4 |
D14 | autos | 26 | 205 | 7 |
D15 | breast-w | 10 | 699 | 2 |
D16 | synthetic-control | 62 | 600 | 6 |
D17 | sonar | 61 | 208 | 2 |
D18 | credit-a | 16 | 690 | 2 |
D19 | dermatology | 35 | 366 | 6 |
D20 | cylinder-bands | 40 | 540 | 2 |
D21 | audiology | 70 | 226 | 24 |
D22 | labor | 17 | 57 | 2 |
D23 | hepatitis | 20 | 155 | 2 |
D24 | heart-statlog | 14 | 270 | 2 |
D25 | soybean | 36 | 683 | 19 |
D26 | hill-valley | 101 | 606 | 2 |
D27 | glass | 10 | 214 | 7 |
D28 | ionosphere | 35 | 351 | 2 |
D29 | molecular-biology | 59 | 106 | 4 |
Identifier | Algorithm Name | Algorithm Type |
---|---|---|
RAND | Random Feature Selection | Control Method |
AMSO | Adaptive Multi-Swarm Optimisation | Filter and Wrapper Method |
GAFS | Genetic Algorithm for Feature Selection | Wrapper Method |
SBFS | Generalised Sequential Backward Selection | Wrapper Method |
SFFS | Generalised Sequential Forward Selection | Wrapper Method |
PCFS | Pearson Correlation Coefficient Ranker | Filter Method |
IGFS | Information Gain Ranker | Filter Method |
RAND | AMSO | GAFS | SBFS | SFFS | PCFS | IGFS | |
---|---|---|---|---|---|---|---|
D1 | 0.2307 (0.28–0.17) | 0.3942 (0.40–0.38) | 0.4607 (0.47–0.45) | 0.3985 | 0.4620 | 0.3235 | 0.3642 |
D2 | 0.1488 (0.19–0.13) | 0.2684 (0.27–0.27) | 0.2684 (0.27–0.27) | 0.2684 | 0.2684 | 0.2684 | 0.2684 |
D3 | 0.1959 (0.21–0.18) | 0.3043 (0.31–0.30) | 0.2984 (0.31–0.29) | 0.1975 | 0.3253 | 0.2936 | 0.2996 |
D4 | 0.1362 (0.16–0.13) | 0.4324 (0.43–0.38) | 0.4688 (0.51–0.44) | 0.3685 | 0.1875 | 0.2299 | 0.2506 |
D5 | 0.1130 (0.13–0.10) | 0.1995 (0.20–0.19) | 0.2207 (0.23–0.21) | 0.2225 | 0.2207 | 0.1497 | 0.1902 |
D6 | 0.1105 (0.13–0.08) | 0.2238 (0.23–0.22) | 0.2284 (0.23–0.23) | 0.2251 | 0.2284 | 0.1978 | 0.1978 |
D7 | 0.1886(0.21–0.16) | 0.2687 (0.27–0.27) | 0.2703 (0.27–0.27) | 0.2748 | 0.2703 | 0.2500 | 0.2360 |
D8 | 0.1987 (0.21–0.16) | 0.2500 (0.25–0.25) | 0.2500 (0.25–0.25) | 0.2500 | 0.2500 | 0.2500 | 0.2500 |
D9 | 0.1899 (0.20–0.18) | 0.2500 (0.25–0.25) | 0.2500 (0.25–0.25) | 0.2500 | 0.2500 | 0.2500 | 0.2500 |
D10 | 0.1321 (0.17–0.11) | 0.3320 (0.34–0.33) | 0.3536 (0.36–0.34) | 0.3001 | 0.3378 | 0.3217 | 0.2785 |
D11 | 0.2053 (0.22–0.17) | 0.2500 (0.25–0.25) | 0.2500 (0.25–0.25) | 0.2500 | 0.2500 | 0.2500 | 0.2500 |
D12 | 0.1563 (0.19–0.11) | 0.2484 (0.26–0.24) | 0.2807 (0.28–0.26) | 0.2817 | 0.2057 | 0.2217 | 0.2025 |
D13 | 0.1551 (0.18–0.14) | 0.2841 (0.29–0.28) | 0.3111 (0.32–0.31) | 0.3038 | 0.3089 | 0.2901 | 0.3142 |
D14 | 0.1923 (0.21–0.12) | 0.4629 (0.46–0.42) | 0.4368 (0.44–0.42) | 0.4238 | 0.4335 | 0.3523 | 0.3226 |
D15 | 0.1806 (0.20–0.16) | 0.2379 (0.24–0.24) | 0.2379 (0.24–0.24) | 0.2379 | 0.2357 | 0.2169 | 0.2169 |
D16 | 0.1899 (0.20–0.18) | 0.2500 (0.25–0.25) | 0.2500 (0.25–0.25) | 0.2500 | 0.2500 | 0.2500 | 0.2500 |
D17 | 0.1885 (0.20–0.18) | 0.4027 (0.40–0.39) | 0.3663 (0.39–0.33) | 0.2781 | 0.4124 | 0.2825 | 0.2619 |
D18 | 0.1716 (0.25–0.03) | 0.3399 (0.34–0.34) | 0.3399 (0.34–0.34) | 0.3073 | 0.3399 | 0.3399 | 0.3399 |
D19 | 0.0947 (0.13–0.06) | 0.2301 (0.24–0.22) | 0.2453 (0.25–0.23) | 0.2093 | 0.2712 | 0.1603 | 0.1603 |
D20 | 0.1675 (0.19–0.15) | 0.3000 (0.30–0.29) | 0.3512 (0.37–0.34) | 0.3210 | 0.3909 | 0.3223 | 0.2860 |
D21 | 0.1176 (0.18–0.08) | 0.3522 (0.35–0.34) | 0.3524 (0.36–0.35) | 0.3555 | 0.3444 | 0.3433 | 0.3365 |
D22 | 0.1366 (0.20–0.08) | 0.3509 (0.38–0.32) | 0.3693 (0.41–0.35) | 0.3693 | 0.3168 | 0.2757 | 0.3100 |
D23 | 0.1875 (0.20–0.17) | 0.3241 (0.32–0.32) | 0.3353 (0.34–0.32) | 0.2689 | 0.3353 | 0.3111 | 0.2500 |
D24 | 0.1513 (0.18–0.12) | 0.2963 (0.30–0.30) | 0.2963 (0.30–0.29) | 0.2232 | 0.2963 | 0.2963 | 0.2963 |
D25 | 0.0723 (0.09–0.02) | 0.1664 (0.17–0.16) | 0.2069 (0.21–0.20) | 0.1752 | 0.1860 | 0.1457 | 0.1674 |
D26 | 0.1934 (0.21–0.17) | 0.2843 (0.29–0.28) | 0.3117 (0.32–0.31) | 0.2122 | 0.3159 | 0.2610 | 0.2592 |
D27 | 0.1723 (0.22–0.14) | 0.3054 (0.33–0.29) | 0.3427 (0.34–0.34) | 0.3405 | 0.3054 | 0.2471 | 0.2367 |
D28 | 0.1894 (0.20–0.17) | 0.2696 (0.28–0.27) | 0.2756 (0.28–0.27) | 0.2440 | 0.2594 | 0.2525 | 0.2613 |
D29 | 0.1467 (0.17–0.11) | 0.3919 (0.41–0.38) | 0.4043 (0.43–0.40) | 0.2644 | 0.4363 | 0.3395 | 0.3231 |
AMSO | GAFS | SBFS | SFFS | PCFS | IGFS | |
---|---|---|---|---|---|---|
D1 | 2 | 1 | 2 | 1 | 4 | 3 |
D2 | 1 | 1 | 1 | 1 | 1 | 1 |
D3 | 2 | 3 | 5 | 1 | 4 | 3 |
D4 | 2 | 1 | 3 | 6 | 5 | 4 |
D5 | 3 | 1 | 1 | 2 | 5 | 4 |
D6 | 2 | 1 | 3 | 1 | 4 | 4 |
D7 | 3 | 1 | 1 | 2 | 4 | 5 |
D8 | 1 | 1 | 1 | 1 | 1 | 1 |
D9 | 1 | 2 | 1 | 1 | 1 | 1 |
D10 | 2 | 1 | 4 | 2 | 3 | 5 |
D11 | 1 | 1 | 1 | 1 | 1 | 1 |
D12 | 2 | 1 | 1 | 4 | 3 | 5 |
D13 | 4 | 1 | 3 | 2 | 4 | 1 |
D14 | 1 | 1 | 1 | 1 | 2 | 3 |
D15 | 1 | 2 | 1 | 2 | 3 | 3 |
D16 | 1 | 2 | 1 | 1 | 1 | 1 |
D17 | 2 | 3 | 5 | 1 | 4 | 6 |
D18 | 1 | 1 | 2 | 1 | 1 | 1 |
D19 | 3 | 2 | 4 | 1 | 5 | 5 |
D20 | 5 | 2 | 4 | 1 | 3 | 6 |
D21 | 2 | 1 | 1 | 3 | 4 | 5 |
D22 | 1 | 1 | 1 | 2 | 4 | 3 |
D23 | 2 | 1 | 4 | 1 | 3 | 5 |
D24 | 1 | 1 | 2 | 1 | 1 | 1 |
D25 | 4 | 1 | 3 | 2 | 5 | 4 |
D26 | 3 | 2 | 6 | 1 | 4 | 5 |
D27 | 2 | 1 | 1 | 2 | 3 | 4 |
D28 | 1 | 1 | 5 | 3 | 4 | 2 |
D29 | 3 | 2 | 6 | 1 | 4 | 5 |
Algorithm | Best | Worst |
---|---|---|
(1) , | ||
AMSO | 8 (13%) | 1 (3%) |
GAFS | 17 (29%) | 2 (6%) |
SBFS | 10 (17%) | 6 (20%) |
SFFS | 14 (24%) | 1 (3%) |
PCFS | 4 (6%) | 8 (26%) |
IGFS | 5 (8%) | 12 (40%) |
Algorithm | Best | Worst |
---|---|---|
(2) , | ||
AMSO | 9 (18%) | 1 (3%) |
GAFS | 8 (16%) | 3 (10%) |
SBFS | 8 (16%) | 6 (20%) |
SFFS | 19 (38%) | 2 (6%) |
PCFS | 3 (6%) | 10 (33%) |
IGFS | 3 (6%) | 8 (26%) |
(3) , | ||
AMSO | 7 (12%) | 1 (3%) |
GAFS | 14 (25%) | 3 (11%) |
SBFS | 8 (14%) | 8 (29%) |
SFFS | 18 (33%) | 1 (3%) |
PCFS | 4 (7%) | 7 (25%) |
IGFS | 3 (5%) | 7 (25%) |
(4) , | ||
AMSO | 5 (9%) | 2 (6%) |
GAFS | 18 (35%) | 2 (6%) |
SBFS | 8 (15%) | 6 (19%) |
SFFS | 13 (25%) | 3 (9%) |
PCFS | 3 (5%) | 7 (22%) |
IGFS | 4 (7%) | 11 (35%) |
(5) , | ||
AMSO | 2 (4%) | 7 (20%) |
GAFS | 21 (42%) | 0 (0%) |
SBFS | 11 (22%) | 2 (5%) |
SFFS | 10 (20%) | 8 (22%) |
PCFS | 1 (2%) | 6 (17%) |
IGFS | 4 (8%) | 12 (34%) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mostert, W.; Malan, K.M.; Engelbrecht, A.P. A Feature Selection Algorithm Performance Metric for Comparative Analysis. Algorithms 2021, 14, 100. https://doi.org/10.3390/a14030100
Mostert W, Malan KM, Engelbrecht AP. A Feature Selection Algorithm Performance Metric for Comparative Analysis. Algorithms. 2021; 14(3):100. https://doi.org/10.3390/a14030100
Chicago/Turabian StyleMostert, Werner, Katherine M. Malan, and Andries P. Engelbrecht. 2021. "A Feature Selection Algorithm Performance Metric for Comparative Analysis" Algorithms 14, no. 3: 100. https://doi.org/10.3390/a14030100