Kernel Partial Least Squares Feature Selection Based on Maximum Weight Minimum Redundancy
Abstract
1. Introduction
2. KPLS Method and Feature Selection
2.1. PLS Method
2.2. KPLS Method
2.3. Improved MWMR Method
3. Proposed KPLS Feature Selection on the Basis of the Improved MWMR Method
- (1)
- Calculation of the latent matrix using the KPLS algorithm;
- (2)
- Calculation of the feature weighting score (W_score(fi|c)) based on the feature, fi, and the class label, c, of the dataset;
- (3)
- Calculation of the feature redundancy score (R_score(fi|fj)) based on the features fi and fj of the dataset;
- (4)
- Calculation of the objective function, , according to the feature weighting score and feature redundancy score;
- (5)
- Selection of an optimal feature subset on the basis of the objective function, .
Algorithm 1. KPLS based on maximum weight minimum redundancy (KPLS-MWMR). |
Input: Feature dataset, , class label, , feature number, k, weight factor, . Output: A selected feature subset, F*. (1) Initialize the feature dataset, F; (2) Let the feature set ; (3) Calculate the latent matrix: F = KPLS(X,Y) using the KPLS algorithm [29]; (4) Calculate the feature weighting score: W_score (F|Y) using the Relief F algorithm [20]; (5) Arrange the feature weighting score in descending order: [WS, rank] = descend(W_score(F|Y)); (6) Form a feature subset S = X(:, rank); (7) Select the optimal feature subset F* = S(:, 1); (8) For each j < k; (9) f1 = S(:, j); (10) w = WS(:, j); (11) Compute the feature redundancy score, r = R_score(f1|(S − F*)) using the PCC algorithm [15]; (12) Calculate the evaluation criteria, according to Equation (5); (13) Arrange the values of R in descending order: [weight, rank] = descend(R); (14) Update F*: F* = [ F*, S (:, rank(1))]; (15) Delete the selected optimal feature in S: S(:,rank(1)) = [ ]; (16) Update j: j = j + 1; (17) Repeat; (18) End; (19) Return the optimal subset F* of k features. |
4. Experimental Results
4.1. Experiments Were Performed Using Synthetic Data
4.2. Experiments Performed Using Public Data
- (1)
- Classification accuracy
- (2)
- Kappa coefficient
- (3)
- F1-score
4.3. Weight Factor Sensitivity Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Cai, J.; Luo, J.W.; Wang, S.L.; Yang, S. Feature selection in machine learning: A new perspective. Neurocomputing 2018, 300, 70–79. [Google Scholar] [CrossRef]
- Solorio-Fernández, S.; Carrasco-Ochoa, J.A.; Martínez-Trinidad, J.F. A review of unsupervised feature selection methods. Artif. Intell. Rev. 2020, 53, 907–948. [Google Scholar] [CrossRef]
- Thirumoorthy, K.; Muneeswaran, K. Feature selection using hybrid poor and rich optimization algorithm for text classification. Pattern Recog. Lett. 2021, 147, 63–70. [Google Scholar]
- Raghuwanshi, G.; Tyagi, V. A novel technique for content based image retrieval based on region-weight assignment. Multimed Tools Appl. 2019, 78, 1889–1911. [Google Scholar] [CrossRef]
- Liu, K.; Jiao, Y.; Du, C.; Zhang, X.; Chen, X.; Xu, F.; Jiang, C. Driver Stress Detection Using Ultra-Short-Term HRV Analysis under Real World Driving Conditions. Entropy 2023, 25, 194. [Google Scholar] [CrossRef]
- Ocloo, I.X.; Chen, H. Feature Selection in High-Dimensional Modes via EBIC with Energy Distance Correlation. Entropy 2023, 25, 14. [Google Scholar] [CrossRef]
- Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
- Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
- Tang, J.; Alelyani, S.; Liu, H. Feature selection for classification: A review. In Data Classification: Algorithms and Applications; CRC Press: Boca Raton, FL, USA, 2014; pp. 37–64. [Google Scholar]
- Dy, J.G.; Brodley, C.E. Feature selection for unsupervised learning. J. Mach. Learn. Res. 2004, 5, 845–889. [Google Scholar]
- Lal, T.N.; Chapelle, O.; Weston, J.; Elisseeff, A.; Zadeh, L. Embedded methods. In Feature Extraction Foundations and Applications; Springer: Berlin/Heidelberg, Germany, 2006; pp. 137–165. [Google Scholar]
- Hu, L.; Gao, W.; Zhao, K.; Zhang, P.; Wang, F. Feature selection considering two types of feature relevancy and feature interdependency. Expert Syst. Appl. 2018, 93, 423–434. [Google Scholar] [CrossRef]
- Stańczyk, U. Pruning Decision Rules by Reduct-Based Weighting and Ranking of Features. Entropy 2022, 24, 1602. [Google Scholar] [CrossRef] [PubMed]
- Battiti, R. Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 1994, 5, 537–550. [Google Scholar] [CrossRef]
- Yilmaz, T.; Yazici, A.; Kitsuregawa, M. RELIEF-MM: Effective modality weighting for multimedia information retrieval. Mul. Syst. 2014, 20, 389–413. [Google Scholar] [CrossRef]
- Feng, J.; Jiao, L.; Liu, F.; Sun, T.; Zhang, X. Unsupervised feature selection based on maximum information and minimum redundancy for hyperspectral images. Pattern Recog. 2016, 51, 295–309. [Google Scholar] [CrossRef]
- Zhou, H.; Wang, X.; Zhu, R. Feature selection based on mutual information with correlation coefficient. Appl. Intell. 2021, 52, 5457–5474. [Google Scholar] [CrossRef]
- Ramasamy, M.; Meena Kowshalya, A. Information gain based feature selection for improved textual sentiment analysis. Wirel. Pers. Commun. 2022, 125, 1203–1219. [Google Scholar] [CrossRef]
- Huang, M.; Sun, L.; Xu, J.; Zhang, S. Multilabel feature selection using relief and minimum redundancy maximum relevance based on neighborhood rough sets. IEEE Access 2020, 8, 62011–62031. [Google Scholar] [CrossRef]
- Eiras-Franco, C.; Guijarro-Berdias, B.; Alonso-Betanzos, A.; Bahamonde, A. Scalable feature selection using ReliefF aided by locality-sensitive hashin. Int. J. Intell. Syst. 2021, 36, 6161–6179. [Google Scholar] [CrossRef]
- Paramban, M.; Paramasivan, T. Feature selection using efficient fusion of fisher score and greedy searching for alzheimer’s classification. J. King Saud Univ. Com. Inform. Sci. 2021, 34, 4993–5006. [Google Scholar]
- He, X.; Cai, D.; Niyogi, P. Laplacian score for feature selection. In Proceedings of the Advances in Neural Information Processing Systems 18 Neural Information Processing Systems (NIPS 2005), Vancouver, BC, Canada, 5–8 December 2005. [Google Scholar]
- Zhang, D.; Chen, S.; Zhou, Z.-H. Constraint score: A new filter method for feature selection with pairwise constraints. Pattern Recog. 2008, 41, 1440–1451. [Google Scholar] [CrossRef]
- Palma-Mendoza, R.J.; De-Marcos, L.; Rodriguez, D.; Alonso-Betanzos, A. Distributed correlation-based feature selection in spark. Inform. Sci. 2018, 496, 287–299. [Google Scholar] [CrossRef]
- Yu, L.; Liu, H. Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the 20th International Conference Machine Learning, Washington, DC, USA, 21–24 August 2003; pp. 856–863. [Google Scholar]
- Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef] [PubMed]
- Wang, J.; Wu, L.; Kong, J.; Li, Y.; Zhang, B. Maximum weight and minimum redundancy: A novel framework for feature subset selection. Pattern Recog. 2013, 46, 1616–1627. [Google Scholar] [CrossRef]
- Tran, T.N.; Afanador, N.L.; Buydens, L.M.; Blanchet, L. Interpretation of variable importance in partial least squares with significance multivariate correlation (SMC). Chemom. Intell. Lab. Syst. 2014, 138, 153–160. [Google Scholar] [CrossRef]
- Rosipal, R.; Trejo, L.J. Kernel partial least squares regression in reproducing kernel hilbert space. J. Mach. Learn. Res. 2001, 2, 97–123. [Google Scholar]
- Qiao, J.; Yin, H. Optimizing kernel function with applications to kernel principal analysis and locality preserving projection for feature extraction. J. Inform. Hiding Mul. Sig. Process. 2013, 4, 280–290. [Google Scholar]
- Zhang, D.L.; Qiao, J.; Li, J.B.; Chu, S.C.; Roddick, J.F. Optimizing matrix mapping with data dependent kernel for image classification. J. Inform. Hiding Mul. Sig. Process. 2014, 5, 72–79. [Google Scholar]
- Hsu, C.W.; Chang, C.C.; Lin, C.J. A Practical Guide to Support Vector Classification; National Taiwan University: Taipei, Taiwan, 2003. [Google Scholar]
- Schölkopf, B.; Smola, A.J. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and beyond; MIT Press: Cambridge, MA, USA, 2002. [Google Scholar]
- Talukdar, U.; Hazarika, S.M.; Gan, J.Q. A kernel partial least square based feature selection method. Pattern Recog. 2018, 83, 91–106. [Google Scholar] [CrossRef]
- Golub, G.H.; Heath, M.; Wahba, G. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 1979, 21, 215–223. [Google Scholar] [CrossRef]
- Lin, C.; Tang, J.L.; Li, B.X. Embedded supervised feature selection for multi-class data. In Proceedings of the 2017 SIAM International Conference on Data Mining, Houston, TX, USA, 27–29 April 2017; pp. 516–524. [Google Scholar]
- UCI Machine Learning Repository. Available online: http://archive.ics.uci.edu/mL/index.php (accessed on 27 October 2022).
- Li, J.; Liu, H. Kent Ridge Biomedical Data Set Repository; Nanyang Technological University: Singapore, 2004. [Google Scholar]
- Rigby, A.S. Statistical methods in epidemiology. v. towards an understanding of the kappa coefficient. Disabil. Rehabil. 2000, 22, 339–344. [Google Scholar] [CrossRef]
- Liu, C.; Wang, W.; Wang, M.; Lv, F.; Konan, M. An efficient instance selection algorithm to reconstruct training set for support vector machine. Knowl. Based Syst. 2017, 116, 58–73. [Google Scholar] [CrossRef]
Algorithms | Top Three Features | Accuracy |
---|---|---|
FS | f1, f2, f3 | 0.973 |
CFS | f52, f58, f3 | 0.907 |
mRMR | f3, f2, f1 | 0.973 |
ReliefF | f3, f2, f1 | 0.977 |
KPLS-MWMR | f1, f3, f2 | 0.983 |
Datasets | Instances | Features | Class | Training |
---|---|---|---|---|
Ionosphere | 351 | 34 | 2 | 246 |
Sonar | 208 | 60 | 2 | 146 |
Musk | 4776 | 166 | 2 | 3343 |
Arrhythmia | 452 | 274 | 13 | 316 |
SRBCT | 83 | 2308 | 4 | 58 |
Lung | 203 | 3312 | 5 | 142 |
DrivFace | 606 | 6400 | 3 | 424 |
Carcinom | 174 | 9182 | 11 | 122 |
LSVT | 126 | 310 | 2 | 88 |
Madelon | 2000 | 500 | 2 | 1400 |
Datasets | Feature Selection Method | ||||
---|---|---|---|---|---|
FS | CFS | mRMR | ReliefF | KPLS-MWMR | |
Musk | 0.4807 | 0.5019 | 0.4519 | 0.4397 | 0.5143 |
Arrhythmia | 0.1726 | 0.2277 | 0.3244 | 0.2772 | 0.3410 |
SRBCT | 0.7510 | 0.6698 | 0.8994 | 0.8829 | 0.9383 |
Lung | 0.8087 | 0.7864 | 0.8271 | 0.8876 | 0.8367 |
DrivFace | 0.4295 | 0.7050 | 0.6832 | 0.6460 | 0.7062 |
Carcinom | 0.6770 | 0.6295 | 0.7438 | 0.7303 | 0.7649 |
LSVT | 0.5935 | 0.5490 | 0.6369 | 0.5714 | 0.6871 |
Madelon | 0.1980 | 0.0450 | 0.0780 | 0.2240 | 0.2277 |
Datasets | Feature Selection Method | ||||
---|---|---|---|---|---|
FS | CFS | mRMR | ReliefF | KPLS-MWMR | |
Musk | 0.7403 | 0.7504 | 0.7259 | 0.7193 | 0.7564 |
Arrhythmia | 0.4576 | 0.4851 | 0.4134 | 0.4951 | 0.4954 |
SRBCT | 0.8027 | 0.7828 | 0.9300 | 0.9191 | 0.9642 |
Lung | 0.7654 | 0.7947 | 0.8186 | 0.8930 | 0.8230 |
DrivFace | 0.5864 | 0.8057 | 0.7891 | 0.7689 | 0.8094 |
Carcinom | 0.6519 | 0.6034 | 0.7206 | 0.7099 | 0.7315 |
LSVT | 0.7941 | 0.7705 | 0.8168 | 0.7824 | 0.8433 |
Madelon | 0.5990 | 0.5230 | 0.5390 | 0.6120 | 0.6220 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, X.; Zhou, S. Kernel Partial Least Squares Feature Selection Based on Maximum Weight Minimum Redundancy. Entropy 2023, 25, 325. https://doi.org/10.3390/e25020325
Liu X, Zhou S. Kernel Partial Least Squares Feature Selection Based on Maximum Weight Minimum Redundancy. Entropy. 2023; 25(2):325. https://doi.org/10.3390/e25020325
Chicago/Turabian StyleLiu, Xiling, and Shuisheng Zhou. 2023. "Kernel Partial Least Squares Feature Selection Based on Maximum Weight Minimum Redundancy" Entropy 25, no. 2: 325. https://doi.org/10.3390/e25020325
APA StyleLiu, X., & Zhou, S. (2023). Kernel Partial Least Squares Feature Selection Based on Maximum Weight Minimum Redundancy. Entropy, 25(2), 325. https://doi.org/10.3390/e25020325