# A Multi-Objective Multi-Label Feature Selection Algorithm Based on Shapley Value

^{*}

## Abstract

**:**

## 1. Introduction

- Shapley value and multi-objective multi-label feature selection are fused from two sides: feature and individual.
- Two improved operators are proposed, which adaptively adjust the crossover and mutation probability by evaluating the features’ contribution and equate the algorithm’s global and local search.
- An improved archive maintenance strategy is put forward to increase the convergence performance of the multi-objective optimization method.
- Experiments on datasets of different scales prove the validity and adaptability of the proposed algorithm.

## 2. Related Works

## 3. Preliminaries

#### 3.1. Multi-Objective Optimization

#### 3.2. Shapley Value

- (1)
- The total personal income is less than the alliance’s income.
- (2)
- Compared with not joining the alliance, every participant is able to gain a higher profit.

#### 3.3. Multi-Label Learning

- Ranking Loss: It evaluates the fraction that an irrelevant label is ranked before a related label. $\overline{{y}_{i}}$ is the complementary set of ${y}_{j}$.$$\begin{array}{c}RL\left(h,\mathcal{H}\right)=\frac{1}{t}{\displaystyle {\displaystyle \sum}_{j=1}^{t}}\{\frac{1}{\left|{y}_{j}\right|\left|\overline{{y}_{i}}\right|}|\left(k,l\right)\in \left({y}_{j}\times \overline{{y}_{i}}\right),s.t.h\left({x}_{j},k\right)\text{}\le \text{}h\left({x}_{j},l\right)\}\end{array}$$
- Average Precision: It measures the average instances’ correlated labels, and these labels are ranked higher than the preset labels. $ran{k}_{h}$ is the descending rank function.$$\begin{array}{c}\text{}AP\left(h,\mathcal{H}\right)=\frac{1}{t}{\displaystyle {\displaystyle \sum}_{j=1}^{t}}\frac{1}{|{y}_{j}|}{\displaystyle {\displaystyle \sum}_{y\in {y}_{j}}}\frac{\left|\left\{{y}^{\prime}|ran{k}_{h}\left(x,{y}^{\prime}\right)\le \text{}ran{k}_{h}\left({x}_{j},y\right),{y}^{\prime}\in {y}_{j}\right\}\right|}{ran{k}_{h}\left({x}_{j},y\right)}\end{array}$$
- Coverage: It records the minimum number of steps that need to be moved to cover the true labels associated with the sample from the sample’s classification prediction labels list.$$\begin{array}{c}CV\left(h,\mathcal{H}\right)=\frac{1}{t}{\displaystyle {\displaystyle \sum}_{j=1}^{t}}ma{x}_{y\in {y}_{j}}ran{k}_{h}\left({x}_{j},y\right)-1\end{array}$$
- Hamming loss: It measures the proportion of misclassified label pair.$$\begin{array}{c}HL\left(h,\mathcal{H}\right)=\frac{1}{t}{\displaystyle {\displaystyle \sum}_{j=1}^{t}}\left|h\left({x}_{j}\right)\u2a01{y}_{j}\right|\end{array}$$
- Macro-F1: It is a label-based index that takes into account the average F-measure of every label.$$\begin{array}{c}MaF\left(f,\mathcal{F}\right)=\frac{1}{q}{\displaystyle {\displaystyle \sum}_{j=1}^{q}}\frac{2{{\displaystyle \sum}}_{i=1}^{t}{y}_{ij}{h}_{j}\left({x}_{i}\right)}{{{\displaystyle \sum}}_{i=1}^{t}{y}_{ij}+{{\displaystyle \sum}}_{i=1}^{t}{h}_{j}\left({x}_{i}\right)}\end{array}$$
- Micro-F1: It is a label-based index that takes into account the average F-measure of the prediction matrix.$$\begin{array}{c}MiF\left(h,\mathcal{H}\right)=\frac{2{{\displaystyle \sum}}_{j=1}^{t}\left|h\left({x}_{j}\right)\cap {y}_{j}\right|}{{{\displaystyle \sum}}_{j=1}^{t}\left|{y}_{j}\right|+{{\displaystyle \sum}}_{j=1}^{t}\left|h\left({x}_{j}\right)\right|}\end{array}$$

## 4. The Proposed Method

#### 4.1. Objective Function

#### 4.2. Mutation Operator

- (a)
- The number of individuals who choose the i-th feature is $\mathrm{u}$, $\frac{m}{2}\text{}\le \text{}\mathrm{u}\text{}\text{}\mathrm{m}$, and the number of individuals who do not choose the i-th feature is $\mathrm{m}-\mathrm{u}$.
- (b)
- The number of individuals who choose the j-th feature is $\mathrm{w}$, $\frac{m}{2}\text{}\le \text{}\mathrm{w}\text{}\text{}\mathrm{m}$, and the number of individuals who do not choose the j-th feature is $\mathrm{m}-\mathrm{w}$.

Algorithm 1 Mutation probability calculation |

Input: Population $pop$; Population scale ${N}_{p}$; Feature dimension d; Default mutation rate $\eta $; Parameter maker t. |

Output: Mutation probability
${P}_{mu}$ |

1: for $i$ = 1:$\text{}{N}_{p}$ |

2:$\text{}A\left(i\right)\text{}$= Fit($pop\left(i\right)$);//Calculate the fitness of every individual in $pop$. |

3: end |

4: for $i$ = 1:$\text{}d$ |

5:$\text{}{\phi}_{i}\text{}$= Shapley($A$);//Calculate the Shapley value of the $i$-th feature. |

6: number1($i$) = Select($pop$(:,$i$));//Record the number of individuals with selected the $i$-th feature. |

7: number2($i$) = Unselect(pop(:,$i$));//Record the number of individuals with unselected the $i$-th feature. |

8: end |

9: for i = 1: d |

10: if t = 0 |

11: if ${\phi}_{i}<0$ && (number1($i$)/number2($i$)) < 1/2 |

12: ${P}_{mu}\left(i\right)=\eta +\eta \ast \left(1-\frac{\mathrm{number}1\left(\mathrm{i}\right)}{\mathrm{number}2\left(\mathrm{i}\right)}\right);$ |

13: t = 1; |

14: elseif ${\phi}_{i}>0$ && (number2($i$)/number1($i$)) < 1/2 |

15: ${P}_{mu}\left(i\right)=\eta +\eta *\left(1-\frac{\mathrm{number}2\left(\mathrm{i}\right)}{\mathrm{number}1\left(\mathrm{i}\right)}\right)$; |

16: t = 1; |

17: else |

18: ${P}_{mu}\left(i\right)=\eta $; |

19: end |

20: end |

21: if t = 0 |

22: if abs$({\phi}_{i}$) < max(abs($\mathsf{\phi}$))/2 |

23: ${P}_{mu}\left(i\right)=\eta +\eta \ast \left(1-\frac{abs\left({\phi}_{i}\right)}{\mathrm{max}\left(abs\left(\mathsf{\phi}\right)\right)}\right)$; |

24: else |

25: ${P}_{mu}\left(i\right)=\eta $; |

26: end |

27: end |

28: end |

Algorithm 2 Mutation operation |

Input: Population $pop$; Population mutation ratio ${P}_{mr}$; Mutation probability ${P}_{m}$; Population scale ${N}_{p}$; Feature dimension $d$. |

Output: Mutation population $po{p}_{m}$ |

1: $n={N}_{p}\ast {P}_{mr}$; |

2: for $i\text{}$= 1:$\text{}n$ |

3: for $j\text{}$= 1:$\text{}d$ |

4: if $\mathrm{rand}$(1) > (1 − ${P}_{m}\left(j\right)$) |

5:$\text{}po{p}_{m}$($i,j$) = abs($pop\left(i,j\right)$ − 1); |

6: else |

7:$\text{}po{p}_{m}\left(i,j\right)\text{}$=$\text{}pop\left(i,j\right)$; |

8: end |

9: end |

10: end |

#### 4.3. Crossover Operator

Algorithm 3 Crossover operation |

Input: Population $pop$; Population crossover ratio ${P}_{cr}$;
Crossover probability ${P}_{c}$; Population scale ${N}_{p}$; Feature dimension d. |

Output: Crossover population $po{p}_{c}$ |

1:$\text{}n={N}_{p}\ast {P}_{cr}$; |

2:$\text{}m=0$; |

3: for $i=1:n$ |

4:$\text{}i1=\mathrm{rand}\left(1,\text{}n\right)$;//Generate a random number from 1 to $n$. |

5: i2 =$\text{}\mathrm{rand}\left(1,\text{}n\right)$; |

6:$\text{}m=2\ast i-1$; |

7: for j = 1: d |

8: if $\mathrm{rand}$(1) > $\mathrm{rand}$(0.5,$\text{}{P}_{c}$) |

9:$\text{}po{p}_{c}\left(m,j\right)=pop\left(i2,j\right);$ |

10:$\text{}po{p}_{c}\left(m+1,j\right)=pop\left(i1,j\right);$ |

11: else |

12:$\text{}po{p}_{c}\left(m,j\right)=pop\left(i1,j\right);$ |

13:$\text{}po{p}_{c}\left(m+1,j\right)=pop\left(i2,j\right);$ |

14: end |

15: end |

16: end |

#### 4.4. The Improved Niche Preservation Mechanism

#### 4.5. The Overall Flow of the Algorithm

## 5. Experiments

#### 5.1. Experiment Settings

#### 5.2. Comparing Methods

#### 5.3. Evaluation of Experimental Results on Multi-Label Classification

#### 5.4. The Comparison on Hypervolume Indicator

#### 5.5. Shapley Value Analysis

#### 5.6. Complexity Analysis

#### 5.7. Comparison of Running Time

## 6. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Bittencourt, M.M.; Silva, R.M.; Almeida, T.A. ML-MDLText: An efficient and lightweight multilabel text classifier with incremental learning. Appl. Soft Comput.
**2020**, 96, 106699. [Google Scholar] [CrossRef] - Omar, A.; Mahmoud, T.M.; Abd-El-Hafeez, T.; Mahfouz, A. Multi-label Arabic text classification in Online Social Networks. Inf. Syst.
**2021**, 100, 101785. [Google Scholar] [CrossRef] - Yun, S.; Oh, S.J.; Heo, B.; Han, D.; Choe, J.; Chun, S. Re-labeling ImageNet: From Single to Multi-Labels, from Global to Localized Labels. arXiv
**2021**, arXiv:2101.05022. [Google Scholar] - Wang, H.; Ding, Y.; Tang, J.; Zou, Q.; Guo, F. Identify RNA-associated subcellular localizations based on multi-label learning using Chou’s 5-steps rule. BMC Genom.
**2021**, 22, 56. [Google Scholar] - Chen, L.; Li, Z.; Zeng, T.; Zhang, Y.H.; Li, H.; Huang, T.; Cai, Y.D. Predicting gene phenotype by multi-label multi-class model based on essential functional features. Mol. Genet. Genom.
**2021**, 296, 905–918. [Google Scholar] [CrossRef] - Jian, L.; Li, J.; Shu, K.; Liu, H. Multi-Label Informed Feature Selection. In Proceedings of the 25th International Joint Conference on Artificial Intelligence, New York, NY, USA, 9–15 July 2016; pp. 1627–1633. [Google Scholar]
- Zhang, M.; Zhou, Z. A Review on Multi-Label Learning Algorithms. IEEE Trans. Knowl. Data Eng.
**2014**, 26, 1819–1837. [Google Scholar] [CrossRef] - Madjarov, G.; Kocev, D.; Gjorgjevikj, D.; Džeroski, S. An extensive experimental comparison of methods for multi-label learning. Pattern Recognit.
**2012**, 45, 3084–3104. [Google Scholar] [CrossRef] - Zhang, M.L.; Zhou, Z.H. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognit.
**2007**, 40, 2038–2048. [Google Scholar] [CrossRef] [Green Version] - Elisseeff, A.; Weston, J. A Kernel Method for Multi-Labelled Classification. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 3–8 December 2001; pp. 681–687. [Google Scholar]
- Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng.
**2014**, 40, 16–28. [Google Scholar] [CrossRef] - Xue, B.; Zhang, M.; Browne, W.N.; Yao, X. A Survey on Evolutionary Computation Approaches to Feature Selection. IEEE Trans. Evol. Comput.
**2016**, 20, 606–626. [Google Scholar] [CrossRef] [Green Version] - Stai, E.; Kafetzoglou, S.; Tsiropoulou, E.E.; Papavassiliou, S. A holistic approach for personalization, relevance feedback & recommendation in enriched multimedia content. Multimed. Tools Appl.
**2016**, 77, 283–326. [Google Scholar] - Herrera-Semenets, V.; Bustio-Martínez, L.; Hernández-León, R.; van den Berg, J. A multi-measure feature selection algorithm for efficacious intrusion detection. Knowl. Based Syst.
**2021**, 227, 107264. [Google Scholar] [CrossRef] - Rauber, T.W.; De AB, F.; Varejao, F.M. Heterogeneous Feature Models and Feature Selection Applied to Bearing Fault Diagnosis. IEEE Trans. Ind. Electron.
**2015**, 62, 637–646. [Google Scholar] [CrossRef] - Jaesung, L.; Dae-Won, K. Efficient Multi-Label Feature Selection Using Entropy-Based Label Selection. Entropy
**2016**, 18, 405. [Google Scholar] - Lin, Y.; Hu, Q.; Zhang, J.; Wu, X. Multi-label feature selection with streaming labels. Inf. Sci.
**2016**, 372, 256–275. [Google Scholar] [CrossRef] - Sechidis, K.; Spyromitros-Xioufis, E.; Vlahavas, I. Information Theoretic Multi-Target Feature Selection via Output Space Quantization. Entropy
**2019**, 21, 855. [Google Scholar] [CrossRef] [Green Version] - Zhang, P.; Gao, W.; Hu, J.; Li, Y. Multi-Label Feature Selection Based on High-Order Label Correlation Assumption. Entropy
**2020**, 22, 797. [Google Scholar] [CrossRef] - Chen, L.; Chen, D. Alignment Based Feature Selection for Multi-label Learning. Neural Process. Lett.
**2019**, 50, 2323–2344. [Google Scholar] [CrossRef] - Lin, Y.; Hu, Q.; Liu, J.; Duan, J. Multi-label feature selection based on max-dependency and min-redundancy. Neurocomputing
**2015**, 168, 92–103. [Google Scholar] [CrossRef] - Spolaôr, N.; Cherman, E.A.; Monard, M.C.; Lee, H.D. A Comparison of Multi-label Feature Selection Methods using the Problem Transformation Approach. Electron. Notes Theor. Comput. Sci.
**2013**, 292, 135–151. [Google Scholar] [CrossRef] [Green Version] - Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell.
**2005**, 27, 1226–1238. [Google Scholar] [CrossRef] - Shang, C.; Li, M.; Feng, S.; Jiang, Q.; Fan, J. Feature selection via maximizing global information gain for text classification. Knowl. Based Syst.
**2013**, 54, 298–309. [Google Scholar] [CrossRef] - Yang, Y.; Pedersen, J.O. A Comparative Study on Feature Selection in Text Categorization. In Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), Nashville, TN, USA, 8–12 July 1997; pp. 412–420. [Google Scholar]
- Huang, R.; Jiang, W.; Sun, G. Manifold-based constraint Laplacian score for multi-label feature selection. Pattern Recognit. Lett.
**2018**, 112, 346–352. [Google Scholar] [CrossRef] - Zhang, J.; Luo, Z.; Li, C.; Zhou, C.; Li, S. Manifold regularized discriminative feature selection for multi-label learning. Pattern Recognit.
**2019**, 95, 136–150. [Google Scholar] [CrossRef] - Zhang, M.L.; Pena, J.M.; Robles, V. Feature selection for multi-label naive Bayes classification. Inf. Sci.
**2009**, 179, 3218–3229. [Google Scholar] [CrossRef] - Guo, Y.; Chung, F.L.; Li, G.; Zhang, L. Multi-Label Bioinformatics Data Classification with Ensemble Embedded Feature Selection. IEEE Access
**2019**, 7, 103863–103875. [Google Scholar] [CrossRef] - Abdel-Basset, M.; El-Shahat, D.; El-henawy, I.; de Albuquerque, V.H.C.; Mirjalili, S. A new fusion of grey wolf optimizer algorithm with a two-phase mutation for feature selection. Expert Syst. Appl.
**2020**, 139, 112824. [Google Scholar] [CrossRef] - Hua, Y.; Liu, Q.; Hao, K.; Jin, Y.A. Survey of Evolutionary Algorithms for Multi-Objective Optimization Problems with Irregular Pareto Fronts. IEEE/CAA J. Autom. Sin.
**2021**, 8, 303–318. [Google Scholar] [CrossRef] - Deb, K.; Jain, H. An Evolutionary Many-Objective Optimization Algorithm Using Reference-Point-Based Nondominated Sorting Approach, Part I: Solving Problems with Box Constraints. IEEE Trans. Evol. Comput.
**2014**, 18, 577–601. [Google Scholar] [CrossRef] - Mnich, K.; Rudnicki, W.R. All-relevant feature selection using multidimensional filters with exhaustive search. Inf. Sci.
**2020**, 524, 277–297. [Google Scholar] [CrossRef] [Green Version] - Hua, Z.; Zhou, J.; Hua, Y.; Zhang, W. Strong approximate Markov blanket and its application on filter-based feature selection. Appl. Soft Comput.
**2019**, 87, 105957. [Google Scholar] [CrossRef] - Fa, A.; As, B. An effective feature selection method for web spam detection. Knowl. Based Syst.
**2019**, 166, 198–206. [Google Scholar] - Bing, X.; Fu, W.; Zhang, M. Multi-Objective Feature Selection in Classification: A Differential Evolution Approach. In Proceedings of the International Conference on Simulated Evolution and Learning, Dunedin, New Zealand, 15–18 December 2014; Volume 8886, pp. 516–528. [Google Scholar]
- Cervante, L.; Xue, B.; Shang, L.; Zhang, M. A Multi-objective Feature Selection Approach Based on Binary Particle Swarm Optimisation (PSO) and Probabilistic Rough Set Theory. In Proceedings of the European Conference on Evolutionary Computation in Combinatorial Optimization, Vienna, Austria, 3–5 April 2013; Volume 7832. [Google Scholar]
- Nouri-Moghaddam, B.; Ghazanfari, M.; Fathian, M. A Novel Multi-Objective Forest Optimization Algorithm for Wrapper Feature Selection. Expert Syst. Appl.
**2021**, 175, 114737. [Google Scholar] [CrossRef] - Dong, H.; Sun, J.; Li, T.; Ding, R.; Sun, X. A multi-objective algorithm for multi-label filter feature selection problem. Appl. Intell.
**2020**, 50, 3748–3774. [Google Scholar] [CrossRef] - Yin, J.; Tao, T.; Xu, J. A Multi-Label Feature Selection Algorithm Based on Multi-Objective Optimization. In Proceedings of the International Joint Conference on Neural Networks, Killarney, Ireland, 12–17 July 2015; p. 7280373. [Google Scholar]
- Zhang, Y.; Gong, D.W.; Sun, X.Y.; Guo, Y.N. A PSO-based multi-objective multi-label feature selection method in classification. Sci. Rep.
**2017**, 7, 376. [Google Scholar] [CrossRef] [PubMed] - Bidgoli, A.A.; Ebrahimpour-Komleh, H.; Rahnamayan, S. Reference-point-based multi-objective optimization algorithm with opposition-based voting scheme for multi-label feature selection. Inf. Sci.
**2021**, 547, 1–17. [Google Scholar] [CrossRef] - Cohen, S.B.; Ruppin, E.; Dror, G. Feature Selection Based on the Shapley Value. In Proceedings of the International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015; Morgan Kaufmann Publishers Inc.: Burlington, MA, USA; pp. 665–670. [Google Scholar]
- Mokdad, F.; Bouchaffra, D.; Zerrouki, N.; Touazi, A. Determination of an Optimal Feature Selection Method Based on Maximum Shapley Value. In Proceedings of the International Conference on Intelligent Systems Design & Applications, Porto, Portugal, 14–16 December 2016; pp. 116–121. [Google Scholar]
- Chu, C.C.F.; Chan, D.P.K. Feature Selection Using Approximated High-Order Interaction Components of the Shapley Value for Boosted Tree Classifier. IEEE Access
**2020**, 8, 112742–112750. [Google Scholar] [CrossRef] - Deng, X.; Wenzhou, L.I.; Jigang, W.U. Hybrid feature selection algorithm fused Shapley value and particle swarm optimization. J. Comput. Appl.
**2018**, 38, 1245–1249. [Google Scholar] - Guha, R.; Khan, A.H.; Singh, P.K.; Sarkar, R.; Bhattacharjee, D. CGA: A new feature selection model for visual human action recognition. Neural Comput. Appl.
**2020**, 33, 5267–5286. [Google Scholar] [CrossRef] - Albizuri, M.J.; Masuya, S.; Zarzuelo, J.M. An Extension of the Shapley Value for Partially Defined Cooperative Games. In Proceedings of the 29th International Conference on Game Theory, Stony Brook, NY, USA, 16–20 July 2018. [Google Scholar]
- Nash, J.F. Non-Cooperative Games. Ann. Math.
**1951**, 54, 286–295. [Google Scholar] [CrossRef] - Peterson, M. Review of Paul Weirich, Collective Rationality: Equilibrium in Cooperative Games. Br. J. Surg.
**2010**, 44, 55–68. [Google Scholar] - Hannesson, R. Individual Rationality and the “Zonal Attachment” Principle: Three Stock Migration Models. Environ. Resour. Econ.
**2006**, 34, 229–245. [Google Scholar] [CrossRef] - Pang, J.; Dong, H.; He, J.; Feng, Q. Mixed Mutation Strategy Evolutionary Programming Based on Shapley Value. In Proceedings of the 2016 IEEE Congress on Evolutionary Computation (CEC), Vancouver, BC, Canada, 24–29 July 2016; pp. 2805–2812. [Google Scholar]
- Alalga, A.; Benabdeslem, K.; Taleb, N. Soft-Constrained Laplacian score for semi-supervised multi-label feature selection. Knowl. Inf. Syst.
**2016**, 47, 75–98. [Google Scholar] [CrossRef] - Dong, H.; Sun, J.; Sun, X.; Ding, R. A many-objective feature selection for multi-label classification. Knowl. Based Syst.
**2020**, 208, 106456. [Google Scholar] [CrossRef] - Tsoumakas, G.; Spyromitros-Xioufis, E.; Vilcek, J.; Vlahavas, I. MULAN: A java library for multi-label learning. J. Mach. Learn. Res.
**2011**, 12, 2411–2414. [Google Scholar] - Read, J.; Reutemann, P.; Pfahringer, B.; Holmes, G. MEKA: A multi-label/multi-target extension to WEKA. J. Mach. Learn. Res.
**2016**, 17, 667–671. [Google Scholar] - Holmes, G.; Donkin, A.; Witten, I.H. WEKA: A Machine Learning Workbench. In Proceedings of the ANZIIS 94 Australian New Zealnd Intelligent Information Systems Conference, Brisbane, Australia, 29 November–2 December 1994; pp. 357–361. [Google Scholar]
- Available online: http://www.uco.es/kdis/mllresources/ (accessed on 22 August 2021).
- Zhang, Y.; Zhou, Z.H. Multilabel Dimensionality Reduction via Dependence Maximization. In Proceedings of the National Conference on Artificial Intelligence, Chicago, IL, USA, 13 July 2008; pp. 1503–1505. [Google Scholar]
- Bader, J.; Zitzler, E. HypE: An Algorithm for Fast Hypervolume-Based Many-Objective Optimization. Evol. Comput.
**2011**, 19, 45–76. [Google Scholar] [CrossRef]

**Figure 3.**Comparison results with the increase of the number of features under the six indicators on flags dataset.

**Figure 4.**Comparison results with the increase of the number of features under the six indicators on emotions dataset.

**Figure 5.**Comparison results with the increase of the number of features under the six indicators on yeast dataset.

**Figure 6.**Comparison results with the increase of the number of features under the six indicators on virus dataset.

**Figure 7.**Comparison results with the increase of the number of features under the six indicators on Languagelog dataset.

**Figure 8.**Comparison results with the increase of the number of features under the six indicators on genbase dataset.

**Figure 9.**Comparison results with the increase of the number of features under the six indicators on medical dataset.

Dataset | Features | Domain | Labels | Samples | Training | Testing |
---|---|---|---|---|---|---|

flags | 19 | images | 7 | 194 | 129 | 65 |

emotions | 72 | music | 6 | 593 | 300 | 293 |

yeast | 103 | biology | 14 | 2417 | 1500 | 917 |

virus | 749 | biology | 6 | 207 | 124 | 83 |

languagelog | 1004 | biology | 75 | 1459 | 1167 | 292 |

genbase | 1185 | biology | 27 | 662 | 463 | 199 |

medical | 1449 | text | 45 | 978 | 333 | 645 |

Methods | Flags | Emotions | Yeast | Virus | Languagelog | Genbase | Medical | Avg. Rank |
---|---|---|---|---|---|---|---|---|

SHAPFS-ML | 0.1915 | 0.1686 | 0.1705 | 0.0146 | 0.1629 | 0.2028 | 0.0627 | 1.1429 |

NSGA III | 0.2056 | 0.1839 | 0.1751 | 0.0157 | 0.1720 | 0.2044 | 0.0677 | 2.8571 |

MDFS | 0.2372 | 0.2778 | 0.1794 | 0.0443 | 0.1692 | 0.2131 | 0.0749 | 5.1429 |

MCLS | 0.1982 | 0.2186 | 0.1765 | 0.0297 | 0.1627 | 0.2100 | 0.0862 | 3.5714 |

MIFS | 0.2015 | 0.2546 | 0.1913 | 0.0306 | 0.1706 | 0.2064 | 0.1449 | 4.8571 |

MDDM_proj | 0.2056 | 0.2050 | 0.1902 | 0.0668 | 0.1779 | 0.2088 | 0.0844 | 5.1429 |

MDDM_spc | 0.2056 | 0.2013 | 0.1854 | 0.1250 | 0.1810 | 0.2112 | 0.0819 | 5.2857 |

Methods | Flags | Emotions | Yeast | Virus | Languagelog | Genbase | Medical | Avg. Rank |
---|---|---|---|---|---|---|---|---|

SHAPFS-ML | 0.8454 | 0.8002 | 0.7566 | 0.9749 | 0.3158 | 0.3774 | 0.8365 | 1.0000 |

NSGA III | 0.8225 | 0.7848 | 0.7504 | 0.9703 | 0.3109 | 0.3749 | 0.8138 | 2.4286 |

MDFS | 0.7929 | 0.7002 | 0.7484 | 0.9291 | 0.2920 | 0.3572 | 0.7412 | 5.1429 |

MCLS | 0.8284 | 0.7505 | 0.7535 | 0.9478 | 0.3120 | 0.3570 | 0.7022 | 3.4286 |

MIFS | 0.8154 | 0.7212 | 0.7322 | 0.9438 | 0.3056 | 0.3608 | 0.4149 | 5.1429 |

MDDM_proj | 0.8182 | 0.7579 | 0.7307 | 0.9000 | 0.2896 | 0.3511 | 0.6940 | 5.7143 |

MDDM_spc | 0.8182 | 0.7594 | 0.7381 | 0.8275 | 0.2808 | 0.3585 | 0.6988 | 5.0000 |

Methods | Flags | Emotions | Yeast | Virus | Languagelog | Genbase | Medical | Avg. Rank |
---|---|---|---|---|---|---|---|---|

SHAPFS-ML | 3.6308 | 1.8960 | 6.3108 | 0.2892 | 13.1370 | 6.2111 | 3.5488 | 1.0714 |

NSGA III | 3.7385 | 2.0248 | 6.4046 | 0.2892 | 13.7295 | 6.2412 | 3.7364 | 2.9286 |

MDFS | 3.8462 | 2.4653 | 6.4526 | 0.4337 | 13.5445 | 6.4271 | 4.1380 | 5.0000 |

MCLS | 3.7231 | 2.1485 | 6.4820 | 0.3614 | 13.1849 | 6.3920 | 4.6698 | 4.2143 |

MIFS | 3.6923 | 2.3020 | 6.6249 | 0.3614 | 13.6130 | 6.3065 | 7.3752 | 4.6429 |

MDDM_proj | 3.7231 | 2.0842 | 6.5540 | 0.5542 | 14.0274 | 6.3618 | 4.6465 | 4.9286 |

MDDM_spc | 3.7231 | 2.0842 | 6.5267 | 0.8554 | 14.5582 | 6.4171 | 4.5085 | 5.2143 |

Methods | Flags | Emotions | Yeast | Virus | Languagelog | Genbase | Medical | Avg. Rank |
---|---|---|---|---|---|---|---|---|

SHAPFS-ML | 0.2352 | 0.2129 | 0.1993 | 0.0301 | 0.0157 | 0.0491 | 0.0137 | 1.0000 |

NSGA III | 0.2681 | 0.2170 | 0.1998 | 0.0321 | 0.0158 | 0.0493 | 0.0139 | 2.1429 |

MDFS | 0.3011 | 0.2921 | 0.2050 | 0.3207 | 0.0162 | 0.0497 | 0.0184 | 4.9286 |

MCLS | 0.2440 | 0.2639 | 0.2013 | 0.2979 | 0.0163 | 0.0503 | 0.0189 | 4.2857 |

MIFS | 0.2703 | 0.2698 | 0.2113 | 0.3070 | 0.0159 | 0.0495 | 0.0278 | 5.0000 |

MDDM_proj | 0.3033 | 0.2409 | 0.2107 | 0.3298 | 0.0162 | 0.0519 | 0.0194 | 6.0000 |

MDDM_spc | 0.3033 | 0.2343 | 0.2098 | 0.3055 | 0.0160 | 0.0516 | 0.0186 | 4.6429 |

Methods | Flags | Emotions | Yeast | Virus | Languagelog | Genbase | Medical | Avg. Rank |
---|---|---|---|---|---|---|---|---|

SHAPFS-ML | 0.7563 | 0.6446 | 0.6335 | 0.9223 | 0.0853 | 0.0149 | 0.7246 | 1.4286 |

NSGA III | 0.7150 | 0.6362 | 0.6240 | 0.9149 | 0.0649 | 0.0075 | 0.7183 | 2.8571 |

MDFS | 0.6761 | 0.4636 | 0.6187 | 0.7978 | 0.0201 | 0.0148 | 0.5968 | 5.2857 |

MCLS | 0.7388 | 0.5114 | 0.6268 | 0.8830 | 0.0165 | 0.0357 | 0.5929 | 3.5714 |

MIFS | 0.7036 | 0.5156 | 0.6001 | 0.8601 | 0.0492 | 0.0000 | 0.0218 | 5.2857 |

MDDM_proj | 0.7000 | 0.5829 | 0.5943 | 0.6848 | 0.0484 | 0.0211 | 0.5751 | 5.0000 |

MDDM_spc | 0.7000 | 0.5860 | 0.6160 | 0.5122 | 0.0539 | 0.0212 | 0.5627 | 4.4286 |

Methods | Flags | Emotions | Yeast | Virus | Languagelog | Genbase | Medical | Avg. Rank |
---|---|---|---|---|---|---|---|---|

SHAPFS-ML | 0.6546 | 0.6193 | 0.3649 | 0.7589 | 0.2308 | 0.1518 | 0.2932 | 1.4286 |

NSGA III | 0.5279 | 0.5674 | 0.3482 | 0.7062 | 0.2237 | 0.1500 | 0.2778 | 3.2857 |

MDFS | 0.4777 | 0.3946 | 0.3367 | 0.5084 | 0.2166 | 0.1515 | 0.2144 | 5.4286 |

MCLS | 0.5657 | 0.4300 | 0.3641 | 0.6115 | 0.2165 | 0.1551 | 0.2183 | 3.5714 |

MIFS | 0.5528 | 0.4544 | 0.3033 | 0.6870 | 0.2217 | 0.1481 | 0.0911 | 4.8571 |

MDDM_proj | 0.5402 | 0.4965 | 0.2950 | 0.4782 | 0.2213 | 0.1521 | 0.2076 | 4.5714 |

MDDM_spc | 0.5402 | 0.5228 | 0.3227 | 0.3002 | 0.2213 | 0.1521 | 0.1835 | 4.4286 |

Methods | Flags | Emotions | Yeast | Virus | Languagelog | Genbase | Medical |
---|---|---|---|---|---|---|---|

SHAPFS-ML | 0.2632 | 0.4583 | 0.2816 | 0.3605 | 0.4781 | 0.4051 | 0.4465 |

NSGA III | 0.2105 | 0.3889 | 0.2427 | 0.3712 | 0.4094 | 0.4203 | 0.3823 |

MDFS | 0.2632 | 0.2639 | 0.2816 | 0.1669 | 0.4452 | 0.0793 | 0.4272 |

MCLS | 0.3684 | 0.4306 | 0.4466 | 0.1696 | 0.2241 | 0.0194 | 0.4976 |

MIFS | 0.6316 | 0.4583 | 0.4951 | 0.2377 | 0.3924 | 0.0270 | 0.4341 |

MDDM_proj | 0.1053 | 0.2500 | 0.4951 | 0.4099 | 0.3466 | 0.4473 | 0.5003 |

MDDM_spc | 0.1053 | 0.4583 | 0.5146 | 0.4686 | 0.2510 | 0.3983 | 0.5003 |

Methods | Flags | Emotions | ||||

Average | Best | Worst | Average | Best | Worst | |

SHAPFS-ML | 0.6301 | 0.7693 | 0.5375 | 0.7326 | 0.7781 | 0.7176 |

NSGA III | 0.5835 | 0.6237 | 0.4800 | 0.5928 | 0.7453 | 0.5668 |

Methods | Yeast | Virus | ||||

Average | Best | Worst | Average | Best | Worst | |

SHAPFS-ML | 0.7264 | 0.7758 | 0.6749 | 0.5667 | 0.6170 | 0.4612 |

NSGA III | 0.6929 | 0.7246 | 0.6184 | 0.5124 | 0.5488 | 0.4392 |

Methods | Languagelog | Genbase | ||||

Average | Best | Worst | Average | Best | Worst | |

SHAPFS-ML | 0.4286 | 0.4515 | 0.4192 | 0.3847 | 0.3864 | 0.3807 |

NSGA III | 0.3971 | 0.4206 | 0.3734 | 0.3755 | 0.3834 | 0.3506 |

Methods | Medical | |||||

Average | Best | Worst | ||||

SHAPFS-ML | 0.4217 | 0.4702 | 0.3984 | |||

NSGA III | 0.3953 | 0.4410 | 0.3705 |

Methods | Flags | Emotions | Yeast | Virus |

SHAPFS-ML | 85.8350 | 422.3640 | 7168.8290 | 129.0880 |

NSGA III | 89.6130 | 422.0730 | 7375.0450 | 117.2760 |

Methods | Languagelog | Genbase | Medical | |

SHAPFS-ML | 5569.5000 | 1112.0010 | 1840.4340 | |

NSGA III | 5728.6390 | 1072.4730 | 1759.2850 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Dong, H.; Sun, J.; Sun, X.
A Multi-Objective Multi-Label Feature Selection Algorithm Based on Shapley Value. *Entropy* **2021**, *23*, 1094.
https://doi.org/10.3390/e23081094

**AMA Style**

Dong H, Sun J, Sun X.
A Multi-Objective Multi-Label Feature Selection Algorithm Based on Shapley Value. *Entropy*. 2021; 23(8):1094.
https://doi.org/10.3390/e23081094

**Chicago/Turabian Style**

Dong, Hongbin, Jing Sun, and Xiaohang Sun.
2021. "A Multi-Objective Multi-Label Feature Selection Algorithm Based on Shapley Value" *Entropy* 23, no. 8: 1094.
https://doi.org/10.3390/e23081094