# How Is a Data-Driven Approach Better than Random Choice in Label Space Division for Multi-Label Classification?

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

- RH1:
- a data-driven approach performs statistically better than the average random baseline;
- RH2:
- a data-driven approach is more likely to outperform RAkELd than methods based on a priori assumptions;
- RH3:
- a data-driven approach has a higher likelihood to outperform RAkELd in the worst case than methods based on a priori assumptions;
- RH4:
- a data-driven approach is more likely to perform better than RAkELd, than otherwise, i.e., the worst-case likelihood is greater than $0.5$;
- RH5:
- the data-driven approach is more time efficient than RAkELd.

## 2. Related Work

## 3. Multi-Label Classification

- objects are represented as feature vectors $\overline{x}$ from the input space X;
- categories, i.e., labels or classes come from a set L, and it spans the output space Y:
- –
- in the case of single-label single-class classification, $\left|L\right|=1$ and $Y=\left(\right)open="\{"\; close="\}">0,1$
- –
- in the case of single-label multi-class classification, $\left|L\right|>1$ and $Y=\left(\right)open="\{"\; close="\}">0,1,\dots ,\left|L\right|$
- –
- in the case of multi-label classification, $Y={2}^{L}$

- the empirical evidence collected: $D=({D}_{x},{D}_{y})\subset X\times Y$;
- a quality criterion function q.

## 4. The Data-Driven Approach

#### 4.1. Label Co-Occurrence Graph

#### 4.2. Dividing the Label Space

#### 4.3. Classification Scheme

- the label co-occurrence graph is constructed based on the training dataset;
- the selected community detection algorithm is executed on the label co-occurrence graph;
- for every community ${L}_{i}$, a new training dataset ${D}_{i}$ is created by taking the original input space with only the label columns that are present in ${L}_{i}$;
- for every community, a classifier ${h}_{i}$ is learned on training set ${D}_{i}$.

## 5. Experiments and Materials

#### 5.1. Datasets

#### 5.2. Experiment Design

- five methods that divide the label space based on structure inferred from the training data via label co-occurrence graphs, in both unweighted and weighted versions of the graphs;
- two methods that take an a priori assumption about the nature of the label space: binary relevance and label powerset;
- one random label space partitioning approach that draws partitions with equal probability: RAkELd.

#### 5.3. Environment

`scikit-multilearn`(Version 0.0.1) [33], a scikit-learn API compatible library for multi-label classification in python that provides its own implementation of several classifiers and uses

`scikit-learn`[34] multi-class classification methods. All of the datasets come from the

`MULAN`[35] dataset library [17] and follow

`MULAN`’s division into the train and test subsets.

`scikit-learn`package (Version 0.15), with the Gini index as the impurity function. We employ community detection methods from the Python version of the

`igraph`library [36] for both weighted and unweighted graphs. The performance measures’ implementation comes from the

`scikit-learn`

`metrics`package.

#### 5.4. Evaluation Methods

- X is the set of objects used in the testing scenario for evaluation
- L is the set of labels that spans the output space Y;
- $\overline{x}$ denotes an example object undergoing classification;
- $h\left(\overline{x}\right)$ denotes the label set assigned to object $\overline{x}$ by the evaluated classifier h;
- y denotes the set of true labels for the observation $\overline{x}$;
- $t{p}_{j}$, $f{p}_{j}$, $f{n}_{j}$, $t{n}_{j}$ are respectively true positives, false positives, false negatives and true negatives of the of label ${L}_{j}$, counted per label over the output of classifier h on the set of testing objects $\overline{x}\in X$, i.e., $h\left(X\right)$;
- the operator $\left[\right[p\left]\right]$ converts the logical value to a number, i.e., it yields 1 if p is true and 0 if p is false.

#### 5.4.1. Example-Based Evaluation Methods

#### 5.4.2. Label-Based Evaluation Methods

## 6. Results and Discussion

#### 6.1. Micro-Averaged F1 Score

#### 6.2. Macro-Averaged F1 Score

#### 6.3. Subset Accuracy

#### 6.4. Jaccard Score

#### 6.5. Hamming Loss

#### 6.6. Efficiency

## 7. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## Appendix A. Result Tables

**Table A1.**Number of random samplings from the universum of RAkELd label space partitions for cases different from 250 samples.

Set Name | k | Number of Samplings |
---|---|---|

birds | 17 | 163 |

emotions | 2 | 15 |

3 | 10 | |

4 | 15 | |

5 | 6 | |

scene | 2 | 15 |

3 | 10 | |

4 | 15 | |

5 | 6 | |

tmc2007-500 | 21 | 22 |

yeast | 12 | 91 |

**Table A2.**Likelihood of performing better than RAkELd in micro-averaged F1 score aggregated over datasets. Bold numbers signify the best likelihoods of a method performing better than RAkELd in the worst-case (Minimum) and average cases (Mean and Median).

Minimum | Median | Mean | Std | |
---|---|---|---|---|

BR | 0.500000 | 0.885556 | 0.840028 | 0.152530 |

LP | 0.438280 | 0.640237 | 0.704891 | 0.171726 |

fast greedy | 0.565217 | 0.806757 | 0.820198 | 0.127463 |

fast greedy-weighted | 0.673913 | 0.922643 | 0.863821 | 0.106477 |

infomap | 0.478261 | 0.713565 | 0.720074 | 0.153453 |

infomap-weighted | 0.433657 | 0.792426 | 0.748072 | 0.170349 |

label_propagation | 0.364309 | 0.734100 | 0.717083 | 0.175682 |

label_propagation-weighted | 0.478964 | 0.815100 | 0.750085 | 0.174347 |

leading_eigenvector | 0.630606 | 0.783227 | 0.803901 | 0.116216 |

leading_eigenvector-weighted | 0.630435 | 0.846506 | 0.834201 | 0.107420 |

walktrap | 0.667500 | 0.742232 | 0.781920 | 0.102776 |

walktrap-weighted | 0.695652 | 0.856861 | 0.852037 | 0.091232 |

**Table A3.**Likelihood of performing better than RAkELd in macro-averaged F1 score aggregated over datasets. Bold numbers signify the best likelihoods of a method performing better than RAkELd in the worst-case (Minimum) and average cases (Mean and Median).

Minimum | Median | Mean | Std | |
---|---|---|---|---|

BR | 0.500000 | 0.985683 | 0.919245 | 0.160273 |

LP | 0.543478 | 0.829283 | 0.779482 | 0.163611 |

fast greedy | 0.543478 | 0.883312 | 0.866478 | 0.139572 |

fast greedy-weighted | 0.695652 | 0.969111 | 0.900483 | 0.116022 |

infomap | 0.478261 | 0.820006 | 0.793086 | 0.147108 |

infomap-weighted | 0.500000 | 0.889556 | 0.818476 | 0.164492 |

label_propagation | 0.449376 | 0.801890 | 0.754205 | 0.182183 |

label_propagation-weighted | 0.521739 | 0.855195 | 0.821663 | 0.148561 |

leading_eigenvector | 0.695652 | 0.863565 | 0.851696 | 0.108860 |

leading_eigenvector-weighted | 0.695652 | 0.889778 | 0.872119 | 0.118911 |

walktrap | 0.695652 | 0.846889 | 0.844807 | 0.105977 |

walktrap-weighted | 0.695652 | 0.894335 | 0.885253 | 0.095436 |

**Table A4.**Likelihood of performing better than RAkELd in subset accuracy aggregated over datasets. Bold numbers signify the best likelihoods of a method performing better than RAkELd in the worst-case (Minimum) and average cases (Mean and Median).

Minimum | Median | Mean | Std | |
---|---|---|---|---|

BR | 0.000000 | 0.498000 | 0.558057 | 0.323240 |

LP | 0.336570 | 0.958889 | 0.885476 | 0.190809 |

fast greedy | 0.213130 | 0.827043 | 0.791811 | 0.214756 |

fast greedy-weighted | 0.061951 | 0.843413 | 0.747624 | 0.288200 |

infomap | 0.580500 | 0.968667 | 0.910039 | 0.143954 |

infomap-weighted | 0.525659 | 0.899430 | 0.859231 | 0.152637 |

label_propagation | 0.380028 | 0.964444 | 0.900555 | 0.174854 |

label_propagation-weighted | 0.336570 | 0.913652 | 0.861642 | 0.189690 |

leading_eigenvector | 0.000000 | 0.826087 | 0.734935 | 0.340537 |

leading_eigenvector-weighted | 0.000000 | 0.843778 | 0.712555 | 0.345277 |

walktrap | 0.078132 | 0.834338 | 0.739359 | 0.288072 |

walktrap-weighted | 0.429958 | 0.812667 | 0.810542 | 0.175723 |

**Table A5.**Likelihood of performing better than RAkELd in Hamming loss aggregated over datasets. Bold numbers signify the best likelihoods of a method performing better than RAkELd in the worst-case (Minimum) and average cases (Mean and Median).

Minimum | Median | Mean | Std | |
---|---|---|---|---|

BR | 0.000000 | 0.369565 | 0.408252 | 0.375954 |

LP | 0.000000 | 0.434653 | 0.380021 | 0.338184 |

fast greedy | 0.022222 | 0.469130 | 0.507034 | 0.266736 |

fast greedy-weighted | 0.093333 | 0.554208 | 0.558218 | 0.280209 |

infomap | 0.000000 | 0.306667 | 0.396299 | 0.352963 |

infomap-weighted | 0.000000 | 0.332902 | 0.397576 | 0.345339 |

label_propagation | 0.000000 | 0.457130 | 0.415966 | 0.361819 |

label_propagation-weighted | 0.000000 | 0.489511 | 0.441813 | 0.363469 |

leading_eigenvector | 0.044383 | 0.465511 | 0.456441 | 0.330251 |

leading_eigenvector-weighted | 0.000444 | 0.451556 | 0.457361 | 0.328774 |

walktrap | 0.070667 | 0.438943 | 0.444424 | 0.312845 |

walktrap-weighted | 0.089333 | 0.379150 | 0.484974 | 0.331186 |

**Table A6.**Likelihood of performing better than RAkELd in Jaccard similarity aggregated over datasets. Bold numbers signify the best likelihoods of a method performing better than RAkELd in the worst-case (Minimum) and average cases (Mean and Median).

Minimum | Median | Mean | Std | |
---|---|---|---|---|

BR | 0.456522 | 0.778060 | 0.759178 | 0.202349 |

LP | 0.355987 | 0.902632 | 0.854545 | 0.189086 |

fast greedy | 0.542302 | 0.876667 | 0.837570 | 0.135707 |

fast greedy-weighted | 0.298197 | 0.875222 | 0.792386 | 0.205646 |

infomap | 0.650000 | 0.945261 | 0.889663 | 0.125510 |

infomap-weighted | 0.510865 | 0.889488 | 0.855242 | 0.148578 |

label_propagation | 0.345816 | 0.928789 | 0.878622 | 0.181165 |

label_propagation-weighted | 0.440592 | 0.920261 | 0.853821 | 0.181186 |

leading_eigenvector | 0.163199 | 0.889874 | 0.821436 | 0.222219 |

leading_eigenvector-weighted | 0.464170 | 0.812444 | 0.794091 | 0.154102 |

walktrap | 0.238558 | 0.891304 | 0.804866 | 0.227916 |

walktrap-weighted | 0.644889 | 0.863722 | 0.852369 | 0.122837 |

**Table A7.**The p-values of the assessment of the performance of the multi-label learning approaches compared against random baseline by the Iman–Davenport–Friedman multiple comparison, per measure.

Iman–Davenport p-Value | |
---|---|

Subset Accuracy | 0.0000000004 |

F1-macro | 0.0000000000 |

F1-micro | 0.0000000177 |

Hamming Loss | 0.0491215784 |

Jaccard | 0.0000124790 |

**Table A8.**The post-hoc pairwise comparison p-values of the assessment of the performance of the multi-label learning approaches compared against random baseline by the Iman–Davenport–Friedman test with Rom post hoc procedure, per measure. Bold numbers signify methods that perfomed statistically significantly better than RAkELd with a p-value $<0.05$.

Accuracy | F1-macro | F1-micro | Hamming Loss | Jaccard | |
---|---|---|---|---|---|

BR | 0.3590121 | 0.0000003 | 0.0000500 | 1.0000000 | 0.0064229 |

LP | 0.0000641 | 0.0280673 | 0.0205862 | 1.0000000 | 0.0000705 |

fast greedy | 0.0515844 | 0.0000234 | 0.0000656 | 1.0000000 | 0.0038623 |

fast greedy-weighted | 0.1089704 | 0.0000001 | 0.0000001 | 0.3472374 | 0.0085647 |

infomap | 0.0000257 | 0.0484159 | 0.0205862 | 1.0000000 | 0.0000705 |

infomap-weighted | 0.0002717 | 0.0010778 | 0.0024092 | 1.0000000 | 0.0001198 |

label_propagation | 0.0000112 | 0.0484159 | 0.0205862 | 1.0000000 | 0.0000098 |

label_propagation weighted | 0.0005315 | 0.0015858 | 0.0024092 | 1.0000000 | 0.0001196 |

leading_eigenvector | 0.1860319 | 0.0001282 | 0.0002372 | 1.0000000 | 0.0056673 |

leading_eigenvector-weighted | 0.2570154 | 0.0000274 | 0.0000653 | 1.0000000 | 0.0259068 |

walktrap | 0.0780264 | 0.0000397 | 0.0004457 | 1.0000000 | 0.0056673 |

walktrap-weighted | 0.0192676 | 0.0000239 | 0.0000040 | 1.0000000 | 0.0018482 |

## Appendix B. Efficiency Figures

**Figure B4.**Efficiency of the leading eigenvector modularity maximization data-driven approach against RAkELd.

## References

- Tsoumakas, G.; Katakis, I. Multi-label classification: An overview. Int. J. Data Warehous. Min.
**2007**, 3, 1–13. [Google Scholar] [CrossRef] - Dembczyński, K.; Waegeman, W.; Cheng, W.; Hüllermeier, E. On label dependence and loss minimization in multi-label classification. Mach. Learn.
**2012**, 88, 5–45. [Google Scholar] [CrossRef] - Tsoumakas, G.; Katakis, I.; Vlahavas, I. Random k-Labelsets for Multilabel Classification. IEEE Trans. Knowl. Data Eng.
**2011**, 23, 1079–1089. [Google Scholar] [CrossRef] - Read, J.; Pfahringer, B.; Holmes, G.; Frank, E. Classifier chains for multi-label classification. Mach. Learn.
**2011**, 85, 333–359. [Google Scholar] [CrossRef] - Dembczynski, K.; Cheng, W.; Hüllermeier, E. Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 279–286.
- Tsoumakas, G.; Katakis, I.; Vlahavas, I. Effective and efficient multilabel classification in domains with large number of labels. In Proceedings of the ECML/PKDD 2008 Workshop on Mining Multidimensional Data (MMD ‘08), Antwerp, Belgium, 19 September 2008; pp. 30–44.
- Madjarov, G.; Kocev, D.; Gjorgjevikj, D.; Džeroski, S. An extensive experimental comparison of methods for multi-label learning. Pattern Recognit.
**2012**, 45, 3084–3104. [Google Scholar] [CrossRef] - Zhang, M.L.; Zhou, Z.H. A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng.
**2014**, 26, 1819–1837. [Google Scholar] [CrossRef] - Clauset, A.; Newman, M.E.J.; Moore, C. Finding community structure in very large networks. Phys. Rev. E
**2004**, 70, 066111. [Google Scholar] [CrossRef] [PubMed] - Newman, M.E.J.; Girvan, M. Finding and evaluating community structure in networks. Phys. Rev. E
**2004**, 69, 026113. [Google Scholar] [CrossRef] [PubMed] - Newman, M.E. Analysis of weighted networks. Phys. Rev. E
**2004**, 70, 056131. [Google Scholar] [CrossRef] [PubMed] - Brandes, U.; Delling, D.; Gaertler, M.; Görke, R.; Hoefer, M.; Nikoloski, Z.; Wagner, D. On Modularity Clustering. IEEE Trans. Knowl. Data E
**2008**, 20, 172–188. [Google Scholar] [CrossRef] - Newman, M.E.J. Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E
**2006**, 74, 036104. [Google Scholar] [CrossRef] [PubMed] - Rosvall, M.; Axelsson, D.; Bergstrom, C.T. The map equation. Eur. Phys. J. Spec. Top.
**2009**, 178, 13–23. [Google Scholar] [CrossRef] - Raghavan, U.N.; Albert, R.; Kumara, S. Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E
**2007**, 76, 036106. [Google Scholar] [CrossRef] [PubMed] - Pons, P.; Latapy, M. Computing communities in large networks using random walks (long version). 2005; arXiv:physics/0512106. [Google Scholar]
- MULAN. Available online: http://mulan.sourceforge.net/datasets-mlc.html (accessed on 21 July 2016).
- Katakis, I.; Tsoumakas, G.; Vlahavas, I. Multilabel text classification for automated tag suggestion. Available online: http://www.kde.cs.uni-kassel.de/ws/rsdc08/pdf/all_rsdc_v2.pdf#page=83 (accessed on 21 July 2016).
- Klimt, B.; Yang, Y. The enron corpus: A new dataset for email classification research. In Machine Learning: ECML 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 217–226. [Google Scholar]
- UC Berkeley Enron Email Analysis. Available online: http://bailando.sims.berkeley.edu/enron_email.html (accessed on 21 July 2016).
- Computationalmedicine.org. Available online: http://www.computationalmedicine.org/challenge/ (accessed on 21 July 2016).
- Boutell, M.R.; Luo, J.; Shen, X.; Brown, C.M. Learning multi-label scene classification. Pattern Recognit.
**2004**, 37, 1757–1771. [Google Scholar] [CrossRef] - Briggs, F.; Lakshminarayanan, B.; Neal, L.; Fern, X.Z.; Raich, R.; Hadley, S.J.K.; Hadley, A.S.; Betts, M.G. Acoustic classification of multiple simultaneous bird species: A multi-instance multi-label approach. J. Acoust. Soc. Am.
**2012**, 131, 4640–4650. [Google Scholar] [CrossRef] [PubMed][Green Version] - Briggs, F.; Huang, Y.; Raich, R.; Eftaxias, K.; Lei, Z.; Cukierski, W.; Hadley, S.; Hadley, A.; Betts, M.; Fern, X.; et al. The 9th annual MLSP competition: New methods for acoustic classification of multiple simultaneous bird species in a noisy environment. In Proceedings of the 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP ‘13), Southampton, UK, 22–25 September 2013; pp. 1–8.
- Duygulu, P.; Barnard, K.; Freitas, J.F.G.D.; Forsyth, D.A. Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary. In Proceedings of the 7th European Conference on Computer Vision-Part IV (ECCV ‘02), Copenhagen, Denmark, 28–31 May 2012; pp. 97–112.
- Snoek, C.G.M.; Worring, M.; Gemert, J.C.V.; Geusebroek, J.M.; Smeulders, A.W.M. The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proceedings of the ACM International Conference on Multimedia, Santa Barbara, CA, USA, 23–27 October 2006; pp. 421–430.
- MediaMill, Research on Visual Search. Available online: http://www.science.uva.nl/research/mediamill/challenge/ (accessed on 21 July 2016).
- Trohidis, K.; Tsoumakas, G.; Kalliris, G.; Vlahavas, I.P. Multi-Label Classification of Music into Emotions. In Proceedings of the Ninth International Conference on Music Information Retrieval (ISMIR 2008), Philadelphia, PA, USA, 14–18 September 2008; Volume 8, pp. 325–330.
- Elisseeff, A.; Weston, J. A Kernel Method for Multi-Labelled Classification. In Advances in Neural Information Processing Systems 14; MIT Press: Cambridge, MA, USA, 2001; pp. 681–687. [Google Scholar]
- Diplaris, S.; Tsoumakas, G.; Mitkas, P.A.; Vlahavas, I. Protein Classification with Multiple Algorithms. In Advances in Informatics; Springer: Berlin/Heidelberg, Germany, 2005; pp. 448–456. [Google Scholar]
- Derrac, J.; García, S.; Molina, D.; Herrera, F. A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol. Comput.
**2011**, 1, 3–18. [Google Scholar] [CrossRef] - Demšar, J. Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Res.
**2006**, 7, 1–30. [Google Scholar] - Scikit-Multilearn. Available online: http://scikit-multilearn.github.io/ (accessed on 21 July 2016).
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res.
**2011**, 12, 2825–2830. [Google Scholar] - Tsoumakas, G.; Spyromitros-Xioufis, E.; Vilcek, J.; Vlahavas, I. Mulan: A Java Library for Multi-Label Learning. J. Mach. Learn. Res.
**2011**, 12, 2411–2414. [Google Scholar] - Csardi, G.; Nepusz, T. The igraph software package for complex network research. Inter. J. Complex Syst.
**2006**, 1695, 1–9. [Google Scholar] - Yang, Y. An Evaluation of Statistical Approaches to Text Categorization. Inf. Retr.
**1999**, 1, 69–90. [Google Scholar] [CrossRef]

**Figure 1.**Statistical evaluation of the method’s performance in terms of micro-averaged F1 score. Gray, baseline; white, statistically identical to the baseline; otherwise, the p-value of the hypothesis that a method performs better than the baseline.

**Figure 2.**Histogram of the methods’ likelihood of performing better than RAkELd in the micro-averaged F1 score aggregated over datasets.

**Figure 3.**Statistical evaluation of the method’s performance in terms of macro-averaged F1 score. Gray, baseline; white, statistically identical to baseline; otherwise, the p-value of the hypothesis that a method performs better than the baseline.

**Figure 4.**Histogram of the methods’ likelihood of performing better than RAkELd in the macro-averaged F1 score aggregated over datasets.

**Figure 5.**Statistical evaluation of the method’s performance in terms of Jaccard similarity score. Gray, baseline; white, statistically identical to baseline; otherwise, the p-value of hypothesis that a method performs better than the baseline.

**Figure 6.**Histogram of the methods’ likelihood of performing better than RAkELd in subset accuracy aggregated over datasets.

**Figure 7.**Statistical evaluation of the method’s performance in terms of micro-averaged F1 score. Gray, baseline; white, statistically identical to baseline, otherwise; the p-value of the hypothesis that a method performs better than the baseline.

**Figure 8.**Histogram of the methods’ likelihood of performing better than RAkELd in Jaccard similarity aggregated over datasets.

**Figure 9.**Histogram of the methods’ likelihood of performing better than RAkELd in Hamming loss similarity aggregated over datasets.

**Table 1.**Likelihood of performing better than RAkELd in the micro-averaged F1 score of every method for each dataset.

BR | LP | Fast Greedy | Fast Greedy-Weighted | Infomap | Infomap-Weighted | Label_Propagation | Label_Propagation-Weighted | Leading_Eigenvector | Leading_Eigenvector-Weighted | Walktrap | Walktrap-Weighted | |
---|---|---|---|---|---|---|---|---|---|---|---|---|

Corel5k | 0.856444 | 0.608000 | 0.961333 | 0.804000 | 0.524000 | 0.881778 | 0.601333 | 0.524000 | 0.949778 | 0.818222 | 0.745333 | 0.799111 |

bibtex | 0.997778 | 0.782667 | 0.756444 | 0.794222 | 0.664889 | 0.816889 | 0.749333 | 0.882222 | 0.812000 | 0.800889 | 0.835111 | 0.833333 |

birds | 0.968562 | 0.438280 | 0.843736 | 0.946833 | 0.591771 | 0.433657 | 0.364309 | 0.478964 | 0.630606 | 0.830791 | 0.694868 | 0.836338 |

delicious | 0.914667 | 0.869333 | 0.941778 | 0.936444 | 0.864000 | 0.874222 | 0.892889 | 0.868889 | 0.934667 | 0.918667 | 0.912889 | 0.916000 |

emotions | 0.500000 | 0.521739 | 0.565217 | 0.673913 | 0.739130 | 0.586957 | 0.673913 | 0.913043 | 0.717391 | 0.630435 | 0.739130 | 0.891304 |

enron | 0.802000 | 0.873000 | 0.934500 | 0.938000 | 0.786000 | 0.776500 | 0.815500 | 0.839000 | 0.776000 | 0.945500 | 0.761000 | 0.859500 |

genbase | 0.941778 | 0.880000 | 0.864444 | 0.919111 | 0.862222 | 0.913333 | 0.880000 | 0.882667 | 0.882667 | 0.862222 | 0.882667 | 0.880000 |

mediamill | 0.740889 | 0.609333 | 0.769778 | 0.932000 | 0.627556 | 0.562222 | 0.615111 | 0.589333 | 0.709333 | 0.886667 | 0.715111 | 0.854222 |

medical | 0.938500 | 0.596500 | 0.769000 | 0.799500 | 0.688000 | 0.736000 | 0.772000 | 0.623000 | 0.770000 | 0.729500 | 0.667500 | 0.698500 |

scene | 0.673913 | 0.608696 | 0.695652 | 0.695652 | 0.478261 | 0.586957 | 0.521739 | 0.608696 | 0.673913 | 0.695652 | 0.695652 | 0.695652 |

tmc2007-500 | 0.999343 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |

yeast | 0.746458 | 0.671141 | 0.740492 | 0.926174 | 0.815063 | 0.808352 | 0.718867 | 0.791201 | 0.790455 | 0.891872 | 0.733781 | 0.960477 |

**Table 2.**Likelihood of performing better than RAkELd in the macro-averaged F1 score of every method for each dataset.

BR | LP | Fast Greedy | Fast Greedy-Weighted | Infomap | Infomap-Weighted | Label_Propagation | Label_Propagation-Weighted | Leading_Eigenvector | Leading_Eigenvector-Weighted | Walktrap | Walktrap-Weighted | |
---|---|---|---|---|---|---|---|---|---|---|---|---|

Corel5k | 1.000000 | 0.665778 | 0.980889 | 0.968444 | 0.615111 | 0.996889 | 0.596444 | 0.712444 | 0.836444 | 0.901333 | 0.845333 | 0.969778 |

bibtex | 1.000000 | 0.871111 | 0.839111 | 0.869333 | 0.803111 | 0.887111 | 0.849333 | 0.945778 | 0.880000 | 0.878222 | 0.881333 | 0.876000 |

birds | 0.992603 | 0.559408 | 0.883957 | 0.999075 | 0.804901 | 0.673601 | 0.449376 | 0.671290 | 0.736477 | 0.847896 | 0.786408 | 0.897365 |

delicious | 0.997778 | 0.937778 | 1.000000 | 1.000000 | 0.929333 | 0.956444 | 0.996889 | 0.974222 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |

emotions | 0.500000 | 0.543478 | 0.543478 | 0.717391 | 0.717391 | 0.608696 | 0.630435 | 0.913043 | 0.695652 | 0.695652 | 0.717391 | 0.891304 |

enron | 0.991500 | 0.966500 | 1.000000 | 0.973000 | 0.943500 | 0.948000 | 0.875500 | 0.954000 | 0.981500 | 0.992000 | 0.949000 | 0.830000 |

genbase | 0.953333 | 0.829333 | 0.881778 | 0.892444 | 0.836000 | 0.892000 | 0.840444 | 0.840889 | 0.882667 | 0.836000 | 0.848444 | 0.840444 |

mediamill | 0.964444 | 0.860444 | 0.882667 | 0.969778 | 0.835111 | 0.743111 | 0.792444 | 0.759556 | 0.881333 | 0.964889 | 0.840889 | 0.943111 |

medical | 0.977500 | 0.725500 | 0.768500 | 0.750500 | 0.696000 | 0.722500 | 0.730000 | 0.697500 | 0.783500 | 0.703000 | 0.701500 | 0.745000 |

scene | 0.673913 | 0.565217 | 0.695652 | 0.695652 | 0.478261 | 0.500000 | 0.478261 | 0.521739 | 0.695652 | 0.695652 | 0.695652 | 0.695652 |

tmc2007-500 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |

yeast | 0.979866 | 0.829232 | 0.921700 | 0.970172 | 0.858315 | 0.893363 | 0.811335 | 0.869500 | 0.847129 | 0.950783 | 0.871738 | 0.934377 |

**Table 3.**Likelihood of performing better than RAkELd in the subset accuracy of every method for each dataset.

BR | LP | Fast Greedy | Fast Greedy-Weighted | Infomap | Infomap-Weighted | Label_Propagation | Label_Propagation-Weighted | Leading_Eigenvector | Leading_Eigenvector-Weighted | Walktrap | Walktrap-Weighted | |
---|---|---|---|---|---|---|---|---|---|---|---|---|

Corel5k | 0.000000 | 0.953778 | 0.652000 | 0.301778 | 0.965778 | 0.652000 | 0.953778 | 0.826667 | 0.000000 | 0.000000 | 0.301778 | 0.780889 |

bibtex | 0.492000 | 0.975111 | 0.828000 | 0.723111 | 0.971556 | 0.761778 | 0.975111 | 0.761778 | 0.800000 | 0.788000 | 0.799111 | 0.761778 |

birds | 0.380028 | 0.336570 | 0.213130 | 0.061951 | 0.651872 | 0.525659 | 0.380028 | 0.336570 | 0.051780 | 0.039297 | 0.078132 | 0.429958 |

delicious | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |

emotions | 0.326087 | 0.717391 | 0.826087 | 0.847826 | 0.956522 | 0.891304 | 0.847826 | 0.891304 | 0.826087 | 0.891304 | 0.760870 | 1.000000 |

enron | 0.504000 | 0.964000 | 0.775500 | 0.817000 | 0.959000 | 0.941000 | 0.990500 | 0.986500 | 0.865500 | 0.795000 | 0.659500 | 0.775500 |

genbase | 0.907556 | 0.871556 | 0.851111 | 0.907556 | 0.851111 | 0.907556 | 0.871556 | 0.871556 | 0.871556 | 0.851111 | 0.871556 | 0.871556 |

mediamill | 0.318222 | 0.924000 | 0.801333 | 0.856889 | 0.987111 | 0.858667 | 0.953333 | 0.936000 | 0.776889 | 0.836444 | 0.914222 | 0.844444 |

medical | 0.866500 | 0.893000 | 0.686500 | 0.839000 | 0.580500 | 0.782500 | 0.839000 | 0.743500 | 0.814000 | 0.853000 | 0.631000 | 0.665500 |

scene | 0.282609 | 1.000000 | 0.869565 | 0.652174 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 0.826087 | 0.543478 | 0.869565 | 0.630435 |

tmc2007-500 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |

yeast | 0.619687 | 0.990306 | 0.998509 | 0.964206 | 0.997017 | 0.990306 | 0.995526 | 0.985831 | 0.987323 | 0.953020 | 0.986577 | 0.966443 |

**Table 4.**Likelihood of performing better than RAkELd in Jaccard similarity of every method for each dataset.

BR | LP | Fast Greedy | Fast Greedy-Weighted | Infomap | Infomap-Weighted | Label_Propagation | Label_Propagation-Weighted | Leading_Eigenvector | Leading_Eigenvector-Weighted | Walktrap | Walktrap-Weighted | |
---|---|---|---|---|---|---|---|---|---|---|---|---|

Corel5k | 0.675111 | 0.828444 | 0.888889 | 0.562667 | 0.788000 | 0.787111 | 0.803111 | 0.597333 | 0.888444 | 0.644444 | 0.504889 | 0.644889 |

bibtex | 0.984444 | 0.998667 | 0.780889 | 0.720889 | 0.975556 | 0.758222 | 0.995111 | 0.866222 | 0.867556 | 0.796000 | 0.938222 | 0.824000 |

birds | 0.820620 | 0.355987 | 0.542302 | 0.298197 | 0.653722 | 0.510865 | 0.345816 | 0.440592 | 0.163199 | 0.464170 | 0.238558 | 0.725381 |

delicious | 0.976000 | 0.973333 | 0.943556 | 0.900444 | 0.964444 | 0.968444 | 0.992444 | 0.984000 | 0.938222 | 0.828889 | 0.988889 | 0.963111 |

emotions | 0.456522 | 0.652174 | 0.760870 | 0.652174 | 0.956522 | 0.826087 | 0.891304 | 0.956522 | 0.891304 | 0.652174 | 0.891304 | 1.000000 |

enron | 0.735500 | 0.984500 | 0.951000 | 0.904500 | 0.934000 | 0.960500 | 0.976000 | 0.994000 | 0.784500 | 0.957000 | 0.760000 | 0.827000 |

genbase | 0.911111 | 0.904444 | 0.864444 | 0.911111 | 0.869333 | 0.952889 | 0.909778 | 0.884000 | 0.904000 | 0.869333 | 0.884000 | 0.909778 |

mediamill | 0.537333 | 0.866667 | 0.682222 | 0.976000 | 0.921778 | 0.774667 | 0.893333 | 0.823556 | 0.706222 | 0.895556 | 0.914222 | 0.900444 |

medical | 0.897500 | 0.789500 | 0.765500 | 0.850000 | 0.650000 | 0.745000 | 0.810500 | 0.722000 | 0.827500 | 0.751000 | 0.691000 | 0.688500 |

scene | 0.456522 | 1.000000 | 0.891304 | 0.782609 | 0.978261 | 1.000000 | 0.978261 | 1.000000 | 0.913043 | 0.739130 | 0.891304 | 0.782609 |

tmc2007-500 | 0.998029 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 0.999343 | 1.000000 | 1.000000 | 1.000000 |

yeast | 0.661447 | 0.900820 | 0.979866 | 0.950037 | 0.984340 | 0.979120 | 0.947800 | 0.977629 | 0.973900 | 0.931394 | 0.956003 | 0.962714 |

**Table 5.**Likelihood of performing better than RAkELd in Hamming loss of every method for each dataset.

BR | LP | Fast Greedy | Fast Greedy-Weighted | Infomap | Infomap-Weighted | Label_Propagation | Label_Propagation-Weighted | Leading_Eigenvector | Leading_Eigenvector-Weighted | Walktrap | Walktrap-Weighted | |
---|---|---|---|---|---|---|---|---|---|---|---|---|

Corel5k | 0.000000 | 0.004000 | 0.400000 | 0.148000 | 0.003556 | 0.115556 | 0.004889 | 0.009778 | 0.069333 | 0.000444 | 0.350667 | 0.243556 |

bibtex | 0.097778 | 0.023111 | 0.022222 | 0.093333 | 0.011556 | 0.044444 | 0.012444 | 0.116000 | 0.059111 | 0.059111 | 0.070667 | 0.089333 |

birds | 0.474341 | 0.008322 | 0.459085 | 0.540915 | 0.055941 | 0.019880 | 0.002774 | 0.006472 | 0.044383 | 0.013870 | 0.104022 | 0.154415 |

delicious | 0.000000 | 0.000000 | 0.244444 | 0.350667 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.249778 | 0.387111 | 0.081778 | 0.120444 |

emotions | 0.369565 | 0.391304 | 0.478261 | 0.695652 | 0.695652 | 0.521739 | 0.586957 | 0.891304 | 0.673913 | 0.521739 | 0.608696 | 0.847826 |

enron | 0.044000 | 0.483000 | 0.460000 | 0.567500 | 0.284000 | 0.274500 | 0.436000 | 0.522500 | 0.474500 | 0.313000 | 0.151000 | 0.277000 |

genbase | 0.911111 | 0.877778 | 0.856889 | 0.911111 | 0.856889 | 0.911111 | 0.877778 | 0.877778 | 0.877778 | 0.856889 | 0.877778 | 0.877778 |

mediamill | 0.138222 | 0.256000 | 0.314667 | 0.403111 | 0.329333 | 0.238667 | 0.260889 | 0.276444 | 0.222222 | 0.403111 | 0.345333 | 0.301778 |

medical | 0.947000 | 0.517000 | 0.735000 | 0.762000 | 0.636500 | 0.725000 | 0.783500 | 0.529000 | 0.740500 | 0.747000 | 0.585500 | 0.675500 |

scene | 0.369565 | 0.521739 | 0.521739 | 0.500000 | 0.282609 | 0.391304 | 0.478261 | 0.456522 | 0.456522 | 0.500000 | 0.630435 | 0.456522 |

tmc2007-500 | 0.999343 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |

yeast | 0.548098 | 0.478001 | 0.592095 | 0.726324 | 0.599553 | 0.528710 | 0.548098 | 0.615958 | 0.609247 | 0.686055 | 0.527218 | 0.775541 |

**Table 6.**Number of communities (number of label subspaces) detected in training sets by each method per dataset.

Fast Greedy | Fast Greedy-Weighted | Infomap | Infomap-Weighted | Label_Propagation | Label_Propagation-Weighted | Leading_Eigenvector | Leading_Eigenvector-Weighted | Walktrap | Walktrap-Weighted | |
---|---|---|---|---|---|---|---|---|---|---|

Corel5k | 10 | 13 | 8 | 25 | 4 | 7 | 7 | 15 | 22 | 25 |

bibtex | 3 | 5 | 2 | 8 | 1 | 7 | 4 | 5 | 9 | 8 |

birds | 3 | 3 | 1 | 1 | 1 | 1 | 2 | 3 | 2 | 7 |

delicious | 3 | 6 | 1 | 3 | 1 | 2 | 2 | 6 | 5 | 4 |

emotions | 1 | 2 | 1 | 1 | 1 | 2 | 1 | 2 | 1 | 2 |

enron | 4 | 4 | 2 | 2 | 2 | 2 | 3 | 3 | 13 | 7 |

genbase | 10 | 11 | 10 | 11 | 10 | 11 | 10 | 11 | 12 | 11 |

mediamill | 3 | 3 | 1 | 2 | 1 | 1 | 3 | 2 | 3 | 4 |

medical | 20 | 19 | 20 | 20 | 18 | 18 | 20 | 18 | 20 | 20 |

scene | 3 | 3 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 |

tmc2007-500 | 2 | 3 | 1 | 1 | 1 | 1 | 2 | 3 | 1 | 2 |

yeast | 1 | 4 | 1 | 1 | 1 | 1 | 1 | 4 | 1 | 4 |

Measure | Micro-Averaged F1 | Micro Averaged F1 | Subset Accuracy | Jaccard Similarity | Hamming Loss |
---|---|---|---|---|---|

RH1: The data-driven approach is significantly better than random ($\alpha =0.05$) | Yes | Yes | Yes | Yes | No |

RH2: The data-driven approach is more likely to outperform RAkELd than a priori methods | Yes | No | Yes | Yes | Yes |

RH3: The data-driven approach is more likely to outperform RAkELd than a priori methods in the worst case | Yes | Yes | Yes | Yes | Yes |

RH4: The data-driven approach is more likely to perform better than RAkELd in the worst case, than otherwise | Yes | Yes | Yes | Yes | No |

RH5: The data-driven approach is more time efficient than RAkELd | Yes | Yes | Yes | Yes | Yes |

Recommended data-driven approach | Weighted fast greedy and weighted walktrap | Weighted fast greedy | Unweighted infomap | Unweighted infomap | Weighted fast greedy |

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Szymański, P.; Kajdanowicz, T.; Kersting, K.
How Is a Data-Driven Approach Better than Random Choice in Label Space Division for Multi-Label Classification? *Entropy* **2016**, *18*, 282.
https://doi.org/10.3390/e18080282

**AMA Style**

Szymański P, Kajdanowicz T, Kersting K.
How Is a Data-Driven Approach Better than Random Choice in Label Space Division for Multi-Label Classification? *Entropy*. 2016; 18(8):282.
https://doi.org/10.3390/e18080282

**Chicago/Turabian Style**

Szymański, Piotr, Tomasz Kajdanowicz, and Kristian Kersting.
2016. "How Is a Data-Driven Approach Better than Random Choice in Label Space Division for Multi-Label Classification?" *Entropy* 18, no. 8: 282.
https://doi.org/10.3390/e18080282