# Ensemble of Networks for Multilabel Classification

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- To the best of our knowledge, we are the first to propose an ensemble method for managing multilabel classification based on combining sets of LSTM, GRU, and TCN, and we are the first to use TCN on this problem.
- Two new topologies of GRU and TCN are also proposed here, as well as a novel topology that combines the two.
- Another advance in multilabel classification is the application of variants of Adam optimization for building our ensembles.
- Finally, for comparison with future works by other researchers in this area, the MATLAB source code for this study is available at https://github.com/LorisNanni (accessed on 1 November 2022).

## 2. Related Works

## 3. DataSets

- Cal500 [55]: This dataset contains human-generated annotations, which label some popular Western music tracks. Tracks were composed by 500 artists. Cal500 has 502 instances, including 68 numeric features and 174 unique labels.
- Scene [11]: This dataset contains 2407 color images. It includes a predefined training and testing set. The images can have the following labels: beach (369), sunset (364), fall foliage (360), field (327), mountain (223), and urban (210). Sixty-three images have been assigned two category labels and one image three, making the total number of labels fifteen. The images all went through a preprocessing procedure. First, the images were converted to the CIE Luv space, which is perceptually uniform (close to Euclidean distances). Second, the images were divided into a 7 × 7 grid, which produced 49 blocks. Third, the mean and variance of each band were computed. The mean represents a low-resolution image, while the variance represents computationally inexpensive texture features. Finally, the images were transformed into a feature vector (49 × 3 × 2 = 294 dimensions).
- Image [56]: This dataset contains 2000 natural scene images. Images are divided into five base categories: desert (340 images), mountains (268 images), sea (341 images), sunset (216 images), and trees (378 images). Categorizing images into these five basic types produced a large dataset of images that belonged to two categories (442 images) and a smaller set that belonged to three categories (15 images). The total number of labels in this set, however, is 20 due to the joint categories. All images went through similar preprocessing methods as discussed in [11].
- Yeast [57]: This dataset contains biological data. In total there are 2417 micro-array expression data and phylogenetic profiles. They are represented by 103 features and are classified into 14 classes based on function. A gene can be classified into more than one class.
- Arts [5]: This dataset contains 5000 art images, which are described by a total of 462 numeric features. Each image can be classified into 26 classes.
- Liu [15]: This dataset contains drug data used to predict side effects. In total it has 832 compounds. They are represented by 2892 features and 1385 labels.
- ATC [58]: This dataset contains 3883 ATC coded pharmaceuticals. Each sample is represented by 42 features and 14 classes.
- ATC_f: This dataset is a variation of the ATC data set described above. In this dataset, however, the patterns are represented by a descriptor of 806 dimensions (i.e., all three descriptors are examined in this dataset as described in [59]).
- mAn [4]: This dataset contains protein data represented by 20 features and 20 labels.
- Bibtex: This dataset is highly sparse and was used in [5].
- Enron: a highly sparse dataset used in [5].
- Health: a highly sparse dataset used in [5].

#### Performance Indicators

- Hamming loss is the fraction of misclassified labels,$$\mathrm{HLoss}\left(H\right)=\frac{1}{ml}{\displaystyle \sum}_{i=1}^{m}{\displaystyle \sum}_{j=1}^{l}I\left({\mathbf{y}}_{i}\left(j\right)\ne {\mathit{h}}_{i}\left(j\right)\right),$$
- One error is the fraction of instances whose most confident label is incorrect. The indicator should be minimized:$$\mathrm{OneError}\left(\mathrm{F}\right)=\frac{1}{m}{\displaystyle \sum}_{i=1}^{m}I\left({\mathit{h}}_{i}\left(\underset{j}{\mathrm{argmax}}{\mathit{f}}_{i}\right)\ne {\mathbf{y}}_{i}\left(\underset{j}{\mathrm{argmax}}{\mathit{f}}_{i}\right)\right),$$
- Ranking loss is the average fraction of reversely ordered label pairs for each instance. It is derived from the confidence value by taking into account the number of confidence values correctly ranked (i.e., when a true label is ranked before a wrong label). Ranking loss is also an error. Therefore, it should be minimized.
- Coverage is the average number of steps needed to move down the ranked label list of an instance to cover all its relevant labels. As such, coverage should be minimized.
- Average precision is the average fraction of relevant labels ranked higher than a particular label. As such, average precision should be maximized.

- Aiming is the ratio of correctly predicted labels and practically predicted labels:$$\mathrm{Aiming}\left(H\right)=\frac{1}{m}{\displaystyle {\sum}_{i=1}^{m}}\frac{\left|\right|{\mathit{h}}_{i}{\displaystyle \cap}{\mathbf{y}}_{i}\left|\right|}{\left|\right|{\mathit{h}}_{i}\left|\right|}$$
- Recall is the rate of the correctly predicted labels and actual labels:$$\mathrm{Recall}\left(H\right)=\frac{1}{m}{\displaystyle {\sum}_{i=1}^{m}}\frac{\left|\right|{\mathit{h}}_{i}\cap {\mathbf{y}}_{i}\left|\right|}{\left|\right|{\mathbf{y}}_{i}\left|\right|}.$$
- Accuracy is the average ratio of correctly predicted labels over total labels:$$\mathrm{Accuracy}\text{}\left(H\right)=\frac{1}{m}{\displaystyle {\sum}_{i=1}^{m}}\frac{\left|\right|{\mathit{h}}_{i}\cap {\mathbf{y}}_{i}\left|\right|}{\left|\right|\mathit{h}\cup {\mathbf{y}}_{i}\left|\right|}.$$
- Absolute true is the ratio of the perfectly correct prediction events and the total number of prediction events:$$\mathrm{AbsTrue}\left(H\right)=\frac{1}{m}{\displaystyle {\sum}_{i=1}^{m}}I\left({\mathit{h}}_{i}={\mathbf{y}}_{i}\right).$$
- Absolute false is the ratio of the completely wrong prediction events and total number of prediction events:$$\mathrm{AbsFalse}\left(H\right)=\frac{1}{m}{\displaystyle {\sum}_{i=1}^{m}}\frac{\left|\right|{\mathit{h}}_{i}\cup {\mathbf{y}}_{i}\left|\right|-\left|\right|{\mathit{h}}_{i}\cap {\mathbf{y}}_{i}\left|\right|}{l}.$$

## 4. Proposed Approaches

#### 4.1. Model Architectures

#### 4.2. Pre-Processing

- Feature normalization in the range [0, 1] for the dataset ATC_f for IMCC [5];
- For the datasets Liu, Arts, bibtex, enron, and health, feature transform was performed with PCA, where 99% of the variance was retained. Feature transform is only necessary for our proposed networks and not for IMCC and TB. Poor performance resulted when using the original sparse data as input to our proposed networks.

#### 4.3. Long Short-Term Memory (LSTM)

_{t}, and the cell state be c

_{t}at time step t. The first LSTM block uses the initial state of the network and the first-time step of the sequence to compute the first output and updated cell state. At time step t, the block uses the current state of the network (c

_{t−1}, h

_{t−1}) and the next time step of the sequence to compute the output and the updated cell state ct.

#### 4.4. Gated Recurrent Units (GRU)

#### 4.5. Temporal Convolutional Neural Networks (TCN)

#### 4.6. IMCC

#### 4.7. Pooling

#### 4.8. Fully Connected Layer and Sigmoid Layer

#### 4.9. Training

#### 4.10. Ensemble Generation

## 5. Optimizers

#### 5.1. Adam Optimizer

_{t}(the first moment) and u

_{t}(the second moment), can be defined as

^{−8}), and all operations are component-wise [45].

#### 5.2. The DiffGrad Optimizer

#### 5.3. DiffGrad Variants

## 6. Experimental Results

- DG_Cos is the fusion of DG_10 + Cos_10;
- DG_Cos_Exp is the fusion of DG_10 + Cos_10 + Exp_10;
- DG_Cos_Exp_Sto is the fusion of DG_10 + Cos_10 + Exp_10 + Sto_10;
- StoGRU is an ensemble composed of 40 GRU_A, combined by average rule, each coupled with the stocastic approach explained in Section 4.10;
- StoGRU_B as StoGRU but based on GRU_B;
- StoTCN is an ensemble of 40 TCN_A, combined by average rule, each coupled with the stochastic approach explained in Section 4.10;
- StoTCN_B as StoTCN but based on TCN_B;
- StoGRU_TCN is an ensemble of 40 GRU_TCN each coupled with the stochastic approach explained in Section 4.10;
- StoLSTM_GRU is an ensemble of 40 LSTM_GRU each coupled with the stochastic approach explained in Section 4.10;
- ENNbase is the fusion by average rule of StoGRU and StoTCN;
- ENN is the fusion by average rule of StoGRU, StoTCN, StoGRU_B, StoTCN_B and StoGRU_TCN;
- ENNlarge is the fusion by average rule of StoGRU, StoTCN, StoGRU_B, StoTCN_B, StoGRU_TCN and StoLSTM_GRU.

- ENNlarge + w × IMCC is the sum rule between ENNlarge and IMCC; before fusion, the scores of ENNlarge (notice that the ensemble ENNlarge is obtained by average rule) were normalized since it has a different range of values compared to IMCC. Normalization was performed as $ENNlarge=\left(ENNlarge-0.5\right)\times 2$, the classification threshold of the the ensemble is simply set to zero. The scores of IMCC were weighted by a factor of w.
- ENNlarge + w × IMCC + TB is the same as the previous fusion, but TB is included in the ensemble. Before fusion, the scores of TB were normalized since it has a different range of values compared to IMCC. Normalization was performed as $TB=\left(TB-0.5\right)\times 2$.
- StoLSTM_GRU + IMCC + TB is the sum rule among StoLSTM_GRU, IMCC, and TB. StoLSTM_GRU and TB are normalized before the fusion as $StoLST{M}_{G}RU=\left(StoLST{M}_{G}RU-0.5\right)\times 2;TB=\left(TB-0.5\right)\times 2$.

- GRU/TCN-based methods work poorly on very sparse datasets (i.e., on Arts, Liu, bibtex, enron, and health);
- StoGRU_TCN outperforms the other ensembles based on GRU/TCN; StoGRU and StoTCN perform similarly;
- StoLSTM_GRU works very well on sparse datasets. On datasets that are not sparse, the performance is similar to that obtained by the other methods based on GRU/TCN. StoLSTM_GRU average performance is higher than that obtained by IMCC;
- ENNlarge outperforms each method from which it was built;
- ENNlarge + 3 × IMCC+TB outperforms ENNlarge+3×IMCC with a p-value 0.09;
- StoLSTM_GRU + IMCC + TB is the best choice for sparse datasets;
- ENNlarge + 3 × IMCC + TB tops or equals IMCC in all the datasets (note that ENN+IMCC+TB and StoLSTM_GRU+IMCC+TB have performance equal to or lower than IMCC in some datasets). ENNlarge + 3 × IMCC + TB is our suggested approach.
- In the following tests, we simplify the names of our best ensembles to reduce clutter:
- Ens refers to ENNlarge + 3 × IMCC+TB;
- EnsSparse refers to StoLSTM_GRU + IMCC + TB.

## 7. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Galindo, E.G.; Ventura, S. Multi label learning: A review of the state of the art and ongoing research. Wiley Interdiscip. Rev. Data Min. Knowl. Discov.
**2014**, 4, 411–444. [Google Scholar] [CrossRef] - Cheng, X.; Lin, W.-Z.; Xiao, X.; Chou, K.-C. pLoc_bal-mAnimal: Predict subcellular localization of animal proteins by balancing training dataset and PseAAC. Bioinformatics
**2018**, 35, 398–406. [Google Scholar] [CrossRef] [PubMed] - Chen, L.; Li, Z.; Zeng, T.; Zhang, Y.-H.; Li, H.; Huang, T.; Cai, Y.-D. Predicting gene phenotype by multi-label multi-class model based on essential functional features. Mol. Genet. Genom. MGG
**2021**, 296, 905–918. [Google Scholar] [CrossRef] [PubMed] - Shao, Y.; Chou, K. pLoc_Deep-mAnimal: A Novel Deep CNN-BLSTM Network to Predict Subcellular Localization of Animal Proteins. Nat. Sci.
**2020**, 12, 281–291. [Google Scholar] [CrossRef] - Shu, S.; Lv, F.; Feng, L.; Huang, J.; He, S.; He, J.; Li, L. Incorporating Multiple Cluster Centers for Multi-Label Learning. arXiv
**2020**, arXiv:2004.08113. [Google Scholar] [CrossRef] - Ibrahim, M.; Khan, M.U.G.; Mehmood, F.; Asim, M.; Mahmood, W. GHS-NET a generic hybridized shallow neural network for multi-label biomedical text classification. J. Biomed. Inform.
**2021**, 116, 103699. [Google Scholar] [CrossRef] - Ravanelli, M.; Brakel, P.; Omologo, M.; Bengio, Y. Light Gated Recurrent Units for Speech Recognition. IEEE Trans. Emerg. Top. Comput. Intell.
**2018**, 2, 92–102. [Google Scholar] [CrossRef] [Green Version] - Kim, Y.; Kim, J. Human-Like Emotion Recognition: Multi-Label Learning from Noisy Labeled Audio-Visual Expressive Speech. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018. [Google Scholar]
- Messaoud, M.B.; Jenhani, I.; Jemaa, N.B.; Mkaouer, M.W. A Multi-label Active Learning Approach for Mobile App User Review Classification. In Proceedings of the KSEM, Athens, Greece, 28–30 August 2019. [Google Scholar]
- Singh, J.P.; Nongmeikapam, K. Negative Comments Multi-Label Classification. In Proceedings of the 2020 International Conference on Computational Performance Evaluation (ComPE), Shillong, India, 2–4 July 2020; pp. 379–385. [Google Scholar] [CrossRef]
- Boutell, M.; Luo, J.; Shen, X.; Brown, C.M. Learning multi-label scene classification. Pattern Recognit.
**2004**, 37, 1757–1771. [Google Scholar] [CrossRef] [Green Version] - Tsoumakas, G.; Katakis, I.; Vlahavas, I. Mining Multi-label Data. In Data Mining and Knowledge Discovery Handbook; Springer: New York, NY, USA, 2020; pp. 667–685. [Google Scholar]
- Read, J.; Pfahringer, B.; Holmes, G.; Frank, E. Classifier chains for multi-label classification. Mach Learn.
**2011**, 85, 333–359. [Google Scholar] [CrossRef] - Qian, W.; Xiong, C.; Wang, Y. A ranking-based feature selection for multi-label classification with fuzzy relative discernibility. Appl. Soft Comput.
**2021**, 102, 106995. [Google Scholar] [CrossRef] - Zhang, W.; Liu, F.; Luo, L.; Zhang, J. Predicting drug side effects by multi-label learning and ensemble learning. BMC Bioinform.
**2015**, 16, 365. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Huang, J.; Li, G.; Huang, Q.; Wu, X. Joint Feature Selection and Classification for Multilabel Learning. IEEE Trans. Cybern.
**2018**, 48, 876–889. [Google Scholar] [CrossRef] [PubMed] - Huang, J.; Li, G.; Huang, Q.; Wu, X. Learning Label-Specific Features and Class-Dependent Labels for Multi-Label Classification. IEEE Trans. Knowl. Data Eng.
**2016**, 28, 3309–3323. [Google Scholar] [CrossRef] - Wang, P.; Ge, R.; Xiao, X.; Zhou, M.; Zhou, F. hMuLab: A Biomedical Hybrid MUlti-LABel Classifier Based on Multiple Linear Regression. IEEE/ACM Trans. Comput. Biol. Bioinform.
**2017**, 14, 1173–1180. [Google Scholar] [CrossRef] - Tsoumakas, G.; Katakis, I.; Vlahavas, I. Random k-Labelsets for Multilabel Classification. IEEE Trans. Knowl. Data Eng.
**2011**, 23, 1079–1089. [Google Scholar] [CrossRef] - Yang, Y.; Jiang, J. Adaptive Bi-Weighting Toward Automatic Initialization and Model Selection for HMM-Based Hybrid Meta-Clustering Ensembles. IEEE Trans. Cybern.
**2019**, 49, 1657–1668. [Google Scholar] [CrossRef] - Moyano, J.M.; Galindo, E.G.; Cios, K.; Ventura, S. Review of ensembles of multi-label classifiers: Models, experimental study and prospects. Inf. Fusion
**2018**, 44, 33–45. [Google Scholar] [CrossRef] - Xia, Y.; Chen, K.; Yang, Y. Multi-label classification with weighted classifier selection and stacked ensemble. Inf. Sci.
**2021**, 557, 421–442. [Google Scholar] [CrossRef] - Moyano, J.M.; Galindo, E.G.; Cios, K.; Ventura, S. An evolutionary approach to build ensembles of multi-label classifiers. Inf. Fusion
**2019**, 50, 168–180. [Google Scholar] [CrossRef] - Wang, R.; Kwong, S.; Wang, X.; Jia, Y. Active k-labelsets ensemble for multi-label classification. Pattern Recognit.
**2021**, 109, 107583. [Google Scholar] [CrossRef] - DeepDetect. DeepDetect. Available online: https://www.deepdetect.com/ (accessed on 1 January 2021).
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition; Cornell University: Ithaca, NY, USA, 2014. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, Inception-Resnet and the Impact of Residual Connections on Learning; Cornell University: Ithaca, NY, USA, 2016; pp. 1–12. Available online: https://arxiv.org/pdf/1602.07261.pdf (accessed on 1 January 2021).
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv
**2018**, arXiv:1804.02767. [Google Scholar] - Howard, J.; Gugger, S. Fastai: A Layered API for Deep Learning. Information
**2020**, 11, 108. [Google Scholar] [CrossRef] [Green Version] - Imagga. Imagga Website. Available online: https://imagga.com/solutions/auto-tagging (accessed on 1 January 2021).
- Wolfram. Wolfram Alpha: Image Identification Project. Available online: https://www.imageidentify.com/ (accessed on 1 January 2020).
- Clarifai. Clarifai Website. Available online: https://www.clarifai.com/ (accessed on 1 January 2021).
- Microsoft. Computer-Vision API Website. Available online: https://www.microsoft.com/cognitive-services/en-us/computer-vision-api (accessed on 1 January 2021).
- IBM. Visual Recognition. Available online: https://www.ibm.com/watson/services/visual-recognition/ (accessed on 1 January 2020).
- Google. Google Cloud Vision. Available online: https://cloud.google.com/vision/ (accessed on 1 January 2021).
- Kubany, A.; Ishay, S.B.; Ohayon, R.-s.; Shmilovici, A.; Rokach, L.; Doitshman, T. Comparison of state-of-the-art deep learning APIs for image multi-label classification using semantic metrics. Expert Syst. Appl.
**2020**, 161, 113656. [Google Scholar] [CrossRef] - Ho, T.K. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell.
**1998**, 20, 832–844. [Google Scholar] - Li, D.; Wu, H.; Zhao, J.; Tao, Y.; Fu, J. Automatic Classification System of Arrhythmias Using 12-Lead ECGs with a Deep Neural Network Based on an Attention Mechanism. Symmetry
**2020**, 12, 1827. [Google Scholar] [CrossRef] - Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput.
**1997**, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed] - Cho, K.; Merrienboer, B.V.; Gülçehre, Ç.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder Decoder for Statistical Machine Translation. In Proceedings of the EMNLP, Varna, Bulgaria, 25–29 October 2014; pp. 25–32. [Google Scholar]
- Lea, C.S.; Flynn, M.D.; Vidal, R.; Reiter, A.; Hager, G. Temporal Convolutional Networks for Action Segmentation and Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1003–1012. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv
**2015**, arXiv:1412.6980. [Google Scholar] - Nanni, L.; Lumini, A.; Manfe, A.; Brahnam, S.; Venturin, G. Gated recurrent units and temporal convolutional network for multilabel classification. arXiv
**2021**, arXiv:2110.04414. [Google Scholar] - Zhang, M.-L.; Zhou, Z.-H. Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization. IEEE Trans. Knowl. Data Eng.
**2006**, 18, 1338–1351. [Google Scholar] [CrossRef] [Green Version] - Stivaktakis, R.; Tsagkatakis, G.; Tsakalides, P. Deep Learning for Multilabel Land Cover Scene Categorization Using Data Augmentation. IEEE Geosci. Remote Sens. Lett.
**2019**, 16, 1031–1035. [Google Scholar] [CrossRef] - Zhu, H.; Cheng, C.; Yin, H.; Li, X.; Zuo, P.; Ding, J.; Lin, F.; Wang, J.; Zhou, B.; Li, Y.; et al. Automatic multilabel electrocardiogram diagnosis of heart rhythm or conduction abnormalities with deep learning: A cohort study. Lancet. Digit. Health
**2020**, 2, e348–e357. [Google Scholar] [CrossRef] [PubMed] - Navamajiti, N.; Saethang, T.; Wichadakul, D. McBel-Plnc: A Deep Learning Model for Multiclass Multilabel Classification of Protein-lncRNA Interactions. In Proceedings of the 2019 6th International Conference on Biomedical and Bioinformatics Engineering (ICBBE’19), Shanghai, China, 13–15 November 2019. [Google Scholar]
- Namazi, B.; Sankaranarayanan, G.; Devarajan, V. LapTool-Net: A Contextual Detector of Surgical Tools in Laparoscopic Videos Based on Recurrent Convolutional Neural Networks. arXiv
**2019**, arXiv:1905.08983. [Google Scholar] - Zhou, X.; Li, Y.; Liang, W. CNN-RNN Based Intelligent Recommendation for Online Medical Pre-Diagnosis Support. IEEE/ACM Trans. Comput. Biol. Bioinform.
**2021**, 18, 912–921. [Google Scholar] [CrossRef] [PubMed] - Samy, A.E.; El-Beltagy, S.R.; Hassanien, E. A Context Integrated Model for Multi-label Emotion Detection. Procedia Comput. Sci.
**2018**, 142, 61–71. [Google Scholar] [CrossRef] - Zhang, J.; Chen, Q.; Liu, B. NCBRPred: Predicting nucleic acid binding residues in proteins based on multilabel learning. Brief. Bioinform.
**2021**, 22, bbaa397. [Google Scholar] [CrossRef] - Turnbull, D.; Barrington, L.; Torres, D.A.; Lanckriet, G. Semantic Annotation and Retrieval of Music and Sound Effects. IEEE Trans. Audio Speech Lang. Process.
**2008**, 16, 467–476. [Google Scholar] [CrossRef] [Green Version] - Zhang, M.-L.; Zhou, Z. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognit.
**2007**, 40, 2038–2048. [Google Scholar] [CrossRef] [Green Version] - Elisseeff, A.; Weston, J. A Kernel Method for Multi-Labelled Classification; MIT Press Direct: Cambridge, MA, USA, 2001. [Google Scholar] [CrossRef] [Green Version]
- Chen, L. Predicting anatomical therapeutic chemical (ATC) classification of drugs by integrating chemical-chemical interactions and similarities. PLoS ONE
**2012**, 7, e35254. [Google Scholar] [CrossRef] [Green Version] - Nanni, L.; Lumini, A.; Brahnam, S. Neural networks for anatomical therapeutic chemical (ATC) classification. Appl. Comput. Inform.
**2022**. Available online: https://www.emerald.com/insight/content/doi/10.1108/ACI-11-2021-0301/full/html (accessed on 1 January 2021). [CrossRef] - Chou, K.C. Some remarks on predicting multi-label attributes in molecular biosystems. Mol. Biosyst.
**2013**, 9, 10922–11100. [Google Scholar] [CrossRef] - Su, Y.; Huang, Y.; Kuo, C.-C.J. On Extended Long Short-term Memory and Dependent Bidirectional Recurrent Neural Network. Neurocomputing
**2019**, 356, 151–161. [Google Scholar] [CrossRef] [Green Version] - Gers, F.; Schmidhuber, J.; Cummins, F. Learning to Forget: Continual Prediction with LSTM. Neural Comput.
**2000**, 12, 2451–2471. [Google Scholar] [CrossRef] [PubMed] - Chung, J.; Gülçehre, Ç.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv
**2014**, arXiv:1412.3555. [Google Scholar] - Jing, L.; Gülçehre, Ç.; Peurifoy, J.; Shen, Y.; Tegmark, M.; Soljb, M.; Bengio, Y. Gated Orthogonal Recurrent Units: On Learning to Forget. Neural Comput.
**2019**, 31, 765–783. [Google Scholar] [CrossRef] - Zhang, K.; Liu, Z.; Zheng, L. Short-Term Prediction of Passenger Demand in Multi-Zone Level: Temporal Convolutional Neural Network With Multi-Task Learning. IEEE Trans. Intell. Transp. Syst.
**2020**, 21, 1480–1490. [Google Scholar] [CrossRef] - Jain, A.K.; Murty, M.N.; Flynn, P.J. Data clustering: A review. ACM Comp. Surv.
**1999**, 31, 264–323. [Google Scholar] [CrossRef] - Dubey, S.; Chakraborty, S.; Roy, S.K.; Mukherjee, S.; Singh, S.K.; Chaudhuri, B. diffGrad: An Optimization Method for Convolutional Neural Networks. IEEE Trans. Neural Netw. Learn. Syst.
**2020**, 31, 4500–4511. [Google Scholar] [CrossRef] [Green Version] - Nanni, L.; Maguolo, G.; Lumini, A. Exploiting Adam-like Optimization Algorithms to Improve the Performance of Convolutional Neural Networks. arXiv
**2021**, arXiv:2103.14689. [Google Scholar] - Nanni, L.; Manfe, A.; Maguolo, G.; Lumini, A.; Brahnam, S. High performing ensemble of convolutional neural networks for insect pest image detection. arXiv
**2021**, arXiv:2108.12539. [Google Scholar] [CrossRef] - Smith, L.N. Cyclical Learning Rates for Training Neural Networks. In Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, USA, 24–34 March 2017; pp. 464–472. [Google Scholar]
- Bogatinovski, J.; Todorovski, L.; Džeroski, S.; Kocev, D. Comprehensive comparative study of multi-label classification methods. Expert Syst. Appl.
**2022**, 203, 117215. [Google Scholar] [CrossRef] - Liu, M.; Wu, Y.; Chen, Y.; Sun, J.; Zhao, Z.; Chen, X.-w.; Matheny, M.; Xu, H. Large-scale prediction of adverse drug reactions using chemical, biological, and phenotypic properties of drugs. J. Am. Med. Inform. Assoc. JAMIA
**2012**, 19, e28–e35. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Yang, L.; Wu, X.-Z.; Jiang, Y.; Zhou, Z. Multi-Label Learning with Deep Forest. arXiv
**2020**, arXiv:1911.06557. [Google Scholar] - Nakano, F.K.; Pliakos, K.; Vens, C. Deep tree-ensembles for multi-output prediction. Pattern Recognit
**2022**, 121, 108211. [Google Scholar] [CrossRef] - Fu, X.; Li, D.; Zhai, Y. Multi-label learning with kernel local label information. Expert Syst. Appl.
**2022**, 207, 118027. [Google Scholar] [CrossRef] - Yu, Z.B.; Zhang, M.L. Multi-Label Classification With Label-Specific Feature Generation: A Wrapped Approach. IEEE Trans. Pattern Anal. Mach. Intell.
**2022**, 44, 5199–5210. [Google Scholar] [CrossRef] - Li, X.; Zhang, T.; Wang, S.; Zhu, G.; Wang, R.; Chang, T.-H. Large-Scale Bandwidth and Power Optimization for Multi-Modal Edge Intelligence Autonomous Driving. arXiv
**2022**, arXiv:2210.09659. [Google Scholar] - Asif, U.; Tang, J.; Harrer, S. Ensemble knowledge distillation for learning improved and efficient networks. arXiv
**2019**, arXiv:1909.08097. [Google Scholar]

**Table 1.**Summary of existing deep learning studies. An X indicates whether a particular technique or model was used.

[47] | [48] | [49] | [50] | [2] | [51] | [52] | [53] | [54] | [41] | [9] | Here | |
---|---|---|---|---|---|---|---|---|---|---|---|---|

Ensemble | X | X | X | X | ||||||||

GRU | X | X | X | X | X | |||||||

CNN | X | X | X | X | X | |||||||

LSTM | X | X | X | |||||||||

TCN | X | |||||||||||

RNN | X | |||||||||||

Inception model | X | |||||||||||

Random forest | X | X | ||||||||||

Gradient descend | X | X | ||||||||||

Stochastic gradient descend | X | X | ||||||||||

Data augmentation | X | X | X | X | ||||||||

Multiple loss functions | X | |||||||||||

Binary cross entropy (BCE) loss | X | X | X | X | X | |||||||

Multiclass cross entropy loss | X | |||||||||||

Logistic regression | X | |||||||||||

Multiple linear regression | X | |||||||||||

Quantile regression | X | X | ||||||||||

Various optimization methods | X | |||||||||||

Adagrad optimization | X | |||||||||||

Variants of Adam optimization | X | X | X |

Name | #Patterns | #Features | #Labels | LCard |
---|---|---|---|---|

CAL500 | 502 | 68 | 174 | 26.044 |

Image | 2000 | 294 | 5 | 1.236 |

Scene | 2407 | 294 | 5 | 1.074 |

Yeast | 2417 | 103 | 14 | 4.24 |

Arts | 5000 | 462 | 26 | 1.636 |

ATC | 3883 | 42 | 14 | 1.265 |

ATC_f | 3883 | 700 | 14 | 1.265 |

Liu | 832 | 2892 | 1385 | 71.160 |

mAn | 3916 | 20 | 20 | 1.650 |

bibtex | 7395 | 1836 | 159 | 2.402 |

enron | 1702 | 1001 | 53 | 3.378 |

health | 5000 | 612 | 32 | 1.662 |

Name | Hidden Units | #Classifiers | #Epoch | Optimizer |
---|---|---|---|---|

Adam_sa | 50 | 1 | 50 | Adam |

Adam_10s | 50 | 10 | 50 | Adam |

Adam_10 | 50 | 10 | 150 | Adam |

DG_10 | 50 | 10 | 150 | DGrad |

Cos_10 | 50 | 10 | 150 | Cos |

Exp_10 | 50 | 10 | 150 | Exp |

Sto_10 | 50 | 10 | 150 | Sto |

DG_Cos = DG_10 + Cos_10 | 50 | 20 | 150 | DGrad, Cos |

DG_Cos_Exp = DG_10 + Cos_10 + Exp_10 | 50 | 30 | 150 | DGrad, Cos, Exp |

DG_Cos_Exp_Sto = DG_10 + Cos_10 + Exp_10 + Sto_10 | 50 | 40 | 150 | DGrad, Cos, Exp, Sto |

StoGRU | 50 | 40 | 150 | DGrad, Cos, Exp, Sto |

StoGRU_B | 50 | 40 | 150 | DGrad, Cos, Exp, Sto |

StoTCN | --- | 40 | 100 | DGrad, Cos, Exp, Sto |

StoTCN_B | --- | 40 | 100 | DGrad, Cos, Exp, Sto |

StoGRU_TCN | 50 (GRU) | 40 | 150 | DGrad, Cos, Exp, Sto |

StoLSTM_GRU | 125 (LSTM layer) 100 (GRU layer) | 40 | 150 | DGrad, Cos, Exp, Sto |

ENNbase = StoTCN+StoGRU | 50 (GRU) | 80 | 100 (TCN)/150 (GRU) | DGrad, Cos, Exp, Sto |

ENN = StoTCN + StoGRU + StoTCN_B + StoGRU_B + StoGRU_TCN | 50 (GRU, GRU_B & GRU_TCN) | 200 | 100 (TCN & TCN_B)/150 (GRU, GRU_B & GRU_TCN) | DGrad, Cos, Exp, Sto |

ENNlarge = StoTCN + StoGRU + StoTCN_B + StoGRU_B + StoGRU_TCN + StoLSTM_GRU | 50 (GRU, GRU_B & GRU_TCN) 100/125 (LSTM_GRU) | 240 | 100 (TCN & TCN_B)/150 (GRU, GRU_B, LSTM_GRU & GRU_TCN) | DGrad, Cos, Exp, Sto |

**Table 4.**Ablation study showing that StoGRU is the best method among the approaches based on GRU_A (other approaches produced similar results).

Method | Comparison |
---|---|

Adam_10s | Outperforms Adam_sa with a p-value of 0.0156 |

Adam_10 | Outperforms Adam_sa with a p-value of 0.0172 Same performance of Adam_10s |

DG_10 | Outperforms Adam_10 with a p-value of 0.0064 |

Cos_10 | Outperforms Adam_10 with a p-value of 0.0137 |

Exp_10 | Outperforms Adam_10 with a p-value of 0.0016 |

Sto_10 | Outperforms Adam_10 with a p-value of 0.1014 |

DG_Cos_Exp_Sto | Outperforms Exp_10 (the best of the approaches reported above in this table) with a p-value of 0.0134 |

StoGRU | Outperforms DG_Cos_Exp_Sto with a p-value of 0.0922 |

**Table 5.**Average precision of the ensembles and state of the art on the twelve benchmarks (boldface values indicate the best performance within each group of similar approaches). Bold highlights superior performance.

Average Precision | Cal500 | Image | Scene | Yeast | Arts | ATC | ATC_f | Liu | mAn | Bibtex | Enron | Health | Ave |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

IMCC | 0.502 | 0.836 | 0.904 | 0.773 | 0.619 | 0.866 | 0.922 | 0.523 | 0.978 | 0.623 | 0.714 | 0.781 | 0.753 |

TB | 0.489 | 0.844 | 0.873 | 0.778 | 0.625 | 0.882 | 0.897 | 0.518 | 0.983 | 0.572 | 0.701 | 0.753 | 0.743 |

StoGRU | 0.498 | 0.851 | 0.911 | 0.740 | 0.561 | 0.872 | 0.872 | 0.485 | 0.979 | 0.403 | 0.680 | 0.739 | 0.715 |

StoGRU_B | 0.490 | 0.861 | 0.908 | 0.741 | 0.555 | 0.877 | 0.848 | 0.485 | 0.978 | 0.400 | 0.688 | 0.724 | 0.712 |

StoTCN | 0.498 | 0.847 | 0.913 | 0.764 | 0.506 | 0.882 | 0.900 | 0.498 | 0.977 | 0.406 | 0.669 | 0.710 | 0.714 |

StoTCN_B | 0.497 | 0.855 | 0.917 | 0.765 | 0.541 | 0.883 | 0.903 | 0.505 | 0.976 | 0.404 | 0.666 | 0.732 | 0.720 |

StoGRU_TCN | 0.491 | 0.852 | 0.916 | 0.752 | 0.592 | 0.890 | 0.913 | 0.510 | 0.977 | 0.354 | 0.674 | 0.764 | 0.724 |

StoLSTM_GRU | 0.493 | 0.839 | 0.901 | 0.771 | 0.633 | 0.888 | 0.912 | 0.541 | 0.978 | 0.618 | 0.702 | 0.790 | 0.756 |

ENNbase | 0.502 | 0.855 | 0.922 | 0.756 | 0.552 | 0.888 | 0.916 | 0.497 | 0.979 | 0.417 | 0.687 | 0.735 | 0.726 |

ENN | 0.499 | 0.859 | 0.924 | 0.762 | 0.582 | 0.893 | 0.916 | 0.505 | 0.979 | 0.424 | 0.689 | 0.749 | 0.732 |

ENNlarge | 0.498 | 0.860 | 0.923 | 0.776 | 0.628 | 0.892 | 0.926 | 0.520 | 0.979 | 0.534 | 0.708 | 0.780 | 0.752 |

ENNlarge + IMCC | 0.502 | 0.853 | 0.920 | 0.784 | 0.633 | 0.883 | 0.926 | 0.526 | 0.979 | 0.627 | 0.717 | 0.790 | 0.762 |

ENNlarge + 3 × IMCC | 0.503 | 0.847 | 0.913 | 0.778 | 0.628 | 0.875 | 0.925 | 0.526 | 0.979 | 0.626 | 0.718 | 0.787 | 0.759 |

ENNlarge + IMCC + TB | 0.500 | 0.856 | 0.913 | 0.784 | 0.641 | 0.889 | 0.927 | 0.526 | 0.982 | 0.622 | 0.718 | 0.788 | 0.762 |

ENNlarge + 3 × IMCC + TB | 0.502 | 0.850 | 0.910 | 0.783 | 0.637 | 0.880 | 0.926 | 0.527 | 0.981 | 0.627 | 0.717 | 0.787 | 0.761 |

StoLSTM_GRU + IMCC + TB | 0.500 | 0.851 | 0.898 | 0.786 | 0.641 | 0.883 | 0.920 | 0.538 | 0.983 | 0.635 | 0.727 | 0.800 | 0.764 |

**Table 6.**Comparison of IMCC and our proposed ensemble ENS using five performance indicators. Bold highlights superior performance.

One Error ↓ | Hamming Loss ↓ | Ranking Loss ↓ | Coverage ↓ | Avg Precision ↑ | |
---|---|---|---|---|---|

Cal500-IMCC | 0.150 | 0.134 | 0.182 | 0.736 | 0.502 |

Cal500-Ens | 0.150 | 0.134 | 0.179 | 0.729 | 0.502 |

Image-IMCC | 0.237 | 0.150 | 0.138 | 0.173 | 0.836 |

Image-Ens | 0.225 | 0.147 | 0.127 | 0.159 | 0.850 |

scene-IMCC | 0.164 | 0.070 | 0.053 | 0.062 | 0.904 |

scene-Ens | 0.151 | 0.067 | 0.047 | 0.057 | 0.910 |

Yest-IMCC | 0.220 | 0.185 | 0.165 | 0.448 | 0.773 |

Yest-Ens | 0.211 | 0.178 | 0.155 | 0.433 | 0.783 |

Arts-IMCC | 0.438 | 0.054 | 0.164 | 0.242 | 0.619 |

Arts-Ens | 0.431 | 0.053 | 0.144 | 0.219 | 0.637 |

Bibtex-IMCC | 0.336 | 0.012 | 0.079 | 0.158 | 0.623 |

Bibtex-Ens | 0.338 | 0.012 | 0.072 | 0.143 | 0.627 |

Enron-IMCC | 0.226 | 0.044 | 0.072 | 0.211 | 0.714 |

Enron-Ens | 0.226 | 0.044 | 0.069 | 0.204 | 0.717 |

Health-IMCC | 0.266 | 0.035 | 0.052 | 0.107 | 0.781 |

Health-Ens | 0.262 | 0.035 | 0.046 | 0.097 | 0.787 |

**Table 7.**Performance comparison of Ens and IMCC on the mAn data set. Bold highlights superior performance.

mAn | Aiming | Coverage | Accuracy | Absolute True | Absolute False |
---|---|---|---|---|---|

[2] | 88.31 | 85.06 | 84.34 | 78.78 | 0.07 |

[4] | 96.21 | 97.77 | 95.46 | 92.26 | 0.00 |

IMCC | 92.80 | 92.02 | 88.83 | 80.76 | 1.43 |

Ens | 93.84 | 93.06 | 90.24 | 83.64 | 1.20 |

**Table 8.**Comparisons with results reported in a recent survey [71] using average precision as the performance indicator. Bold highlights superior performance.

EPS | CDE | MLkNN | MLARM | BR | DEEP1 | PCT | HOMER | AdaBoost.MH | BPNN | RAkEL | CLR | RFPCT | PSt | TREMLC | RFDTBR | MBR | CDN | ECCJ48 | EBRJ48 | DEEP4 | RSLP | CLEMS | Ens | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

bibtex | 0.466 | 0.414 | 0.161 | 0.423 | 0.350 | 0.335 | 0.016 | 0.316 | 0.472 | 0.434 | 0.081 | 0.463 | 0.515 | 0.538 | 0.483 | 0.559 | 0.265 | 0.231 | 0.492 | 0.546 | 0.258 | 0.491 | 0.183 | 0.627 |

Cal500 | 0.440 | 0.411 | 0.441 | 0.352 | 0.236 | 0.489 | 0.497 | 0.288 | 0.475 | 0.508 | 0.271 | 0.355 | 0.520 | 0.485 | 0.497 | 0.500 | 0.381 | 0.292 | 0.427 | 0.458 | 0.329 | 0.490 | 0.446 | 0.502 |

enron | 0.622 | 0.580 | 0.517 | 0.529 | 0.512 | 0.624 | 0.531 | 0.388 | 0.627 | 0.642 | 0.447 | 0.623 | 0.683 | 0.629 | 0.675 | 0.686 | 0.498 | 0.485 | 0.623 | 0.675 | 0.537 | 0.548 | 0.538 | 0.717 |

scene | 0.789 | 0.812 | 0.785 | 0.715 | 0.790 | 0.686 | 0.745 | 0.753 | 0.880 | 0.855 | 0.851 | 0.889 | 0.868 | 0.887 | 0.856 | 0.874 | 0.711 | 0.501 | 0.813 | 0.856 | 0.810 | 0.797 | 0.850 | 0.910 |

yeast | 0.745 | 0.718 | 0.704 | 0.598 | 0.663 | 0.701 | 0.732 | 0.693 | 0.711 | 0.761 | 0.693 | 0.710 | 0.762 | 0.766 | 0.760 | 0.763 | 0.576 | 0.437 | 0.719 | 0.740 | 0.709 | 0.748 | 0.715 | 0.787 |

**Table 9.**Other comparisons with the literature using average precision. Bold highlights superior performance.

Average Precision | Cal500 | Image | Scene | Yeast | Arts | ATC | ATC_f | Liu | mAn | Bibtex | Enron | Health |
---|---|---|---|---|---|---|---|---|---|---|---|---|

Ens | 0.503 | 0.849 | 0.912 | 0.780 | 0.632 | 0.878 | 0.926 | 0.527 | 0.981 | 0.626 | 0.717 | 0.788 |

EnsSparse | 0.500 | 0.851 | 0.898 | 0.786 | 0.641 | 0.883 | 0.920 | 0.538 | 0.983 | 0.635 | 0.727 | 0.800 |

FastAi [32] | 0.425 | 0.824 | 0.899 | 0.718 | 0.588 | 0.860 | 0.908 | 0.414 | 0.976 | --- | --- | --- |

IMCC | 0.502 | 0.836 | 0.904 | 0.773 | 0.619 | 0.866 | 0.922 | 0.523 | 0.978 | 0.623 | 0.714 | 0.781 |

hML | 0.453 | 0.810 | 0.885 | 0.792 | 0.538 | 0.831 | 0.854 | 0.433 | 0.965 | --- | --- | --- |

ECC [5] | 0.491 | 0.797 | 0.857 | 0.756 | 0.617 | --- | --- | --- | --- | 0.617 | 0.657 | 0.719 |

MAHR [5] | 0.441 | 0.801 | 0.861 | 0.745 | 0.524 | --- | --- | --- | --- | 0.524 | 0.641 | 0.715 |

LLSF [5] | 0.501 | 0.789 | 0.847 | 0.617 | 0.627 | --- | --- | --- | --- | 0.627 | 0.703 | 0.780 |

JFSC [5] | 0.501 | 0.789 | 0.836 | 0.762 | 0.597 | --- | --- | --- | --- | 0.597 | 0.643 | 0.751 |

LIFT [5] | 0.496 | 0.789 | 0.859 | 0.766 | 0.627 | --- | --- | --- | --- | 0.627 | 0.684 | 0.708 |

[15] | --- | --- | --- | --- | --- | --- | --- | 0.513 | --- | --- | --- | --- |

[72] | --- | --- | --- | --- | --- | --- | --- | 0.261 | --- | --- | --- | --- |

hMuLab [18] | --- | --- | --- | 0.778 | --- | --- | --- | --- | --- | --- | --- | --- |

MlKnn [18] | --- | --- | --- | 0.766 | --- | --- | --- | --- | --- | --- | --- | --- |

RaKel [18] | --- | --- | --- | 0.715 | --- | --- | --- | --- | --- | --- | --- | --- |

ClassifierChain [18] | --- | --- | --- | 0.624 | --- | --- | --- | --- | --- | --- | --- | --- |

IBLR [18] | --- | --- | --- | 0.768 | --- | --- | --- | --- | --- | --- | --- | --- |

MLDF [73] | 0.512 | 0.842 | 0.891 | 0.770 | --- | --- | --- | --- | --- | --- | 0.742 | --- |

RF_PCT [73] | 0.512 | 0.829 | 0.873 | 0.758 | --- | --- | --- | --- | --- | --- | 0.729 | --- |

DBPNN [73] | 0.495 | 0.672 | 0.563 | 0.738 | --- | --- | --- | --- | --- | --- | 0.679 | --- |

MLFE [73] | 0.488 | 0.817 | 0.882 | 0.759 | --- | --- | --- | --- | --- | --- | 0.656 | --- |

ECC [73] | 0.482 | 0.739 | 0.853 | 0.752 | --- | --- | --- | --- | --- | --- | 0.646 | --- |

RAKEL [73] | 0.353 | 0.788 | 0.843 | 0.720 | --- | --- | --- | --- | --- | --- | 0.596 | --- |

[14] | --- | --- | --- | 0.758 | --- | --- | --- | --- | --- | --- | --- | --- |

[74] | 0.484 | --- | --- | 0.740 | --- | --- | --- | --- | --- | --- | --- | --- |

[75] | --- | --- | --- | 0.775 | 0.636 | --- | --- | --- | --- | --- | --- | --- |

Wrap [76] | 0.520 | --- | 0.832 | 0.761 | --- | --- | --- | --- | --- | 0.578 | 0.710 | --- |

Wrap^{k} [76] | 0.518 | --- | 0.892 | 0.781 | --- | --- | --- | --- | --- | 0.571 | 0.720 | --- |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Nanni, L.; Trambaiollo, L.; Brahnam, S.; Guo, X.; Woolsey, C.
Ensemble of Networks for Multilabel Classification. *Signals* **2022**, *3*, 911-931.
https://doi.org/10.3390/signals3040054

**AMA Style**

Nanni L, Trambaiollo L, Brahnam S, Guo X, Woolsey C.
Ensemble of Networks for Multilabel Classification. *Signals*. 2022; 3(4):911-931.
https://doi.org/10.3390/signals3040054

**Chicago/Turabian Style**

Nanni, Loris, Luca Trambaiollo, Sheryl Brahnam, Xiang Guo, and Chancellor Woolsey.
2022. "Ensemble of Networks for Multilabel Classification" *Signals* 3, no. 4: 911-931.
https://doi.org/10.3390/signals3040054