Multi-Label Classification Algorithm for Adaptive Heterogeneous Classifier Group

Han, Meng; Yang, Shurong; Wu, Hongxin; Ding, Jian

doi:10.3390/math13010103

Open AccessArticle

Multi-Label Classification Algorithm for Adaptive Heterogeneous Classifier Group

by

Meng Han

,

Shurong Yang

,

Hongxin Wu

and

Jian Ding

^*

School of Computer Science and Engineering, North Minzu University, Yinchuan 750021, China

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(1), 103; https://doi.org/10.3390/math13010103

Submission received: 8 November 2024 / Revised: 24 December 2024 / Accepted: 26 December 2024 / Published: 30 December 2024

(This article belongs to the Section E1: Mathematics and Computer Science)

Download

Browse Figures

Versions Notes

Abstract

Ensemble classification is widely used in multi-label algorithms, and it can be divided into homogeneous ensembles and heterogeneous ensembles according to classifier types. A heterogeneous ensemble can generate classifiers with better diversity than a homogeneous ensemble and improve the performance of classification results. An Adaptive Heterogeneous Classifier Group (AHCG) algorithm is proposed. The AHCG first proposes the concept of a Heterogeneous Classifier Group (HCG); that is, two groups of different ensemble classifiers are used in the testing and training phases. Secondly, the Adaptive Selection Strategy (ASS) is proposed, which can select the ensemble classifiers to be used in the test phase. The least squares method is used to calculate the weights of the base classifiers for the in-group classifiers and dynamically update the base classifiers according to the weights. A large number of experiments on seven datasets show that this algorithm has better performance than most existing ensemble classification algorithms in terms of its accuracy, example-based F1 value, micro-averaged F1 value, and macro-averaged F1 value.

Keywords:

multi-label classification; heterogeneous ensemble; heterogeneous classifier group; adaptive selection strategy; dynamic update

MSC:

68T09

1. Introduction

The traditional supervised learning task is based on single-label classification algorithms; that is, one data instance classifies one attribute. But, in reality, many problems have multiple attributes. Multi-label classification can mainly be applied to text classification [1], medical diagnostic classification [2], protein classification [3], music [4] or video classification [5], etc. For example, in medical diagnostic classification, a patient can have both diabetes and hypertension.

Given a d-dimensional input space X = X₁×···×X_d and an output p, we have label

L = {l_{1}, l_{2}, . . ., l_{p}}, w h e r e p > 1

. A multi-label instance can be defined as a pair (x, l), where x = (x₁,…, x_d)∈X and l⊆L, where l is called a label set.

l_{p}

equals 1 when associated with instance x, and 0 otherwise. The goal of multi-label classification (MLC) [6] is to build a prediction model

h : X \to 2^{L}

and provide a set of related labels for unknown instances. Each instance may have several labels associated with it from a previously defined set of labels. Therefore, for every x∈X, there is a binary set

(l, \bar{l})

of label space L, where l = h(x) is the set of related labels and

\bar{l}

irrelevant labels.

Ensemble techniques are becoming increasingly important, as they have been shown to improve the accuracy of single classifiers [7]. The types of base classifiers in ensemble classifiers can be homogeneous or heterogeneous, and the heterogeneous base classifiers are constructed using different algorithms. The classical multi-label homogeneous ensemble algorithms include the Ensemble of Classifier Chains (ECC) [8], Ensemble of the Binary Relevance (EBR) [9], and Ensemble of Pruned Sets (EPS) [10]. Heterogeneous ensemble algorithms are widely used in single-label classification. Heterogeneous dynamic ensemble selection (HDES-AD) [11] based on accuracy and diversity can accurately exchange between different types of base classifiers in the ensemble to enhance its predictive performance in non-stationary environments. A heterogeneous ensemble using a variety of learning algorithms has a higher potential to generate diversified classifiers than a homogeneous ensemble. The authors of [12] use a heterogeneous ensemble to carry out unbalanced learning. Also, in multi-label classification, another study [7] uses a heterogeneous ensemble of classifiers to solve the problem of sample imbalance and solve problems related to labels. It proposes combining the most advanced multi-label methods through ensemble techniques rather than focusing on ensemble techniques in multi-label learners.

Although existing ensemble classification algorithms have been able to deal with some multi-label classification problems, most of them adopt the homogeneous ensemble method, which has a single type of base classifier and lacks diversity. At the same time, the traditional heterogeneous ensemble is aimed at the different types of base classifiers. To solve the above problems, this paper proposes an Adaptive Heterogeneous Classifier Group (AHCG) multi-label algorithm, the main contributions of which are as follows:

(1): The concept of a Heterogeneous Classifier Group is proposed. Two different ensemble classifiers are used in the testing and training stages. It is different from the previous concept of a heterogeneous ensemble, which is no longer an ensemble classifier with different kinds of base classifiers.
(2): The Adaptive Selection Strategy is proposed. It proposes to use the adaptive mean square error formula to calculate the sum of the error values of each group of ensemble classifiers and select the most suitable group of ensemble classifiers for testing by comparing the values.
(3): The least squares method is used to calculate the weight of the base classifier in the Heterogeneous Classifier Group and dynamically update it according to the weight.
(4): Experiments are carried out on seven real datasets, and the AHCG algorithm is compared with eight homogeneous ensemble methods. Good results are obtained in all four evaluation indexes.

2. Related Works

This section has three parts, which are reviewed from classical classification algorithms of single and ensemble models.

2.1. Classical Multi-Label Classification Algorithm

There are many classical multi-label ensemble classification algorithms. They are mainly divided into Problem Transformation (PT) and Algorithm Adaptation (AA). The most commonly used PT method is Binary Relevance (BR). The BR method does not consider the interdependence between labels. In order to overcome this problem, researchers proposed a method of classifier chains (CCs) [13], which is based on the BR algorithm and connects binary classifiers obtained from BR through the chain method. The label powerset (LP) method is also the PT method. Random k-label sets (RAkEL) [14] is an ensemble use of the LP method, in which each LP classifier is trained by randomly generated and different small subsets of labels. AA is to modify existing algorithms to accommodate new problems to be solved. The specific performance is to adjust the existing single-label classification problem to an MLC problem. Popular AA models for performing MLC include K-nearest neighbor [15], decision trees [16], Bayesian networks [17], support vector machines [18], neural network [19], etc.

Here, EBR [9], ECC [8], and EPS [10] algorithms are mainly introduced. The Ensemble of Binary Relevance (EBR) Classifier is generated for each BR classifier using bagging. Because of the diversity between base classifiers, having randomly selected instances of each BR improves BR performance. However, the EBR does not consider the relationship between labels.

A CC algorithm realizes the serial connection of classifiers by adding the results of previous classifiers to the current classifier; that is, the attribute space of each binary model is extended with the 0/1 label association of all previous base classifiers, thus forming a classifier chain. Since the order of the initially randomly generated label chain cannot effectively avoid the risk of error propagation, CCs are still very sensitive to the order. Therefore, it is a key problem to select an optimal sequence for CCs to ensure the high accuracy of multi-label classification. Introducing CCs into an ensemble framework is called an Ensemble of Classifier Chains (ECC). The ECC trains K CC base classifiers C₁, C₂, …, C_k, …, C_K. Each C_k model may be unique and capable of giving different multi-label predictions. The predictions are added up by labels so that each label gets a certain number of votes. The final prediction multi-label set is formed by threshold selection.

The Ensemble of Pruned Sets (EPS) enables the formation of new label sets to accommodate irregular or complex data. EPS models are generated through training for prediction. The EPS utilizes the most important label relationships in multi-label datasets. Much unnecessary and harmful complexity is avoided by pruning the set of labels that do not occur frequently. The basic idea is to get rid of the instance in which the tag combination does not occur frequently in the training stage and then generate a new instance to replace the original one. The principle is to ensure that the label combination of the newly generated instance is a subset of the original instance’s label combination and is frequent. But this approach also tends to throw away some important label information and miss many important label combinations.

2.2. Multi-Label Classification of Single Algorithm

This section introduces the single classification algorithm from the perspective of nearest neighbors, extreme learning machine, and decision trees.

To cope with concept drift, the Punitive k-Nearest Neighbors algorithm with a Self-Adjusting Memory (MLSAMPkNN) [20] uses memory resizing to include only the current concept encouraging the system to identify and punish incorrect data instances and retain and use only the current useful data. The algorithm can quickly and efficiently adapt to changes in the data stream while still maintaining low computational complexity. The Self-Adjusting Multi-Label Nearest Neighbors algorithm (MLSAESAKNNS) [21] uses an adaptive window to adjust to drift sizes. Since multi-label data may include data that cause errors for only a single or few labels but provide valuable information for other labels, the algorithm enables and disables instances of each label based on the past performance of the instances. The kNN overlay method for the feature-weighted distance measurement [22] addresses imbalanced data by optimizing the F-value rather than the classification error. The Coupled Multi-Label k-Nearest Neighbors algorithm (CML-kNN) [23] introduces coupled similarity between class labels, where the higher the similarity, the closer these two values are in the interaction of two different label values in the selected label space, and also includes more similar neighbors to overcome the problem of lack of neighbors with specific labels.

A multi-label classification algorithm based on a Kernel Extreme Learning Machine (ML-KELM) [24] solves the problem of converting the real-valued output of the network into a binary vector using an adaptive thresholding function problem, and the threshold function is used to find the boundary between the relevant and irrelevant labels. The Multi-layer Kernel Extreme Learning Machine (ML-CK-ELM) [25] uses linear combinations of base kernels at each layer to construct a specific kernel without randomly tuning parameters, with a significant reduction in the computation time and memory storage. The ability of global approximation is demonstrated when many basis kernels are combined together while showing good general performance.

The multi-label decision tree algorithm for predicting probabilities [26] uses a normalization method to convert multi-label data into single-labeled instances using a traditional, single-label decision tree algorithm to build a tree. Logarithm-depth Streaming Multi-label Decision Trees (LdSM) [27] can be used to construct and train multi-label decision trees by optimizing a new objective function in each node of the tree that facilitates balanced splitting, maintains high class purity of the child nodes, and allows for sending instances in multiple directions with a penalty to prevent excessive tree growth. Moral-García et al. [28] extended the ECC by using CC4.5 as the base classifier. This integrated model has better performance in noisy data and also handles correlation between labels well. The splitting criterion used by CC4.5 is the imprecise information gain ratio. Learning label correlations may create circular dependencies, and to address this problem, the 3RC algorithm is proposed. 3RC [29] follows the BR approach by using multiple decision trees as binary classifiers. This novel approach aims to learn label dependencies and give model results considering only relevant dependencies in order to perform better predictions and reduce error propagation due to irrelevant and weak dependencies.

2.3. Ensemble Multi-Label Classification Algorithm

Most integrated classification algorithms assign the same importance to all base classifiers [30]. However, usually, different base classifiers have different classification performances for the same instances. Assigning suitable weights according to the performances of base classifiers can improve the classification results. In this section, fixed and dynamic weighting algorithms are described.

In multi-label classification systems, many papers have shown that the classification power of integrated classifiers is stronger than that of single models. Nan et al. [31] proposed a novel integrated approach for multi-label classification. Instead of randomly selecting a small subset, the algorithm constructs a set of k-label sets based on local positive and negative pairwise label correlations. The Adaptive Ensemble of Self-Adjusting Nearest Neighbor Subspaces (AESAKNNS) algorithm [21] uses the ADWIN detector to monitor each classifier for conceptual drift on the subspace, and once detected, the algorithm automatically trains additional classifiers in the background to try to capture the new concepts on the new feature subspace.

Moyano successively proposed the Evolutionary Algorithm for multi-Label Ensemble optimization (EAGLET) [30] and the Auto-adaptive algorithm based on Grammar-Guided Genetic Programming (AG3P-kEMLC) [32] regarding intelligent optimization. EAGLET is used to select simple, accurate, and diverse multi-label classifiers to build an ensemble considering data features. Each multi-label classifier focuses on a small subset of labels, considering the relationships between them but avoiding the high complexity of the output space.

To obtain better classification results, many algorithms perform label set processing. Mahdavi-Shahri et al. [33] proposed using base classifiers in an ensemble to perform the prediction of individual labels. Then, these base classifiers are combined into a multi-label integrated classifier that makes predictions for all labels. Zhang et al. [34] proposed a Fully Associative Ensemble Learning (FAEL) algorithm. This algorithm models the relationship between the global prediction of each class node and the local prediction of all class nodes as a multivariate regression problem with Frobenius parametric or l1 parametric regularization. Nasierding et al. [35] proposed a triple-Random Ensemble Multi-Label Classification (TREMLC) learning method for multi-label classification problems, especially for applications of image-to-text translation and automatic image annotation. The proposed stochastic learning method integrates the concepts of stochastic subspace, bagging, and stochastic k-label set ensemble learning methods to form a multi-label data classification method.

The learning methods often used in building integrated classifiers are bagging, boosting, and stacking. The Ensemble of Label specIfic FeaTures (ELIFT) [36] constructs multiple LIFT classifiers using multiple training sets generated using the bagging strategy. The different classifiers are automatically weighted according to the loss of each classifier. For each new instance, the predicted label vector is obtained using the learned weighted integrated classifier. The Boosting Weighted Extreme Learning Machine (WELM) algorithm [37] seamlessly embeds the weighted ELM into a modified AdaBoost framework. The Boosting Label Weighted Extreme Learning Machine algorithm (Boosting Label Weighting Extreme Learning machine, BLW-EL) [38] integrates the label-weighted extreme learning machine into the boosting integrated learning framework. Based on the iterative feedback of the training results, appropriate weights are designed for each training label belonging to each training instance. GOOWE-ML [39] is a new stacking ensemble method for multi-label classification with linear and dynamic weighting. It uses a spatial model to assign optimal weights to its base classifier and can be used with any existing incremental multi-label classification algorithm and its base classifier.

Li et al. [40] proposed a heterogeneous integration approach for sentiment classification. The algorithm combines LSTM, CNN, and LR because they are, respectively, good at capturing different types of text features. Then, the regression model is used to fuse the prediction results to improve the prediction accuracy. Wu et al. [41] proposed a heterogeneous ensemble learning algorithm based on the firefly optimization algorithm and applied it to network intrusion detection. The firefly optimization algorithm is used to optimize the weights of each base classifier, and the output results are weighted and fused according to the optimized weights to obtain the final result. Ding et al. [42] proposed a multi-label classification algorithm based on a dynamic heterogeneous ensemble (DHEML), which trains the data block with h different classifier algorithms to obtain the classifier candidate group and uses geometric weighting to assign weights to the classifier group to update the classifier group. An adversarial Domain Adaptation approach integrating an Adaptive Graph Convolutional Network (DA-AGCN) [43] uses the backbone network based on TResNet to extract data features, and then the adaptive graph convolutional neural network is used to learn multiple classifiers. After learning, the classifiers classify the extracted features with multiple labels.

3. AHCG Algorithm

In this section, the Adaptive Heterogeneous Classifier Group (AHCG) Algorithm is introduced in detail. The concepts of a Heterogeneous Classifier Group and an Adaptive Selection Strategy are proposed. Table 1 describes the meanings of the symbols covered in this section.

3.1. HCG Concept

The AHCG algorithm proposed the concept of a Heterogeneous Classifier Group (HCG). Unlike previous heterogeneous ensembles, it no longer has different base classifiers in the ensemble but involves using two different ensemble classifiers in the testing and training phases. In the test phase, the ASS in Section 2.2 is used to select the appropriate ensemble classifiers for the testing of the data instance. In the training phase, two groups of different ensemble classifiers are simultaneously constructed, and then the dynamic update strategy in Section 2.3 is used to update and replace the in-group base classifiers, respectively. Two groups of different ensemble classifiers are combined to increase the heterogeneity of the ensemble classifiers.

The HCG can combine any two ensemble classifiers and is a general concept. We use C₁ and C₂ to represent two base classifiers generated by different algorithms, respectively. Figure 1 shows the traditional homogeneous and heterogeneous ensemble diagrams, and Figure 2 depicts the schematic diagram of the HCG. Experiments show that the HCG concept proposed in this paper is reasonable and has better performance than heterogeneous ensemble methods.

3.2. Adaptive Selection Strategy

The AHCG proposes an Adaptive Selection Strategy (ASS). In the testing phase, the ensemble classifier with the minimum sum of the mean square error is selected from the HCG.

Accuracy Updated Ensemble (AUE2) [44] is a classic single-label ensemble classification algorithm. The Mean Square Error (MSE) formula is used to calculate the Mean Square Error of the classifiers. In this algorithm, the calculation formulas of MSEh_k and MSE_r are proposed. MSE_hk is to calculate the prediction error of d_h with the base classifier, where

f_{y}^{k} (x)

is the probability that instance x belongs to class y given by base classifier C_k. MSE_r is the mean square error of the random prediction classifier and is used as a reference point for the current category distribution, where p(y) is the distribution of category y. The calculation formula is as follows:

{M S E}_{h k} = \frac{1}{|d_{h}|} \sum_{\{x, y\} \in d_{h}} {(1 - f_{y}^{k} (x))}^{2}

(1)

{M S E}_{r} = \sum_{y} p (y) {(1 - p (y))}^{2}

(2)

Finally, MSE_hk and MSE_r are combined to give the accuracy of the base classifier and the current class distribution. It also adds a very small positive value θ, which avoids the problem of dividing by zero. The formula is shown in (3):

w_{h k} = \frac{1}{{M S E}_{r} + {M S E}_{h k} + θ}

(3)

AUE2 calculates the weight of each base classifier, and the formula is modified as follows:

w_{k} = \frac{1}{{M S E}_{r} + θ}

(4)

But the AHCG algorithm targets two ensemble classifier groups and needs to separately calculate the sum of the mean square error of each ensemble. Therefore, the formula is shown in (5).

{A M S E}_{h k} = \sum_{k = 1}^{K} \frac{1}{{M S E}_{h k} + {M S E}_{r} + θ}

(5)

The values of MSE_hk and MSE_r are both between 0 and 1, and θ is a very small constant.

\sum_{k}^{K} {M S E}_{h k}

represents the accuracy of each base classifier in the classifier group, and MSE_r represents the prediction mean square error of randomly selecting a base classifier. For

{A M S E}_{h k}

, the smaller the sum of error values, the smaller the value of

{A M S E}_{h k}

, and the more stable the performance of the base classifier.

Figure 3 shows a schematic diagram of the ASS using the HCG constructed with C₁ and C₂ as examples. Here, C₁ and C₂ are still used to represent the two base classifiers generated by different algorithms. CR(C₁) and CR(C₂) represent their classification results for all class labels of the data instance, expressed as Boolean values. First, the structure calculates the AMSE value of the HCG and then determines whether the classification result of the HCG is the same as that of the data instance. If so, it selects the one with the smaller AMSE for testing. Otherwise, it determines whether the C₂ result is correct. If it is correct, it selects C₂ for testing; otherwise, it selects C₁.

Algorithm 1 is the pseudocode of the AHCG algorithm proposed in this paper, which mainly describes the ASS. The details are analyzed as follows: The input is data stream D, the data block is DC, the number of ensemble classifiers is K, and M₁ and M₂ represent two different sets of ensemble classifiers.

Algorithm 1. AHCG

Input: D: data stream; DC: data block; K: number of integrated classifiers; M₁, M₂: set of ensemble classifiers

Output : predicted results \hat{y}

1.: While D has more instances, perform

2.: Calculate the AMSE of M₁ and M₂, respectively, using by Formula (4)

3.: If (CR(M₁) =CR(M₂)) //When the classification results are the same

4.: If (AMSE₁<AMSE₂)

5.: $\hat{y} = p r e d i c t (x_{i}, M_{1})$

6.: Else

7.: $\hat{y} = p r e d i c t (x_{i}, M_{2})$

8.: End if

9.: Else if (CR(M₁)) //Classification results are different

10.: $\hat{y} = p r e d i c t (x_{i}, M_{1})$

11.: Else

12.: $\hat{y} = p r e d i c t (x_{i}, M_{2})$

13.: End if

14.: Train (M₁, DC) //see Algorithm 2

15.: Train (M₂, DC) //see Algorithm 2

16.: End while

Algorithm 1 starts with the input of the data stream, and in lines 4–15 of the algorithm is a detailed description of the ASS. After that, the HCG is trained with a data block size of DC, respectively. The specific implementation of the training is shown in Algorithm 2 in Section 3.3.

3.3. Dynamic Update of Base Classifiers

During the training of HCG base classifiers, the in-group base classifiers need to be weighted and updated. Each group of ensemble classifiers is composed of K base classifiers, and each data block in the stream is used for training. For incoming data streams, the algorithm uses a sliding window consisting of recent instances to evaluate the stream.

In [17], the least squares formula is used to obtain weights to dynamically update the base classifier.

\vec{y}

is the vector that represents the reality of a given data point, and

\vec{w}

is the weight vector. The formula of the least squares method is as follows:

{}_{\vec{w}}^{m i n}{| | \vec{y} - S \vec{w} | |}_{2}^{2}

(6)

In the framework of the ensemble, a combinator can be defined as the function

g

, where

g

:

R^{K \times p} \to R^{p}

. For Formula (6), its minimized function can be expressed as follows:

g (w_{1}, w_{2}, \dots . ., w_{K}) = \sum_{i = 1}^{n} \sum_{j = 1}^{p} {(\sum_{k = 1}^{K} (w_{k} s_{k j}^{i} - y_{j}^{i}))}^{2}

(7)

Taking the partial derivative of w and setting the gradient to zero, and the formula is as follows:

\nabla g (w_{1}, w_{2}, \dots, w_{K}) = \sum_{k = 1}^{K} w_{k} (\sum_{i = 1}^{n} \sum_{j = 1}^{p} S_{q j}^{i} S_{k j}^{i}) - \sum_{i = 1}^{n} \sum_{j = 1}^{p} y_{j}^{i} S_{q j}^{i}

(8)

The formula can be simplified to:

\sum_{k = 1}^{K} w_{k} (\sum_{i = 1}^{n} \sum_{j = 1}^{p} S_{q j}^{i} S_{k j}^{i}) = \sum_{i = 1}^{n} \sum_{j = 1}^{p} y_{j}^{i} S_{q j}^{i}

(9)

The AHCG algorithm uses Formula (9) to calculate the weight of the base classifiers in each group of ensemble classifiers. Formula (9) is simplified to obtain

\sum_{k = 1}^{K} w_{k} = \sum_{i = 1}^{n} \sum_{j = 1}^{p} y_{j}^{i} S_{q j}^{i} {((\sum_{i = 1}^{n} \sum_{j = 1}^{p} S_{q j}^{i} S_{k j}^{i}))}^{- 1}

(10)

Figure 4 shows the schematic diagram of the training algorithm using the HCG constructed by C₁ and C₂ as an example. Here, C₁ and C₂ are still used to represent the two base classifiers generated by different algorithms. Formula (10) is used to calculate the weights of the base classifier of the HCG. It can be used to decide whether to update and replace the device based on the criteria. For the kth base classifier, one must determine whether the number of classifiers in the classifier ensemble is full. If it is full, the weight needs to be calculated using Formula (9), the base classifier with the lowest weight discarded, and a new base classifier added. For both base classifiers, the same replacement and update principles are applied.

Algorithm 2 is a detailed description of the dynamic weighting of the in-group base classifiers. The least squares method is used to calculate the weights of the base classifiers.

Algorithm 2. Train

Inputs: DC: data blocks; K: number of base classifiers; M: ensemble classifiers; C: base classifier

Output: ensemble classifiers

1.: C_in ← uses DC to build a base classifier

2.: If M has K classifiers //When the classifier number is full

3.: Use Formula (9) to calculate the weight

4.: C_out← Selects the base classifier with the worst weight

5.: M ← M − C_out //Classifiers have the worst weight dropped

6.: End if

7.: M ← M + C_in //Add classifier in ensemble classifiers

8.: Train all base classifiers except C_in

Algorithm 2 updates and replaces the base classifier according to the weights of the base classifiers within the group. First, the DC data block is used to train the base classifier, and Formula (9) is used to calculate the weight. When there are K classifiers, the base classifiers need to be replaced, the base classifier with the worst weight is selected and removed, and the newly trained base classifier is added. Otherwise, it is added directly.

3.4. Time Complexity Analysis

Here, we first computed the worst-case time complexity of Algorithm 2. Here, the time complexity of the process of building base classifiers and the process of training base classifiers other than C_in is set to O(1), assuming that the number of base classifiers in the current integrated classifier is K and the time complexity in the conditional judgment statement is

O (n^{2} \times p^{2})

. After that, the worst time time complexity of Algorithm 1 is calculated. The time complexity of Algorithm 1 is

O ({| N | \times (K + (n}^{2} \times p^{2})))

, where, n is the number of instances in DC, p is the number of labels, K is the number of base classifiers, and N is the total number in the dataset.

In order to more intuitively see the time efficiency differences of the algorithms, the time complexity of the benchmark algorithm was analyzed, and the analysis results are shown in Table 2. The time complexity of the EBR is the time complexity of training k groups of BR models, and the training time complexity of a BR model is

O (k \times p \times f (m, | N |))

. The time complexity of the ECC is

O (k \times p \times f (m + p, | N |))

. The algorithm models the correlation between labels through a chain structure, so its time complexity is slightly higher than that of the EBR. The time complexity of the EPS is

O (|N| \times p \times \log g + |N| \times 2^{p} \times p \times \log g),

which is mainly determined by constructing and traversing the PS model. The time complexity of GORT, EBRT, EaBR, EaCC, and EaPS is

O (\frac{|N|}{n} (n K c + n p K^{2}) + K^{3})

. The time complexity of EBRT is

O (2 N^{2})

, which is mainly determined by the construction, training, and prediction time of the model.

The time complexity of DHEML is determined by the size of the dataset. The larger the dataset, the more complex the feature relationship, and the more difficult it is to calculate the geometric weighting. AESAKNNS depends on factors such as data preprocessing, K-nearest neighbors search, adaptive and multi-label processing, and the application of Bayesian conceptual rules. It can be seen from the table that the time complexity of the AHCG algorithm is high because the time complexity of the selected classifier group is too high. Here, p represents the number of labels, m represents the number of features, N represents the size of the dataset, g represents the number of pruning times, K represents the number of base classifiers, n represents the number of instances in DC, c is the prediction time of a single classifier, and k is the number of iterations.

4. Experimental Results and Analysis

The hardware environment for this experiment is a personal computer with Intel Corei5-7200U, 2.5 GHz CPU with 12 GB of memory. The operating system is Windows 10, and the software environment is Massive Online Analysis (MOA) MOA2021 [45] combined with the multi-label method in MEKA [6].

An interleaved-test-then-train (ITTT) [40] assessment method is used to predict the AHCG. This method is mainly used for the calculation of the data stream. Different from the traditional batch evaluation process, as the amount of data increases, it is impossible for stream processing to conduct multiple training tests. To complete the training and testing in a reasonable time, repetition and column splitting must be reduced. ITTT allows for each instance to be used for testing before training so that accuracy can be updated incrementally.

This section will conduct experiments on datasets 3.1(1). The experimental contents are as follows: (1) comparative experiments of ensemble algorithms; (2) comparison with classification of algorithms dealing with conceptual drift; and (3) Friedman statistical analysis. In the experiment of this paper, the block size n is set to 500, the number of base classifiers k is set to 10, and the rest of the parameters are set to conventional settings.

4.1. Experimental Settings

(1): Datasets

This experiment will introduce datasets from the aspects of the research field, number of instances (n), number of attributes (m), number of labels (L), label card number (LC(D)), and label density (LD(D)), as shown in Table 3, where the label card and label density are shown in Formulas (11) and (12).

L C (D) = \frac{1}{n} \sum_{i = 1}^{n} |y^{i}|

(11)

L D (D) = \frac{L C (D)}{L} = \frac{1}{L n} \sum_{i = 1}^{n} |y^{i}|

(12)

(2): Benchmark algorithm

To verify the validity of the AHCG algorithm, the EPS and ECC are combined and named by their method, denoted as AHCG1. Similarly, AHCG2 is composed of the EPS and EBR, and AHCG3 is composed of the EBR and ECC. This article sets up eight baseline algorithms, as shown below.

EBR [9]: An ensemble version of the BR model: each instance of BR is randomly generated, regardless of the relationship between labels;
ECC [9]: An ensemble version of CCs, where the chain order of each CC is randomly generated; it takes into account global label dependencies;
EPS [10]: An improved integrated version of LP pays attention to the most important relationships of labels by pruning the set of labels that appear less often;
GORT [40]: Algorithm using iSOUP regression tree;
EBRT [46]: A regression tree method for multi-label classification through multi-objective regression in stream setting;
EaBR, EaCC, and EaPS [40]: Use ADWIN as their concept drift detector;
MLSAMPkNN [20]: The algorithm uses self-adjusting memory penalty kNN;
AESAKNNS [21]: The algorithm uses ensemble kNN;
DHEML [43]: This algorithm is a heterogeneous ensemble algorithm and uses the Adaptive Selection Strategy.

(3): Evaluation Metrics

In multi-label classification, it is not appropriate to use some evaluation indexes as evaluation indexes. Many evaluation indexes are designed for multi-label classification. Accuracy, example-based F1, micro-averaged F1, and macro-averaged F1 are used for assessment in this paper. Table 4 explains the mathematical symbols in the formula.

In Formulas (13)–(16), Acc(h) represents, accuracy and

F^{β} (h)

represents the F value based on the data instance.

β

expresses an equilibrium factor, usually equal to 1. R_i and P_i are the accuracy and recall rates of the ith label.

A c c (h) = \frac{1}{p} \sum_{i = 1}^{p} \frac{| Y_{i} \cap h (x_{i}) |}{| Y_{i} \cup h (x_{i}) |}

(13)

F^{β} (h) = \frac{(1 + β^{2}) \cdot P r e (h) \cdot R e (h)}{β^{2} \cdot P r e (h) + R e (h)}

(14)

M i c r o_F 1 = \frac{2 \times M i c r o_P r e c i s i o n \times M i c r o_R e c a l l}{M i c r o_P r e c i s i o n + M i c r o_R e c a l l}

(15)

M a c r o_F 1 = \frac{1}{p} \sum_{i = 1}^{p} \frac{2 \times R i \times P i}{R i \times P i}

(16)

4.2. Experimental Analysis

(1): Comparative experiments of ensemble algorithms

The algorithms of AHCG1, AHCG2, and AHCG3 are compared with the EBR, ECC, EPS, and DHEML on 12 datasets. Detailed experimental results include the accuracy, example-based F1, micro-averaged F1, macro-averaged F1, and time efficiency of each algorithm. As shown in Table 5, the best results are in bold.

As can be seen from the table, the algorithms using the AHCG ranked better on average than the algorithms with the EBR, ECC, and EPS classifiers for the four evaluation metrics. The AHCG3 algorithm received the best ranking in terms of its accuracy, example-based F1, and macro-averaged F1, and the micro-averaged F1 received the second place. The overall performances of the AHCG2 and AHCG3 algorithms are slightly better than that of DHEML, which can prove the effectiveness of the proposed HCG. Among them, the accuracy of the AHCG algorithm is significantly higher than the ensemble classifier. For example, in the dataset Medical, the AHCG3 algorithm outperformed the EBR algorithm by 9.3% and outperformed the ECC algorithm by 9.7%. In the dataset Ohsumed, the AHCG3 algorithm was 8.4% higher than the EBR algorithm and 9.5% higher than the ECC algorithm.

In general, algorithms using the AHCG have better performance than homogeneous ensemble classifier algorithms. Because algorithms can use the ASS for testing, they can select group classifiers that are likely to achieve better performance for testing. But in some cases, the prediction result of the AHCG algorithm is not as good as that of the ensemble. In Enron, for example, the EBR algorithm is more accurate than AHCG2 or AHCG3. This is because in the selection of test methods, the judgment criterion is first selected by whether the two sets of ensemble classifiers have the same test results for all labels of instances. For AHCG3 method, if both the HCG can correctly test or incorrectly test instances, and the AMSE value of the ECC of one HCG is less than or equal to the AMSE value of the EBR of the HCG, the ECC group classifier is selected by default for testing. At this time, ECC group classifiers that can be correctly classified may be selected to reduce the number of results. The running times of the AHCG2 and AHCG3 are less than that of the DHEML algorithm (As shown in Table 6, the best results are in bold).

In terms of time efficiency, as shown by the time complexity of the AHCG algorithm, the algorithm’s efficiency is mainly determined by the number of instances, the number of base classifiers, the number of labels, and the number of instances in the data block. In small datasets, the AHCG algorithm is faster than the integrated algorithm except for the EPS algorithm. The EPS algorithm saves time because it can prune the infrequent label sets to focus on the most important label relationships. In the dataset Medical, the AHCG3 algorithm saves 289,288 ms over the EBR algorithm and 295,062 ms over the ECC algorithm. However, in the face of large datasets, for the number of instances, the AHCG is more time efficient and takes more time than the algorithm that uses only homogeneous classifiers.

(2): Comparison with the classification of algorithms dealing with concept drift

The algorithms of AHCG1, AHCG2, and AHCG3 are compared with GORT, EBRT, EaBR, EaCC, EaPS, MLSAMPKNN, AESAKNNS, and DHEML on 12 datasets. Both the AHCG and contrast algorithms are designed to deal with the details of concept drift. Detailed experimental results include the accuracy, example-based F1, micro-averaged F1, and macro-averaged F1 of each algorithm. As shown in Table 7, the best results are in bold.

The AHCG algorithm obtained better results than other algorithms in the evaluation metrics of instance-based F1, macro-averaged F1, and micro-averaged F1, ranking in the top three on average among all algorithms. Moreover, the algorithm of AHCG3 ranked first in instance-based F1 and macro-averaged F1, and it ranked third in micro-averaged F1. The algorithm of AHCG2 ranked first in micro-averaged F1, and the algorithm of AHCG3 ranked second in accuracy.

Compared with EaBR, EaCC, and EaPS algorithms with a window mechanism, AHCG algorithms achieve better experimental results. The accuracy of comparison algorithm EaCC on the datasets Slashdot, Reuters, and Ohsumed are not very optimistic. The accuracies of AHCG1 and AHCG3 are better than that of EaCC in these three datasets. For example, in the dataset Slashdot, the accuracy value of the AHCG1 algorithm is as high as 5.5% compared with EaCC, and the accuracy value of AHCG3 is as high as 5.8% compared with EaCC. In the dataset Reuters, the accuracy value of the AHCG1 algorithm is as high as 7.4% compared with EaCC, and the accuracy value of the AHCG3 algorithm is as high as 13.2% compared with EaCC. In the dataset Ohsumed, the accuracy value of the AHCG1 algorithm is up to 20% higher than that of EaCC, and the accuracy value of the AHCG3 algorithm is up to 27.1% higher than that of EaCC. The accuracy, case-based F1, micro-averaged F1, and macro-averaged F1 results of the AHCG algorithm are all higher than those of DHEML algorithm. As can be seen from Table 7, the AHCG algorithm is superior to the DHEML algorithm in terms of the example-based F1, micro-averaged F1, and macro-averaged F1. The experimental results show that the AHCG algorithm is superior to the heterogeneous ensemble method in dealing with concept drift problems.

(3): Friedman statistical analysis

To detect the statistical significance between algorithms, Friedman statistical analysis was adopted in the result analysis process [46]. This section studies the saliency problem between the algorithm of the AHCG and the comparison algorithms involved. The results of this test can be seen in a critical graph, where each algorithm is sorted according to the average rank, and the algorithms within the critical distance are connected by a line. In this experiment, the better model is on the left side of the critical graph. The calculation formula of critical distance is shown in Formula (17).

C D = q_{α, k} \sqrt{\frac{k (k + 1)}{6 N}}

(17)

where the significance level α = 0.05, k represents the number of algorithms, and N represents the number of datasets; in this experiment, k = 14, and N = 12. Calculated using the formula, CD = 5.26. Figure 4 shows the CD plots on the assessment indicator accuracy, example-based F1, micro-averaged F1, and macro-averaged F1, where the average ranking of each algorithm is marked along the axis.

Figure 5 shows that the AHCG algorithms of AHCG1, AHCG2, and AHCG3 are superior to other comparison algorithms in terms of accuracy, example-based F1, micro-averaged F1, and macro-averaged F1. Among them, the algorithm of AHCG1 is ranked first in accuracy, example-based F1, micro-averaged F1, and macro-averaged F1. AHCG2 is ranked first in accuracy and micro-averaged F1. The superiority of HCG algorithms shows that using different classifier groups can improve the heterogeneity among the classifiers and improve the classification performance.

5. Summary

For the diversity of ensemble classifiers, this paper proposes the AHCG algorithm, which uses the concept of an HCG. An Adaptive Selection Strategy is proposed to select the appropriate ensemble classifier for testing according to the AMSE. The least squares method is used to calculate the weights of the base classifiers in the group, and then the weights are updated and replaced. According to the experiments, the AHCG algorithm obtains better values in terms of the accuracy, instance-based F1, micro-averaged F1, and macro-averaged F1 compared with other algorithms for homogeneous ensemble classifiers and ranks high overall. It is a general algorithmic structure that can be applied to most algorithms. The algorithm of the AHCG is executed serially in terms of in-group classifier generation and update replacement strategies, which will improve the time efficiency of the algorithm in large datasets. Therefore, the AHCG algorithm can classify large text data with multiple labels. In future work, our research group will focus on the time efficiency of the algorithm so that it can be applied to image classification. Under the condition that the evaluation metrics are guaranteed to be stable, the HCG can run in parallel in the intra-group classifier generation and update phases to improve the time efficiency of the algorithm. Meanwhile, using different datasets for evaluation and further development, the method of dynamically updating the base classifiers is added to reduce the impact of their number on the experiment.

Author Contributions

Conceptualization, M.H. and S.Y.; methodology, M.H.; software, H.W.; validation, M.H., S.Y. and H.W.; formal analysis, J.D.; investigation, J.D.; resources, H.W.; data curation, S.Y.; writing—original draft preparation, M.H.; writing—review and editing, S.Y.; visualization, M.H.; supervision, J.D.; project administration, H.W.; funding acquisition, M.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Ningxia Natural Science Foundation Project (2022AAC03279), the National Nature Science Foundation of China (62062004), and the Graduate Innovation Project of North Minzu University (YCX24371).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Minaee, S.; Kalchbrenner, N.; Cambria, E.; Nikzad, N.; Chenaghlu, M.; Gao, J. Deep learning--based text classification: A comprehensive review. ACM Comput. Surv. (CSUR) 2021, 54, 1–40. [Google Scholar] [CrossRef]
Liu, Q.; She, X.; Xia, Q. AI based diagnostics product design for osteosarcoma cells microscopy imaging of bone cancer patients using CA-MobileNet V3. J. Bone Oncol. 2024, 49, 100644. [Google Scholar] [CrossRef] [PubMed]
Rana, P.; Meijering, E.; Sowmya, A.; Song, Y. Multi-Label Classification Based On Subcellular Region-Guided Feature Description for Protein Localisation. In Proceedings of the 18th International Symposium on Biomedical Imaging, Nice, France, 13–16 April 2021; pp. 1929–1933. [Google Scholar]
Ding, Y.; Zhang, H.; Huang, W.; Zhou, X.; Shi, Z. Efficient Music Genre Recognition Using ECAS-CNN: A Novel Channel-Aware Neural Network Architecture. Sensors 2024, 24, 7021. [Google Scholar] [CrossRef] [PubMed]
Xie, F.; Pan, X.; Yang, T.; Ernewein, B.; Li, M.; Robinson, D. A novel computer vision and point cloud-based approach for accurate structural analysis of a tall irregular timber structure. Structures 2024, 70, 107697. [Google Scholar] [CrossRef]
Read, J.; Reutemann, P.; Pfahringer, B.; Holmes, G. Meka: A multi-label/multi-target extension to weka. J. Mach. Learn. Res. 2016, 17, 1–5. [Google Scholar]
Osojnik, A.; Panov, P.; Džeroski, S. Multi-label classification via multi-target regression on data streams. Mach. Learn. 2017, 106, 745–770. [Google Scholar] [CrossRef]
Duan, J.; Gu, Y.; Yu, H.; Yang, X.; Gao, S. ECC++: An algorithm family based on ensemble of classifier chains for classifying imbalanced multi-label data. Expert Syst. Appl. 2024, 236, 121366. [Google Scholar] [CrossRef]
Mauri, L.; Damiani, E. Hardening behavioral classifiers against polymorphic malware: An ensemble approach based on minority report. Inf. Sci. 2025, 689, 121499. [Google Scholar] [CrossRef]
Ganaie, M.; Hu, M.; Malik, A.; Tanveer, M.; Suganthan, P. Ensemble deep learning: A review. Eng. Appl. Artif. Intell. 2022, 115, 105151. [Google Scholar] [CrossRef]
Alzubi, O.A.; Alzubi, J.A.; Alweshah, M.; Qiqieh, I.; Al-Shami, S.; Ramachandran, M. An optimal pruning algorithm of classifier ensembles: Dynamic programming approach. Neural Comput. Appl. 2020, 32, 16091–16107. [Google Scholar] [CrossRef]
Tinofirei, M.; Fulufhelo, V.; Nelwamond, O.; Khmaies, O. An Adaptive Heterogeneous Online Learning Ensemble Classifier for Nonstationary Environments. Comput. Intell. Neurosci. 2021, 2021, 6669706. [Google Scholar]
Hg, Z.; Altnay, H. Imbalance Learning Using Heterogeneous Ensembles. Expert Syst. Appl. 2019, 142, 113005. [Google Scholar]
Read, J.; Martino, L.; Olmos, P.M.; Luengo, D. Scalable multi-output label prediction: From classifier chains to classifier trellises. Pattern Recognit. 2015, 48, 2096–2109. [Google Scholar] [CrossRef]
Wang, R.; Kwong, S.; Wang, X.; Jia, Y. Active k-labelsets ensemble for multi-label classification. Pattern Recognit. 2021, 109, 107583. [Google Scholar] [CrossRef]
Zhang, J.; Bian, Z.; Wang, S. Style linear k-nearest neighbor classification method. Appl. Soft Comput. 2024, 150, 111011. [Google Scholar] [CrossRef]
Xiao, N.; Dai, S. A network big data classification method based on decision tree algorithm. Int. J. Reason.-Based Intell. Syst. 2024, 16, 66–73. [Google Scholar] [CrossRef]
Zhang, Z.; Wang, Z.; Liu, H.; Sun, Y. Ensemble Multi-label Classification Algorithm Based on Tree-Bayesian Network. Comput. Sci. 2018, 45, 195–201. [Google Scholar]
Roy, A.; Chakraborty, S. Support vector machine in structural reliability analysis: A review. Reliab. Eng. Syst. Saf. 2023, 233, 109126. [Google Scholar] [CrossRef]
Kavitha, P.M.; Muruganantham, B. Mal_CNN: An Enhancement for Malicious Image Classification Based on Neural Network. Cybern. Syst. 2024, 55, 739–752. [Google Scholar] [CrossRef]
Roseberry, M.; Krawczyk, B.; Cano, A. Multi-label punitive kNN with self-adjusting memory for drifting data streams. ACM Trans. Knowl. Discov. Data 2019, 13, 1–31. [Google Scholar] [CrossRef]
Alberghini, G.; Junior, S.B.; Cano, A. Adaptive ensemble of self-adjusting nearest neighbor subspaces for multi-label drifting data streams. Neurocomputing 2022, 481, 228–248. [Google Scholar] [CrossRef]
Rastin, N.; Jahromi, M.Z.; Taheri, M. Feature weighting to tackle label dependencies in multi-label stacking nearest neighbor. Appl. Intell. 2021, 51, 5200–5218. [Google Scholar] [CrossRef]
Liu, C.; Cao, L. A coupled k-nearest neighbor algorithm for multi-label classification. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Ho Chi Minh City, Vietnam, 19–22 May 2015; pp. 176–187. [Google Scholar]
Luo, F.; Guo, W.; Yu, Y.; Chen, G. A multi-label classification algorithm based on kernel extreme learning machine. Neurocomputing 2017, 260, 313–320. [Google Scholar] [CrossRef]
Rezaei, M.; Eftekhari, M.; Movahed, F.S. ML-CK-ELM: An efficient multi-layer extreme learning machine using combined kernels for multi-label classification. Sci. Iran. 2020, 27, 3005–3018. [Google Scholar] [CrossRef]
Bezembinder, E.M.; Wismans LJ, J.; Berkum EC, V. Constructing multi-labelled decision trees for junction design using the predicted probabilities. In Proceedings of the 20th IEEE International Conference on Intelligent Transportation Systems, Yokohama, Japan, 16–19 October 2017; pp. 1–7. [Google Scholar]
Majzoubi, M.; Choromanska, A. Ldsm: Logarithm-depth streaming multi-label decision trees. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, Online, 26–28 August 2020; pp. 4247–4257. [Google Scholar]
Moral-García, S.; Mantas, C.J.; Castellano, J.G.; Abellán, J. Ensemble of classifier chains and credal C4.5 for solving multi-label classification. Prog. Artif. Intell. 2019, 8, 195–213. [Google Scholar] [CrossRef]
Lotf, H.; Ramdani, M. Multi-Label Classification: A Novel approach using decision trees for learning Label-relations and preventing cyclical dependencies: Relations Recognition and Removing Cycles (3RC). In Proceedings of the 13th International Conference on Intelligent Systems: Theories and Applications, Rabat, Morocco, 23–24 September 2020; pp. 1–6. [Google Scholar]
Nan, G.; Li, Q.; Dou, R.; Liu, J. Local positive and negative correlation-based k -labelsets for multi-label classification. Neurocomputing 2018, 318, 90–101. [Google Scholar] [CrossRef]
Moyano, J.M.; Ventura, S. Auto-adaptive grammar-guided genetic programming algorithm to build ensembles of multi-label classifiers. Inf. Fusion 2022, 78, 1–19. [Google Scholar] [CrossRef]
Mahdavi-Shahri, A.; Houshmand, M.; Yaghoobi, M.; Jalali, M. Applying an ensemble learning method for improving multi-label classification performance. In Proceedings of the 2nd International Conference of Signal Processing and Intelligent Systems, Tehran, Iran, 14–15 December 2016; pp. 1–62018. [Google Scholar]
Moyano, J.M.; Gibaja, E.L.; Cios, K.J.; Ventura, S. Generating ensembles of multi-label classifiers using cooperative coevolutionary algorithms. In Proceedings of the 24th European Conference on Artificial Intelligence, Santiago de Compostela, Spain, 29 August–8 September 2020; pp. 1379–1386. [Google Scholar]
Zhang, L.; Shah, S.K.; Kakadiaris, I.A. Hierarchical multi-label classification using fully associative ensemble learning. Pattern Recognit. 2017, 70, 89–103. [Google Scholar] [CrossRef]
Wei, X.; Yu, Z.; Zhang, C.; Hu, Q. Ensemble of label specific features for multi-label classification. In Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, San Diego, CA, USA, 23–27 July 2018; pp. 1–6. [Google Scholar]
Cheng, K.; Gao, S.; Dong, W.; Yang, X.; Wang, Q.; Yu, H. Boosting label weighted extreme learning machine for classifying multi-label imbalanced data. Neurocomputing 2020, 403, 360–370. [Google Scholar] [CrossRef]
Li, K.; Kong, X.; Lu, Z.; Wenyin, L.; Yin, J. Boosting weighted ELM for imbalanced learning. Neurocomputing 2014, 128, 15–21. [Google Scholar] [CrossRef]
Büyükçakir, A.; Bonab, H.; Can, F. A novel online stacked ensemble for multi-label stream classification. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 22–26 October 2018; pp. 1063–1072. [Google Scholar]
Li, D.; Ji, L.; Liu, S. Research on sentiment classification method based on ensemble learning of heterogeneous classifiers. Eng. J. Wuhan Univ. 2021, 54, 975–982. [Google Scholar]
Wu, D.; Han, B. Network Intrusion Detection Method Based on Optimization Heterogeneous Ensemble Learning of Glowworms. Fire Control Command Control 2021, 46, 26–31. [Google Scholar]
Ding, J.; Wu, H.; Han, M. Multi-label data stream classification algorithm based on dynamic heterogeneous ensemble. Comput. Eng. Des. 2023, 44, 3031–3038. [Google Scholar]
Singh, I.P.; Ghorbel, E.; Oyedotun, O.; Aouada, D. Multi-label image classification using adaptive graph convolutional networks: From a single domain to multiple domains. Comput. Vis. Image Underst. 2024, 247, 104062. [Google Scholar] [CrossRef]
Brzezinski, D.; Stefanowski, J. Reacting to different types of concept drift: The accuracy updated ensemble algorithm. IEEE Trans. Neural Netw. Learn. Syst. 2013, 25, 81–94. [Google Scholar] [CrossRef] [PubMed]
Bifet, A.; Holmes, G.; Pfahringer, B.; Read, J.; Kranen, P.; Kremer, H.; Jansen, T. MOA: A realtime analytics open source framework. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2011, Athens, Greece, 5–9 September 2011; Proceedings, Part III 22; Springer: Berlin/Heidelberg, Germany, 2011; pp. 617–620. [Google Scholar]
Liu, J.; Xu, Y. T-Friedman test: A new statistical test for multiple comparison with an adjustable conservativeness measure. Int. J. Comput. Intell. Syst. 2022, 15, 29. [Google Scholar] [CrossRef]

Figure 1. Traditional ensemble diagram.

Figure 2. HCG diagram.

Figure 3. ASS diagram.

Figure 4. Training algorithm diagram.

Figure 5. Algorithm critical distance graph. (The blue line represents a significant difference, and algorithms outside the region represent a significant difference from controls).

Table 1. Symbols used in the AHCG algorithm.

K	The number of base classifiers in the ensemble
n	The amount of data in the instance window
H	Maximum capacity of a data block DC
$y_{j}^{i}$	For the ith instance, the jth class label
$W_{k}$	It is the weight of classifier k
$s_{k j}^{i}$	For the ith instance, the correlation score of the kth base classifier to the j class label
S	Correlation score matrix; it obtains the predicted scores for different labels through different base classifiers
DC	A data stream is divided into blocks of data, DC = d₁, d₂, …, d_h, …, d_H

Table 2. Time complexity analysis of the algorithm.

Algorithms	Time Complexity	Influence Factor
EBR	$O (k \times p \times f (m, \| N \|))$	The number of iterations k, the number of labels p, and the training complexity $f (m, \| N \|)$ of each binary classifier, etc.
ECC	$O (k \times p \times f (m + p, \| N \|))$	The number of iterations k, the number of labels p, and the training complexity $f (m + p, \| N \|)$ of each binary classifier, etc.
EPS	$O (\|N\| \times p \times \log g + \| N \| \times 2^{p} \times p \times l o g g)$	The number of iterations k, the size of the training set N, the number of labels p, and the pruning strategy, etc.
GORT	$O (\frac{\|N\|}{n} (n K c + n p K^{2}) + K^{3})$	The number of instances in DC n, the number of classifiers K, and the number of labels p, etc.
EBRT	$O (2 N^{2})$	Dataset size N and number of features m, etc.
EaBR	$O (\frac{\|N\|}{n} (n K c + n p K^{2}) + K^{3})$	The number of instances in DC n, the number of classifiers K, and the number of labels p, etc.
EaCC	$O (\frac{\|N\|}{n} (n K c + n p K^{2}) + K^{3})$	The number of instances in DC n, the number of classifiers K, and the number of labels p, etc.
EaPS	$O (\frac{\|N\|}{n} (n K c + n p K^{2}) + K^{3})$	The number of instances in DC n, the number of classifiers K, and the number of labels p, etc.
AHCG	$O (\| N \| \times (K + n^{2} \times p^{2}))$	The number of labels p and the number of classifiers K, etc.

Table 3. Datasets.

Dataset	Domain	n	m	L	LC(D)	LD(D)
Medical	Text	978	1449	45	1.245	0.028
Enron	Text	1702	1001	53	3.378	0.064
Chess	Text	1675	585	227	2.411	0.011
Yeast	Biological	2417	103	14	4.237	0.303
Slashdot	Text	3782	1079	22	1.18	0.053
Philosophy	Text	3971	842	233	2.272	0.010
Reuters	Text	6000	500	101	2.88	0.028
chemistry	Text	6961	540	175	2.109	0.012
Cs	Text	9270	635	274	2.556	0.009
Ohsumed	Text	13,930	1002	23	1.663	0.072
TMC	Text	28,600	500	22	2.158	0.098
IDBI	Text	120,900	1001	28	2.000	0.071

Table 4. Mathematical symbols for evaluation metrics.

Symbol	Meaning
$X$	d dimensional instance space $R^{d}$
$Y$	it marks the space with q possible category labels, $y_{1}, y_{2}, \dots, y_{q}$ {}
x	d dimensional eigenvector, (x₁, x₂, …, x_d) (x∈ $X$ )
p	p indicates the number of instances. 1 ≤ i ≤ p
Y	a set of labels associated with X (Y⊆Y)
h(∙)	MLC h: $X$ → $2^{Y}$ , where h(x) returns the correct set of labels for x

Table 5. The experimental results of the integrated algorithm accuracy, case-based F1, micro-averaged F1, and macro-averaged F1 are compared.

Evaluation Metrics	Date Set	AHCG1	AHCG2	AHCG3	EBR	ECC	EPS	DHEML
Accuracy	Medical	0.271	0.272	0.286	0.193	0.189	0.205	0.285
	Enron	0.322	0.172	0.322	0.342	0.298	0.302	0.304
	Chess	0.056	0.075	0.086	0.067	0.062	0.053	0.079
	Yeast	0.51	0.505	0.508	0.502	0.493	0.48	0.513
	Philosophy	0.029	0.087	0.033	0.013	0.011	0.072	0.072
	Chemistry	0.058	0.069	0.058	0.022	0.015	0.02	0.061
	Cs	0.056	0.074	0.086	0.067	0.062	0.053	0.071
	Slashdot	0.073	0.133	0.076	0.02	0.018	0.07	0.114
	Reuters	0.078	0.154	0.136	0.099	0.093	0.131	0.154
	Ohsumed	0.264	0.212	0.275	0.191	0.18	0.135	0.245
	TMC	0.504	0.499	0.511	0.52	0.511	0.469	0.502
	IDBI	0.163	0.172	0.164	0.055	0.012	0.011	0.166
	Rank. avg	4.17	2.88	2.38	2.50	4.42	5.88	5.79
Example-Based F1	Medical	0.3	0.341	0.315	0.216	0.21	0.235	0.329
	Enron	0.462	0.264	0.462	0.468	0.414	0.397	0.460
	Chess	0.128	0.138	0.2	0.095	0.087	0.079	0.182
	Yeast	0.648	0.639	0.645	0.638	0.632	0.601	0.647
	Philosophy	0.087	0.156	0.104	0.018	0.015	0.072	0.141
	chemistry	0.15	0.143	0.15	0.03	0.02	0.026	0.134
	Cs	0.128	0.137	0.203	0.095	0.087	0.079	0.168
	Slashdot	0.118	0.199	0.12	0.023	0.02	0.075	0.171
	Reuters	0.143	0.241	0.226	0.106	0.099	0.136	0.227
	Ohsumed	0.383	0.344	0.389	0.23	0.217	0.16	0.342
	TMC	0.656	0.65	0.666	0.654	0.643	0.59	0.644
	IDBI	0.288	0.294	0.289	0.075	0.016	0.015	0.135
	Rank. avg	3.00	2.67	2.08	2.92	4.83	6.33	6.17
Micro-Averaged F1	Medical	0.613	0.703	0.636	0.497	0.489	0.396	0.564
	Enron	0.449	0.244	0.449	0.403	0.191	0.358	0.436
	Chess	0.034	0.123	0.031	0.122	0.115	0.11	0.120
	Yeast	0.638	0.639	0.636	0.631	0.625	0.604	0.638
	Philosophy	0.02	0.127	0.02	0.022	0.019	0.098	0.086
	chemistry	0.027	0.095	0.014	0.04	0.028	0.034	0.053
	Cs	0.032	0.123	0.029	0.122	0.115	0.11	0.119
	Slashdot	0.11	0.199	0.111	0.041	0.037	0.106	0.157
	Reuters	0.041	0.207	0.041	0.143	0.135	0.154	0.153
	Ohsumed	0.283	0.307	0.295	0.294	0.28	0.197	0.283
	TMC	0.618	0.61	0.632	0.638	0.631	0.566	0.621
	IDBI	0.215	0.279	0.216	0.099	0.014	0.018	0.225
	Rank.avg	4.54	1.83	4.21	3.00	3.67	5.58	5.17
Macro-Averaged F1	Medical	0.033	0.035	0.034	0.028	0.028	0.021	0.035
	Enron	0.07	0.06	0.07	0.05	0.046	0.034	0.056
	Chess	0.021	0.019	0.028	0.037	0.032	0.006	0.025
	Yeast	0.368	0.353	0.356	0.329	0.343	0.343	0.363
	Philosophy	0.019	0.012	0.02	0.012	0.007	0.005	0.015
	chemistry	0.027	0.062	0.027	0.02	0.012	0.004	0.023
	Cs	0.021	0.02	0.027	0.037	0.032	0.006	0.027
	Slashdot	0.096	0.162	0.097	0.039	0.036	0.077	0.139
	Reuters	0.037	0.072	0.039	0.057	0.049	0.021	0.053
	Ohsumed	0.236	0.28	0.255	0.244	0.23	0.082	0.251
	TMC	0.432	0.421	0.491	0.485	0.465	0.199	0.488
	IDBI	0.115	0.137	0.115	0.032	0.014	0.029	0.118
	Rank. avg	3.71	3.00	2.67	2.83	4.08	5.08	6.63

Table 6. Experimental results for running times.

Running Time	AHCG1	AHCG2	AHCG3	EBR	ECC	EPS	DHEML
Medical	54,480	53,312	95,348	384,636	390,410	7019	65,695
Enron	330,011	296,226	545,682	634,480	623,481	17,758	378,999
Chess	40,317,450	53,310,855	85,347,953	28,946,030	21,725,172	160,898	57,880,922
Yeast	27,138	28,007	37,427	29,333	31,378	4545	29,938
Philosophy	10,513,191	11,601,136	21,001,456	10,956,341	8,151,044	88,600	13,943,644
chemistry	9,525,265	13,239,025	21,925,773	7,708,541	6,474,030	87,881	14,452,766
Cs	45,607,460	69,042,003	113,073,438	35,952,832	29,460,820	196,050	73,645,586
Slashdot	572,558	584,341	857,543	544,816	560,608	95,291	651,471
Reuters	2,858,115	2,622,387	5,223,701	2,097,796	2,124,155	48,777	3,461,739
Ohsumed	4,223,451	3,963,728	5,874,063	2,320,674	2,468,183	289,888	4,547,406
TMC	4,341,980	4,309,034	6,791,509	2,273,091	2,490,216	111,747	4,994,112
IDBI	115,513,788	100,334,992	180,692,563	67,979,363	71,315,269	1,658,930	128,241,470

Table 7. Experimental results of accuracy, example-based F1, micro-averaged F1, and macro-averaged F1.

Evaluation Metrics	Dataset	AHCG1	AHCG2	AHCG2	GORT	EBRT	EaBR	EaCC	EaPS	MLSAMPKNN	AESAKNNS	DHEML
Accuracy	Medical	0.271	0.272	0.286	0.197	0.279	0.192	0.189	0.209	0.075	0.112	0.285
	Enron	0.322	0.172	0.322	0.213	0.059	0.308	0.289	0.26	0294	0.327	0.304
	Chess	0.056	0.075	0.086	0.03	0	0.064	0.003	0.028	0.03	0.043	0.079
	Yeast	0.51	0.505	0.508	0.464	0.502	0.502	0.495	0.474	0.456	0.271	0.513
	Philosophy	0.029	0.087	0.033	0.032	0	0.012	0.003	0.057	0.022	0.037	0.072
	chemistry	0.058	0.069	0.058	0.035	0	0.013	0.001	0.019	0.039	0.026	0.061
	Cs	0.056	0.074	0.086	0.03	0	0.064	0.003	0.028	0.03	0.03	0.071
	Slashdot	0.073	0.133	0.076	0.11	0.001	0.016	0.018	0.044	0.221	0.135	0.114
	Reuters	0.078	0.154	0.136	0.041	0.000	0.051	0.004	0.17	0.289	0.262	0.154
	Ohsumed	0.264	0.212	0.275	0.179	0.049	0.169	0.004	0.113	0.069	0.001	0.245
	TMC	0.504	0.499	0.511	0.295	0.007	0.529	0.516	0.481	0.177	0.214	0.502
	IDBI	0.163	0.172	0.164	0.162	0.000	0.024	0.001	0.019	0.155	0.133	0.166
	Rank. avg	4.58	3.63	3.08	6.96	9.71	6.46	8.67	7.08	6.29	6.50	3.04
Example-Based F1	Medical	0.3	0.341	0.315	0.286	0.321	0.214	0.21	0.24	0.085	0.132	0.329
	Enron	0.462	0.264	0.462	0.35	0.061	0.425	0.408	0.347	0.391	0.445	0.46
	Chess	0.128	0.138	0.200	0.057	0	0.091	0.005	0.04	0.03	0.047	0.182
	Yeast	0.648	0.639	0.645	0.614	0.638	0.638	0.633	0.596	0.586	0.373	0.647
	Philosophy	0.087	0.156	0.104	0.063	0	0.017	0.004	0.073	0.049	0.029	0.141
	chemistry	0.15	0.143	0.15	0.068	0	0.017	0.001	0.024	0.052	0.034	0.134
	Cs	0.128	0.137	0.203	0.058	0	0.091	0.005	0.04	0.045	0.043	0.168
	Slashdot	0.118	0.199	0.12	0.194	0.001	0.018	0.02	0.047	0.23	0.138	0.171
	Reuters	0.143	0.241	0.226	0.08	0.000	0.055	0.005	0.176	0.311	0.279	0.227
	Ohsumed	0.383	0.344	0.389	0.298	0.056	0.202	0.005	0.134	0.09	0.001	0.342
	TMC	0.656	0.65	0.666	0.449	0.008	0.661	0.646	0.598	0.196	0.236	0.644
	IDBI	0.288	0.294	0.289	0.28	0.000	0.031	0.001	0.026	0.206	0.17	0.135
	Rank. avg	3.58	3.17	2.50	6.08	9.71	6.79	8.83	7.67	6.83	7.33	3.50
Micro-Averaged F1	Medical	0.613	0.703	0.636	0.209	0.432	0.493	0.037	0.402	0.202	0.293	0.564
	Enron	0.449	0.244	0.449	0.345	0.037	0.411	0.395	0.33	0.384	0.443	0.436
	Chess	0.034	0.123	0.031	0.057	0	0.116	0.007	0.051	0.045	0.063	0.12
	Yeast	0.638	0.639	0.636	0.602	0.632	0.632	0.627	0.6	0.587	0.42	0.638
	Philosophy	0.02	0.127	0.02	0.062	0	0.021	0.006	0.081	0.055	0.037	0.086
	chemistry	0.027	0.095	0.014	0.067	0	0.023	0.001	0.024	0.061	0.044	0.053
	CS	0.032	0.123	0.029	0.057	0	0.116	0.007	0.051	0.058	0.059	0.119
	Slashdot	0.11	0.199	0.111	0.191	0.001	0.033	0.037	0.074	0.268	0.204	0.157
	Reuters	0.041	0.207	0.041	0.079	0.000	0.076	0.007	0.2	0.342	0.331	0.153
	Ohsumed	0.283	0.307	0.295	0.271	0.076	0.266	0.007	0.171	0.114	0.002	0.283
	TMC	0.618	0.61	0.632	0.435	0.008	0.64	0.632	0.577	0.351	0.405	0.621
	IDBI	0.215	0.279	0.216	0.273	0.000	0.041	0.002	0.033	0.218	0.201	0.225
	Rank. avg	5.63	2.50	5.58	5.58	9.96	5.71	8.79	6.92	5.92	6.00	3.42
Macro-Averaged F1	Medical	0.033	0.035	0.034	0.023	0.016	0.028	0.028	0.021	0.01	0.015	0.035
	Enron	0.07	0.06	0.07	0.095	0.006	0.065	0.062	0.045	0.127	0.148	0.056
	Chess	0.021	0.019	0.028	0.023	0	0.023	0.002	0.004	0.012	0.027	0.025
	Yeast	0.368	0.353	0.356	0.357	0.329	0.329	0.346	0.341	0.379	0.194	0.363
	Philosophy	0.019	0.012	0.02	0.02	0	0.011	0.002	0.004	0.006	0.013	0.015
	chemistry	0.027	0.062	0.027	0.027	0	0.012	0.000	0.003	0.009	0.013	0.023
	Cs	0.021	0.02	0.027	0.022	0	0.035	0.002	0.004	0.012	0.025	0.027
	Slashdot	0.096	0.162	0.097	0.115	0.000	0.033	0.036	0.041	0.136	0.084	0.139
	Reuters	0.037	0.072	0.039	0.042	0.000	0.028	0.004	0.03	0.163	0.162	0.053
	Ohsumed	0.236	0.28	0.255	0.219	0.028	0.215	0.006	0.065	0.052	0.001	0.251
	TMC	0.432	0.421	0.491	0.277	0.003	0.481	0.462	0.329	0.06	0.09	0.488
	IDBI	0.115	0.137	0.115	0.132	0.000	0.013	0.002	0.027	0.107	0.104	0.118
	Rank. avg	4.58	4.04	3.25	4.33	10.50	6.46	8.58	8.33	6.17	6.25	3.50

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Han, M.; Yang, S.; Wu, H.; Ding, J. Multi-Label Classification Algorithm for Adaptive Heterogeneous Classifier Group. Mathematics 2025, 13, 103. https://doi.org/10.3390/math13010103

AMA Style

Han M, Yang S, Wu H, Ding J. Multi-Label Classification Algorithm for Adaptive Heterogeneous Classifier Group. Mathematics. 2025; 13(1):103. https://doi.org/10.3390/math13010103

Chicago/Turabian Style

Han, Meng, Shurong Yang, Hongxin Wu, and Jian Ding. 2025. "Multi-Label Classification Algorithm for Adaptive Heterogeneous Classifier Group" Mathematics 13, no. 1: 103. https://doi.org/10.3390/math13010103

APA Style

Han, M., Yang, S., Wu, H., & Ding, J. (2025). Multi-Label Classification Algorithm for Adaptive Heterogeneous Classifier Group. Mathematics, 13(1), 103. https://doi.org/10.3390/math13010103

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Label Classification Algorithm for Adaptive Heterogeneous Classifier Group

Abstract

1. Introduction

2. Related Works

2.1. Classical Multi-Label Classification Algorithm

2.2. Multi-Label Classification of Single Algorithm

2.3. Ensemble Multi-Label Classification Algorithm

3. AHCG Algorithm

3.1. HCG Concept

3.2. Adaptive Selection Strategy

3.3. Dynamic Update of Base Classifiers

3.4. Time Complexity Analysis

4. Experimental Results and Analysis

4.1. Experimental Settings

4.2. Experimental Analysis

5. Summary

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI