You are currently viewing a new version of our website. To view the old version click .
Mathematics
  • Article
  • Open Access

30 December 2024

Multi-Label Classification Algorithm for Adaptive Heterogeneous Classifier Group

,
,
and
School of Computer Science and Engineering, North Minzu University, Yinchuan 750021, China
*
Author to whom correspondence should be addressed.
This article belongs to the Section E1: Mathematics and Computer Science

Abstract

Ensemble classification is widely used in multi-label algorithms, and it can be divided into homogeneous ensembles and heterogeneous ensembles according to classifier types. A heterogeneous ensemble can generate classifiers with better diversity than a homogeneous ensemble and improve the performance of classification results. An Adaptive Heterogeneous Classifier Group (AHCG) algorithm is proposed. The AHCG first proposes the concept of a Heterogeneous Classifier Group (HCG); that is, two groups of different ensemble classifiers are used in the testing and training phases. Secondly, the Adaptive Selection Strategy (ASS) is proposed, which can select the ensemble classifiers to be used in the test phase. The least squares method is used to calculate the weights of the base classifiers for the in-group classifiers and dynamically update the base classifiers according to the weights. A large number of experiments on seven datasets show that this algorithm has better performance than most existing ensemble classification algorithms in terms of its accuracy, example-based F1 value, micro-averaged F1 value, and macro-averaged F1 value.

1. Introduction

The traditional supervised learning task is based on single-label classification algorithms; that is, one data instance classifies one attribute. But, in reality, many problems have multiple attributes. Multi-label classification can mainly be applied to text classification [1], medical diagnostic classification [2], protein classification [3], music [4] or video classification [5], etc. For example, in medical diagnostic classification, a patient can have both diabetes and hypertension.
Given a d-dimensional input space X = X1×···×Xd and an output p, we have label L = { l 1 , l 2 , . . . , l p } , w h e r e   p > 1 . A multi-label instance can be defined as a pair (x, l), where x = (x1,…, xd)∈X and lL, where l is called a label set. l p equals 1 when associated with instance x, and 0 otherwise. The goal of multi-label classification (MLC) [6] is to build a prediction model h : X 2 L and provide a set of related labels for unknown instances. Each instance may have several labels associated with it from a previously defined set of labels. Therefore, for every xX, there is a binary set ( l , l ¯ ) of label space L, where l = h(x) is the set of related labels and l ¯ irrelevant labels.
Ensemble techniques are becoming increasingly important, as they have been shown to improve the accuracy of single classifiers [7]. The types of base classifiers in ensemble classifiers can be homogeneous or heterogeneous, and the heterogeneous base classifiers are constructed using different algorithms. The classical multi-label homogeneous ensemble algorithms include the Ensemble of Classifier Chains (ECC) [8], Ensemble of the Binary Relevance (EBR) [9], and Ensemble of Pruned Sets (EPS) [10]. Heterogeneous ensemble algorithms are widely used in single-label classification. Heterogeneous dynamic ensemble selection (HDES-AD) [11] based on accuracy and diversity can accurately exchange between different types of base classifiers in the ensemble to enhance its predictive performance in non-stationary environments. A heterogeneous ensemble using a variety of learning algorithms has a higher potential to generate diversified classifiers than a homogeneous ensemble. The authors of [12] use a heterogeneous ensemble to carry out unbalanced learning. Also, in multi-label classification, another study [7] uses a heterogeneous ensemble of classifiers to solve the problem of sample imbalance and solve problems related to labels. It proposes combining the most advanced multi-label methods through ensemble techniques rather than focusing on ensemble techniques in multi-label learners.
Although existing ensemble classification algorithms have been able to deal with some multi-label classification problems, most of them adopt the homogeneous ensemble method, which has a single type of base classifier and lacks diversity. At the same time, the traditional heterogeneous ensemble is aimed at the different types of base classifiers. To solve the above problems, this paper proposes an Adaptive Heterogeneous Classifier Group (AHCG) multi-label algorithm, the main contributions of which are as follows:
(1)
The concept of a Heterogeneous Classifier Group is proposed. Two different ensemble classifiers are used in the testing and training stages. It is different from the previous concept of a heterogeneous ensemble, which is no longer an ensemble classifier with different kinds of base classifiers.
(2)
The Adaptive Selection Strategy is proposed. It proposes to use the adaptive mean square error formula to calculate the sum of the error values of each group of ensemble classifiers and select the most suitable group of ensemble classifiers for testing by comparing the values.
(3)
The least squares method is used to calculate the weight of the base classifier in the Heterogeneous Classifier Group and dynamically update it according to the weight.
(4)
Experiments are carried out on seven real datasets, and the AHCG algorithm is compared with eight homogeneous ensemble methods. Good results are obtained in all four evaluation indexes.

3. AHCG Algorithm

In this section, the Adaptive Heterogeneous Classifier Group (AHCG) Algorithm is introduced in detail. The concepts of a Heterogeneous Classifier Group and an Adaptive Selection Strategy are proposed. Table 1 describes the meanings of the symbols covered in this section.
Table 1. Symbols used in the AHCG algorithm.

3.1. HCG Concept

The AHCG algorithm proposed the concept of a Heterogeneous Classifier Group (HCG). Unlike previous heterogeneous ensembles, it no longer has different base classifiers in the ensemble but involves using two different ensemble classifiers in the testing and training phases. In the test phase, the ASS in Section 2.2 is used to select the appropriate ensemble classifiers for the testing of the data instance. In the training phase, two groups of different ensemble classifiers are simultaneously constructed, and then the dynamic update strategy in Section 2.3 is used to update and replace the in-group base classifiers, respectively. Two groups of different ensemble classifiers are combined to increase the heterogeneity of the ensemble classifiers.
The HCG can combine any two ensemble classifiers and is a general concept. We use C1 and C2 to represent two base classifiers generated by different algorithms, respectively. Figure 1 shows the traditional homogeneous and heterogeneous ensemble diagrams, and Figure 2 depicts the schematic diagram of the HCG. Experiments show that the HCG concept proposed in this paper is reasonable and has better performance than heterogeneous ensemble methods.
Figure 1. Traditional ensemble diagram.
Figure 2. HCG diagram.

3.2. Adaptive Selection Strategy

The AHCG proposes an Adaptive Selection Strategy (ASS). In the testing phase, the ensemble classifier with the minimum sum of the mean square error is selected from the HCG.
Accuracy Updated Ensemble (AUE2) [44] is a classic single-label ensemble classification algorithm. The Mean Square Error (MSE) formula is used to calculate the Mean Square Error of the classifiers. In this algorithm, the calculation formulas of MSEhk and MSEr are proposed. MSEhk is to calculate the prediction error of dh with the base classifier, where f y k x is the probability that instance x belongs to class y given by base classifier Ck. MSEr is the mean square error of the random prediction classifier and is used as a reference point for the current category distribution, where p(y) is the distribution of category y. The calculation formula is as follows:
M S E h k = 1 d h x , y d h 1 f y k x 2
M S E r = y p ( y ) ( 1 p ( y ) ) 2
Finally, MSEhk and MSEr are combined to give the accuracy of the base classifier and the current class distribution. It also adds a very small positive value θ, which avoids the problem of dividing by zero. The formula is shown in (3):
w h k = 1 M S E r + M S E h k + θ
AUE2 calculates the weight of each base classifier, and the formula is modified as follows:
w k = 1 M S E r + θ
But the AHCG algorithm targets two ensemble classifier groups and needs to separately calculate the sum of the mean square error of each ensemble. Therefore, the formula is shown in (5).
A M S E h k = k = 1 K 1 M S E h k + M S E r + θ
The values of MSEhk and MSEr are both between 0 and 1, and θ is a very small constant. k K M S E h k represents the accuracy of each base classifier in the classifier group, and MSEr represents the prediction mean square error of randomly selecting a base classifier. For A M S E h k , the smaller the sum of error values, the smaller the value of A M S E h k , and the more stable the performance of the base classifier.
Figure 3 shows a schematic diagram of the ASS using the HCG constructed with C1 and C2 as examples. Here, C1 and C2 are still used to represent the two base classifiers generated by different algorithms. CR(C1) and CR(C2) represent their classification results for all class labels of the data instance, expressed as Boolean values. First, the structure calculates the AMSE value of the HCG and then determines whether the classification result of the HCG is the same as that of the data instance. If so, it selects the one with the smaller AMSE for testing. Otherwise, it determines whether the C2 result is correct. If it is correct, it selects C2 for testing; otherwise, it selects C1.
Figure 3. ASS diagram.
Algorithm 1 is the pseudocode of the AHCG algorithm proposed in this paper, which mainly describes the ASS. The details are analyzed as follows: The input is data stream D, the data block is DC, the number of ensemble classifiers is K, and M1 and M2 represent two different sets of ensemble classifiers.
Algorithm 1. AHCG
Input: D: data stream; DC: data block; K: number of integrated classifiers; M1, M2: set of ensemble classifiers
Output :   predicted   results   y ^
      1.
While D has more instances, perform
      2.
            Calculate the AMSE of M1 and M2, respectively, using by Formula (4)
      3.
            If (CR(M1) =CR(M2))      //When the classification results are the same
      4.
                     If (AMSE1<AMSE2)
      5.
                               y ^ = p r e d i c t x i , M 1
      6.
                      Else
      7.
                               y ^ = p r e d i c t x i , M 2
      8.
                      End if
      9.
            Else if (CR(M1))            //Classification results are different
      10.
                       y ^ = p r e d i c t x i , M 1
      11.
            Else
      12.
                       y ^ = p r e d i c t x i , M 2
      13.
            End if
      14.
            Train (M1, DC)                                       //see Algorithm 2
      15.
            Train (M2, DC)                                       //see Algorithm 2
      16.
End while
Algorithm 1 starts with the input of the data stream, and in lines 4–15 of the algorithm is a detailed description of the ASS. After that, the HCG is trained with a data block size of DC, respectively. The specific implementation of the training is shown in Algorithm 2 in Section 3.3.

3.3. Dynamic Update of Base Classifiers

During the training of HCG base classifiers, the in-group base classifiers need to be weighted and updated. Each group of ensemble classifiers is composed of K base classifiers, and each data block in the stream is used for training. For incoming data streams, the algorithm uses a sliding window consisting of recent instances to evaluate the stream.
In [17], the least squares formula is used to obtain weights to dynamically update the base classifier. y is the vector that represents the reality of a given data point, and w is the weight vector. The formula of the least squares method is as follows:
| | y S w | | 2 2 w m i n
In the framework of the ensemble, a combinator can be defined as the function g , where g : R K × p R p . For Formula (6), its minimized function can be expressed as follows:
g ( w 1 , w 2 , . . , w K ) = i = 1 n j = 1 p ( k = 1 K ( w k s k j i y j i ) ) 2
Taking the partial derivative of w and setting the gradient to zero, and the formula is as follows:
g ( w 1 , w 2 , , w K ) = k = 1 K w k ( i = 1 n j = 1 p S q j i S k j i ) i = 1 n j = 1 p y j i S q j i
The formula can be simplified to:
k = 1 K w k ( i = 1 n j = 1 p S q j i S k j i ) = i = 1 n j = 1 p y j i S q j i
The AHCG algorithm uses Formula (9) to calculate the weight of the base classifiers in each group of ensemble classifiers. Formula (9) is simplified to obtain
k = 1 K w k = i = 1 n j = 1 p y j i S q j i ( i = 1 n j = 1 p S q j i S k j i ) 1
Figure 4 shows the schematic diagram of the training algorithm using the HCG constructed by C1 and C2 as an example. Here, C1 and C2 are still used to represent the two base classifiers generated by different algorithms. Formula (10) is used to calculate the weights of the base classifier of the HCG. It can be used to decide whether to update and replace the device based on the criteria. For the kth base classifier, one must determine whether the number of classifiers in the classifier ensemble is full. If it is full, the weight needs to be calculated using Formula (9), the base classifier with the lowest weight discarded, and a new base classifier added. For both base classifiers, the same replacement and update principles are applied.
Figure 4. Training algorithm diagram.
Algorithm 2 is a detailed description of the dynamic weighting of the in-group base classifiers. The least squares method is used to calculate the weights of the base classifiers.
Algorithm 2. Train
Inputs: DC: data blocks; K: number of base classifiers; M: ensemble classifiers; C: base classifier
Output: ensemble classifiers
    1.
Cin ← uses DC to build a base classifier
    2.
If M has K classifiers                                                //When the classifier number is full
    3.
        Use Formula (9) to calculate the weight
    4.
        Cout← Selects the base classifier with the worst weight
    5.
        M ← M − Cout                                                //Classifiers have the worst weight dropped
    6.
End if
    7.
M ← M + Cin                                                                    //Add classifier in ensemble classifiers
    8.
Train all base classifiers except Cin
Algorithm 2 updates and replaces the base classifier according to the weights of the base classifiers within the group. First, the DC data block is used to train the base classifier, and Formula (9) is used to calculate the weight. When there are K classifiers, the base classifiers need to be replaced, the base classifier with the worst weight is selected and removed, and the newly trained base classifier is added. Otherwise, it is added directly.

3.4. Time Complexity Analysis

Here, we first computed the worst-case time complexity of Algorithm 2. Here, the time complexity of the process of building base classifiers and the process of training base classifiers other than Cin is set to O(1), assuming that the number of base classifiers in the current integrated classifier is K and the time complexity in the conditional judgment statement is O ( n 2 × p 2 ) . After that, the worst time time complexity of Algorithm 1 is calculated. The time complexity of Algorithm 1 is O ( | N | × ( K + ( n 2 × p 2 ) ) ) , where, n is the number of instances in DC, p is the number of labels, K is the number of base classifiers, and N is the total number in the dataset.
In order to more intuitively see the time efficiency differences of the algorithms, the time complexity of the benchmark algorithm was analyzed, and the analysis results are shown in Table 2. The time complexity of the EBR is the time complexity of training k groups of BR models, and the training time complexity of a BR model is O ( k × p × f ( m , | N | ) ) . The time complexity of the ECC is O ( k × p × f ( m + p , | N | ) ) . The algorithm models the correlation between labels through a chain structure, so its time complexity is slightly higher than that of the EBR. The time complexity of the EPS is O N × p × log g + N × 2 p × p × log g , which is mainly determined by constructing and traversing the PS model. The time complexity of GORT, EBRT, EaBR, EaCC, and EaPS is O ( N n n K c + n p K 2 + K 3 ) . The time complexity of EBRT is O ( 2 N 2 ) , which is mainly determined by the construction, training, and prediction time of the model.
Table 2. Time complexity analysis of the algorithm.
The time complexity of DHEML is determined by the size of the dataset. The larger the dataset, the more complex the feature relationship, and the more difficult it is to calculate the geometric weighting. AESAKNNS depends on factors such as data preprocessing, K-nearest neighbors search, adaptive and multi-label processing, and the application of Bayesian conceptual rules. It can be seen from the table that the time complexity of the AHCG algorithm is high because the time complexity of the selected classifier group is too high. Here, p represents the number of labels, m represents the number of features, N represents the size of the dataset, g represents the number of pruning times, K represents the number of base classifiers, n represents the number of instances in DC, c is the prediction time of a single classifier, and k is the number of iterations.

4. Experimental Results and Analysis

The hardware environment for this experiment is a personal computer with Intel Corei5-7200U, 2.5 GHz CPU with 12 GB of memory. The operating system is Windows 10, and the software environment is Massive Online Analysis (MOA) MOA2021 [45] combined with the multi-label method in MEKA [6].
An interleaved-test-then-train (ITTT) [40] assessment method is used to predict the AHCG. This method is mainly used for the calculation of the data stream. Different from the traditional batch evaluation process, as the amount of data increases, it is impossible for stream processing to conduct multiple training tests. To complete the training and testing in a reasonable time, repetition and column splitting must be reduced. ITTT allows for each instance to be used for testing before training so that accuracy can be updated incrementally.
This section will conduct experiments on datasets 3.1(1). The experimental contents are as follows: (1) comparative experiments of ensemble algorithms; (2) comparison with classification of algorithms dealing with conceptual drift; and (3) Friedman statistical analysis. In the experiment of this paper, the block size n is set to 500, the number of base classifiers k is set to 10, and the rest of the parameters are set to conventional settings.

4.1. Experimental Settings

(1)
Datasets
This experiment will introduce datasets from the aspects of the research field, number of instances (n), number of attributes (m), number of labels (L), label card number (LC(D)), and label density (LD(D)), as shown in Table 3, where the label card and label density are shown in Formulas (11) and (12).
L C ( D ) = 1 n i = 1 n y i
Table 3. Datasets.
L D ( D ) = L C ( D ) L = 1 L n i = 1 n y i
(2)
Benchmark algorithm
To verify the validity of the AHCG algorithm, the EPS and ECC are combined and named by their method, denoted as AHCG1. Similarly, AHCG2 is composed of the EPS and EBR, and AHCG3 is composed of the EBR and ECC. This article sets up eight baseline algorithms, as shown below.
  • EBR [9]: An ensemble version of the BR model: each instance of BR is randomly generated, regardless of the relationship between labels;
  • ECC [9]: An ensemble version of CCs, where the chain order of each CC is randomly generated; it takes into account global label dependencies;
  • EPS [10]: An improved integrated version of LP pays attention to the most important relationships of labels by pruning the set of labels that appear less often;
  • GORT [40]: Algorithm using iSOUP regression tree;
  • EBRT [46]: A regression tree method for multi-label classification through multi-objective regression in stream setting;
  • EaBR, EaCC, and EaPS [40]: Use ADWIN as their concept drift detector;
  • MLSAMPkNN [20]: The algorithm uses self-adjusting memory penalty kNN;
  • AESAKNNS [21]: The algorithm uses ensemble kNN;
  • DHEML [43]: This algorithm is a heterogeneous ensemble algorithm and uses the Adaptive Selection Strategy.
(3)
Evaluation Metrics
In multi-label classification, it is not appropriate to use some evaluation indexes as evaluation indexes. Many evaluation indexes are designed for multi-label classification. Accuracy, example-based F1, micro-averaged F1, and macro-averaged F1 are used for assessment in this paper. Table 4 explains the mathematical symbols in the formula.
Table 4. Mathematical symbols for evaluation metrics.
In Formulas (13)–(16), Acc(h) represents, accuracy and F β ( h ) represents the F value based on the data instance. β expresses an equilibrium factor, usually equal to 1. Ri and Pi are the accuracy and recall rates of the ith label.
A c c ( h ) = 1 p i = 1 p | Y i h ( x i ) | | Y i h ( x i ) |
F β ( h ) = ( 1 + β 2 ) · P r e ( h ) · R e ( h ) β 2 · P r e ( h ) + R e ( h )
M i c r o _ F 1 = 2 × M i c r o _ P r e c i s i o n × M i c r o _ R e c a l l M i c r o _ P r e c i s i o n + M i c r o _ R e c a l l
M a c r o _ F 1 = 1 p i = 1 p 2 × R i × P i R i × P i

4.2. Experimental Analysis

(1)
Comparative experiments of ensemble algorithms
The algorithms of AHCG1, AHCG2, and AHCG3 are compared with the EBR, ECC, EPS, and DHEML on 12 datasets. Detailed experimental results include the accuracy, example-based F1, micro-averaged F1, macro-averaged F1, and time efficiency of each algorithm. As shown in Table 5, the best results are in bold.
Table 5. The experimental results of the integrated algorithm accuracy, case-based F1, micro-averaged F1, and macro-averaged F1 are compared.
As can be seen from the table, the algorithms using the AHCG ranked better on average than the algorithms with the EBR, ECC, and EPS classifiers for the four evaluation metrics. The AHCG3 algorithm received the best ranking in terms of its accuracy, example-based F1, and macro-averaged F1, and the micro-averaged F1 received the second place. The overall performances of the AHCG2 and AHCG3 algorithms are slightly better than that of DHEML, which can prove the effectiveness of the proposed HCG. Among them, the accuracy of the AHCG algorithm is significantly higher than the ensemble classifier. For example, in the dataset Medical, the AHCG3 algorithm outperformed the EBR algorithm by 9.3% and outperformed the ECC algorithm by 9.7%. In the dataset Ohsumed, the AHCG3 algorithm was 8.4% higher than the EBR algorithm and 9.5% higher than the ECC algorithm.
In general, algorithms using the AHCG have better performance than homogeneous ensemble classifier algorithms. Because algorithms can use the ASS for testing, they can select group classifiers that are likely to achieve better performance for testing. But in some cases, the prediction result of the AHCG algorithm is not as good as that of the ensemble. In Enron, for example, the EBR algorithm is more accurate than AHCG2 or AHCG3. This is because in the selection of test methods, the judgment criterion is first selected by whether the two sets of ensemble classifiers have the same test results for all labels of instances. For AHCG3 method, if both the HCG can correctly test or incorrectly test instances, and the AMSE value of the ECC of one HCG is less than or equal to the AMSE value of the EBR of the HCG, the ECC group classifier is selected by default for testing. At this time, ECC group classifiers that can be correctly classified may be selected to reduce the number of results. The running times of the AHCG2 and AHCG3 are less than that of the DHEML algorithm (As shown in Table 6, the best results are in bold).
Table 6. Experimental results for running times.
In terms of time efficiency, as shown by the time complexity of the AHCG algorithm, the algorithm’s efficiency is mainly determined by the number of instances, the number of base classifiers, the number of labels, and the number of instances in the data block. In small datasets, the AHCG algorithm is faster than the integrated algorithm except for the EPS algorithm. The EPS algorithm saves time because it can prune the infrequent label sets to focus on the most important label relationships. In the dataset Medical, the AHCG3 algorithm saves 289,288 ms over the EBR algorithm and 295,062 ms over the ECC algorithm. However, in the face of large datasets, for the number of instances, the AHCG is more time efficient and takes more time than the algorithm that uses only homogeneous classifiers.
(2)
Comparison with the classification of algorithms dealing with concept drift
The algorithms of AHCG1, AHCG2, and AHCG3 are compared with GORT, EBRT, EaBR, EaCC, EaPS, MLSAMPKNN, AESAKNNS, and DHEML on 12 datasets. Both the AHCG and contrast algorithms are designed to deal with the details of concept drift. Detailed experimental results include the accuracy, example-based F1, micro-averaged F1, and macro-averaged F1 of each algorithm. As shown in Table 7, the best results are in bold.
Table 7. Experimental results of accuracy, example-based F1, micro-averaged F1, and macro-averaged F1.
The AHCG algorithm obtained better results than other algorithms in the evaluation metrics of instance-based F1, macro-averaged F1, and micro-averaged F1, ranking in the top three on average among all algorithms. Moreover, the algorithm of AHCG3 ranked first in instance-based F1 and macro-averaged F1, and it ranked third in micro-averaged F1. The algorithm of AHCG2 ranked first in micro-averaged F1, and the algorithm of AHCG3 ranked second in accuracy.
Compared with EaBR, EaCC, and EaPS algorithms with a window mechanism, AHCG algorithms achieve better experimental results. The accuracy of comparison algorithm EaCC on the datasets Slashdot, Reuters, and Ohsumed are not very optimistic. The accuracies of AHCG1 and AHCG3 are better than that of EaCC in these three datasets. For example, in the dataset Slashdot, the accuracy value of the AHCG1 algorithm is as high as 5.5% compared with EaCC, and the accuracy value of AHCG3 is as high as 5.8% compared with EaCC. In the dataset Reuters, the accuracy value of the AHCG1 algorithm is as high as 7.4% compared with EaCC, and the accuracy value of the AHCG3 algorithm is as high as 13.2% compared with EaCC. In the dataset Ohsumed, the accuracy value of the AHCG1 algorithm is up to 20% higher than that of EaCC, and the accuracy value of the AHCG3 algorithm is up to 27.1% higher than that of EaCC. The accuracy, case-based F1, micro-averaged F1, and macro-averaged F1 results of the AHCG algorithm are all higher than those of DHEML algorithm. As can be seen from Table 7, the AHCG algorithm is superior to the DHEML algorithm in terms of the example-based F1, micro-averaged F1, and macro-averaged F1. The experimental results show that the AHCG algorithm is superior to the heterogeneous ensemble method in dealing with concept drift problems.
(3)
Friedman statistical analysis
To detect the statistical significance between algorithms, Friedman statistical analysis was adopted in the result analysis process [46]. This section studies the saliency problem between the algorithm of the AHCG and the comparison algorithms involved. The results of this test can be seen in a critical graph, where each algorithm is sorted according to the average rank, and the algorithms within the critical distance are connected by a line. In this experiment, the better model is on the left side of the critical graph. The calculation formula of critical distance is shown in Formula (17).
C D = q α , k k ( k + 1 ) 6 N
where the significance level α = 0.05, k represents the number of algorithms, and N represents the number of datasets; in this experiment, k = 14, and N = 12. Calculated using the formula, CD = 5.26. Figure 4 shows the CD plots on the assessment indicator accuracy, example-based F1, micro-averaged F1, and macro-averaged F1, where the average ranking of each algorithm is marked along the axis.
Figure 5 shows that the AHCG algorithms of AHCG1, AHCG2, and AHCG3 are superior to other comparison algorithms in terms of accuracy, example-based F1, micro-averaged F1, and macro-averaged F1. Among them, the algorithm of AHCG1 is ranked first in accuracy, example-based F1, micro-averaged F1, and macro-averaged F1. AHCG2 is ranked first in accuracy and micro-averaged F1. The superiority of HCG algorithms shows that using different classifier groups can improve the heterogeneity among the classifiers and improve the classification performance.
Figure 5. Algorithm critical distance graph. (The blue line represents a significant difference, and algorithms outside the region represent a significant difference from controls).

5. Summary

For the diversity of ensemble classifiers, this paper proposes the AHCG algorithm, which uses the concept of an HCG. An Adaptive Selection Strategy is proposed to select the appropriate ensemble classifier for testing according to the AMSE. The least squares method is used to calculate the weights of the base classifiers in the group, and then the weights are updated and replaced. According to the experiments, the AHCG algorithm obtains better values in terms of the accuracy, instance-based F1, micro-averaged F1, and macro-averaged F1 compared with other algorithms for homogeneous ensemble classifiers and ranks high overall. It is a general algorithmic structure that can be applied to most algorithms. The algorithm of the AHCG is executed serially in terms of in-group classifier generation and update replacement strategies, which will improve the time efficiency of the algorithm in large datasets. Therefore, the AHCG algorithm can classify large text data with multiple labels. In future work, our research group will focus on the time efficiency of the algorithm so that it can be applied to image classification. Under the condition that the evaluation metrics are guaranteed to be stable, the HCG can run in parallel in the intra-group classifier generation and update phases to improve the time efficiency of the algorithm. Meanwhile, using different datasets for evaluation and further development, the method of dynamically updating the base classifiers is added to reduce the impact of their number on the experiment.

Author Contributions

Conceptualization, M.H. and S.Y.; methodology, M.H.; software, H.W.; validation, M.H., S.Y. and H.W.; formal analysis, J.D.; investigation, J.D.; resources, H.W.; data curation, S.Y.; writing—original draft preparation, M.H.; writing—review and editing, S.Y.; visualization, M.H.; supervision, J.D.; project administration, H.W.; funding acquisition, M.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Ningxia Natural Science Foundation Project (2022AAC03279), the National Nature Science Foundation of China (62062004), and the Graduate Innovation Project of North Minzu University (YCX24371).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Minaee, S.; Kalchbrenner, N.; Cambria, E.; Nikzad, N.; Chenaghlu, M.; Gao, J. Deep learning--based text classification: A comprehensive review. ACM Comput. Surv. (CSUR) 2021, 54, 1–40. [Google Scholar] [CrossRef]
  2. Liu, Q.; She, X.; Xia, Q. AI based diagnostics product design for osteosarcoma cells microscopy imaging of bone cancer patients using CA-MobileNet V3. J. Bone Oncol. 2024, 49, 100644. [Google Scholar] [CrossRef] [PubMed]
  3. Rana, P.; Meijering, E.; Sowmya, A.; Song, Y. Multi-Label Classification Based On Subcellular Region-Guided Feature Description for Protein Localisation. In Proceedings of the 18th International Symposium on Biomedical Imaging, Nice, France, 13–16 April 2021; pp. 1929–1933. [Google Scholar]
  4. Ding, Y.; Zhang, H.; Huang, W.; Zhou, X.; Shi, Z. Efficient Music Genre Recognition Using ECAS-CNN: A Novel Channel-Aware Neural Network Architecture. Sensors 2024, 24, 7021. [Google Scholar] [CrossRef] [PubMed]
  5. Xie, F.; Pan, X.; Yang, T.; Ernewein, B.; Li, M.; Robinson, D. A novel computer vision and point cloud-based approach for accurate structural analysis of a tall irregular timber structure. Structures 2024, 70, 107697. [Google Scholar] [CrossRef]
  6. Read, J.; Reutemann, P.; Pfahringer, B.; Holmes, G. Meka: A multi-label/multi-target extension to weka. J. Mach. Learn. Res. 2016, 17, 1–5. [Google Scholar]
  7. Osojnik, A.; Panov, P.; Džeroski, S. Multi-label classification via multi-target regression on data streams. Mach. Learn. 2017, 106, 745–770. [Google Scholar] [CrossRef]
  8. Duan, J.; Gu, Y.; Yu, H.; Yang, X.; Gao, S. ECC++: An algorithm family based on ensemble of classifier chains for classifying imbalanced multi-label data. Expert Syst. Appl. 2024, 236, 121366. [Google Scholar] [CrossRef]
  9. Mauri, L.; Damiani, E. Hardening behavioral classifiers against polymorphic malware: An ensemble approach based on minority report. Inf. Sci. 2025, 689, 121499. [Google Scholar] [CrossRef]
  10. Ganaie, M.; Hu, M.; Malik, A.; Tanveer, M.; Suganthan, P. Ensemble deep learning: A review. Eng. Appl. Artif. Intell. 2022, 115, 105151. [Google Scholar] [CrossRef]
  11. Alzubi, O.A.; Alzubi, J.A.; Alweshah, M.; Qiqieh, I.; Al-Shami, S.; Ramachandran, M. An optimal pruning algorithm of classifier ensembles: Dynamic programming approach. Neural Comput. Appl. 2020, 32, 16091–16107. [Google Scholar] [CrossRef]
  12. Tinofirei, M.; Fulufhelo, V.; Nelwamond, O.; Khmaies, O. An Adaptive Heterogeneous Online Learning Ensemble Classifier for Nonstationary Environments. Comput. Intell. Neurosci. 2021, 2021, 6669706. [Google Scholar]
  13. Hg, Z.; Altnay, H. Imbalance Learning Using Heterogeneous Ensembles. Expert Syst. Appl. 2019, 142, 113005. [Google Scholar]
  14. Read, J.; Martino, L.; Olmos, P.M.; Luengo, D. Scalable multi-output label prediction: From classifier chains to classifier trellises. Pattern Recognit. 2015, 48, 2096–2109. [Google Scholar] [CrossRef]
  15. Wang, R.; Kwong, S.; Wang, X.; Jia, Y. Active k-labelsets ensemble for multi-label classification. Pattern Recognit. 2021, 109, 107583. [Google Scholar] [CrossRef]
  16. Zhang, J.; Bian, Z.; Wang, S. Style linear k-nearest neighbor classification method. Appl. Soft Comput. 2024, 150, 111011. [Google Scholar] [CrossRef]
  17. Xiao, N.; Dai, S. A network big data classification method based on decision tree algorithm. Int. J. Reason.-Based Intell. Syst. 2024, 16, 66–73. [Google Scholar] [CrossRef]
  18. Zhang, Z.; Wang, Z.; Liu, H.; Sun, Y. Ensemble Multi-label Classification Algorithm Based on Tree-Bayesian Network. Comput. Sci. 2018, 45, 195–201. [Google Scholar]
  19. Roy, A.; Chakraborty, S. Support vector machine in structural reliability analysis: A review. Reliab. Eng. Syst. Saf. 2023, 233, 109126. [Google Scholar] [CrossRef]
  20. Kavitha, P.M.; Muruganantham, B. Mal_CNN: An Enhancement for Malicious Image Classification Based on Neural Network. Cybern. Syst. 2024, 55, 739–752. [Google Scholar] [CrossRef]
  21. Roseberry, M.; Krawczyk, B.; Cano, A. Multi-label punitive kNN with self-adjusting memory for drifting data streams. ACM Trans. Knowl. Discov. Data 2019, 13, 1–31. [Google Scholar] [CrossRef]
  22. Alberghini, G.; Junior, S.B.; Cano, A. Adaptive ensemble of self-adjusting nearest neighbor subspaces for multi-label drifting data streams. Neurocomputing 2022, 481, 228–248. [Google Scholar] [CrossRef]
  23. Rastin, N.; Jahromi, M.Z.; Taheri, M. Feature weighting to tackle label dependencies in multi-label stacking nearest neighbor. Appl. Intell. 2021, 51, 5200–5218. [Google Scholar] [CrossRef]
  24. Liu, C.; Cao, L. A coupled k-nearest neighbor algorithm for multi-label classification. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Ho Chi Minh City, Vietnam, 19–22 May 2015; pp. 176–187. [Google Scholar]
  25. Luo, F.; Guo, W.; Yu, Y.; Chen, G. A multi-label classification algorithm based on kernel extreme learning machine. Neurocomputing 2017, 260, 313–320. [Google Scholar] [CrossRef]
  26. Rezaei, M.; Eftekhari, M.; Movahed, F.S. ML-CK-ELM: An efficient multi-layer extreme learning machine using combined kernels for multi-label classification. Sci. Iran. 2020, 27, 3005–3018. [Google Scholar] [CrossRef]
  27. Bezembinder, E.M.; Wismans LJ, J.; Berkum EC, V. Constructing multi-labelled decision trees for junction design using the predicted probabilities. In Proceedings of the 20th IEEE International Conference on Intelligent Transportation Systems, Yokohama, Japan, 16–19 October 2017; pp. 1–7. [Google Scholar]
  28. Majzoubi, M.; Choromanska, A. Ldsm: Logarithm-depth streaming multi-label decision trees. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, Online, 26–28 August 2020; pp. 4247–4257. [Google Scholar]
  29. Moral-García, S.; Mantas, C.J.; Castellano, J.G.; Abellán, J. Ensemble of classifier chains and credal C4.5 for solving multi-label classification. Prog. Artif. Intell. 2019, 8, 195–213. [Google Scholar] [CrossRef]
  30. Lotf, H.; Ramdani, M. Multi-Label Classification: A Novel approach using decision trees for learning Label-relations and preventing cyclical dependencies: Relations Recognition and Removing Cycles (3RC). In Proceedings of the 13th International Conference on Intelligent Systems: Theories and Applications, Rabat, Morocco, 23–24 September 2020; pp. 1–6. [Google Scholar]
  31. Nan, G.; Li, Q.; Dou, R.; Liu, J. Local positive and negative correlation-based k -labelsets for multi-label classification. Neurocomputing 2018, 318, 90–101. [Google Scholar] [CrossRef]
  32. Moyano, J.M.; Ventura, S. Auto-adaptive grammar-guided genetic programming algorithm to build ensembles of multi-label classifiers. Inf. Fusion 2022, 78, 1–19. [Google Scholar] [CrossRef]
  33. Mahdavi-Shahri, A.; Houshmand, M.; Yaghoobi, M.; Jalali, M. Applying an ensemble learning method for improving multi-label classification performance. In Proceedings of the 2nd International Conference of Signal Processing and Intelligent Systems, Tehran, Iran, 14–15 December 2016; pp. 1–62018. [Google Scholar]
  34. Moyano, J.M.; Gibaja, E.L.; Cios, K.J.; Ventura, S. Generating ensembles of multi-label classifiers using cooperative coevolutionary algorithms. In Proceedings of the 24th European Conference on Artificial Intelligence, Santiago de Compostela, Spain, 29 August–8 September 2020; pp. 1379–1386. [Google Scholar]
  35. Zhang, L.; Shah, S.K.; Kakadiaris, I.A. Hierarchical multi-label classification using fully associative ensemble learning. Pattern Recognit. 2017, 70, 89–103. [Google Scholar] [CrossRef]
  36. Wei, X.; Yu, Z.; Zhang, C.; Hu, Q. Ensemble of label specific features for multi-label classification. In Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, San Diego, CA, USA, 23–27 July 2018; pp. 1–6. [Google Scholar]
  37. Cheng, K.; Gao, S.; Dong, W.; Yang, X.; Wang, Q.; Yu, H. Boosting label weighted extreme learning machine for classifying multi-label imbalanced data. Neurocomputing 2020, 403, 360–370. [Google Scholar] [CrossRef]
  38. Li, K.; Kong, X.; Lu, Z.; Wenyin, L.; Yin, J. Boosting weighted ELM for imbalanced learning. Neurocomputing 2014, 128, 15–21. [Google Scholar] [CrossRef]
  39. Büyükçakir, A.; Bonab, H.; Can, F. A novel online stacked ensemble for multi-label stream classification. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 22–26 October 2018; pp. 1063–1072. [Google Scholar]
  40. Li, D.; Ji, L.; Liu, S. Research on sentiment classification method based on ensemble learning of heterogeneous classifiers. Eng. J. Wuhan Univ. 2021, 54, 975–982. [Google Scholar]
  41. Wu, D.; Han, B. Network Intrusion Detection Method Based on Optimization Heterogeneous Ensemble Learning of Glowworms. Fire Control Command Control 2021, 46, 26–31. [Google Scholar]
  42. Ding, J.; Wu, H.; Han, M. Multi-label data stream classification algorithm based on dynamic heterogeneous ensemble. Comput. Eng. Des. 2023, 44, 3031–3038. [Google Scholar]
  43. Singh, I.P.; Ghorbel, E.; Oyedotun, O.; Aouada, D. Multi-label image classification using adaptive graph convolutional networks: From a single domain to multiple domains. Comput. Vis. Image Underst. 2024, 247, 104062. [Google Scholar] [CrossRef]
  44. Brzezinski, D.; Stefanowski, J. Reacting to different types of concept drift: The accuracy updated ensemble algorithm. IEEE Trans. Neural Netw. Learn. Syst. 2013, 25, 81–94. [Google Scholar] [CrossRef] [PubMed]
  45. Bifet, A.; Holmes, G.; Pfahringer, B.; Read, J.; Kranen, P.; Kremer, H.; Jansen, T. MOA: A realtime analytics open source framework. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2011, Athens, Greece, 5–9 September 2011; Proceedings, Part III 22; Springer: Berlin/Heidelberg, Germany, 2011; pp. 617–620. [Google Scholar]
  46. Liu, J.; Xu, Y. T-Friedman test: A new statistical test for multiple comparison with an adjustable conservativeness measure. Int. J. Comput. Intell. Syst. 2022, 15, 29. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.