Func-Bagging: An Ensemble Learning Strategy for Improving the Performance of Heterogeneous Anomaly Detection Models

Qiu, Ruinan; Yin, Yongfeng; Su, Qingran; Guan, Tianyi

doi:10.3390/app15020905

Open AccessArticle

Func-Bagging: An Ensemble Learning Strategy for Improving the Performance of Heterogeneous Anomaly Detection Models

by

Ruinan Qiu

¹

,

Yongfeng Yin

^1,*

,

Qingran Su

²

and

Tianyi Guan

¹

School of Software, Beihang University, Haidian, Beijing 100191, China

²

School of Computer Science and Engineering, Beihang University, Haidian, Beijing 100191, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(2), 905; https://doi.org/10.3390/app15020905

Submission received: 15 December 2024 / Revised: 14 January 2025 / Accepted: 16 January 2025 / Published: 17 January 2025

Download

Browse Figures

Versions Notes

Abstract

Featured Application

This work presents an adaptive weight distribution strategy for bagging-based heterogeneous ensemble learning, with a particular focus on improving classification performance in imbalanced datasets. The proposed method can be applied to tasks such as anomaly detection, where class imbalance is common, and it offers a robust solution for enhancing model accuracy and stability in real-world applications, particularly in domains like fall detection, fault diagnosis, and action identification.

Abstract

In the field of ensemble learning, bagging and stacking are two widely used ensemble strategies. Bagging enhances model robustness through repeated sampling and weighted averaging of homogeneous classifiers, while stacking improves classification performance by integrating multiple models using meta-learning strategies, taking advantage of the diversity of heterogeneous classifiers. However, the fixed weight distribution strategy in traditional bagging methods often has limitations when handling complex or imbalanced datasets. This paper combines the concept of heterogeneous classifier integration in stacking with the weighted averaging strategy of bagging, proposing a new adaptive weight distribution approach to enhance bagging’s performance in heterogeneous ensemble settings. Specifically, we propose three weight generation functions with “high at both ends, low in the middle” curve shapes and demonstrate the superiority of this strategy over fixed weight methods on two datasets. Additionally, we design a specialized neural network, and by training it adequately, validate the rationality of the proposed adaptive weight distribution strategy, further improving the model’s robustness. The above methods are collectively called func-bagging. Experimental results show that func-bagging has an average 1.810% improvement in extreme performance compared to the base classifier, and is superior to stacking and bagging methods. It also has better dataset adaptability and interpretability than stacking and bagging. Therefore, func-bagging is particularly effective in scenarios with class imbalance and is applicable to classification tasks with imbalanced classes, such as anomaly detection.

Keywords:

ensemble learning; bagging; stacking; adaptive weight generation; anomaly detection

1. Introduction

1.1. Background

A classifier is a system that takes instances from a dataset and assigns a category or label to each instance [1]. To accomplish this task, a classifier needs to possess certain knowledge. Classifiers can be created through various learning methods, such as deduction, analogy, or memorization, but the most common approach is to learn knowledge from a set of pre-classified instances, a method known as supervised learning. Much of machine learning research focuses on developing automated approaches for classification tasks. Despite the proposal of various models, such as artificial neural networks [2], decision trees [3], inductive logic programming [4], and Bayesian learning algorithms [5], building a perfect classifier for any specific task remains a challenging endeavor. Furthermore, no single method can be considered superior across all datasets. As a result, combining different classification models has become a viable choice for achieving higher accuracy, known as ensemble learning. The core idea of ensemble learning is to create a set of classifiers and combine their outputs such that the ensemble’s performance surpasses that of any individual classifier [6]. To achieve this, it is necessary to ensure that (1) each classifier is both accurate and diverse, and (2) the combination of outputs amplifies correct decisions and mitigates erroneous ones. Research in ensemble learning often focuses on generating multiple classifiers using a single learning algorithm and combining their outputs using mathematical functions. Stacking, on the other hand, generates classifier members through various learning algorithms and subsequently uses another algorithm to learn how to combine their outputs [7].

Currently, commonly used ensemble methods include boosting, bagging, and stacking, each with its unique strategies and applicable scenarios. Boosting is an iterative ensemble method aimed at reducing the bias between the base classifier’s output and the true labels, gradually bringing the output closer to the true labels. Typical boosting methods, such as AdaBoost and Gradient Boosting, have demonstrated excellent performance in various classification and regression tasks. Bagging is an ensemble method that generates homogeneous classifiers through repeated sampling and parallel training, reducing the model’s variance and enhancing robustness. A representative algorithm of bagging is Random Forest, which trains multiple decision trees on different subsets and makes predictions through averaging or voting, effectively reducing overfitting risks. Stacking, another ensemble method, focuses on improving performance through the combination of heterogeneous classifiers. It generates various base classifiers in the first layer and uses a meta-learner in the second layer to learn how to combine their outputs for better prediction accuracy [8].

This paper focuses on bagging and stacking. While both bagging and stacking have important applications in ensemble learning, they each have notable limitations. In bagging, the traditional fixed-weight combination strategy often fails to achieve satisfactory results when dealing with class imbalance problems. Although stacking can integrate heterogeneous classifiers, it lacks a directly effective and interpretable weighting strategy for combining their outputs. To address these limitations, this paper proposes a novel approach that combines the heterogeneous classifier integration idea of stacking with the weighted averaging strategy of bagging, using an adaptive weight distribution strategy to improve the performance of bagging. We design three weight generation functions with “high at both ends, low in the middle” curve shapes and experimentally demonstrate the advantages of this strategy in heterogeneous classifier integration. Additionally, we construct a specialized neural network and loss function to further validate the rationale behind the weight distribution strategy, achieving excellent performance on imbalanced datasets. The innovation of this paper lies in the introduction of an adaptive weighting strategy for heterogeneous classifier integration, overcoming the limitations of traditional bagging and stacking, and showing significant advantages, particularly in class-imbalanced tasks.

1.2. Related Work

Junlang Wang et al. proposed a method for multi-component fault diagnosis in hydraulic systems. The method first uses Pearson correlation coefficients and Neighborhood Component Analysis (NCA) for data channel selection and feature dimensionality reduction, to reduce data redundancy and improve computational efficiency. Two different types of Deep Neural Networks (DNNs) are then constructed as base learners: Stacked Sparse Autoencoder and Deep Hierarchical Extreme Learning Machine (D-ELM). A bagging voting ensemble strategy is used to combine these DNN base learners to enhance the robustness and accuracy of the diagnostic system. This paper utilizes the original bagging voting ensemble strategy, where multiple learners vote for a sample, and the class with the most votes is assigned as the predicted class for the sample [9].

Yufei Xia et al. introduced a trainable combiner to optimize the ensemble results, training an XGBoost model as a meta-classifier to learn how to generate the final prediction based on the outputs of base learners. However, this meta-classifier is prediction-oriented and lacks interpretability in terms of the weight distribution of each base learner [10].

V. Sobanadevi and G. Ravi proposed a heterogeneous ensemble-based model (HBSE) for credit card fraud detection, training a logistic regression model as a Level-1 meta-learner, using the outputs of Level-0 base learners as input, and producing the final prediction. Similar to the previous paper, this approach also lacks interpretability in the weight distribution of each base learner [11].

M. Paz Sesmero et al. provided a detailed discussion of various variants of the stacking method, which was first proposed by Wolpert in 1992 [7]. For instance, Skalak introduced the use of instance-based learning classifiers as Level-0 classifiers (base learners). In this variant, base learners are no longer traditional machine learning models, but instead classify by storing a few prototypes for each class [12]. Fan et al. proposed a method to evaluate the overall accuracy of a stacking ensemble model using conflict-based accuracy estimates, employing two tree-based classifiers and one rule-based classifier as base learners. For the Level-1 meta-learner, they used a non-pruned decision tree instead of a traditional decision tree [13]. Merz proposed another variant of stacking, called Stacking with Correspondence Analysis and Nearest Neighbor (SCANN). In this approach, correspondence analysis is used to detect correlations between base learners, effectively removing redundant information. After removing the dependency between base learners, a nearest neighbor method is used as the meta-learner in the new feature space. SCANN improves model diversity and handles heterogeneous base learners effectively [14]. Ting and Witten proposed several important innovations, including using class probabilities rather than single class predictions as the output of Level-0 classifiers and using Multi-response Linear Regression (MLR) as the Level-1 classifier. While class probabilities were used as inputs to Level-1, the output was still the final prediction, which remained result-oriented and lacked interpretability in the weight distribution of base learners [15].

Hossein Ghaderi Zefrehi et al. addressed the class imbalance problem in classification tasks, proposing a solution using heterogeneous ensembles. In binary classification, class imbalance typically refers to situations where one class has far more samples than the other, leading to poor performance on the minority class. The paper mentions that ensemble classifiers can alleviate this issue by using different sampling methods, especially by applying different balanced datasets to each member of the ensemble, with the datasets generated via random undersampling or oversampling [16].

Qiang Li Zhao et al. explored classifier ensemble methods in incremental learning, proposing a new bagging-based incremental learning method called Bagging++. In this method, incremental base learners are trained on new data and integrated into the original ensemble model after training. This paper also uses the simple voting ensemble strategy [17].

Kuo-Wei Hsu et al. theoretically demonstrated that the greater the divergence (heterogeneity) between base classifier algorithms in an ensemble, the stronger the resulting model and the more robust its performance. They designed a bagging framework based on heterogeneous base classifiers, but the final fusion strategy for combining the results of base classifiers was not explored, and the original bagging voting strategy was still used [18].

Nguyen et al. proposed an ensemble selection method based on classifier prediction confidence, which takes into account both the classifier’s prediction confidence on test samples and its overall reliability. In this method, a classifier’s prediction result is selected only if its confidence exceeds its reliability threshold. By optimizing the empirical 0-1 loss on the training set, this method effectively combines the characteristics of static and dynamic ensemble selection, and experimental results on 62 datasets show that this method outperforms various traditional ensemble strategies [19].

The key influences of related work above are shown in Table 1. In summary, existing research based on stacking generally produces final predictions in the meta-classifier of the last layer, with a strong focus on result-oriented output and lacking interpretability in the weight distribution of individual base classifiers. In contrast, bagging-based research mainly focuses on improving the performance and robustness of the model by introducing heterogeneous base classifiers, but still retains the original bagging method’s strategy for combining base learner results, which typically involves voting ensemble or fixed-weight averaging. Some ensemble methods based on classifier prediction confidence attempt to find suitable thresholds to determine which base learners can participate in the final decision, but ultimately still use simple voting ensemble or fixed-weight averaging strategies.

Based on the related work above, the contributions of this paper are as follows:

For the final result fusion strategy of bagging, we propose an intuitive anomaly detection weight allocation strategy: when merging results, if the predicted score (predict probability, also known as prediction confidence) given by a classifier is closer to 0 or 1, it indicates that the classifier has higher confidence in its judgment, and therefore should be assigned higher weight.
Based on the above weight allocation strategy, three weight generation functions with a “high at both ends, low in the middle” curve are designed, and better results than fixed weight allocation strategies are obtained on two datasets.
To explore the relationship between the outputs of base learners and weight allocation, a special neural network and loss function are designed. The network’s input is the output of the base learners, and the output is the weight to be allocated. After sufficient training, the function curve drawn by this network is consistent with the proposed weight allocation strategy, proving the reasonableness of the strategy. Since this neural network is trained in a data-driven manner, it generalizes better compared to the weight generation functions with fixed expressions mentioned earlier.

2. Materials and Methods

2.1. Problem Definition

In supervised binary classification tasks, the dataset typically consists of a collection of labeled data pairs, denoted as

D = {(X_{1}, y_{1}), (X_{2}, y_{2}), . . ., (X_{n}, y_{n})}

, where n is the number of samples,

X_{i} = (x_{1}, x_{2}, . . ., x_{m})

represents the feature vector of the i-th sample with m features, and

y_{i} \in {0, 1}

indicates the label of the current sample.

In ensemble learning, the final result of the ensemble model is usually obtained by combining the outputs of multiple base learners. Let

H = {h_{1}, h_{2}, . . ., h_{J}}

denote the set of well-trained base learners, where J is the number of base learners. Each base learner can be considered as a function, expressed as

h_{j} (X) = ({\hat{y}}_{j}, p r o b_{j})

.

In the resulting fusion of the bagging ensemble methods, two common strategies are voting and averaging. The voting method is expressed as

H (X) = arg max_{c \in C} \sum_{j = 1}^{J} w_{j} \cdot I (h_{j} (x) = c)

(1)

where

w_{j}

represents the weight of the j-th learner, and

I (\cdot)

is the indicator function, which takes the value 1 when

h_{j} (x) = c

and 0 otherwise.

The weighted averaging method is expressed as

H (X) = \sum_{j = 1}^{J} w_{j} \cdot h_{j} (x)

(2)

where

w_{j}

is the weight of the j-th learner, and

w_{j} \in [0, 1]

. This paper mainly investigates the relationship between

h_{j} (X)

and

w_{j}

in the weighted averaging method, which can be considered as a function, having the functional form

F (h_{j} (X)) = w_{j}

(3)

Thus, we can rewrite Equation (2) as

H (X) = \sum_{j = 1}^{J} F (h_{j} (X)) \cdot h_{j} (x)

(4)

By extending the range of

F (x)

from

[0, 1]

to the entire real number domain R, we rewrite Equation (4) as

H (X) = \frac{\sum_{j = 1}^{J} F (h_{j} (X)) \cdot h_{j} (x)}{\sum_{j = 1}^{J} F (h_{j} (X))}

(5)

Nguyen et al. point out in their paper that the higher the classifier’s prediction confidence, the higher its weight in the ensemble should be [19,20,21]. Specifically, in anomaly detection or binary classification tasks, when a classifier’s predicted score for a certain class (often the anomaly class) is closer to 0 or 1, it indicates that the classifier has higher confidence in this decision and should therefore be assigned a higher weight. Visualizing this, the function graph of

F (x)

should exhibit a “high at both ends, low in the middle” pattern, with

x = 0.5

being the axis of symmetry. The mathematical definition of this type of function is as follows:

If the range of $f (x)$ is $[0, + \infty)$ , then

$lim_{x \to 0^{+}} f (x) = lim_{x \to 1^{-}} f (x) = + \infty .$

There exists a $c \in (0, 1)$ such that the function attains its minimum value at c, i.e.,

$f (c) = min_{x \in (0, 1)} f (x) .$

The function exhibits the property of having high values at both ends and low values in the middle.
If the range of $f (x)$ is $(- \infty, 0]$ , then

$lim_{x \to 0^{+}} f (x) = lim_{x \to 1^{-}} f (x) = - \infty .$

There exists a $c \in (0, 1)$ such that the function attains its minimum value at c, i.e.,

$f (c) = max_{x \in (0, 1)} f (x) .$

The function exhibits the property of having low values at both ends and high values in the middle.

2.2. Three “High at Both Ends, Low in the Middle” Functions

Based on the problem definition and analysis, three functions that satisfy the “high at both ends, low in the middle” property and are symmetric around

x = 0.5

are proposed. The first function is the tangent function, transformed by shifting and taking the absolute value (referred to as the tan function hereafter). Its functional expression is

|tan ((x - 0.5) π)|

(6)

The graph of this function is shown in Figure 1.

The second function is the secant function, shifted (referred to as the sec function hereafter). Its functional expression is

sec ((x - 0.5) π)

(7)

The graph of this function is shown in Figure 2.

The third function is a shifted rational function (referred to as the fractional function hereafter). Its functional expression is

\frac{1}{x (1 - x)}

(8)

The graph of this function is shown in Figure 3.

As shown, all three functions assign higher weights to base learners whose predicted scores are closer to 0 or 1, while those with predicted scores closer to 0.5 are given smaller weights. Compared to the other two functions, the tan function has a weight of 0 at

x = 0.5

and carries a risk of division by zero. However, on the domain

x \in [0, 1]

, the derivative of this function is relatively large at every point, which leads to more pronounced differentiation of the outputs from different base learners. The sec function and the fractional function, on the other hand, have derivatives approaching 0 on the interval

x \in [0.2, 0.8]

, resulting in a smaller and more consistent weight for base learners whose predicted scores fall within this range. For

x \in [0, 0.2) \cup (0.8, 1]

, the derivatives approach infinity, giving base learners in these regions a larger and more distinct weight.

Each of the three functions has its advantages and disadvantages, but none of them are universally applicable to all datasets. Therefore, in the next section, a neural network-based weight generation function is proposed. By cleverly constructing the neural network and designing an appropriate loss function, this function can learn, in a data-driven manner, a weight generation function that is suitable for a specific dataset.

2.3. Neural Network-Based Weight Generation Function

A neural network-based weight generation function is designed, and its overall structure is shown in Figure 4. The outputs of classifiers 1 to J do not interact directly and remain independent until the calculation of the loss function, which is indicated by the dashed lines. The feed-forward neural network (FFN) in the middle receives a scalar as input and outputs a scalar, which represents the weight assigned to the corresponding classifier. The function expression for this part is given in Formula (3).

Then, multiple multiplicative skip connections [22] are used to multiply the outputs of each classifier by the assigned weights. The sum of these weighted outputs gives

H (X)

, as shown in Formula (5). Finally, the deviation between

H (x)

and the true label y is computed as the loss function. In this case, the mean squared error (MSE) is used as a measure of the deviation between

H (x)

and the true label y, and the loss function is expressed in Formula (9).

From Formula (5), we can see that the outputs of the classifiers interact within the loss function, and through the backpropagation algorithm, they collectively influence the weights of the neurons in the FFN [23], as shown in Formula (10), where

θ

denotes the weight of the neuron. This neural network is trained in a data-driven manner and can flexibly adapt to different datasets and combinations of classifiers.

L = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - H (x_{i}))}^{2}

(9)

\frac{\partial L}{\partial θ} = \sum_{i = 1}^{n} \frac{\partial L}{\partial H (x_{i})} \cdot \frac{\partial H (x_{i})}{\partial F (h_{j} (X))} \cdot \frac{\partial F (h_{j} (X))}{\partial θ}

(10)

2.4. Datasets and Evaluation Metrics

Battery Charging Detection Data. This dataset comes from Baidu PaddlePaddle Learning Competition: Energy AI Challenge—Anomaly Detection Track (https://aistudio.baidu.com/datasetdetail/168245, (accessed on 22 July 2024)). It consists of processed data from full vehicle charging segments, aiming to analyze whether a vehicle is currently experiencing any faults based on charging data. The competition selected data that contain actual battery faults, but where the traditional alarm system failed to provide early warning. The data represent a mix of multiple vehicles of the same model. The charging segment data are relatively stable in new energy vehicle battery data, and the parameter changes in the battery follow a regular pattern. It is important to note that these data have been cleaned, and the raw values no longer reflect the characteristics of the battery itself, but the trend of the data changes still follows the battery’s regular patterns. The dataset is in .pkl file format, with each .pkl file containing a tuple (data, metadata). Each .pkl file corresponds to one battery and contains 28,390 samples. The shape of each data point is (256, 8), with each column representing different features as shown in Table 2. The metadata contains label and mileage information. The label “00” indicates normal, and “10” indicates abnormal. The mileage indicates the total distance for which the battery has been used. The proportion of abnormal samples is approximately 17%.

Fall Detection Data. This dataset is a de-identified reconstruction of the dataset used in the paper by Kaluza et al. [24], and is currently hosted on Kaggle (https://www.kaggle.com/datasets/jorekai/anomaly-detection-falling-people-events, (accessed on 4 November 2024)). The dataset consists of multiple CSV files, with each file recording the X, Y, Z position data of four sensors worn by different experimental subjects, along with time-point level anomaly labels, as shown in Table 3. Note that there is no timestamp feature in this dataset, as timestamps are often a key component in classification tasks. This can lead to an imbalance in learning tasks and limit the potential for broad generalization. The training set contains approximately 134,229 samples, with 5% of them being anomalous. The test set contains a total of 30,030 samples.

Motion State Recognition Data. This dataset comes from the preliminary round of the Second National Embedded Software Development Competition hosted by VeriSilicon (https://bhpan.buaa.edu.cn/link/AA7C53FB7574F1434DB72FE002F6DDB366, accessed on 15 December 2024). The dataset consists of multiple txt files, with each file recording three-axis data (total of six axes) from the accelerometer and gyroscope of an intelligent wristband worn by different experimental subjects. The accelerometer and gyroscope data are sampled at a rate of 25 Hz. The accelerometer has a range of

\pm 8

G with a resolution of 4096/1 G (one gravity is represented by the value 4096). The gyroscope has a range of

\pm 2000

°/s with a resolution of 16.4/(1 °/s) (one degree per second is represented by the value 16.4). The dataset includes six types of motions: walking, jogging, sitting, waving, squats, and jumping jacks. For jumping jacks, due to variations in the physical abilities of the data collectors, some files contain rest phases between actions (performing action, then rest, then performing action), where the subject remains still during the rest phase. This experiment only uses data for jumping jacks and jogging, containing 74,314 samples, with a balanced class distribution.

In this paper, the Area Under the ROC Curve (AUC) is used as the evaluation metric. The Receiver Operating Characteristic (ROC) curve is a graphical representation of the classification model’s performance at different thresholds. It plots the relationship between False Positive Rate (FPR) and True Positive Rate (TPR). The confusion matrix for binary classification [25] is defined as follows:

\begin{matrix} \begin{matrix} Actual Predict i n e Positive (P) Negative (N) \end{matrix} & \begin{matrix} Positive (P) TP (True Positive) FP (False Positive) \end{matrix} & \begin{matrix} Negative (N) FN (False Negative) TN (True Negative) \end{matrix} \end{matrix}

Then, the formulas for calculating

F P R

and

T P R

are

\begin{matrix} T P R (T r u e P o s i t i v e R a t e) = \frac{T P}{T P + F N} \\ F P R (F a l s e P o s i t i v e R a t e) = \frac{F P}{F P + T N} \end{matrix}

(11)

AUC is the area under the ROC curve, typically ranging between 0 and 1. The closer the AUC is to 1, the better the classification performance of the model. If the AUC is close to 0.5, it indicates the model’s performance is near random guessing, and if AUC is below 0.5, the model performs poorly. AUC is a robust performance metric for binary classification, especially in cases with imbalanced data, as it is unaffected by changes in the class distribution and provides a more stable performance evaluation standard. The AUC calculation method is as follows [26]:

\begin{matrix} AUC = \frac{\sum I (P_{p o s}, P_{n e g})}{M \times N} \\ I (P_{p o s}, P_{n e g}) = \{\begin{matrix} 1, & P_{p o s} > P_{n e g} \\ 0.5, & P_{p o s} = P_{n e g} \\ 0, & P_{p o s} < P_{n e g} \end{matrix} \end{matrix}

(12)

3. Results

3.1. Comparison of Fixed Expression Weight Generation Function and Fixed Weight Ensemble Results

The ensemble of (SVM + MLP), (KNN + MLP), and (RF + MLP) learners was applied to the battery charging detection data, fall detection data, and motion state recognition data, respectively, where SVM is the Support Vector Machine [27] model, MLP is the Multi-Layer Perceptron [28] model, KNN is the K-Nearest Neighbors [29] model, and RF is the Random Forest [30] model. The hyperparameters of each model are shown in Table 4. The Area Under the Curve (AUC) metrics were recorded for the single learner, three weight generation function ensembles, and fixed weight ensembles (the optimal weight distribution chosen from 1:9 to 9:1 for the fixed weight ensemble). The results are shown in Table 5.

As observed, on two imbalanced anomaly detection datasets, the ensemble of fixed expression weight generation functions significantly outperforms the single model results, as well as the fixed weight ensemble results. However, on the balanced dataset, although the fixed expression weight generation function ensemble performs better than the single model, it slightly lags behind the fixed weight ensemble. Therefore, the ensemble method based on fixed expression weight generation functions performs better for anomaly detection tasks with imbalanced classes. For balanced binary classification tasks, an adaptive learning method based on neural network-based weight generation functions should be used to optimize the ensemble.

3.2. Neural Network-Based Weight Generation Function

A simple feed-forward neural network (FFN) was constructed, with the structure shown in Figure 5, and fully trained based on Figure 4. The function images on the three datasets are shown in Figure 6, Figure 7 and Figure 8. Notably, for the “low at both ends, high in the middle” trend presented in Figure 8, due to its value range being

(- \infty, 0)

, we can factor out

- 1

from both the numerator and denominator in equation Equation (5), effectively flipping the function graph, which is equivalent to the “high at both ends, low in the middle” trend. This experiment also proves the rationality of the ensemble strategy that assigns weights based on confidence levels.

The ensemble results using the fully trained neural network-based weight generation function are shown in Table 6. It can be observed that the neural network-based weight generation function performs better when dealing with balanced datasets.

The comparative experiments on three datasets are shown in Table 7, Table 8 and Table 9. As can be seen in the results, either fixed function ensemble or neural network function ensemble has better effects and improvements than stacking, bagging, and AdaBoost methods. Therefore, func-bagging is superior with greater flexibility, higher base learner performance improvement, and better interpretability. However, it should be noted that func-bagging is a post-processing method, which is more separate from stacking, bagging, and AdaBoost in terms of program design (it is necessary to train and obtain the output scores of each base classifier separately before func-bagging can be performed). Therefore, if a unified training process is required, stacking and bagging will be more appropriate.

4. Conclusions

This paper proposes a confidence-based weight generation function ensemble strategy for heterogeneous classifiers, which includes fixed function ensembles and neural network-based function ensembles. In the fixed function ensemble, three types of “high at both ends, low in the middle” functions are designed, demonstrating superior performance in imbalanced anomaly detection tasks. In the neural network-based function ensemble, an ingeniously designed neural network structure and loss function are used to adaptively learn the relationship between weight distribution and classifier outputs. On one hand, the neural network function ensemble exhibits a trend consistent with the “high at both ends, low in the middle” fixed functions, verifying the rationality of the proposed strategy. On the other hand, the neural network function ensemble outperforms the fixed function ensemble in balanced binary classification tasks, showcasing the great potential of this method.

The proposed strategy is currently applicable only to binary classification problems. Future work will explore its extension to multi-class classification tasks. As shown in Figure 9, each block is used to predict a label, so there are N blocks for N labels in total. Also, there are T base classifiers. The final result of multi classification is defined as follows:

p r e d_l a b e l = arg max_{i = 1 \to N} \sum_{j = 1}^{T} w_{i j} s c o r e_{i j}

(13)

Additionally, the neural network structure used in this paper is relatively simple and did not show particularly outstanding performance across the three datasets. Future research will focus on improving the network structure to achieve better ensemble results.

Author Contributions

Conceptualization, R.Q. and Y.Y.; methodology, R.Q. and Q.S.; validation, R.Q., Y.Y. and T.G.; investigation, R.Q.; resources, R.Q.; data curation, R.Q.; writing—original draft preparation, R.Q.; writing—review and editing, R.Q.; visualization, Y.Y.; supervision, Q.S.; project administration, Q.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key R&D Program of China (2022YFB4501900).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors are grateful to the Beihang University.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Read, J.; Pfahringer, B.; Holmes, G.; Frank, E. Classifier chains: A review and perspectives. J. Artif. Intell. Res. 2021, 70, 683–718. [Google Scholar] [CrossRef]
Abdolrasol, M.G.; Hussain, S.M.S.; Ustun, T.S.; Sarker, M.R.; Hannan, M.A.; Mohamed, R.; Ali, J.A.; Mekhilef, S.; Milad, A. Artificial neural networks based optimization techniques: A review. Electronics 2021, 10, 2689. [Google Scholar] [CrossRef]
Costa, V.G.; Pedreira, C.E. Recent advances in decision trees: An updated survey. Artif. Intell. Rev. 2023, 56, 4765–4800. [Google Scholar] [CrossRef]
Cropper, A.; Dumančić, S. Inductive logic programming at 30: A new introduction. J. Artif. Intell. Res. 2022, 74, 765–850. [Google Scholar] [CrossRef]
Khan, M.E.; Rue, H. The bayesian learning rule. arXiv 2021, arXiv:2107.04562. [Google Scholar]
Mienye, I.D.; Sun, Y. A survey of ensemble learning: Concepts, algorithms, applications, and prospects. IEEE Access 2022, 10, 99129–99149. [Google Scholar] [CrossRef]
Sesmero, M.P.; Ledezma, A.I.; Sanchis, A. Generating ensembles of heterogeneous classifiers using stacked generalization. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2015, 5, 21–34. [Google Scholar] [CrossRef]
Odegua, R. An empirical study of ensemble techniques (bagging, boosting and stacking). In Proceedings of the Deep Learning Indaba, Sandton, South Africa, 25–30 August 2019. [Google Scholar]
Wang, J.; Xu, H.; Liu, J.; Peng, X.; He, C. A bagging-strategy based heterogeneous ensemble deep neural networks approach for the multiple components fault diagnosis of hydraulic systems. Meas. Sci. Technol. 2023, 34, 065007. [Google Scholar] [CrossRef]
Xia, Y.; Liu, C.; Da, B.; Xie, F. A novel heterogeneous ensemble credit scoring model based on stacking approach. Expert Syst. Appl. 2018, 93, 182–199. [Google Scholar] [CrossRef]
Sobanadevi, V.; Ravi, G. Handling data imbalance using a heterogeneous bagging-based stacked ensemble (HBSE) for credit card fraud detection. In Intelligence in Big Data Technologies—Beyond the Hype: Proceedings of ICBDCC 2019; Springer: Singapore, 2021; pp. 517–525. [Google Scholar]
Skalak, D.B. Prototype Selection for Composite Nearest Neighbor Classifiers; University of Massachusetts Amherst: Amherst, MA, USA, 1997. [Google Scholar]
Fan, W.; Stolfo, S.; Chan, P. Using conflicts among multiple base classifiers to measure the performance of stacking. In Proceedings of the ICML-99 Workshop on Recent Advances in Meta-Learning and Future Work; Stefan Institute Publisher: Ljubljana, Slovenia, 1999; pp. 10–17. [Google Scholar]
Merz, C.J. Using correspondence analysis to combine classifiers. Mach. Learn. 1999, 36, 33–58. [Google Scholar] [CrossRef]
Ting, K.M.; Witten, I.H. Issues in stacked generalization. J. Artif. Intell. Res. 1999, 10, 271–289. [Google Scholar] [CrossRef]
Zefrehi, H.G.; Altınçay, H. Imbalance learning using heterogeneous ensembles. Expert Syst. Appl. 2020, 142, 113005. [Google Scholar] [CrossRef]
Zhao, Q.L.; Jiang, Y.H.; Xu, M. Incremental learning by heterogeneous bagging ensemble. In Advanced Data Mining and Applications, Proceedings of the 6th International Conference, ADMA 2010, Chongqing, China, 19–21 November 2010, Proceedings, Part II; Springer: Berlin/Heidelberg, Germany, 2010; pp. 1–12. [Google Scholar]
Hsu, K.W.; Srivastava, J. Improving bagging performance through multi-algorithm ensembles. In New Frontiers in Applied Data Mining, Proceedings of the PAKDD 2011 International Workshops, Shenzhen, China, 24–27 May 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 471–482. [Google Scholar]
Nguyen, T.T.; Luong, A.V.; Dang, M.T.; Liew, A.W.C.; McCall, J. Ensemble selection based on classifier prediction confidence. Pattern Recognit. 2020, 100, 107104. [Google Scholar] [CrossRef]
Nguyen, T.T.; Nguyen, M.P.; Pham, X.C.; Liew, A.W.-C. Heterogeneous classifier ensemble with fuzzy rule-based meta learner. Inf. Sci. 2018, 422, 144–160. [Google Scholar] [CrossRef]
Nguyen, T.T.; Nguyen, M.P.; Pham, X.C.; Liew, A.W.-C.; Pedrycz, W. Combining heterogeneous classifiers via granular prototypes. Appl. Soft Comput. 2018, 73, 795–815. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 7132–7141. [Google Scholar]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Kaluza, B.; Mirchevska, V.; Dovgan, E.; Luštrek, M.; Gams, M. An agent-based approach to care in independent living. In Ambient Intelligence, Proceedings of the First International Joint Conference, AmI 2010, Malaga, Spain, 10–12 November 2010. Proceedings 1; Springer: Berlin/Heidelberg, Germany, 2010; pp. 177–186. [Google Scholar]
Heydarian, M.; Doyle, T.E.; Samavi, R. MLCM: Multi-label confusion matrix. IEEE Access 2022, 10, 19083–19095. [Google Scholar] [CrossRef]
Narkhede, S. Understanding AUC-ROC Curve. Towards Data Sci. 2018, 26, 220–227. [Google Scholar]
Yue, S.; Li, P.; Hao, P. SVM classification: Its contents and challenges. Appl.-Math.-J. Chin. Univ. 2003, 18, 332–342. [Google Scholar] [CrossRef]
Singh, J.; Banerjee, R. A study on single and multi-layer perceptron neural network. In Proceedings of the 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 27–29 March 2019; IEEE: New York, NY, USA, 2019; pp. 35–40. [Google Scholar]
Guo, G.; Wang, H.; Bell, D.; Bi, Y.; Greer, K. KNN model-based approach in classification. In On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE, Proceedings of the OTM Confederated International Conferences, CoopIS, DOA, and ODBASE 2003 Catania, Sicily, Italy, 3–7 November 2003. Proceedings; Springer: Berlin/Heidelberg, Germany, 2003; pp. 986–996. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Harris, J.K. Primer on binary logistic regression. Fam. Med. Community Health 2021, 9 (Suppl. S1), e001290. [Google Scholar] [CrossRef] [PubMed]
Wang, W.; Sun, D. The improved AdaBoost algorithms for imbalanced data classification. Inf. Sci. 2021, 563, 358–374. [Google Scholar] [CrossRef]
Reddy, E.M.K.; Gurrala, A.; Hasitha, V.B.; Kumar, K.V.R. Introduction to Naive Bayes and a review on its subtypes with applications. In Bayesian Reasoning and Gaussian Processes for Machine Learning Applications; Chapman and Hall/CRC: Boca Raton, FL, USA, 2022; pp. 1–14. [Google Scholar]

Figure 1. Graph of the function

|tan ((x - 0.5) π)|

.

Figure 1. Graph of the function

|tan ((x - 0.5) π)|

.

Figure 2. Graph of the function

sec ((x - 0.5) π)

.

Figure 2. Graph of the function

sec ((x - 0.5) π)

.

Figure 3. Graph of the function

\frac{1}{x (1 - x)}

.

Figure 3. Graph of the function

\frac{1}{x (1 - x)}

.

Figure 4. Architecture of the Neural Network-Based Weight Generation Function.

Figure 5. Neural Network-Based Weight Generation Function Structure.

Figure 6. Neural Network-Based Weight Generation Function on Battery Data.

Figure 7. Neural Network-Based Weight Generation Function on Fall Data.

Figure 8. Neural Network-Based Weight Generation Function on Motion Data.

Figure 9. Architecture of the Multi-Class Problem.

Table 1. Related Work and Key Influences.

Author(s)	Key Influence
Skalak (1997) [12]	Introduced instance-based learning classifiers as Level-0 base learners, using prototypes for classification instead of traditional models.
Merz (1999) [14]	Proposed Stacking with Correspondence Analysis and Nearest Neighbor (SCANN), using correspondence analysis to remove redundancy between base learners and a nearest neighbor method as the meta-learner, improving model diversity.
Fan et al. (1999) [13]	Evaluated stacking ensemble accuracy with conflict-based estimates, using tree-based and rule-based classifiers as base learners.
Qiang Li Zhao et al. (2010) [17]	Explored classifier ensemble methods in incremental learning, proposing Bagging++, which integrates new incremental base learners into the original ensemble model using a simple voting strategy.
Kuo-Wei Hsu et al. (2011) [18]	Demonstrated that higher divergence (heterogeneity) between base classifiers leads to stronger model performance, although they used the original bagging voting strategy without exploring a new fusion strategy.
M. Paz Sesmero et al. (2015) [7]	Enhanced understanding of stacking method variants and their applications.
Yufei Xia et al. (2018) [10]	Enhanced ensemble performance but lacked interpretability in base learner weight distribution.
Hossein Ghaderi Zefrehi et al. (2020) [16]	Addressed the class imbalance problem in binary classification tasks using heterogeneous ensembles and various sampling methods (undersampling and oversampling).
Nguyen et al. (2020) [19]	Proposed an ensemble selection method based on classifier prediction confidence and reliability, optimizing the empirical 0-1 loss to effectively combine static and dynamic ensemble selection, outperforming traditional strategies in experiments on 62 datasets.
V. Sobanadevi and G. Ravi (2021) [11]	Improved credit card fraud detection but lacked interpretability in base learner contributions.
Junlang Wang et al. (2023) [9]	Improved computational efficiency and diagnostic accuracy through feature selection, dimensionality reduction, and ensemble learning.

Table 2. Battery Charging Detection Data Features and Their Meanings.

Feature Name	Meaning
volt	Overall voltage
current	Overall current
soc	State of Charge
max_single_volt	Maximum cell voltage
min_single_volt	Minimum cell voltage
max_temp	Maximum temperature
min_temp	Minimum temperature
timestamp	Timestamp

These features are used to monitor and detect battery performance during the charging process.

Table 3. Fall Detection Data Features and Their Meanings.

Feature Name	Meaning
x	x-coordinate of the sensor
y	y-coordinate of the sensor
z	z-coordinate of the sensor
010-000-024-033	Sensor 1 data
010-000-030-096	Sensor 2 data
020-000-032-221	Sensor 3 data
020-000-033-111 anomaly	Sensor 4 data Anomaly label (whether a fall occurred)

The data include multi-axis sensor measurements to detect falls.

Table 4. Experimental Results on Three Datasets.

Hyper Parameter	SVM	MLP	KNN	RF
C (Penalty coefficient)	0.88	*	*	*
kernel	RBF	*	*	*
gamma	$\frac{1}{n_f e a t u r e s}$	*	*	*
random_state	42	1	0	42
solver	*	lbfgs	*	*
learning_rate	*	0.00001	*	*
hidden_layer_sizes	*	(5, 2)	*	*
max_iter	*	200	*	*
n_neighbors	*	*	200	*
n_estimators	*	*	*	3
n_jobs	*	*	*	−1

* The corresponding model does not have this parameter.

Table 5. Experimental Results on Three Datasets.

Dataset	SVM	MLP	KNN	RF	Fixed Weight Ensemble	Tan Function Ensemble	Sec Function Ensemble	Fractional Function Ensemble
Battery Data	0.9366	0.9313	*	*	0.9441	0.9556	0.9558	0.9557
Fall Data	*	0.8168	0.8170	*	0.8353	0.8329	0.8375	0.8377
Motion Data	*	0.9806	*	0.9779	0.9883	0.9806	0.9855	0.9879

* The corresponding dataset does not use this model. bold: The largest value in this row.

Table 6. Experimental Results with Neural Network-Based Weight Generation Function.

Dataset	SVM	MLP	KNN	RF	Fixed Weight Ensemble	Fixed Function Ensemble	Neural Network Function Ensemble
Battery Data	0.9364	0.9367	*	*	0.9504	0.9515	0.9508
Fall Data	*	0.8203	0.8170	*	0.8375	0.8407	0.8354
Motion Data	*	0.9809	*	0.9787	0.98879	0.9884	0.98881

* The corresponding dataset does not use this model. bold: The largest value in this row.

Table 7. Comparative Experiment on Battery Data.

Model	AUC	Improvement over Base Classifier	Note
SVM	0.9364	*	Base classifier
MLP	0.9367	*	Base classifier
Fixed Weight Ensemble	0.9504	1.479%	The weight ratio of the two base classifiers is 5:5
Fixed Function Ensemble	0.9515	1.596%	Frac function
Neural Network Function Ensemble	0.9508	1.522%	*
Stacking	0.9497	1.404%	*
Bagging (SVM)	0.9374	0.107%	Bagging does not support integrating two heterogeneous models
Bagging (MLP)	0.9445	0.833%	*
Logistic [31]	0.8497	*	Base classifier
AdaBoost [32] (Logistic)	0.8471	−0.306%	AdaBoost does not support models without the sample_weight parameter
Gaussian Naive Bayes [33]	0.8201	*	Base classifier
AdaBoost (Gaussian Naive Bayes)	0.7077	−13.706%	*

* The model itself is the base classifier or no notes. bold: The largest value in this column.

Table 8. Comparative Experiment on Fall Data.

Model	AUC	Improvement over Base Classifier	Note
MLP	0.8168	*	One of the base classifiers
KNN	0.8170	*	One of the base classifiers
Fixed Weight Ensemble	0.8375	2.522%	The weight ratio of the two base classifiers is 4:6
Fixed Function Ensemble	0.8407	2.913%	Sec function
Neural Network Function Ensemble	0.8354	2.277%	*
Stacking	0.8336	2.044%	*
Bagging (MLP)	0.8345	2.167%	*
Bagging (KNN)	0.8192	0.269%	*
Logistic	0.8118	*	Base classifier
Adaboost (Logistic)	0.8123	0.062%	*
Gaussian Naive Bayes	0.8001	*	Base classifier
Adaboost (Gaussian Native Bayers)	0.6538	−18.285%	*

* The model itself is the base classifier or no notes. bold: The largest value in this column.

Table 9. Comparative Experiment on Motion Data.

Model	AUC	Improvement over Base Classifier	Note
MLP	0.9809	*	One of the base classifiers
RF	0.9787	*	One of the base classifiers
Fixed Weight Ensemble	0.98879	0.918%	The weight ratio of the two base classifiers is 6:4
Fixed Function Ensemble	0.9884	0.878%	Sec function
Neural Network Function Ensemble	0.98881	0.920%	*
Stacking	0.98879	0.918%	*
Bagging (MLP)	0.9851	0.633%	*
Bagging (RF)	0.98875	0.913%	*
Logistic	0.5428	*	Base classifier
Adaboost (Logistic)	0.5419	−0.166%	*
Gaussian Naive Bayes	0.8077	*	Base classifier
Adaboost (Gaussian Native Bayers)	0.6817	−15.600%	*

* The model itself is the base classifier or no notes. bold: The largest value in this column.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qiu, R.; Yin, Y.; Su, Q.; Guan, T. Func-Bagging: An Ensemble Learning Strategy for Improving the Performance of Heterogeneous Anomaly Detection Models. Appl. Sci. 2025, 15, 905. https://doi.org/10.3390/app15020905

AMA Style

Qiu R, Yin Y, Su Q, Guan T. Func-Bagging: An Ensemble Learning Strategy for Improving the Performance of Heterogeneous Anomaly Detection Models. Applied Sciences. 2025; 15(2):905. https://doi.org/10.3390/app15020905

Chicago/Turabian Style

Qiu, Ruinan, Yongfeng Yin, Qingran Su, and Tianyi Guan. 2025. "Func-Bagging: An Ensemble Learning Strategy for Improving the Performance of Heterogeneous Anomaly Detection Models" Applied Sciences 15, no. 2: 905. https://doi.org/10.3390/app15020905

APA Style

Qiu, R., Yin, Y., Su, Q., & Guan, T. (2025). Func-Bagging: An Ensemble Learning Strategy for Improving the Performance of Heterogeneous Anomaly Detection Models. Applied Sciences, 15(2), 905. https://doi.org/10.3390/app15020905

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Func-Bagging: An Ensemble Learning Strategy for Improving the Performance of Heterogeneous Anomaly Detection Models

Abstract

Featured Application

Abstract

1. Introduction

1.1. Background

1.2. Related Work

2. Materials and Methods

2.1. Problem Definition

2.2. Three “High at Both Ends, Low in the Middle” Functions

2.3. Neural Network-Based Weight Generation Function

2.4. Datasets and Evaluation Metrics

3. Results

3.1. Comparison of Fixed Expression Weight Generation Function and Fixed Weight Ensemble Results

3.2. Neural Network-Based Weight Generation Function

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI