MCH-Ensemble: Minority Class Highlighting Ensemble Method for Class Imbalance in Network Intrusion Detection

Oh, Sumin; Sohn, Seoyoung; Kim, Chaewon; Park, Minseo

doi:10.3390/app152312647

Open AccessArticle

MCH-Ensemble: Minority Class Highlighting Ensemble Method for Class Imbalance in Network Intrusion Detection

Department of Data Science, Seoul Women’s University, Seoul 01797, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(23), 12647; https://doi.org/10.3390/app152312647

Submission received: 30 October 2025 / Revised: 24 November 2025 / Accepted: 27 November 2025 / Published: 28 November 2025

(This article belongs to the Special Issue Advances in Machine Learning and Data Mining: Emerging Trends and Applications)

Download

Browse Figures

Versions Notes

Abstract

As cyber threats such as denial-of-service (DoS) attacks continue to rise, network intrusion detection systems (NIDS) have become essential components of cybersecurity defense. Although machine learning is widely applied to network intrusion detection, its performance often deteriorates due to the extreme class imbalance present in real-world data. This imbalance causes models to become biased and unable to detect critical attack instances. To address this issue, we propose MCH-Ensemble (Minority Class Highlighting Ensemble), an ensemble framework designed to improve the detection of minority attack classes. The method constructs multiple balanced subsets through random under-sampling and trains base learners, including decision tree, XGBoost, and LightGBM models. Features of correctly predicted attack samples are then amplified by adding a constant value, producing a boosting-like effect that enhances minority class representation. The highlighted subsets are subsequently combined to train a random forest meta-model, which leverages bagging to capture diverse and fine-grained decision boundaries. Experimental evaluations on the UNSW-NB15, CIC-IDS2017, and WSN-DS datasets demonstrate that MCH-Ensemble effectively mitigates class imbalance and achieves superior recognition of DoS attacks. The proposed method achieves enhanced performance compared with those reported previously. On the UNSW-NB15 and CIC-IDS2017 datasets, it achieves improvements in accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC) by ~1.2% and ~0.61%, ~9.8% and 0.77%, ~0.7% and ~0.56%, ~5.3% and 0.66%, and ~0.1% and ~0.06%, respectively. In addition, it achieves these improvements by ~0.17%, ~1.66%, ~0.11%, ~0.88%, and ~0.06%, respectively, on the WSN-DS dataset. These findings indicate that the proposed framework offers a robust and accurate approach to intrusion detection, contributing to the development of reliable cybersecurity systems in highly imbalanced network environments.

Keywords:

minority class highlighting; class imbalance; network intrusion detection; hybrid method; random forest

1. Introduction

As more devices and services become interconnected through the internet, the risk of cyberattacks is increasing rapidly [1]. Among these attacks, denial-of-service (DoS) is particularly dangerous because it overwhelms target systems or networks with excessive traffic, disrupting critical services and causing severe damage [2]. Therefore, network intrusion detection systems (NIDS) have become a crucial defense mechanism against such evolving threats [3]. By monitoring and analyzing network traffic, NIDS can effectively identify malicious activities. Numerous studies have applied machine learning to develop efficient NIDS frameworks [4].

Machine learning is well suited for intrusion detection because it can automatically learn data patterns and predict attack behaviors [5]. However, constructing a machine learning-based NIDS remains challenging due to the high class imbalance in real-world network data. Machine learning models typically assume evenly distributed classes, yet attack instances are far less frequent than normal traffic [6]. This imbalance results in biased models that tend to overfit normal traffic and fail to detect rare but critical DoS attacks. To address this issue, researchers have proposed various solutions, which can generally be categorized into three approaches: data-level, algorithm-level, and hybrid methods [7,8].

At the data-level, sampling techniques are commonly used to balance class distributions. Under-sampling methods reduce the majority class size by randomly removing samples, thus addressing imbalance without increasing data volume [9,10]. Conversely, over-sampling methods, such as the synthetic minority over-sampling technique (SMOTE) and its variants (e.g., Borderline-SMOTE), generate synthetic samples of minority classes to balance datasets [11]. These methods have demonstrated improved performance in class-imbalanced data; however, under-sampling may cause significant information loss [12], whereas over-sampling can lead to overfitting by generating redundant synthetic samples [13].

At the algorithm-level, cost-sensitive learning is often employed to mitigate class imbalance without altering data distribution [14]. This approach assigns higher misclassification costs to minority class instances within the loss function, enabling the model to pay greater attention to them during training. Although cost-sensitive methods preserve the original data size, their effectiveness depends heavily on the proper selection of cost parameters. Consequently, they may lead to trade-offs between sensitivity and specificity, making the optimization process more unstable [15].

To overcome the limitations of both approaches, hybrid methods integrate data-level and model-level strategies [16]. A representative example is EasyEnsemble [17]. This technique applies random under-sampling multiple times to generate several balanced subsets of the training data. Each subset is then used to train an independent AdaBoost model, which increases the weights of misclassified samples so that subsequent models focus more on them. This design allows EasyEnsemble to combine data-level resampling with algorithm-level cost-sensitive weighting, effectively mitigating information loss while emphasizing minority class instances during learning [18]. Recent studies have also explored deep learning–based hybrid approaches that combine data-level sampling method with algorithm-level deep learning architecture [19,20].

However, hybrid methods still have limitations. The final prediction from hybrid methods such as EasyEnsemble is typically obtained through a simple aggregation of multiple model outputs, which may dilute the unique patterns and characteristics learned by each individual model. Although deep learning–based hybrid approaches have shown improved performance, they heavily rely on synthetic data quality and often fail to explicitly preserve minority class characteristics. Thus, effectively improving minority class detection and ensuring stable prediction performance of these approaches remain to be addressed. To this end, we propose MCH-Ensemble (Minority Class Highlighting Ensemble), a novel hybrid framework. We aim to enhance DoS attack detection by emphasizing minority class features to enhance minority class recognition accuracy, thereby achieving more accurate predictions.

In MCH-Ensemble, k balanced training subsets are generated from the original dataset using random under-sampling, where k is a hyperparameter that controls the number of subsets. Each subset is then used to train base models such as decision tree, extreme gradient boosting (XGBoost), and light gradient boosting machine (LightGBM). Subsequently, attack class instances predicted by each model are compared with the actual attack class labels, and a constant weight of 1 is added to the values of correctly predicted instances.

Subsequently, attack class instances that are correctly predicted are identified, and a constant weight of 1 is added to their feature values. This process highlights well-predicted minority features and introduces a boosting-like effect. All highlighted subsets are then merged into a single dataset, which is used to train a random forest model as the final classifier.

MCH-Ensemble enhances detection performance by leveraging the bagging mechanism of random forest. In bagging, multiple decision trees are trained on different subsets of the data, and their results are aggregated. By highlighting features of attack classes, the method produces a more detailed dataset that distinguishes among normal traffic, attack traffic, and severe attacks. Each tree learns distinct patterns from these variations, and when combined, the ensemble can detect attacks more accurately by capturing richer and more diverse decision boundaries.

For evaluation, we employ network intrusion detection datasets, specifically focusing on DoS attack traffic. To determine the optimal configuration, decision tree, XGBoost, and LightGBM models are compared in combination with the random forest meta-model to identify the most effective setup for attack class detection.

The main contributions of this study are summarized as follows:

We propose MCH-Ensemble, a hybrid ensemble framework that effectively addresses class imbalance in DoS attack detection.
We introduce a novel highlighting mechanism that emphasizes attack class features within the data, thereby improving the model’s ability to classify subtle attack patterns.
We demonstrate that integrating these highlighted datasets with a random forest meta-model enhances prediction performance by leveraging diverse and fine-grained bagging samples.

The effectiveness of MCH-Ensemble is evaluated on the UNSW-NB15, CIC-IDS2017, and WSN-DS datasets to identify the optimal classifier combination for enhanced detection performance.

The remainder of this paper is organized as follows: Section 2 presents related work. Section 3 describes the proposed MCH-Ensemble method. Section 4 discusses the experimental results, and Section 5 concludes the paper and outlines directions for future research.

2. Related Work

2.1. Data-Level Methods

To address the class imbalance problem, data-level methods balance class distribution using sampling techniques. Among these, random over-sampling and under-sampling are widely used. Random over-sampling involves duplicating instances from the minority class until the dataset contains an equal number of minority and majority class samples. In contrast, random under-sampling randomly removes instances from the majority class to achieve balance [9]. Mohammed et al. [10] applied random over-sampling and under-sampling techniques to mitigate class imbalance when predicting Santander customer transactions. The number of customers who completed a transaction (minority class) was adjusted to match the number of those who did not (majority class). Similarly, Pang et al. [11] used random over-sampling to address class imbalance in Android malware detection, balancing the number of malicious traffic flows (minority class) with that of benign traffic flows (majority class). Chen et al. [21] proposed a GAN-based over-sampling method to address class imbalance, which incorporates both the statistical and spatial information of minority class samples into the generation of synthetic minority-class samples.

2.2. Algorithm-Level Methods

Unlike data-level methods that balance datasets through resampling, algorithm-level methods modify machine learning algorithms to reduce bias toward the majority class. In many class imbalance scenarios, misclassification types incur different costs depending on their real-world impact. In intrusion detection, for example, misclassifying an intrusion (minority class) as normal (majority class) is typically more costly than the reverse case. Cost-sensitive learning addresses this issue by assigning higher penalties to misclassifications of minority class instances, increasing their influence during training and minimizing the overall cost [22,23]. Thach et al. [24] designed a reward function that assigns greater rewards to true positives than to true negatives, improving minority class performance in imbalanced datasets. Yang et al. [25] modified the support vector machine (SVM) algorithm by adopting an inversely proportional regularized penalty to reweight the classes and applying margin compensation to adjust the decision boundary, thereby achieving more balanced accuracy on imbalanced data. In addition, Mienye et al. [26] developed a cost-sensitive classifier that handles imbalanced class distributions by modifying the objective functions of four machine learning algorithms. This classifier achieved improved performance compared with existing methods on four key medical datasets.

2.3. Hybrid Methods

Hybrid methods combine data-level and algorithm-level strategies to leverage their respective advantages while overcoming individual limitations [27]. EasyEnsemble is a representative example of a hybrid approach that integrates random under-sampling with ensemble learning. Ensemble learning, which combines multiple classifiers, has proven more effective than standalone sampling techniques for addressing class imbalance [28]. EasyEnsemble repeatedly performs random under-sampling on the majority class to create multiple balanced subsets. Each subset is then used to train an independent AdaBoost classifier, and the outputs of all classifiers are aggregated to generate the final prediction [29]. Riaz et al. [30] applied EasyEnsemble for software fault prediction and demonstrated improved performance on several imbalanced datasets composed of faulty (minority class) and non-faulty (majority class) modules. Similarly, Wang et al. [31] employed EasyEnsemble for heart arrhythmia detection, achieving superior performance compared with previous studies on datasets containing arrhythmia (minority class) and normal (majority class) cases. Recent studies have explored various deep learning–based approaches building upon these traditional hybrid strategies. Wand et al. [19] proposed a deep learning–based hybrid method that generates minority class using an auxiliary-guided conditional variational autoencoder (ACVAE) and reduces majority class using the edited centroid-displacement nearest neighbor (ECDNN) algorithm. Ali et al. [20] developed various deep learning–based hybrid models that addressed class imbalance by applying synthetic minority over-sampling Technique (SMOTE). The proposed models, particularly convolutional Neural networks (CNNs) and long short-term memory (LSTM), outperformed traditional machine learning approaches in cyber threat detection.

3. Methodology

This section introduces the methods used in this study, including machine learning-based classifiers, under-sampling ensemble, and minority class highlighting. These methods collectively form the proposed framework.

3.1. Machine Learning-Based Classifiers

3.1.1. Decision Tree

A decision tree is a tree-structured model that generates decision rules from training data to classify new instances [32]. It consists of nodes and branches, where each internal node represents a decision criterion, and branches connect nodes based on possible outcomes. The root node initiates the data division, while the leaf nodes represent the final output classes according to the learned decision rules [33]. Decision trees have the advantage of interpretability because their decision process can be visualized in a hierarchical structure [34]. However, since a single tree is trained on the entire dataset, it is prone to overfitting when the tree depth increases.

3.1.2. Random Forest

Random forest is an ensemble learning algorithm that combines multiple decision trees to improve prediction stability and accuracy [35]. It is based on bagging, which stands for “bootstrap aggregation” [36]. In this process, the model first performs sampling with replacement to create several bootstrap training sets. Each training set is used to train an independent decision tree. The predictions from all trees are then aggregated, typically through majority voting, to produce the final classification result. Because random forest aggregates multiple models, it provides robust predictions even for large-scale datasets. However, it has limitations in interpreting the relative importance of individual features and may require parameter tuning to achieve optimal performance [37].

3.1.3. Extreme Gradient Boosting (XGBoost)

Extreme gradient boosting (XGBoost) is an ensemble algorithm based on gradient boosting principles [38]. It builds successive models that correct the residual errors of previous ones, assigning feature-specific weights and combining the outputs to generate the final prediction. Initially, XGBoost trains a decision tree using the training data and computes the residual errors as the difference between actual and predicted values. These residuals are then used to generate weighted training samples for the next iteration, allowing subsequent trees to focus on previously misclassified instances. XGBoost improves upon traditional gradient boosting by employing GPU-accelerated parallel processing to reduce training time. It also mitigates overfitting through level-wise tree growth, maintaining balanced structures that enable stable and efficient training on large-scale datasets [39].

3.1.4. Light Gradient Boosting Machine (LightGBM)

LightGBM is a boosting-based ensemble model designed for high training efficiency using a leaf-wise growth strategy [40]. In contrast to XGBoost’s level-wise method, LightGBM splits the leaf node that yields the maximum loss reduction at each iteration, enabling faster convergence and improved accuracy. This leaf-wise splitting provides substantially higher training speed than level-wise approaches while maintaining competitive predictive performance [41].

3.2. Under-Sampling Ensemble

To address the class imbalance problem, we employ an under-sampling ensemble method that constructs multiple balanced training datasets through repeated sampling with replacement. This approach compensates for the potential information loss typically caused by standard under-sampling. By creating multiple balanced training datasets from diverse subsets of majority class instances [42], the method enhances generalization capability and reduces the likelihood of losing informative samples. Furthermore, this ensemble allows classifiers to learn from a wider range of feature representations.

Figure 1 illustrates the process of the under-sampling ensemble. Here, n_minority represents the number of minority class instances. Majority class instances are randomly sampled k times with replacement to match n_minority. The parameter k is a hyperparameter and is set to 5 in this study (Appendix A). The number of sampled majority class instances (n_{sampled-majority}) equals n_minority. These sampled majority class instances are then combined with the minority class instances to generate balanced datasets. All minority class instances remain constant across subsets. As a result, k balanced training datasets are produced.

3.3. Minority Class Highlighting

To emphasize the distinctive characteristics of the minority class, we apply a minority class highlighting (MCH) process. MCH increases the feature values of correctly predicted minority class instances, thereby refining data representation and sharpening the decision boundary between minority and majority classes. This process helps reduce misclassification, whether a minority instance is incorrectly classified as a majority instance or vice versa, especially for samples located near the classification boundary.

The MCH procedure is illustrated in Figure 2. Let X₁, X₂, X₃, …, X_n denote independent variables; y denotes the actual class, and

\hat{y}

represents the predicted class produced by the classifier. (a) The classifier is first trained on a balanced dataset generated through sampling. (b) Predictions are obtained from the trained classifier. (c) The predicted minority class labels are compared with the actual minority class labels. (d) For instances that are correctly classified as minority class in both actual and predicted results, a constant value of 1 is added to their feature values. Finally, the highlighted training data is obtained through MCH. For example, consider an instance whose actual label is minority (y = 1) and is predicted as minority (

\hat{y}

= 1). If its original feature values are (X₁ = 0.0944, X₂ = −0.0320, and X₃ = −0.0511), then MCH increases each feature by 1 and yields (X₁ = 1.0944, X₂ = 0.9680, and X₃ = 0.9489). This indicates the process by which MCH selectively amplifies correctly classified minority instances to reinforce their representation in the training data.

MCH is expressed in Equation (1) as follows:

X_{i n}^{M C H} = \{\begin{matrix} X_{i n} + δ, & i f y_{i} = 1 a n d {\hat{y}}_{i} = 1 \\ X_{i n}, & o t h e r w i s e \end{matrix}

(1)

The feature vector of instance i is denoted as X_i = (X_i₁, X_i₂, …, X_in);

y_{i} \in {0, 1}

and

{\hat{y}}_{i} \in {0, 1}

represent its actual and predicted labels, respectively. The highlighting coefficient is set to

δ = 1

. The MCH weights the feature values only when the actual and predicted labels indicate the minority class. This rule ensures that only the instances correctly detected by the classifier are emphasized, thereby fine-graining minority-class characteristics.

MCH achieves finer segmentation of minority-class features via distinct weighting schemes. In machine learning, input feature transformation can be conceptually represented as linear operations of the form

a x + b

, where a and b denote the scaling factor and bias, respectively. Highlighting feature values via multiplication (×a) modifies gradients, potentially distorting the data distribution or excessively amplifying certain dimensions. In contrast, MCH employs an additive (+b) approach, enabling minority-class instances to be subtly shifted without altering the overall distribution. Therefore, MCH enhances the representation of minority-class features while preserving the intrinsic structure of the data; it thus enables the classifier to more effectively learn their underlying structural patterns.

3.4. MCH-Ensemble: Minority Class Highlighting Ensemble

We propose MCH-Ensemble (Minority Class Highlighting Ensemble), a framework that enhances prediction accuracy by fine-graining the characteristics of the minority class. The overall architecture of the proposed method is illustrated in Figure 3. MCH-Ensemble consists of two main stages: (1) training base classifiers on balanced datasets (Figure 4) and (2) training a random forest on the highlighted data (Figure 5). In the first stage, an under-sampling ensemble is applied to the imbalanced dataset, producing k balanced training subsets. The hyperparameter k is set to 5 in this study. Each balanced subset is used to train a base classifier. The base classifiers include decision tree, extreme gradient boosting (XGBoost), and LightGBM. After training, predictions are obtained from each of the k classifiers. In the second stage, the MCH process is performed using the outputs of the base classifiers. This step produces k datasets that contain fine-grained representations of the minority class. These highlighted datasets are then combined and used to train a random forest model, which serves as the final meta-classifier. The random forest learns from the enriched feature representations and produces the final prediction results.

4. Experimental Setup

This section describes the experimental setup, which includes four components: data collection, data preprocessing, modeling, and evaluation metrics.

4.1. Data Collection

Three different DoS network intrusion datasets are used to evaluate the proposed model. All three datasets are imbalanced, meaning the proportions of normal and attack instances differ significantly. The UNSW-NB15 dataset [43] was obtained from OpenDrive, the CIC-IDS2017 dataset [44,45] from the Canadian Institute for Cybersecurity, and the WSN-DS dataset [46,47] from the Kaggle platform. The details of these datasets are summarized in Table 1.

In this study, “Normal” instances represent the majority class, whereas “Attack” instances correspond to the minority class. The UNSW-NB15 dataset contains 109,353 instances with 45 features, including 93,000 normal and 16,353 attack samples, resulting in an imbalance ratio of 5.69. The CIC-IDS2017 dataset comprises 692,703 instances with 79 features, including 440,031 normal and 252,672 attack samples, giving an imbalance ratio of 1.74. The WSN-DS dataset exhibits the highest imbalance ratio of 9.83, consisting of 340,066 normal and 34,595 attack samples.

4.2. Data Preprocessing

Data preprocessing is a critical step before model training to ensure reliability and consistency. The preprocessing steps are as follows:

Removal of null values and standardization of numeric independent variables;
Selection of significant independent variables using the variance inflation factor (VIF);
Splitting the dataset into training and testing subsets.

Rows containing null values were removed because missing data can degrade model prediction accuracy [48]. Each dataset numerical contained numerical variables, and no additional categorical encoding was required. When features with different units or scales are used in a single model, the variables with larger magnitudes may disproportionately influence the results. Therefore, numerical independent variables with varying distributions were standardized to have a mean of 0 and a standard deviation of 1 [49].

We next evaluated multicollinearity using the VIF. Multicollinearity refers to the presence of strong correlations among independent variables, which makes it difficult to estimate the individual effects of each predictor on the dependent variable and reduces model reliability [50]. The VIF is a widely used indicator for detecting multicollinearity, where a VIF value of 10 or higher indicates a high correlation among variables [51]. In the UNSW-NB15 dataset, the variable tcprtt exhibited a VIF value of 9.5, which, although below the threshold of 10, was excluded because it showed relatively high correlation with other independent variables [52]. After this analysis, the significant independent variables selected for each dataset were as follows: 29 for UNSW-NB15, 16 for CIC-IDS2017, and 13 for WSN-DS. For reproducibility, the preprocessed datasets are summarized in detail in Appendix B.

For the experiments, 80% of the CIC-IDS2017 and WSN-DS datasets were used for training, and the remaining 20% were used for testing. Both sets maintained the original class imbalance distribution. The training data were balanced using the under-sampling ensemble, while the testing data were balanced through a single round of under-sampling. For the UNSW-NB15 dataset, the training data consisted of 56,000 “Normal” and 12,264 “Attack” instances, and the testing data consisted of 37,000 “Normal” and 4089 “Attack” instances, preserving the original imbalance ratio. The training data were balanced using the under-sampling ensemble, and the testing data were balanced through single under-sampling. The testing datasets were used to validate the proposed method.

However, real-world network intrusions are inherently imbalanced. To verify that the proposed method performs well under realistic conditions, additional results obtained on the original imbalanced testing data are reported in Appendix C.

4.3. Modeling: Minority Class Highlighting Ensemble (MCH-Ensemble)

After data preprocessing, the proposed model was developed using the MCH-Ensemble framework. Figure 6 presents a summary of the model’s construction and design. Machine learning-based classifiers—decision tree, XGBoost, and LightGBM—were employed as base learners. A random forest was used as the meta-model to combine the outputs from the base classifiers.

All four classifiers were tuned using Optuna that automatically derived optimal hyperparameters for the three datasets [53]. These hyperparameters are listed in Table 2, Table 3 and Table 4, respectively.

4.4. Evaluation Metrics

To comprehensively evaluate the performance of the proposed method, five standard classification metrics are used: accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC) [54,55,56].

Accuracy represents the proportion of correctly predicted instances among all predictions and is defined as:

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N} \times 100

(2)

where TP (true positive) denotes the number of actual minority class instances correctly predicted as minority class; TN (true negative) represents the number of actual majority class instances correctly predicted as majority class; FP (false positive) refers to majority class instances incorrectly predicted as minority class; and FN (false negative) represents minority class instances incorrectly predicted as majority class.

Precision measures the proportion of correctly predicted positive instances among all predicted positive instances:

P r e c i s i o n = \frac{T P}{T P + F P}

(3)

Recall measures the proportion of actual positive instances correctly identified by the model:

R e c a l l = \frac{T P}{T P + F N}

(4)

F1-score is the harmonic mean of precision and recall, providing a balanced measure of both metrics:

F 1_S c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(5)

AUC-ROC measures the overall discriminative ability of a classifier by quantifying the area under the ROC curve. The ROC curve represents the relationship between the true positive rate (TPR) and false positive rate (FPR) across all possible decision thresholds. The TPR and FPR are defined in Equations (6) and (7), respectively:

T P R = \frac{T P}{T P + F N},

(6)

F P R = \frac{F P}{F P + T N} .

(7)

The AUC-ROC is calculated as follows:

A U C - R O C = \sum_{i = 1}^{n - 1} ({F P R}_{i + 1} - {F P R}_{i}) \times \frac{{T P R}_{i + 1} + {T P R}_{i}}{2} .

(8)

The ROC curve was plotted to visually assess the performance of MCH-Ensemble [57]. The x-axis and y-axis represent the FPR and TPR, respectively. A curve that is closer to the upper-left corner indicates superior classification performance.

A model demonstrates better performance when accuracy approaches 100%, and precision, recall, F1-score, and AUC-ROC approach 1.

5. Experimental Results

This section presents the experimental results used to evaluate the performance of the proposed MCH-Ensemble across three datasets: UNSW-NB15, CIC-IDS2017, and WSN-DS.

The proposed framework incorporates two key innovations: (1) MCH, which refines the representation of minority class features, and (2) an ensemble of base classifiers and a random forest meta-model, which improves predictive performance by learning from diverse, fine-grained bagging samples. To validate the effectiveness of these components, we conducted a series of comparative experiments. We employed three heterogeneous ensemble configurations: decision tree + random forest (DT + RF), XGBoost + random forest (XGB + RF), and LightGBM + random forest (LGBM + RF). Each ensemble was evaluated both with and without MCH to assess the contribution of the highlighting process.

5.1. UNSW-NB15

Table 5 presents the performance of all ensemble models on the UNSW-NB15 testing set across five evaluation metrics. Overall, ensembles incorporating MCH consistently outperformed those without it. Notably, all ensembles without MCH exhibited identical performance. This is because the diverse outputs from the base classifiers are not reflected in training sets, so the random forest meta-model is trained on the simply aggregated dataset of the k balanced training datasets generated by the under-sampling ensemble. This confirms that the MCH step is crucial for fine-graining the training data and enhancing model diversity in random forest. And all ensembles that employed MCH achieved the precision of 1.0000, indicating that emphasizing minority class features enabled the model to more effectively learn the distinguishing characteristics of attack instances.

Among all the tested configurations, the XGB + RF ensemble achieved the best overall performance on the testing set: an accuracy of 98.79%, a precision of 1.0000, a recall of 0.9758, an F1-score of 0.9877, and an AUC-ROC of 0.9963. The recall value was slightly lower than those of the models without MCH but the other performance metrics were higher, confirming the superiority of the proposed method. These results indicate that the XGB + RF ensemble is the most suitable configuration for the UNSW-NB15 dataset.

Figure 7 compares the ROC curves of ensemble models on the UNSW-NB15 testing set. These results demonstrate that the models reliably and consistently classify the classes. The XGB + RF ensemble achieved the highest AUC scores among all the compared ensemble models.

5.2. CIC-IDS2017

Table 6 shows the performances of all ensemble models on the CIC-IDS2017 testing set across five evaluation metrics. Overall, all models achieved excellent performances, indicating that the dataset itself contained features that facilitated straightforward distinguishing between attack and normal instances. In particular, all models with MCH achieved a precision and an AUC-ROC of 1.0000, indicating that they effectively and accurately identified attack instances while minimizing false positives.

Among all configurations, the XGBoost + random forest (XGB + RF) ensemble achieved the best overall results on the testing set, with an accuracy of 99.99%, precision of 1.0000, recall of 0.9999, F1-score of 0.9999, and AUC-ROC of 1.0000. These findings demonstrate that the XGB + RF ensemble is the most suitable configuration for the CIC-IDS2017 dataset.

Figure 8 compares the ROC curves of ensemble models on the CIC-IDS2017 testing set. The ensemble model without MCH achieved an excellent performance with an AUC of 0.9996, whereas those with MCH classified more accurately and achieved a higher AUC of 1.0000.

5.3. WSN-DS

Table 7 summarizes the performance of all ensemble models on the WSN-DS testing set across the five evaluation metrics. All ensembles without MCH performed identically on the WSN-DS testing set, similar to the UNSW-NB15 and CIC-IDS2017 datasets, because the random forest meta-model was trained on the same aggregated dataset. In general, ensembles incorporating MCH achieved better results than those without it and achieved a precision of 1.0000, like the UNSW-NB15 dataset, because the emphasizing minority class features enabled the model to more effectively learn the distinguishing characteristics of attack instances.

Among all the tested configurations, the DT + RF ensemble achieved the best overall performance on the testing set, with an accuracy of 99.39%, a precision of 1.0000, a recall of 0.9879, an F1-score of 0.9939, and an AUC-ROC of 0.9971. Although its AUC-ROC value was slightly lower than that of LGBM + RF, the other performance metrics were higher; this indicated that the proposed method achieved superior performance across all key metrics. The DT + RF was the simplest and most intuitive among the evaluated configurations and exhibited high performance with minimal model complexity. Given these results, decision tree + random forest (DT + RF) can be considered the most suitable method for the WSN-DS dataset.

Figure 9 compares the ROC curves of ensemble models on the WSN-DS testing set. All models classified the classes reliably and consistently, wherein the LGBM + RF achieved a slightly higher AUC of 0.9972 compared with other models.

5.4. Stepwise Performance Comparison of MCH-Ensemble

The overall performances of the MCH-Ensemble on the UNSW-NB15, CIC-IDS2017, and WSN-DS datasets were validated via experiments. However, the components that contribute to the observed performance gains must also be analyzed. The MCH-Ensemble comprises two main stages: (1) training base classifiers on balanced datasets and (2) training a random forest on the highlighted data. The contribution of each stage was analyzed via these comparative experiments: training the base classifier without the under-sampling ensemble; training the base classifier using the under-sampling ensemble; and training the MCH-Ensemble.

Table 8, Table 9 and Table 10 present the performance of each stage across five evaluation metrics for each dataset. The MCH-Ensemble demonstrated overall superior performance across all three datasets compared with the conventional single model and the model using the under-sampling ensemble. The single model exhibited reduced predictive capability for minority classes, resulting in relatively low recall compared with the other metrics. Although the under-sampling ensemble improved recall, its performance on some metrics deteriorated compared with the single model because it averaged the predictions of multiple classifiers trained on various balanced subsets of data. Both the models showed noticeable variations in metric values depending on the dataset, indicating their limited performance stability. The MCH-Ensemble exhibited minimal performance fluctuation across the three datasets, indicating that it effectively addressed class imbalance and maintained robust and stable performance regardless of dataset characteristics.

The computational cost of MCH-Ensemble was also measured. Table 11 shows the training time and memory consumption of the proposed model on each dataset. The computational cost varied based on the dataset size and the degree of class imbalance; therefore, efficiency should be carefully considered in practical applications.

5.5. Performance Comparison with Previous Studies

The performance of MCH-Ensemble in improving DoS detection was confirmed by comparing it with hybrid methods across the same five metrics across the three datasets (Refer to Table 12).

MCH-Ensemble outperformed the existing hybrid methods in most evaluation metrics, exhibiting consistent performance. This indicates that MCH-Ensemble enhances the detection capability of ensemble-based intrusion detection models and provides a more robust and generalizable approach for identifying DoS attacks in diverse network environments.

Figure 10 compares the ROC curves of MCH-Ensemble and existing classification method across the three datasets, indicating their enhanced performance. MCH-Ensemble performed equally or slightly lower to other methods but exhibited higher values for the indicators used.

6. Discussion and Conclusions

DoS attack detection in NIDS is an emerging research topic in cybersecurity. Although machine learning techniques have been widely adopted to build effective NIDS models, real-world network data often suffer from severe class imbalance. Machine learning techniques are typically designed assuming evenly distributed classes; therefore, class imbalance leads to biased models or failure in the detection of rare but critical DoS attacks. This raises an important research question: how can minority class detection be effectively improved while ensuring stable prediction performance under class imbalance? To address this question, MCH-Ensemble, a method designed to address class imbalance and enhance prediction accuracy for DoS detection, was proposed herein. We aimed to enhance DoS attack detection by emphasizing minority-class features to enhance minority-class recognition accuracy and achieve high-accuracy predictions. The proposed framework comprises two main stages. In the first stage, k balanced training datasets were constructed using the under-sampling ensemble technique, where k is a hyperparameter set to 5 in this study. Machine learning-based classifiers were trained on each balanced dataset, and their predictions were obtained. In the second stage, MCH was applied to the minority class predictions of the k classifiers. This process generated k highlighted training datasets with enhanced representation of minority class features. Finally, a random forest model was trained on all highlighted datasets to produce the final predictions. Experiments were conducted using three DoS datasets—UNSW-NB15, CIC-IDS2017, and WSN-DS—with varying imbalance ratios. The ROC curve is used to Analyze the performance comparison. Model performance was evaluated using five metrics: accuracy, precision, recall, F1-score, and AUC-ROC. The results demonstrated that the XGBoost + random forest (XGB + RF) ensemble achieved the best performance on the UNSW-NB15 and CIC-IDS2017 datasets, whereas the decision tree + random forest (DT + RF) ensemble performed optimally on the WSN-DS dataset. These findings indicate that different dataset characteristics can influence the optimal model configuration. MCH-Ensemble showed improved performance compared with existing models. On the UNSW-NB15 and CIC-IDS2017 dataset, it achieved improvements in accuracy, precision, recall, F1-score, and AUC-ROC by ~1.2% and ~0.61%, ~9.8% and 0.77%, ~0.7% and ~0.56%, ~5.3% and ~0.66%, and ~0.1% and ~0.06%, respectively. It improved these metrics by ~0.17%, ~1.66%, ~0.11%, ~0.88%, and ~0.06% on the WSN-DS dataset similarly. Overall, the proposed method exhibited robust and stable performance across datasets with diverse imbalance ratios. The significance of MCH-Ensemble lies in its ability to extend beyond conventional data preprocessing methods. By highlighting minority class features, the framework captures attack characteristics more precisely, thereby improving detection performance. Moreover, the proposed architecture is model-agnostic, allowing it to be readily integrated with various machine learning or deep learning-based classification models. Nevertheless, this study has certain limitations. All experiments were conducted exclusively on DoS-type attacks, which limited the evaluation scope of MCH-Ensemble. Although this study focused on binary DoS detection, MCH-Ensemble was designed to refine and emphasize minority-class feature representations, rather than being restricted to a specific attack type. Future research will therefore apply this model to broader network intrusion categories and validate its capability in multiclass scenarios. Furthermore, the validation experiments were conducted using publicly available NIDS datasets. Future research should include validation on datasets from other domains to assess the generalizability of the proposed approach. We plan to extend our experiments to additional domains to further verify the robustness of MCH-Ensemble. The advantages and limitations of MCH-Ensemble are summarized in Appendix D.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app152312647/s1.

Author Contributions

Conceptualization, S.O., S.S., C.K. and M.P.; data curation, S.O.; formal analysis, S.O.; methodology, S.O., S.S., C.K. and M.P.; validation, S.O.; Supplementary Materials, S.O.; writing—original draft preparation, S.O., S.S., C.K. and M.P.; writing—review and editing, M.P.; supervision, M.P.; funding acquisition, M.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data of this study are available at: https://sites.google.com/view/sumin-oh/%ED%99%88/mch-ensemble (accessed on 26 November 2025). Data is contained within the article or Supplementary Materials.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AUC	Area Under Curve
DoS	Denial-of-service
FN	False negative
FP	False positive
LightGBM	Light gradient boosting machine
MCH	Minority class highlighting
NIDS	Network intrusion detection systems
ROC	Receiver operating Characteristic
SMOTE	Synthetic minority over-sampling technique
SVM	Support vector machine
TN	True negative
TP	True positive
VIF	Variance inflation factor
XGBoost	Extreme gradient boosting

Appendix A

To determine the appropriate value of the hyperparameter k used in the under-sampling ensemble, we experimented with several values: k ∈ {2, 4, 5, 8, 10}. We constructed the MCH-Ensemble using DT + RF for comparison as it offers a simple yet effective baseline for evaluating the impact of k on the model performance. The optimal k was derived by measuring the F1-score, which is the harmonic mean of precision and recall. Three datasets were used for the evaluation: UNSW-NB15, CIC-IDS2017, and WSN-DS.

Figure A1 shows the performance of MCH-Ensemble across different k. The blue, orange, and green lines represent the model performance on the UNSW-NB15, CIC-IDS2017, and WSN-DS datasets, respectively. Results show that the optimal k varies depending on dataset characteristics, distribution, and imbalance ratio. On the UNSW-NB15 data, the F1-score typically increases with increasing k grows; however, excessively large k can degrade model performance. This indicates that a larger k may generate overly redundant majority-class subsets, which can negatively impact the model’s learning process. Although the model achieved the highest performance on UNSW-NB15 and CIC-IDS2017 datasets with k = 8, this value was not considered optimal. Based on the overall stability observed in the experiments, k = 5 was considered optimal.

Figure A1. Comparison of F1-scores of MCH-Ensemble with varying k acrossUNSW-NB15, CIC-IDS2017, and WSN-DS datasets.

Appendix B

Table A1. Detailed information of the preprocessed datasets.

Datasets	All Features Numerical/Categorical	Selected Features by VIF	Training Set Normal/Attack	Testing Set Normal/Attack
UNSW-NB15	45/None	29	56,000/12,264	37,000/4089
CIC-IDS2017	79/None	16	351,746/201,378	87,937/50,345
WSN-DS	19/None	13	272,052/27,676	68,014/6919

Appendix C

Real-world network traffic is inherently imbalanced, and intrusion attempts typically occur far less frequently than normal activity. Although balanced test sets were used in main experiments herein to enable a fair comparison of minority-class detection performance, the effectiveness of MCH-Ensemble under realistic conditions must be evaluated. Table A2 shows the evaluation results of MCH-Ensemble on imbalanced testing data. Results indicate the practical applicability of MCH-Ensemble in real-world network intrusion scenarios.

Table A2. Performance of MCH-Ensemble on imbalanced testing data.

Datasets	Accuracy	Precision	Recall	F1-Score	AUC-ROC
UNSW-NB15	99.76%	1.0000	0.9758	0.9877	0.9963
CIC-IDS2017	99.99%	1.0000	0.9999	0.9999	1.0000
WSN-DS	99.89%	1.0000	0.9879	0.9939	0.9971

Appendix D

Table A3. Advantages and limitations of MCH-Ensemble.

Advantage	Limitations
Effectively improves minority-class detection by highlighting minority-class feature values.	Experiments were conducted only on DoS-type attacks, limiting the evaluation scope.
Demonstrates robust and stable performance across datasets with different imbalance ratios.	Current evaluation focuses on binary classification; multiclass applicability not yet fully validated.
Model-agnostic design enables integration with various machine learning or deep learning classifiers.	Validation used publicly available NIDS datasets only, raising questions about the model’s generalizability to other domains.

References

Lee, I. Internet of things (IoT) cybersecurity: Literature review and IoT cyber risk management. Future Internet 2020, 12, 157. [Google Scholar] [CrossRef]
Obaid, H.S. Denial of service attacks: Tools and categories. Int. J. Eng. Res. Technol. (IJERT) 2020, 9, 631–636. [Google Scholar]
Nguyen, M.T.; Kim, K. Genetic convolutional neural network for intrusion detection systems. Future Gener. Comput. Syst. 2020, 113, 418–427. [Google Scholar] [CrossRef]
Ahmed, L.A.H.; Hamad, Y.A.M. Machine learning techniques for network-based intrusion detection system: A survey paper. In Proceedings of the 2021 National Computing Colleges Conference (NCCC), Taif, Saudi Arabia, 27–28 March 2021. [Google Scholar]
Berman, D.S.; Buczak, A.L.; Chavis, J.S.; Corbett, C.L. A survey of deep learning methods for cyber security. Information 2019, 10, 122. [Google Scholar] [CrossRef]
Shanmugam, V.; Razavi-Far, R.; Hallaji, E. Addressing class imbalance in intrusion detection: A comprehensive evaluation of machine learning approaches. Electronics. 2024, 14, 69. [Google Scholar] [CrossRef]
Mirsadeghi, S.M.H.; Bahsi, H.; Vaarandi, R.; Inoubli, W. Learning from few cyber-attacks: Addressing the class imbalance problem in machine learning-based intrusion detection in software-defined networking. IEEE Access 2023, 11, 140428–140442. [Google Scholar] [CrossRef]
Krawczyk, B. Learning from imbalanced data: Open challenges and future directions. Prog. Artif. Intell. 2016, 5, 221–232. [Google Scholar] [CrossRef]
Abd Elrahman, S.M.; Abraham, A. A review of class imbalance problem. J. Netw. Innov. Comput. 2013, 1, 332–340. [Google Scholar]
Mohammed, R.; Rawashdeh, J.; Abdullah, M. Machine learning with oversampling and undersampling techniques: Overview study and experimental results. In Proceedings of the 2020 11th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 7–9 April 2020. [Google Scholar] [CrossRef]
Pang, Y.; Chen, Z.; Peng, L.; Ma, K.; Zhao, C.; Ji, K. A signature-based assistant random oversampling method for malware detection. In Proceedings of the 2019 18th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/13th IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE), Rotorua, New Zealand, 5–8 August 2019. [Google Scholar] [CrossRef]
Fernando, K.R.M.; Tsokos, C.P. Dynamically weighted balanced loss: Class imbalanced learning and confidence calibration of deep neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 2940–2951. [Google Scholar] [CrossRef] [PubMed]
Alkhawaldeh, I.M.; Albalkhi, I.; Naswhan, A.J. Challenges and limitations of synthetic minority oversampling techniques in machine learning. World J. Methodol. 2023, 13, 373. [Google Scholar] [CrossRef]
Kim, Y.J.; Baik, B.; Cho, S. Detecting financial misstatements with fraud intention using multi-class cost-sensitive learning. Expert Syst. Appl. 2016, 62, 32–43. [Google Scholar] [CrossRef]
Peykani, P.; Peymany Foroushany, M.; Tanasescu, C.; Sargolzaei, M.; Kamyabfar, H. Evaluation of cost-sensitive learning models in forecasting business failure of capital market firms. Mathematics 2025, 13, 368. [Google Scholar] [CrossRef]
Wozniak, M. Hybrid Classifiers: Methods of Data, Knowledge, and Classifier Combination; Springer: Berlin/Heidelberg, Germany, 2013; pp. 166–167. [Google Scholar]
Liu, X.Y.; Wu, J.; Zhou, Z.H. Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2008, 39, 539–550. [Google Scholar]
Petrides, G.; Verbeke, W. Cost-sensitive ensemble learning: A unifying framework. Data Min. Knowl. Discov. 2022, 36, 1–28. [Google Scholar] [CrossRef]
Wang, A.X.; Le, V.T.; Trung, H.N.; Nguyen, B.P. Addressing imbalance in health data: Synthetic minority oversampling using deep learning. Comput. Biol. Med. 2025, 188, 109830. [Google Scholar] [CrossRef] [PubMed]
Ali, M.L.; Thakur, K.; Schmeelk, S.; Debello, J.; Dragos, D. Deep Learning vs. Machine Learning for Intrusion Detection in Computer Networks: A Comparative Study. Appl. Sci. 2025, 15, 1903. [Google Scholar] [CrossRef]
Chen, Y.; Pedrycz, W.; Zhang, C.; Wang, J.; Yang, J. Oversampling with GAN via meta-learning for imbalanced data. IEEE Trans. Multimed. 2025, 27, 8819–8834. [Google Scholar] [CrossRef]
Zhou, Z.H. Cost-Sensitive Learning; Springer: Berlin/Heidelberg, Germany, 2011; pp. 17–18. [Google Scholar]
Theephoowiang, K.; Hanskunatai, A. A Partition-Based Hybrid Algorithm for Effective Imbalanced Classification. Data 2025, 10, 54. [Google Scholar] [CrossRef]
Thach, N.H.; Rojanavasu, P.; Pinngern, O. Cost-Xensitive XCS classifier system addressing imbalance problems. In Proceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery, Jinan, China, 18–20 October 2008. [Google Scholar] [CrossRef]
Yang, C.Y.; Yang, J.S.; Wang, J.J. Margin calibration in SVM class-imbalanced learning. Neurocomputing 2009, 73, 397–411. [Google Scholar] [CrossRef]
Mienye, I.D.; Sun, Y. Performance analysis of cost-sensitive learning methods with application to imbalanced medical data. Inform. Med. Unlocked 2021, 25, 100690. [Google Scholar] [CrossRef]
Zhou, Z.H.; Liu, X.Y. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 2006, 18, 63–77. [Google Scholar] [CrossRef]
Feng, W.; Huang, W.; Ren, J. Class imbalance ensemble learning based on the margin theory. Appl. Sci. 2018, 8, 815. [Google Scholar] [CrossRef]
Ayodele, A. A comparative study of ensemble learning techniques for imbalanced classification problems. World J. Adv. Res. Rev. 2023, 19, 1633–1643. [Google Scholar] [CrossRef]
Riaz, S.; Arshad, A.; Jiao, L. Rough noise-filtered easy ensemble for software fault prediction. IEEE Access 2018, 6, 46886–46899. [Google Scholar] [CrossRef]
Wang, T.; Lu, C.; Ju, W.; Liu, C. Imbalanced heartbeat classification using EasyEnsemble technique and global heartbeat information. Biomed. Signal Process. Control. 2022, 71, 103105. [Google Scholar] [CrossRef]
Myles, A.J.; Feudale, R.N.; Liu, Y.; Woody, N.A.; Brown, S.D. An introduction to decision tree modeling. J. Chemom. 2004, 18, 275–285. [Google Scholar] [CrossRef]
Quinlan, J.R. Learning decision tree classifiers. ACM Comput. Surv. 1996, 28, 71–72. [Google Scholar] [CrossRef]
De Ville, B. Decision trees. Wiley Interdiscip. Rev. Comput. Stat. 2013, 5, 448–455. [Google Scholar] [CrossRef]
Salman, H.A.; Kalakech, A.; Steiti, A. Random forest algorithm overview. Babylon. J. Mach. Learn. 2024, 2024, 69–79. [Google Scholar] [CrossRef] [PubMed]
Sutton, C.D. Classification and regression trees, bagging, and boosting. Handb. Stat. 2005, 24, 303–329. [Google Scholar]
Biau, G. Analysis of a random forests model. J. Mach. Learn. Res. 2012, 13, 1063–1095. [Google Scholar] [CrossRef]
Asselman, A.; Khaldi, M.; Aammou, S. Enhancing the prediction of student performance based on the machine learning XGBoost algorithm. Interact. Learn. Environ. 2023, 31, 3360–3379. [Google Scholar] [CrossRef]
Pristyanto, Y.; Mukarabiman, Z.; Nugraha, A.F. Extreme gradient boosting algorithm to improve machine learning model performance on multiclass imbalanced dataset. JOIV Int. J. Inform. Vis. 2023, 7, 710–715. [Google Scholar] [CrossRef]
Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
Zhang, D.; Gong, Y. The comparison of LightGBM and XGBoost coupling factor analysis and prediagnosis of acute liver failure. IEEE Access 2020, 8, 220990–221003. [Google Scholar] [CrossRef]
Sun, B.; Chen, H.; Wang, J.; Xie, H. Evolutionary under-sampling based bagging ensemble method for imbalanced data classification. Front. Comput. Sci. 2018, 12, 331–350. [Google Scholar] [CrossRef]
Moustafa, N.; Slay, J. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, Australia, 10–12 November 2015. [Google Scholar] [CrossRef]
Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP), Funchal, Portugal, 22–24 January 2018. [Google Scholar]
Canadian Institute for Cybersecurity. Available online: https://www.unb.ca/cic/datasets/ids-2017.html (accessed on 19 October 2025).
Almomani, I.; Al-Kasasbeh, B.; Al-Akhras, M. WSN-DS: A dataset for intrusion detection systems in wireless sensor networks. J. Sens. 2016, 2016, 4731953. [Google Scholar] [CrossRef]
Kaggle. Available online: https://www.kaggle.com/datasets/bassamkasasbeh1/wsnds (accessed on 22 October 2025).
Nijman, S.W.J.; Leeuwenberg, A.M.; Beekers, I.; Verkouter, I.; Jacobs, J.J.L.; Bots, M.L.; Asselbergs, F.W.; Moons, K.G.M.; Debray, T.P.A. Missing data is poorly handled and reported in prediction model studies using machine learning: A literature review. J. Clin. Epidemiol. 2022, 142, 218–229. [Google Scholar] [CrossRef]
Sujon, K.M.; Hassan, R.B.; Towshi, Z.T.; Othman, M.A.; Samad, M.A.; Choi, K. When to use standardization and normalization: Empirical evidence from machine learning models and XAI. IEEE Access 2024, 12, 135300–135314. [Google Scholar] [CrossRef]
Cheng, J.; Sun, J.; Yao, K.; Xu, M.; Cao, Y. A variable selection method based on mutual information and variance inflation factor. Spectrochim. Acta A 2022, 268, 120652. [Google Scholar] [CrossRef]
Tsagris, M.; Pandis, N. Multicollinearity. Stat. Res. Des. 2021, 159, 695–696. [Google Scholar] [CrossRef]
Kim, J.H. Multicollinearity and misleading statistical results. Korean J. Anesthesiol. 2019, 72, 558–569. [Google Scholar] [CrossRef] [PubMed]
Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-Generation Hyperparameter Optimization Frame-work. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019. [Google Scholar]
Sathyanarayanan, S.; Tantri, B.R. Confusion matrix-based performance evaluation metrics. Afr. J. Biomed. Res. 2024, 27, 4023–4031. [Google Scholar] [CrossRef]
Hossin, M.; Sulaiman, M.N. A review on evaluation metrics for data classification evaluations. Int. J. Data Min. Knowl. Manag. Process 2015, 5, 1–11. [Google Scholar] [CrossRef]
Tafvizi, A.; Avci, B.; Sundararajan, M. Attributing AUC-ROC to Analyze Binary Classifier Performance. arXiv 2022, arXiv:2205.11781. [Google Scholar] [CrossRef]
Hoo, Z.H.; Candlish, J.; Teare, D. What is an ROC curve? Emerg. Med. J. 2017, 34, 357–359. [Google Scholar] [CrossRef]

Figure 1. Process of under-sampling ensemble.

Figure 2. Example of the minority class highlighting process; (a) Classifier training with the balanced dataset; (b) Classifier prediction; (c) Comparison of predicted and actual minority labels; (d) Highlighting instances correctly predicted as minority.

Figure 3. Architecture of the Minority Class Highlighting Ensemble (MCH-Ensemble). Minority ≠ Minority indicates that the predicted minority class does not match the actual minority class, whereas Minority = Minority indicates that the predicted minority class corresponds to the actual minority class.

Figure 4. Overview of the first stage of MCH-Ensemble: balanced training datasets for base classifiers are generated through the under-sampling ensemble, and the classifiers are trained on these datasets.

Figure 5. Overview of the second stage of MCH-Ensemble: highlighted training datasets for the random forest are generated through the minority class highlighting process.

Figure 6. Overview of the MCH-Ensemble framework. ML denotes machine learning.

Figure 7. Comparison of the ROC curves of ensemble models on the UNSW-NB15 testing set; The dashed line represents the performance of a classifier that predicts classes randomly.

Figure 8. Comparison of the ROC curves of ensemble models on the CIC-IDS2017 testing set; The dashed line represents the performance of a classifier that predicts classes randomly.

Figure 9. Comparison of the ROC curves of ensemble models on the WSN-DS testing set; The dashed line represents the performance of a classifier that predicts classes randomly.

Figure 10. (a) Comparison of the ROC curves of different classification methods on UNSW-NB15; (b) CIC-IDS2017; and (c) WSN-DS, respectively; The dashed line represents the performance of a classifier that predicts classes randomly.

Table 1. Description of the three DoS network intrusion datasets used in the experiments.

Datasets	Features	Number of Instances	Normal/Attack	Imbalance Ratio
UNSW-NB15	45	109,353	93,000/16,353	5.69
CIC-IDS2017	79	692,703	440,031/252,672	1.74
WSN-DS	19	374,661	340,066/34,595	9.83

Table 2. Hyperparameters for each classifier on UNSW-NB15.

Classifiers	Max_Depth	n_Estimators	Max_Features	Learning_Rate
Decision Tree	5	1	N/A	N/A
Random Forest	5	285	0.36	N/A
XGBoost	5	500	N/A	0.06
LightGBM	5	498	N/A	0.06

Table 3. Hyperparameters for each classifier on CIC-IDS2017.

Classifiers	Max_Depth	n_Estimators	Max_Features	Learning_Rate
Decision Tree	5	1	N/A	N/A
Random Forest	5	320	0.72	N/A
XGBoost	5	275	N/A	0.09
LightGBM	5	275	N/A	0.09

Table 4. Hyperparameters for each classifier on WSN-DS.

Classifiers	Max_Depth	n_Estimators	Max_Features	Learning_Rate
Decision Tree	5	1	N/A	N/A
Random Forest	5	253	0.67	N/A
XGBoost	5	275	N/A	0.09
LightGBM	5	275	N/A	0.09

Table 5. Performance of all ensembles on the UNSW-NB15 testing set.

Ensemble	MCH	Accuracy	Precision	Recall	F1-Score	AUC-ROC
DT + RF	X	94.63%	0.9170	0.9814	0.9481	0.9895
DT + RF	O	97.49%	1.0000	0.9499	0.9743	0.9854
XGB + RF	X	94.63%	0.9170	0.9814	0.9481	0.9895
XGB + RF	O	98.79%	1.0000	0.9758	0.9877	0.9963
LGBM + RF	X	94.63%	0.9170	0.9814	0.9481	0.9895
LGBM + RF	O	98.62%	1.0000	0.9724	0.9860	0.9935

DT: Decision tree; RF: Random forest; XGB: XGBoost; LGBM: LightGBM; and MCH: Minority class highlighting.

Table 6. Performance for all ensembles on the CIC-IDS2017 testing set.

Ensemble	MCH	Accuracy	Precision	Recall	F1-Score	AUC-ROC
DT + RF	X	99.41%	0.9900	0.9983	0.9941	0.9996
DT + RF	O	99.96%	1.0000	0.9992	0.9996	1.0000
XGB + RF	X	99.41%	0.9900	0.9983	0.9941	0.9996
XGB + RF	O	99.99%	1.0000	0.9999	0.9999	1.0000
LGBM + RF	X	99.41%	0.9900	0.9983	0.9941	0.9996
LGBM + RF	O	99.99%	1.0000	0.9998	0.9999	1.0000

DT: Decision tree; RF: Random forest; XGB: XGBoost; LGBM: LightGBM; MCH: Minority class highlighting.

Table 7. Performance of all ensembles on the WSN-DS testing set.

Ensemble	MCH	Accuracy	Precision	Recall	F1-Score	AUC-ROC
DT + RF	X	98.44%	0.9811	0.9879	0.9844	0.9962
DT + RF	O	99.39%	1.0000	0.9879	0.9939	0.9971
XGB + RF	X	98.44%	0.9811	0.9879	0.9844	0.9962
XGB + RF	O	99.39%	1.0000	0.9877	0.9938	0.9971
LGBM + RF	X	98.44%	0.9811	0.9879	0.9844	0.9962
LGBM + RF	O	99.39%	1.0000	0.9877	0.9938	0.9972

DT: Decision tree; RF: Random forest; XGB: XGBoost; LGBM: LightGBM; MCH: Minority class highlighting.

Table 8. Performance of each stage of MCH-Ensemble on the UNSW-NB15 testing set.

Models	Accuracy	Precision	Recall	F1-Score	AUC-ROC
DT	91.23%	0.9845	0.8379	0.9053	0.9705
RF	94.55%	0.9762	0.9132	0.9436	0.9926
XGB	97.70%	0.9868	0.9670	0.9768	0.9977
LGBM	97.79%	0.9866	0.9689	0.9777	0.9978
UE + DT	92.88%	0.9083	0.9542	0.9306	0.9584
UE + RF	94.61%	0.9160	0.9824	0.9480	0.9900
UE + XGB	97.46%	0.9648	0.9853	0.9749	0.9972
UE + LGBM	97.56%	0.9674	0.9843	0.9758	0.9972
Our Proposed Method	98.79%	1.0000	0.9758	0.9877	0.9963

DT: Decision tree; RF: Random forest; XGB: XGBoost; LGBM: LightGBM; UE: Under-sampling Ensemble.

Table 9. Performance of each stage of MCH-Ensemble on the CIC-IDS2017 testing set.

Models	Accuracy	Precision	Recall	F1-Score	AUC-ROC
DT	99.42%	0.9952	0.9930	0.9941	0.9990
RF	99.40%	0.9942	0.9939	0.9940	0.9995
XGB	99.97%	0.9995	0.9999	0.9997	1.0000
LGBM	99.98%	0.9996	0.9999	0.9998	1.0000
UE + DT	99.41%	0.9950	0.9931	0.9941	0.9990
UE + RF	99.42%	0.9937	0.9947	0.9942	0.9995
UE + XGB	99.97%	0.9994	0.9999	0.9997	1.0000
UE + LGBM	99.97%	0.9995	0.9999	0.9997	1.0000
Our Proposed Method	99.99%	1.0000	0.9999	0.9999	1.0000

Table 10. Performance of each stage of the MCH-Ensemble on the WSN-DS testing set.

Models	Accuracy	Precision	Recall	F1-Score	AUC-ROC
DT	97.54%	0.9925	0.9581	0.9750	0.9955
RF	94.14%	0.9977	0.8848	0.9379	0.9962
XGB	98.99%	0.9977	0.9821	0.9898	0.9972
LGBM	99.03%	0.9978	0.9828	0.9902	0.9971
UE + DT	98.43%	0.9810	0.9878	0.9844	0.9954
UE + RF	98.44%	0.9811	0.9878	0.9844	0.9962
UE + XGB	99.14%	0.9952	0.9875	0.9913	0.9970
UE + LGBM	99.13%	0.9951	0.9875	0.9913	0.9971
Our Proposed Method	99.39%	1.0000	0.9879	0.9939	0.9971

Table 11. Training time and memory of MCH-Ensemble on each dataset.

Datasets	Times (Seconds)	Memory (MB)
UNSW-NB15	20.91	108.62
CIC-IDS2017	464.29	951.53
WSN-DS	26.51	112.02

Table 12. Comparison of various classification methods with MCH-Ensemble across three datasets.

Datasets	Methods		Accuracy	Precision	Recall	F1-Score	AUC-ROC
UNSW-NB15	Previous studies	EasyEnsemble [28,29]	97.70%	0.9594	0.9836	0.9714	0.9971
		Deep Learning [19]	98.50%	0.8906	0.9680	0.9277	0.9976
		Deep Learning [20]	95.12%	0.8397	0.9455	0.8826	0.9868
	MCH-Ensemble		98.79%	1.0000	0.9758	0.9877	0.9963
CIC-IDS2017	Previous studies	EasyEnsemble [28,29]	99.93%	0.9988	0.9998	0.9993	1.0000
		Deep Learning [19]	99.98%	0.9996	0.9999	0.9998	1.0000
		Deep Learning [20]	98.25%	0.9790	0.9836	0.9812	0.9983
	MCH-Ensemble		99.99%	1.0000	0.9999	0.9999	1.0000
WSN-DS	Previous studies	EasyEnsemble [28,29]	98.79%	0.9879	0.9879	0.9879	0.9963
		Deep Learning [19]	99.37%	0.9825	0.9816	0.9821	0.9976
		Deep Learning [20]	99.52%	0.9807	0.9911	0.9858	0.9955
	MCH-Ensemble		99.39%	1.0000	0.9879	0.9939	0.9971

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Oh, S.; Sohn, S.; Kim, C.; Park, M. MCH-Ensemble: Minority Class Highlighting Ensemble Method for Class Imbalance in Network Intrusion Detection. Appl. Sci. 2025, 15, 12647. https://doi.org/10.3390/app152312647

AMA Style

Oh S, Sohn S, Kim C, Park M. MCH-Ensemble: Minority Class Highlighting Ensemble Method for Class Imbalance in Network Intrusion Detection. Applied Sciences. 2025; 15(23):12647. https://doi.org/10.3390/app152312647

Chicago/Turabian Style

Oh, Sumin, Seoyoung Sohn, Chaewon Kim, and Minseo Park. 2025. "MCH-Ensemble: Minority Class Highlighting Ensemble Method for Class Imbalance in Network Intrusion Detection" Applied Sciences 15, no. 23: 12647. https://doi.org/10.3390/app152312647

APA Style

Oh, S., Sohn, S., Kim, C., & Park, M. (2025). MCH-Ensemble: Minority Class Highlighting Ensemble Method for Class Imbalance in Network Intrusion Detection. Applied Sciences, 15(23), 12647. https://doi.org/10.3390/app152312647

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MCH-Ensemble: Minority Class Highlighting Ensemble Method for Class Imbalance in Network Intrusion Detection

Abstract

1. Introduction

2. Related Work

2.1. Data-Level Methods

2.2. Algorithm-Level Methods

2.3. Hybrid Methods

3. Methodology

3.1. Machine Learning-Based Classifiers

3.1.1. Decision Tree

3.1.2. Random Forest

3.1.3. Extreme Gradient Boosting (XGBoost)

3.1.4. Light Gradient Boosting Machine (LightGBM)

3.2. Under-Sampling Ensemble

3.3. Minority Class Highlighting

3.4. MCH-Ensemble: Minority Class Highlighting Ensemble

4. Experimental Setup

4.1. Data Collection

4.2. Data Preprocessing

4.3. Modeling: Minority Class Highlighting Ensemble (MCH-Ensemble)

4.4. Evaluation Metrics

5. Experimental Results

5.1. UNSW-NB15

5.2. CIC-IDS2017

5.3. WSN-DS

5.4. Stepwise Performance Comparison of MCH-Ensemble

5.5. Performance Comparison with Previous Studies

6. Discussion and Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

Appendix B

Appendix C

Appendix D

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI