Comparative Analysis of Feature Selection Methods in Clustering-Based Detection Methods

Zeinalpour, Alireza; McElroy, Charles P.

doi:10.3390/electronics14112119

Open AccessArticle

Comparative Analysis of Feature Selection Methods in Clustering-Based Detection Methods

by

Alireza Zeinalpour

^*

and

Charles P. McElroy

^*

Department of Information Systems, Monte Ahuja College of Business, Cleveland State University, Cleveland, OH 44115, USA

^*

Authors to whom correspondence should be addressed.

Electronics 2025, 14(11), 2119; https://doi.org/10.3390/electronics14112119

Submission received: 2 April 2025 / Revised: 15 May 2025 / Accepted: 19 May 2025 / Published: 23 May 2025

Download

Browse Figures

Versions Notes

Abstract

:

Feature selection plays a crucial role in the effectiveness of distributed denial of service (DDoS) attack detection methods, particularly as network traffic data becomes increasingly complex. This study conducts a categorical investigation of feature selection methods in clustering-based DDoS attack detection, comparing wrapper and hybrid approaches. Through two experiments using one-way ANOVA analyses, the research evaluated the effectiveness of different clustering approaches and supervised learning algorithms. The findings reveal that clustering-based wrapper methods performed more effectively than supervised learning approaches in feature selection for clustering-based DDoS attack detection methods. The results show strong statistical significance for clustering-based methods, with p-values of less than 0.05 and η² values indicating robust relationships between methods. Our clustering-based wrapper approach achieved a 57.7% reduction in false positive rates compared to supervised learning methods (mean FPR of 0.17 versus 0.40) on the CICIDS2017 dataset, with certain configurations reaching a false positive rate of 0.000. A similar pattern was observed with the NSL-KD dataset, where clustering-based methods reduced false positive rates by 63.1% compared to supervised approaches (0.048 versus 0.128). This study provides empirical evidence for effective combinations in which organizations and agencies can implement DDoS attack detection methods that have high performance.

Keywords:

clustering-based detection; supervised learning; wrapper method; density-based clustering; not-density-based clustering

1. Introduction

The occurrence of Distributed Denial of Service (DDoS) attacks has amplified with complexity over the last decade [1]. These attacks signify major security risks to computer networks [2]. An attacker launches a DDoS attack by overwhelming a victim system’s resources with a large number of network traffic requests [1]. Introducing effective methods for identifying these attacks is essential, yet it is challenging to do so due to the complexity of these attacks [2]. There is a dearth of categorical investigations on machine learning algorithms that enables researchers to properly compare and make assessments of the various approaches [2]. It is therefore essential to assess the effectiveness of machine learning algorithms categorically. This way, researchers can ensure the proper and systematic enhancement of the performance of machine learning algorithms in analyzing network traffic data. This study addresses that challenge.

A denial of service (DoS) attack sends a large number of service requests in a malicious way to negatively impact the functioning of a server in a network [3]. A DoS attack is primarily launched from a device targeting an intended server as the victim. A DDoS attack is a variation of a DoS attack that is more dangerous because it is initiated from several devices to a targeted server [3]. Modern cyber intrusions are complex in that distributed settings can go unidentified, in which case DDoS attacks can block the accessibility of resources [4]. This problem can lead to service unavailability, substantial income loss, reputation damage, and a bad user experience, to name but a few negative outcomes [5]. The evaluation of a large volume of network traffic data is challenging in anomaly-based DDoS attack detection [1]. This is particularly the case for clustering-based detection methods. Analyzing a large volume of data requires significant processing capabilities, and has a negative impact on the performance of the detection models [6]. Many studies have not paid attention to the significance of feature selection and focused mostly on the performance of modeling and classification [7]. According to Xu [6], there is an urgent need for effective modeling while maximizing efficiency and the security of network infrastructure. Feature selection should be carried out appropriately to assess all the necessary information for detection, thereby reducing redundant features [8].

The ramifications of this stream of research can have a broad impact. Statistical data reveal that threats from cyber-attacks hit 20% of small businesses, 33% of Small-Medium Enterprises (SMEs), and 41% of large businesses [9]. Given the nature of these threats posed by DDoS attacks, it becomes critical to investigate and propose effective DDoS attack detection methods in a systematic way, which supports further theory development. A total of 82% of organizations have suffered data thefts as a result of DDoS attacks [9]. Consequently, we performed a comparative analysis of feature selection methods by considering two categories of machine learning, focusing on the clustering-based detection methods. We compared clustering algorithms in the feature selection process that used supervised learning vs. unsupervised learning algorithms.

The feature selection phase of the process focuses on assessing large volumes of network traffic data to improve the classification accuracy through the identification of relevant features [10]. Filter and wrapper methods are two types of feature selection methods used in selecting proper network traffic data for training via machine learning algorithms. The filter method uses statistical measures to identify features, while the wrapper method uses a machine learning algorithm to determine features [11]. Two types of algorithms can be applied in the wrapper method for evaluating network traffic data—they are unsupervised and supervised learning algorithms. Clustering algorithms, as opposed to supervised learning algorithms, are unsupervised techniques that do not require data labels for training.

The clustering approach is a famous technique for intrusion detection methods. Similarity-based and distance-based clustering are two types of techniques in which the algorithm tries to maximize the distances of data points between clusters and minimize distances within clusters, leading to the effective categorization of DDoS attacks [12]. Another type of clustering approach is density-based. This approach categorizes data points in clusters in accordance with the density of a region, ensuring proper noise filtering [13]. ‘MakeDensityBasedClusterer’ is a clustering approach that is available in the Weka (Waikato Environment for Knowledge Analysis) tool. This clustering approach can incorporate similarity-based and distance-based clustering methods.

Weka tools comprise machine learning algorithms, and these tools are used to assess network traffic data for attack detection. This tool was a DDoS attack detection method used to properly assess the data in a study by Zeinalpour [12]. In this study, we consider clustering-based approaches comprised of both density-based-clustering and non-density-based-clustering. What are considered to be density-based-clustering approaches utilize the ‘MakeDensityBasedClusterer’ as opposed to not-density-based-clustering strategies.

We conducted a comparative analysis of feature selection through considering the hybrid and wrapper approaches. The hybrid feature selection approach uses the filter method preceded by the wrapper method. We used the Naïve Bayes, J48, and DecisionTable supervised learning algorithms, which are applied in the wrapper method when the clustering approach is incorporated. The subsequent comparison of the performance is conducted through consideration of the false positive rate. Currently, there is a paucity of studies that look at this issue.

The findings of this study contribute to the improvement of DDoS attack detection methods by reflecting on the most effective combinations of feature selection and clustering approaches. Indeed, while numerous studies have improved the detection accuracy (citing Ali et al. [7]), relatively few have investigated the comparative effectiveness of different feature selection paradigms for clustering-based detection. Our study addresses this gap by providing a systematic comparison of supervised versus unsupervised clustering approaches and feature selection within clustering-based detection frameworks, and by evaluating these approaches using the critical metric of false positive rates, which has been understudied in prior research. Through the use of effective combinations, organizations and agencies can implement DDoS attack detection methods that have a high performance value, reducing their exposure to this type of attack.

2. Materials and Methods

According to Najafimehr et al. [2], the absence of research that examines machine learning algorithms based on their classification capabilities prevents the suitable assessment of various methods in this field. Existing DDoS attack detection methods may not perform well in identifying novel attacks due to the sophistication and increasing complexities of the DDOS attacks [2]. In recent years, identifying DDoS attacks has become difficult due to the diversity of techniques in launching them [14]. Assessing the large volume of network traffic data generated through this type of attack is challenging [1]. This is even more true when multi-vector DDoS attacks are present. Hassan et al. [14] argue that the nature of network traffic data is dynamic, and that with the use of multiple attack protocols, introducing robust defense mechanisms becomes essential. Consequently, performing a comparative analysis of various feature selection methods is of paramount importance. Grounded in this purpose, we constructed two hypotheses in which two ex post facto design experimentations of the A-B single group were considered. The base measures are the control group, while the experimental measures are for the experimental group that includes interventions [1]. ‘A’ denotes the control group and ‘B’ denotes the experimental group.

The first experiment reflects the following research question: “Does incorporating a clustering-based-wrapper method differ in effectiveness as opposed to supervised-learning-wrapper method in clustering-based detection of DDoS attacks?” The corresponding null hypothesis is that there is no difference in effectiveness when incorporating a clustering-based wrapper method against a supervised learning wrapper method in clustering-based detection of DDoS attacks. The second experiment reflects the following question: “Does incorporating clustering-based-hybrid-feature-selection method differ in effectiveness as opposed to a supervised-learning-hybrid-feature-selection method in clustering-based detection of DDoS attacks?” The null hypothesis is that there is no difference in effectiveness when incorporating the clustering-based hybrid feature selection method against the supervised learning hybrid feature selection method in the clustering-based detection of DDoS attacks. The clustering-based wrapper method and the clustering-based hybrid feature selection method are the experimental groups, respectively. The corresponding control groups are the supervised learning wrapper and supervised learning hybrid feature selection methods. We used the entire CICIDS2017 dataset, which contains both DDoS and benign events. This dataset includes real-world data of harmless traffic and attack traffic in a CSV format [15]. To further confirm the statistical results of our two hypotheses, we also used the ‘KDDTrain+.ARFF’, which is a full NSL-KDD training dataset. The NSL-KDD dataset is a well-established network traffic dataset [15]. We performed one-way ANOVA analyses to test our hypotheses, which allowed us to specify a factor variable reflecting on the various groups that considered in our research study, as well as a dependent variable, which was the false positive rate. The name of the independent variables with the corresponding values are presented in Table A1 under Appendix A.

The first research question examines the statistical clustering-based wrapper method against the supervised machine learning wrapper method in this form of the clustering-based detection of DDoS attacks. The clustering-based wrapper method is one in which the wrapper method incorporates a clustering technique as the machine learning algorithm to evaluate network traffic data. We used k-means and expectation and maximization, known as EM, to apply the clustering approach. We also considered ‘MakeDensityBasedClusterer’ as another clustering approach using k-means and EM. The k-means and EM algorithms use Euclidean distance to perform clustering, while ‘MakeDensityBasedClusterer’ ensures clustering analysis based on density. We used J48, NaïveBayes, and DecisionTable as the corresponding supervised learning algorithms. J48 is a tree-based decision learning algorithm that constructs a tree-based structure for assessing features. NaïveBayes performs the assessment of features based on the Bayesian formula. DecisionTable is a rule-based learning algorithm that constructs decision tables to map and evaluate network traffic data. We used k-means, EM, and ‘MakeDensityBasedClusterer’ to construct the clustering-based DDoS attack detection methods.

The second research question assesses the clustering-based hybrid feature selection method against a supervised machine learning hybrid feature selection method for the clustering-based detection of DDoS attacks. The clustering-based hybrid feature selection method is a hybrid approach in which the filter method is incorporated prior to the wrapper method. We used ChiSquared and Information Gain as the corresponding algorithms for the filter method to evaluate the network traffic data. We used the machine learning algorithms that we considered for the wrapper regarding the first research question for the hybrid approach in this research question. In this case, we also used k-means, EM, and ‘MakeDensityBasedClusterer’ to construct clustering-based DDoS attack detection methods.

In this study, we used Weka (Waikato Environment for Knowledge Analysis) Workbench to build the DDoS attack detection models based on their applied feature selection methods. As the name of this tool suggests, it is used to facilitate knowledge discovery from data in building effective machine learning models. This tool offers a series of capabilities in assessing data and constructing prediction models [16]. The filter and wrapper methods rely on a search method for identifying the optimal attributes. The Ranker search method with the threshold of ‘0.5’, proposed by Zeinalpour [12], was used in the filter method to select attributes, while the wrapper method used ‘BestFirst’, with the default settings provided by Weka.

In summary, the integration operates through an iterative feedback mechanism: initially, the clustering algorithm (k-means or EM) generates clusters without labels, which are then evaluated against ground truth using the silhouette score, measuring the quality of cluster cohesion and separation. For each candidate feature subset proposed by the wrapper method, the clustering algorithm recomputes clusters and evaluates the resulting silhouette score. Feature subsets that improve this score are prioritized, creating a selection mechanism that optimizes for cluster quality rather than classification accuracy directly. This approach preserves the unsupervised nature of clustering while leveraging labeled data for validation, offering advantages in detecting novel attack patterns that might be misclassified by purely supervised approaches. In this study, we collected the false positives rates corresponding to categorization or classification of DDoS events.

Machine learning algorithms are susceptible to overfitting when learning models are trained to perform better on one set of network traffic events than another. Data quality can also impact a learning model’s performance. We ensured that the problems of overfitting and data quality were addressed through the consideration of the following data preprocessing procedures. For the CICIDS2017 dataset, we first manually removed ‘Fwd_Header_Length’, which was a duplicate attribute. This enabled Weka to read the data. Afterward, we applied the Numeric Cleaner to enable min–max normalization in order to process the data. In Weka, the ‘Normalize’ procedure performs the min–max normalization. Normalization requires numeric cleaning on values that are outliers, and without normalization, machine learning algorithms cannot properly undertake learning from processing network traffic data [12]. Subsequently, we applied EM imputation to address the missing values of the ‘Flow Bytes/s’ attribute. Missing values make learning challenging, as attributes with missing values lead to improper modeling [12]. Then, we applied the SpreadSubSample procedure with the distribution spread value of ‘1.0’ to balance the dataset. Imbalanced data increase the complexity of obtaining accurate results [17]. This problem leads the machine learning algorithms to produce biased results from analyzing the network traffic data [12]. Finally, we applied the ‘Randomize’ procedure provided by Weka to ensure that all the data were randomized and that the data of the same network traffic event were not aligned together. With respect to the NSL-KDD dataset, since it did not have any duplicate, outlier, or missing value, we were able to apply the min–max normalization, SpreadSubSample, and Randomize approaches in addressing the overfitting and data quality problems. In addition to the mentioned data preprocessing procedures, we incorporated a 10-fold cross validation method, which is a generalization approach in ensuring accurate modeling and results.

3. Literature Review

3.1. Concerns Surrounding DDoS Attacks

A DoS attack artificially creates a massive volume of traffic on a network, which overwhelms the computing power of the network [18]. Identifying attack patterns in a large volume of network traffic generated through this type of intrusion is difficult [1]. The prevalence of DDoS attacks has risen in recent years [14]. These attacks are the most common type of attack launched through networks [19]. The dynamic nature of this type of attack and the presence of multiple attack protocols requires the development of sophisticated defense mechanisms [14]. When a network is not configured properly, the corresponding network controller will be activated by a DDoS attack, enlarging the attack surface of the network [20]. Currently, this type of attack, e.g., focusing on the network controller, is the most common vector for a network intrusion [21]. The risk of bringing down servers that do not have protective layers in a short period of time is high, and therefore this type of attack makes it very difficult for organizations to provide uninterrupted services [5]. Since DDoS attacks are generated through various sources, locating the origin becomes especially difficult [19]. Hackers will try to keep the sessions open for the longest possible duration [1]. The estimated cost of dealing with each attack is around USD 3 million per organization [22]. Therefore, it becomes an economic imperative to design strong mechanisms to identify DDoS attacks [19].

Anomaly detection methods apply machine learning algorithms in identifying unusual activity on the network. Network intrusion detection methods that incorporate machine learning algorithms are effective in this regard. In anomaly-based intrusion detection methods, if network traffic patterns deviate from what are considered normal, the respective patterns are recognized as anomalies [22]. Anomaly-based methods are susceptible to producing high false alarm rates [22]. The modernization of network traffic data can increase this vulnerability. According to Prasad and Chandra [5], 54% of DDoS attacks that occurred between January 2020 and March 2021 were launched through these modern attack vectors. In the third quarter of 2024, the Cloudflare cybersecurity company dealt with 6 million DDoS attacks, and some of these attacks sent 2 billion packets per second [23]. However, one metric that can be used to measure the effectiveness of anomaly-based DDoS attack detection methods is the false positive rate. It is measured according to the number of false positives, which is divided by the summation of the total number of false positives and true negatives. False positives erroneously identify normal network traffic patterns as being DDoS attacks. Conversely, true negatives correctly recognize normal network traffic patterns.

One major challenge of DDoS attack detection methods is dealing with high dimensional data. High dimensional data require huge computational power as well as longer training periods, and they increase the chance that anomaly-based methods will overfit the data [24]. Feature selection is an important process for efficient and effective learning in enhancing the operational efficiency of machine learning and reducing overfitting and improving the accuracy of the algorithms [11]. The focus of our study is on assessing the effectiveness of DDoS attack detection methods categorically.

3.2. Application of Clustering Algorithms in DDoS Attack Detection

Supervised and unsupervised machine learning approaches are used in detecting DDoS attacks, with clustering algorithms being the most common unsupervised approach [25]. Given the significance of clustering algorithms being considered in DDoS attack detection methods, it is important to develop systematic and categorical investigations of this approach. Cybercrime has become a big business, and stolen data are a significant problem for any business [9]. Clustering leads to low detection rates [26]. This approach is challenging in research, as it can be considered an independent tool in assessing data patterns and finding particular clustering analysis [27]. Clustering-based detection methods assess network traffic data through similarity-based and distance-based approaches via the distance of the data points. These two forms of clustering by themselves cannot perform a density-based analysis of network traffic data. However, they can be used together to perform density-based analysis. We have used ‘MakeDensityBasedClusterer’ provided by Weka for this purpose. Density-based cluster analysis can be considered one approach that applies a cut among data points based on the density level obtained from a probability function [28]. According to Mondragón et al. [29], the robustness of density-based clustering against noise, as well as its enhanced quality in clustering, have been demonstrated. The calculation is based on the analysis of data points with respect to clusters given a certain number of objects k (a predetermined threshold) considering the radius of a neighborhood [30].

For the proper categorization of data points, clustering algorithms, whether density-based or not density-based, need suitable feature selection methods. The curse of dimensionality is a problem in DDoS attack detection methods [12]. This problem is due to the large volume of network traffic features that have a negative impact on the performance of DDoS attack detection methods [31]. The aim of these attack detection methods is to have network traffic data between categories at their maximum distances, while the distances within clusters should be at their minimum [12]. This enables the methods to identify clusters of data points. The Zeinalpour study [12] investigated the addition of the filter and wrapper methods prior to the clustering algorithms. This study compared the performance of clustering-based DDoS attack detection methods when the filter method was applied in contrast to when the wrapper method that was incorporated after the filter method was used. The application of the wrapper method after the filter method made the feature selection process hybrid. The study [31] took the investigation further to perform one-way ANOVA statistical analyses, and found that the wrapper method had slightly better performance than the filter method.

The study [1] compared the hybrid approach to when only the wrapper method was incorporated in clustering-based DDoS attack detection methods. This study found that the “BestFirst” search method outperformed the metaheuristic search methods in searching the feature space for optimal solutions. This was in accordance with the one-way ANOVA statistical analyses when the search methods were incorporated into the wrapper method in the considered approaches.

Metaheuristics-driven DDoS detection has drawn increasing attention, with optimization algorithms like the Whale Optimization Algorithm (WOA) [27], Firefly Search Algorithm (FSA) [1], and ensemble methods [31] being applied to enhance clustering performance. For instance, Shakil et al. [32] employed WOA to dynamically adjust clustering centroids for Software-Defined Networking (SDN)-based DDoS detection. Likewise, Zeinalpour and McElroy [1] and Zeinalpour and Ahmed [31] explored feature selection-based optimizations combined with clustering, but these studies primarily focused on improving parameters or reducing false positive rates rather than developing a taxonomy of clustering approaches. Furthermore, limited evaluation with key metrics (e.g., silhouette scores) and the sparse inclusion of real-world datasets restrict the applicability of these works.

Clustering techniques, commonly employed for unsupervised anomaly detection, have demonstrated effectiveness in this domain when integrated with metaheuristics—optimization methods aimed at improving clustering adaptability and efficiency in large-scale datasets. However, the existing literature lacks a systematic taxonomy of clustering techniques specifically tailored to DDoS attack detection, evaluated comprehensively using internal and external validation metrics (e.g., silhouette scores, the F-measure, and the true positive rate (TPR)) across both simulated and real-world datasets.

Several studies explore specific clustering methods for DDoS detection, but fall short of generalization or taxonomy development. Many studies have overlooked the significant role that feature selection plays in the effectiveness of detection models, and the most attention was being paid to increasing the accuracy and performance of detection models [7]. For example, Bhaya and Manaa [33] and Bhaya and Manaa [34] proposed early clustering-based approaches using unsupervised methods such as k-means and CURE, providing high accuracy (>99%) and a high F-measure (97.98%) in DDoS detection using CAIDA datasets. Modified or hybrid clustering methods, such as entropy-enhanced approaches [35,36] and non-parametric clustering [37], address issues like dynamic detection thresholds and overlapping data characteristics, but remain focused on specific algorithmic improvements without a broader framework. Similarly, Gu et al. [38] introduced a semi-supervised weighted k-means method leveraging hybrid feature selection, tested extensively on simulated (DARPA, CAIDA, and CICIDS2017) and real-world datasets, demonstrating superior performance metrics. However, these studies do not generalize findings into comprehensive taxonomies. The argument of focusing only on algorithmic improvement has been made by Ali et al. [7] as well. For example, research studies [39,40,41] examined a three-stage deep learning model, deep learning in detecting ICMPv6 DDoS attacks, and a Pelican deep learning model in evaluating performance, respectively. As another example, according to Ali et al. [7], this is present in [42], in which ensemble feature selection was applied to enhance accuracy. Also, efforts to systematically adopt performance metrics, such as the F-measure and TPR, remain inconsistent. While studies like [33,34,36] incorporate the F-measure (up to 97.98%) or detection rates (~96–98%), other key metrics, such as silhouette scores, are rarely utilized. To stay consistent and build upon the studies [1,12,31], as it relates to clustering-based DDoS attack detection methods, we considered false positive rates as being a key and underutilized metric.

Additionally, few works test approaches across both simulated and real-world datasets; exceptions include Gu et al. [38], who integrate dataset diversity, and Feng et al. [43], who enhance adaptability through explainable clustering methods evaluated in both contexts. Despite these advancements, no study comprehensively investigates clustering techniques to address the analysis of large volumes of network traffic data more effectively. These gaps highlight the need for a structured taxonomy of clustering techniques for DDoS detection. This work seeks to consolidate the existing research and identify avenues for a systematic approach to clustering technique classification, emphasizing performance benchmarking and practical applicability.

The filter method, as opposed to the wrapper method, evaluates attributes individually without relying on predictive models [44]. Wrapper-based methods, which evaluate feature subsets iteratively using predictive models, are particularly promising for optimizing detection metrics such as accuracy, false positives (FPs), and false negatives (FNs). Based on Bhattacharya and Selvakumar [44], because of the wrapper method in which a subset of features are evaluated in a group considering a given class of information, a feature within that group can be more informative. In the context of DDoS detection, two distinct approaches to wrapper-based feature selection have garnered attention: supervised learning wrapper methods, which leverage labeled data and classifiers (e.g., decision trees and Random Forest, etc.), and clustering-based wrapper methods, which rely on unsupervised clustering and cluster quality indices (e.g., the F-measure and Davies–Bouldin index) to evaluate feature subsets. However, a systematic comparative evaluation of these two paradigms concerning DDoS detection remains underexplored, especially with regard to specific metrics such as detection accuracy, FPs, and FNs.

Several studies have examined the role of clustering-based wrapper methods in DDoS detection. Bhattacharya and Selvakumar proposed LAWRA, a layered clustering-wrapper framework utilizing external cluster validity indices and cooperative game theory to optimize feature selection [45]. This approach demonstrated an improved detection accuracy and F-measure compared to classifier-driven methods, highlighting the potential of clustering-based wrappers in high-dimensional, unlabeled settings. Similarly, Bhattacharya and Selvakumar extended these principles through a multi-weight ranking approach, integrating clustering and filter methods to prioritize features, achieving higher detection accuracy in identifying DDoS and probe attacks [44]. Despite these contributions, limited attention was given to evaluating FNs or directly comparing clustering-based methods with supervised learning approaches in feature selection.

In contrast, supervised learning wrapper methods predominately use labeled datasets, where they efficiently optimize detection models for accuracy and precision. For example, wrapper-based feature selection using algorithms such as Random Forest, Genetic Algorithms, and KNN classifiers has achieved high detection accuracy, often exceeding 96%, in multiple DDoS detection contexts [46,47,48]. These approaches, however, face challenges in overfitting and reduced generalizability to novel attack types, as highlighted in works emphasizing supervised classifiers’ dependency on labeled data [47,48,49]. While supervised methods consistently outperform clustering-based approaches in accuracy, they often neglect metrics critical to DDoS contexts, such as FNs and FPs.

Emerging research suggests that hybrid approaches, combining clustering and supervised learning paradigms, offer a promising middle ground. Studies such as Zeinalpour and Ahmed [31] and Saha et al. [50] demonstrate that ensemble feature selection approaches leveraging insights from both clustering-based and supervised methods can improve detection generalizability and robustness. For instance, in the study [31], the use of the vote classifier with clustering and wrapper-derived features achieved significant reductions in false positives, though direct quantitative comparisons between clustering-based and supervised learning wrapper methods were absent. Similarly, Saha et al. [50] explored a hybrid ensemble framework to unify feature subsets across supervised and unsupervised methodologies, improving feature robustness in DDoS detection models. Despite these advancements, research remains sparse regarding explicit analyses of FPs and FNs in hybrid or comparative evaluations. For example, Saha et al. [50] reflect on the need for an evaluation of their proposed approach using NSL-KDD and CICIDS network traffic datasets. They state that the consideration of various datasets contributes to the effective combination of feature selection and corresponding detection models.

Overall, the current body of literature identifies key strengths and weaknesses in both clustering-based and supervised learning wrapper methods for DDoS detection. Clustering-based wrappers excel in generalizing to high-dimensional or unlabeled data [44,45], while supervised learning wrappers outperform in precision metrics for labeled datasets in the studies of Bouzoubaa et al. [46] and Polat et al. [48]. Hybrid frameworks show the potential to balance these strengths [31,50], yet comprehensive comparative evaluations addressing FPs and FNs across paradigms are critically lacking. This gap motivates further investigation into how these methods perform under diverse data and attack conditions to guide the optimization of DDoS detection systems. As outlined in our literature review, current studies mainly focus on improving the performance of learning algorithms rather than taking a systematic approach in introducing robust attack detection methods.

In this study, we considered the k-means and EM clustering algorithms, which represent the not density-based clustering approach. We also use ‘MakeDensityBasedClusterer’, provided by Weka, to incorporate the algorithms, which is a density-based clustering approach. The not density-based and density-based clustering algorithms are two types of clustering approaches used in this study.

The k-means algorithm is a distance-based cluster analysis [12]. The objective of the algorithm is to minimize WCCS, known as the within-cluster sum of squares [51]. Based on the study [12], the algorithm initially selects random data points as the center points in which the values are adjusted based on calculation. According to Miniak-Górecka et al. [51], the k-means algorithm is presented below, where ‘k’ is the number of subsets, ‘x_i’ is the corresponding data point belonging to a set of ‘X’ of ‘n’ data points, and

C_{i}

represents the sum of clusters that include all of the data points.

W C C S = \sum_{i = 1}^{k} \sum_{x_{j} \in C_{i}} {|x_{j} - c_{i}|}^{2}

(1)

According to Yang et al. [52], the EM is presented below, where ‘

α_{k}

’ is the mixing proportions with the given restriction of ‘

\sum_{k = 1}^{c} α_{k}

= 1’, ‘

f (x_{i}; θ_{k})

’ is the density of ‘x’ given the kth class with the corresponding parameters

θ_{k}

, and Z is the missing data as belongs to ‘C’. EM is a similarity-based cluster analysis [1]. As stated by Yang et al. [52], it aims to maximize the log likelihood.

L (α, θ) = \sum_{i = 1}^{n} \sum_{k = 1}^{c} z_{k i} \ln [α_{k} f (x_{i}; θ_{k})]

(2)

4. Data Analysis and Experimentation

4.1. Statistical Analysis Using One-Way ANOVA Considering the CICIDS2017 Dataset

We used the same data preprocessing techniques that Zeinalpour initially [12] introduced and applied in his study. The same techniques were used in the studies [1,31] to ensure the proper analysis of the CICIDS2017 network traffic dataset using clustering-based DDoS detection methods. Figure 1 below represents how the clustering-based DDoS attack detection methods are constructed to test the two hypotheses.

We used the same three data preprocessing techniques considering NSL-KDD, as shown below, which Zeinalpour initially [12] introduced and applied in his study. These preprocessing techniques, similar to their applications in the CICIDS2017 dataset, ensured that the dataset was normalized, balanced, and randomized for proper analysis by machine learning algorithms. We did not apply EM imputation on the NSL-KDD dataset. Figure 2 below represents the way in which the clustering-based DDoS attack detection methods are constructed to test and verify the methods considering the two hypotheses.

Experimental limitations are problematic in cybersecurity research studies and reflect factors or experimental circumstances that cannot be controlled. Limitations are issues for internal validity [53]. The curse of dimensionality is a limitation of this study. According to the study [31], due to assessing a large amount of network traffic data, the performance of anomaly-based DDoS attack detection methods could be reduced. To address this issue, we applied the wrapper and hybrid feature selection methods in selecting relevant attributes. According to Zeinalpour and Ahmed [31], implications reflect delimitations and assumptions. The delimitation of this study was supervised DDoS attack detection methods. Supervised learning algorithms in contrast to unsupervised can improve model performance [54]. In general, supervised DDoS attack detection methods are more robust as they use labeled data to be trained. The assumption of this study was that the results from analyzing the CICIDS2017 dataset are reflective of the real world in performance when identifying DDoS attacks. Frequent and known network protocols were used to generate the dataset [55]. We also used NSL-KDD to further confirm the statistical testing of hypotheses. To ensure that our experimentation does not introduce any bias, we considered internal, predictive, conclusion, and external validities. We used Weka workbench to guarantee internal validity. This tool has a modular architecture and supports the entire process of data mining experimentation [56]. To guarantee predictive and conclusion validities, we applied a ten-fold cross-validation method, and we used the entire CICIDS2017 to ensure external validity. For further verification, we also used the full training dataset of NSL-KDD.

We conducted one-way ANOVA analyses for assessing the effectiveness of DDoS attack detection methods. One-way ANOVA uses a factor variable in specifying the types of groups or levels, and a dependent variable that measures each level on a quantitative dimension [57]. This allowed us to reflect on the two research questions for testing the corresponding hypotheses. The ‘FS’ denotes feature selection when naming the tables. The first research question was whether incorporating clustering-based wrapper methods differs in effectiveness as opposed to supervised learning wrapper methods in the clustering-based detection of DDoS attacks. We considered the one-way ANOVA F-test. The outcomes of the first experiment are shown in Table 1. The results show that the test was significant with F(1, 26) = 10.55 and p = 0.003. The p-value is represented under the “Sig” column. The p-value was less than 0.05, leading us to reject the null hypothesis. The null hypothesis was that there is no difference in effectiveness when incorporating clustering-based wrapper methods against supervised learning wrapper methods in the clustering-based detection of DDoS attacks. The η², represented by the “Partial Eta Squared” column with the value of 0.29 shows a strong relationship among the DDoS attack detection methods when incorporating clustering-based wrapper methods in contrast to supervised learning wrapper methods.

The second research question was whether incorporating clustering-based hybrid feature selection methods differs in effectiveness as opposed to supervised learning hybrid feature selection methods in the clustering-based detection of DDoS attacks. We considered the one-way ANOVA F-test. The outcomes are shown in Table 2. The results show that the test was significant, with F(1, 54) = 10.04 and p = 0.003. The p-value is represented under the “Sig” column. The p-value was less than 0.05 in leading us to reject the null hypothesis. The null hypothesis was that there is no difference in effectiveness when incorporating clustering-based hybrid feature selection methods against supervised learning hybrid feature selection methods in the clustering-based detection of DDoS attacks. The η², represented by the “Partial Eta Squared” column with the value of 0.16 shows a strong relationship among the DDoS attack detection methods when incorporating clustering-based hybrid feature selection methods against supervised learning hybrid feature selection methods.

4.2. Statistical Analysis Using One-Way ANOVA Considering NSL-KDD Dataset

To further verify our first hypothesis, we applied the one-way ANOVA F-test considering the NSL-KDD dataset. The outcomes of the experiment are shown in Table 3. The results show that the test was significant, with F(1, 26) = 12.77 and p = 0.001. The p-value is represented under the “Sig” column. The p-value was less than 0.05, leading us to reject the null hypothesis using the NSL-KDD dataset. The null hypothesis was that there is no difference in effectiveness when incorporating clustering-based wrapper methods against supervised learning wrapper methods in the clustering-based detection of DDoS attacks. The η², represented by the “Partial Eta Squared” column with the value of 0.33 shows a strong relationship among the DDoS attack detection methods when incorporating clustering-based wrapper methods in contrast to supervised learning wrapper methods.

To further verify the second hypothesis, we applied the one-way ANOVA F-test using the NSL-KDD dataset. The outcomes are shown in Table 4. The results show that the test was significant, with F(1, 54) = 15.51 and p = 0.001. The p-value is represented under the “Sig” column. The p-value was less than 0.05, leading us to reject the null hypothesis considering the NSL-KDD dataset. The null hypothesis was that there is no difference in effectiveness when incorporating clustering-based hybrid feature selection methods against supervised learning hybrid feature selection methods in the clustering-based detection of DDoS attacks. The η², represented by the “Partial Eta Squared” column with the value of 0.22 shows a strong relationship among the DDoS attack detection methods when incorporating clustering-based hybrid feature selection methods against supervised learning hybrid feature selection methods.

4.3. Comparison Analysis Based on Descriptive Statistics Using CICIDS2017 Dataset

The mean results of the descriptive statistics presented in Table 5 correspond to the first experiment. The table shows that incorporating clustering-based wrapper methods was more effective than the supervised learning wrapper methods in the clustering-based detection of DDoS attacks. This is in accordance with the mean value of 0.17 for clustering-based wrapper methods against the mean value of 0.40 for supervised learning wrapper methods when constructing clustering-based DDoS attack detection methods.

The mean results of the descriptive statistics presented in Table 6 are related to the second experiment. The table shows that incorporating clustering-based hybrid feature selection methods was more effective than supervised learning hybrid feature selection methods in the clustering-based detection of DDoS attacks. This is in accordance with the mean value of 0.18 for clustering-based hybrid feature selection methods against the mean value of 0.35 for supervised learning hybrid feature selection methods when constructing clustering-based DDoS attack detection methods.

The mean results of the descriptive statistics presented in Table 7 correspond to the first experiment. The table shows that incorporating clustering-based wrapper methods was more effective than supervised learning wrapper methods in the clustering-based detection of DDoS attacks. This is in accordance with the mean value of 0.05 for clustering-based wrapper methods against the mean value of 0.13 for supervised learning wrapper methods when constructing clustering-based DDoS attack detection methods.

4.4. Comparison Analysis Based on Descriptive Statistics Using NSL-KDD Dataset

The mean results of the descriptive statistics presented in Table 8 are with respect to the second experiment. The table shows that incorporating clustering-based hybrid feature selection methods was more effective than supervised learning hybrid feature selection methods in the clustering-based detection of DDoS attacks. This is in accordance with the mean value of 0.04 for clustering-based hybrid feature selection methods against the mean value of 0.09 for supervised learning hybrid feature selection methods when constructing clustering-based DDoS attack detection methods.

5. Discussion

This study conducted a comparative analysis of feature selection methods in clustering-based DDoS attack detection methods. Service availability is crucial for computer networks and machine learning algorithms offer promising tactics to counter DDoS attacks [2]. Arango-López et al. [58] mention that modern DDoS attacks are launched through a combination of methods, concurrently making detection challenging. Identifying these attacks can be difficult before encountering them [59]. The nature of network traffic data is dynamic and with the presence of multiple attack protocols, there is a necessity for robust defense mechanisms [14]. Rishkhan et al. [4] claim that machine leaning-based intrusion detection methods are one crucial strategy in preventing information security attacks. This type of attack poses major challenges to organizational networks. Some of these challenges are service interruptions, the exposure of network vulnerabilities to hackers, increases in risk of data loss and data theft, and similar others. Initially, Zeinalpour [12] examined the application of the wrapper method and the hybrid feature selection prior to clustering algorithms in DDoS attack detection. The hybrid feature selection approach used the filter method preceded by the wrapper method. He could not verify which approach was more effective. Nevertheless, the study [12] found that the addition of the hybrid approach that incorporated ChiSquared and NaïveBayes was more effective. It had the lowest false positive rate of 0.013. The study [31] took the investigation further by conducting one-way ANOVA analyses, comparing the addition of the wrapper method and hybrid approach prior to clustering algorithms using the vote classifier method. The results of the descriptive analyses from one-way ANOVA showed that the addition of the wrapper method introduced more effectiveness than just applying the filter method. Likewise, the results of the study [31] showed that the addition of the filter method prior to the wrapper method that incorporated ChiSquared and J48 (a decision tree classifier) was more effective. The incorporation of such an approach in selecting features produced the lowest false positive rate of 0.012. In a similar endeavor, the one-way ANOVA results in the study [1] showed that the BestFirst search method outperformed metaheuristic search techniques when using the wrapper method. The lowest obtained false positive rate in the study [1] was when Information Gain and the k-means clustering algorithm were applied in the filter and wrapper methods accordingly prior to the clustering algorithms. This method was able to obtain a false positive rate of 0.000.

Density-based clustering is shown to be promising in network intrusion detection models. Kaliyaperumal et al. [60] examined the performance of only DBSCAN, which is a density-based clustering algorithm using the CICIDS2018 network traffic dataset. The obtained specificity was 0.9752. This is equivalent to a 0.0248 false positive rate. When Kaliyaperumal et al. [60] proposed a novel way to use DBSCAN, and assessed it using the CICIDS2017 dataset, the obtained specificity was 0.9806. This is equivalent to a 0.0194 false positive rate. The same proposed approach applied by Kaliyaperumal et al. [60] on CICIDS2018 had the specificity of 0.9814, which is 0.0186 in false positive rate. When Emadi and Mazinani [61] evaluated the performance of DBSCAN as a density-based approach, they could achieve the highest accuracy of 95.5%. This was lower than the accuracy of 0.9888, which was obtained as the highest accuracy by Kaliyaperumal et al. [60] with the specificity of 0.9814.

However, feature selection to address the efficacy and effectiveness of network intrusion detection methods is extremely important. In this research study, we analyzed two variations of the clustering-based detection method, e.g., not density-based clustering and density-based clustering. We applied the wrapper method and the hybrid approach (filter–wrapper) and compared the performance of DDoS attack detection methods. We examined whether incorporating supervised learning against a clustering approach that included not density-based and density-based approaches in the wrapper method would impact the performance of the detection methods. Given the results of the two experiments, we found that incorporating the clustering approach in the wrapper method had a greater impact on the performance of clustering-based DDoS attack detection methods in terms of lowering the false positive rates. The one-way ANOVA analyses show statistical significance in that regard.

We were also able to obtain a false positive rate of 0.000 in several cases. The first case was when we incorporated a density clustering-based wrapper method using SimpleKMeans in not density-based clustering, using SimpleKMeans to identify DDoS attacks. The second two cases were when we applied a not density-clustering-based hybrid method using Information Gain and SimpleKMeans for feature selection. They occurred when we used density-based and not density-based clustering using EM in attack detection. The fourth case was when we applied a density-clustering-based hybrid method using ChiSquared and SimpleKMeans for not density-based clustering using SimpleKMeans in DDoS attack detection. The fifth case was when we incorporated a density-clustering-based-hybrid method using Information Gain and EM prior to not density-based clustering using EM. The sixth and final case was when we applied a density-clustering-based hybrid method using Information Gain and SimpleKMeans prior to not density-based clustering using SimpleKMeans in DDoS attack detection. In all of the considered cases of applying supervised learning algorithms in the wrapper method, the DDoS attack detection methods were not able to obtain a false positive rate of 0.000. In general, the clustering algorithms are effective techniques that categorize data using the centroid or the mean of a data point. In this case, similar network traffic data points that are considered normal are in one group, while similar data points that are of DDoS attack events are categorized in another group.

Overfitting is a big problem of machine learning algorithms when learning models are (overly) well trained to perform well on a set of network traffic events and are not readily generalizable. Also, data quality is a major concern. We used min–max normalization to facilitate the construction of an accurate learning model and EM imputation method for fixing missing values. We also applied SpreadSubSample and a randomization approach to prevent the bias introduced by the datasets during predictive analysis. Consequently, the results that we achieved are from using the generalization approach of a ten-fold cross-validation method to ensure the validity and accuracy of results.

Analysis of Mechanism Effectiveness

In this study, we gathered false positive rates in relation to the categorization or classification of only DDoS events. Our clustering-based wrapper method demonstrates superior performance primarily through better feature space representation. Clustering algorithms maintain a better representation of network traffic patterns by preserving cluster separability (measured using the silhouette coefficient), while supervised approaches focus narrowly on class discrimination.

This fundamental difference explains the significantly lower false positive rates achieved using clustering-based methods. Specifically, clustering methods excel at modeling the inherent structure of network traffic patterns rather than making binary classifications, which proves particularly effective when attack patterns form distinct clusters in the feature space but overlap with normal traffic in individual feature dimensions. The CICIDS2017 dataset initially contained 78 features (after removing duplicates), while NSL-KDD contained 41 features. In the NSL-KDD dataset, the best performance was obtained using thirteen features, with the best performance of 0.003 FPR. Our top-performing configurations for the CICIDS2017 dataset typically retained a spectrum of features that included three, five, six, and fifteen features, with the best performance of 0.000 FPR. These optimal features were able to facilitate sufficient discriminative power while avoiding the curse of dimensionality. This analysis demonstrates the effectiveness of our feature selection approaches, with respect to the highest performance, in balancing model complexity and accuracy, a critical consideration for real-time DDoS detection systems where computational efficiency is important. Appendix D represents the tables for the selected features by the applied clustering techniques in the wrapper method that led to the highest performance of 0.000 in the false positive rate using the CICIDS2017 dataset and the highest performance of 0.003 in the false positive rate using the NSL-KDD.

6. Conclusions

This study contributes to improving DDoS attack detection methods by assessing the incorporation of the most effective combinations of feature selection that considered supervised learning and clustering algorithms. In this respect, various organizations can implement DDoS attack detection methods that are more likely to have a high performance in countering attacks. In today’s modernization of internet communication, it is essential to have the best countermeasures against this type of attack. With constant modernization, the complexity of network traffic data analysis increases. Therefore, feature selection is extremely important in intrusion detection methods.

In this study, we compared the results obtained from supervised learning with clustering approaches that included not density-based and density-based approaches in the wrapper method. The comparative analyses were based on obtained false positive rates in DDoS attack detection methods. The outcomes of one-way ANOVA analyses showed that the wrapper method performs more effectively using clustering algorithms for feature selection than supervised learning. DDoS attack detection methods that apply clustering algorithms suffer from the curse of dimensionality due to high network traffic data dimensionality [12]. Therefore, proceeding with the appropriate feature selection processes is essential. Analysis of the large volume of data from having these detection models to counter the attacks is problematic [1]. As a result, the categorical investigation of clustering-based detection models remains an important research stream. With respect to the outcomes of our research study, we found that clustering algorithms were effective in clustering-based DDoS attack detection methods. Given the importance of the categorical investigation of feature selection due to the need for proper analysis of network traffic data, future studies can take the findings of this research study further. For example, the considered feature selection methods can further be evaluated with other State-of-the-Art detection methods such as deep learning approaches or other approaches. In this study, we found that clustering-based feature selection is more effective for clustering-based DDoS attack detection methods. A future study could examine whether the effectiveness of clustering-based feature selection, realized through statistical analyses in this study, is consistent with other State-of-the-Art detection methods or not. Ensuring the robustness of DDoS attack detection methods is important. The dynamic nature of attacks, along with the use of multiple attack protocols, necessitate the robustness of defense mechanisms [14]. This can ensure that DDoS attack detection methods, through the consideration of different machine learning frameworks, are able to deal with the complexity of network traffic data throughout internet communication.

Author Contributions

Conceptualization, A.Z. and C.P.M.; methodology, A.Z.; validation, A.Z.; original draft, A.Z. and C.P.M.; writing-review and editing C.P.M. All authors have read and agreed to the published version of the manuscript.

Funding

The Authors received no external funding.

Data Availability Statement

The authors of this study used the CICIDS2017 and NSL-KDD datasets. The CICIDS2017 dataset is publicly available at https://www.unb.ca/cic/datasets/ids-2017.html, accessed on 3 March 2024. The NSL-KDD dataset is publicly available at https://web.archive.org/web/20150205070216/http://nsl.cs.unb.ca/NSL-KDD/, accessed on 20 April 2025.

Conflicts of Interest

The authors of this study declare no conflicts of interest. The authors of this research guided the study with no sponsorship.

Appendix A. Independent Variables Table

Table A1. Independent Variables Table.

Independent Variables	Procedures
Clustering Based DDoS Detection Method	Not-Density-Clustering of EM Not-Density-Clustering of SimpleKMeans MakeDensityBasedClusterer(EM) MakeDensityBasedClusterer(SimpleKMeans)
Clustering-Based-Wrapper Method	WrapperSubsetEval(Not-Density-Based-Clustering) WrapperSubsetEval(Density-Based-Clustering)
Supervised-Learning-Wrapper Method	WrapperSubsetEval(J48) WrapperSubsetEval(DecisionTable) WrapperSubsetEval(NaïveBayes)
Clustering-Based-Hybrid-Feature-Selection Method	InformationGainAttributeEval + WrapperSubsetEval(Not-Density-Based-Clustering) ChiSquaredAttributeEval + WrapperSubsetEval(Not-Density-Based-Clustering) InformationGainAttributeEval + WrapperSubsetEval(Density-Based-Clustering) ChiSquaredAttributeEval + WrapperSubsetEval(Density-Based-Clustering)
Supervised-Learning-Hybrid-Feature-Selection Method	InformationGainAttributeEval + WrapperSubsetEval(J48) InformationGainAttributeEval + WrapperSubsetEval(DecisionTable) InformationGainAttributeEval + WrapperSubsetEval(NaïveBayes) ChiSquaredAttributeEval + WrapperSubsetEval(J48) ChiSquaredAttributeEval + WrapperSubsetEval(DecisionTable) ChiSquaredAttributeEval + WrapperSubsetEval(NaïveBayes)

Appendix B. Experimental Results Using CICIDS2017 Dataset

Table A2. FPR Table using Clustering Feature Selection.

Applied Clustering-Based Feature Selection	Applied Clustering Methods in DDoS Attack Detection	False Positive Rates
Not-Density-Clustering-based-Wrapper method using EM	Not-Density-based Clustering using EM	0.002
Not-Density-Clustering-based-Wrapper method using EM	Density-based Clustering using EM	0.027
Not-Density-Clustering-based-Wrapper method using EM	Not-Density-based Clustering using SimpleKMeans	0.216
Not-Density-Clustering-based-Wrapper method using EM	Density-based Clustering using SimpleKMeans	0.282
Not-Density-Clustering-based-Wrapper method using SimpleKMeans	Not-Density-based Clustering using EM	0.086
Not-Density-Clustering-based-Wrapper method using SimpleKMeans	Density-based Clustering using EM	0.121
Not-Density-Clustering-based-Wrapper method using SimpleKMeans	Not-Density-based Clustering using SimpleKMeans	0.005
Not-Density-Clustering-based-Wrapper method using SimpleKMeans	Density-based Clustering using SimpleKMeans	0.083
Density-Clustering-based-Wrapper method using EM	Not-Density-based Clustering using EM	0.004
Density-Clustering-based-Wrapper method using EM	Density-based Clustering using EM	0.008
Density-Clustering-based-Wrapper method using EM	Not-Density-based Clustering using SimpleKMeans	0.299
Density-Clustering-based-Wrapper method using EM	Density-based Clustering using SimpleKMeans	0.332
Density-Clustering-based-Wrapper method using SimpleKMeans	Not-Density-based Clustering using EM	0.636
Density-Clustering-based-Wrapper method using SimpleKMeans	Density-based Clustering using EM	0.636
Density-Clustering-based-Wrapper method using SimpleKMeans	Not-Density-based Clustering using SimpleKMeans	0.000
Density-Clustering-based-Wrapper method using SimpleKMeans	Density-based Clustering using SimpleKMeans	0.011

Table A3. FPR Table using Clustering Method in Hybrid Feature Selection.

Applied Clustering-Based Hybrid Feature Selection	Applied Clustering Methods in DDoS Attack Detection	False Positive Rates
Not-Density-Clustering-based-Hybrid method using ChiSquared and EM	Not-Density-based Clustering using EM	0.002
Not-Density-Clustering-based-Hybrid method using ChiSquared and EM	Density-based Clustering using EM	0.027
Not-Density-Clustering-based-Hybrid method using ChiSquared and EM	Not-Density-based Clustering using SimpleKMeans	0.216
Not-Density-Clustering-based-Hybrid method using ChiSquared and EM	Density-based Clustering using SimpleKMeans	0.282
Not-Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeans	Not-Density-based Clustering using EM	0.006
Not-Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeans	Density-based Clustering using EM	0.003
Not-Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeans	Not-Density-based Clustering using SimpleKMeans	0.102
Not-Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeans	Density-based Clustering using SimpleKMeans	0.098
Not-Density-Clustering-based-Hybrid method using Information Gain and EM	Not-Density-based Clustering using EM	0.290
Not-Density-Clustering-based-Hybrid method using Information Gain and EM	Density-based Clustering using EM	0.290
Not-Density-Clustering-based-Hybrid method using Information Gain and EM	Not-Density-based Clustering using SimpleKMeans	0.267
Not-Density-Clustering-based-Hybrid method using Information Gain and EM	Density-based Clustering using SimpleKMeans	0.260
Not-Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeans	Not-Density-based Clustering using EM	0.000
Not-Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeans	Density-based Clustering using EM	0.000
Not-Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeans	Not-Density-based Clustering using SimpleKMeans	0.006
Not-Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeans	Density-based Clustering using SimpleKMeans	0.033
Density-Clustering-based-Hybrid method using ChiSquared and EM	Not-Density-based Clustering using EM	0.003
Density-Clustering-based-Hybrid method using ChiSquared and EM	Density-based Clustering using EM	0.008
Density-Clustering-based-Hybrid method using ChiSquared and EM	Not-Density-based Clustering using SimpleKMeans	0.309
Density-Clustering-based-Hybrid method using ChiSquared and EM	Density-based Clustering using SimpleKMeans	0.331
Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeans	Not-Density-based Clustering using EM	0.636
Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeans	Density-based Clustering using EM	0.636
Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeans	Not-Density-based Clustering using SimpleKMeans	0.000
Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeans	Density-based Clustering using SimpleKMeans	0.011
Density-Clustering-based-Hybrid method using Information Gain and EM	Not-Density-based Clustering using EM	0.000
Density-Clustering-based-Hybrid method using Information Gain and EM	Density-based Clustering using EM	0.009
Density-Clustering-based-Hybrid method using Information Gain and EM	Not-Density-based Clustering using SimpleKMeans	0.359
Density-Clustering-based-Hybrid method using Information Gain and EM	Density-based Clustering using SimpleKMeans	0.343
Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeans	Not-Density-based Clustering using EM	0.626
Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeans	Density-based Clustering using EM	0.625
Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeans	Not-Density-based Clustering using SimpleKMeans	0.000
Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeans	Density-based Clustering using SimpleKMeans	0.064

Table A4. FPR Table using Supervised Learning in Wrapper Feature Selection.

Applied Clustering-Based Feature Selection	Applied Clustering Methods in DDoS Attack Detection	False Positive Rates
Supervised-Learning-Wrapper method using NaïveBayes	Not-Density-based Clustering using EM	0.340
Supervised-Learning-Wrapper method using NaïveBayes	Density-based Clustering using EM	0.344
Supervised-Learning-Wrapper method using NaïveBayes	Not-Density-based Clustering using SimpleKMeans	0.200
Supervised-Learning-Wrapper method using NaïveBayes	Density-based Clustering using SimpleKMeans	0.209
Supervised-Learning-Wrapper method using J48	Not-Density-based Clustering using EM	0.381
Supervised-Learning-Wrapper method using J48	Density-based Clustering using EM	0.380
Supervised-Learning-Wrapper method using J48	Not-Density-based Clustering using SimpleKMeans	0.511
Supervised-Learning-Wrapper method using J48	Density-based Clustering using SimpleKMeans	0.490
Supervised-Learning-Wrapper method using DecisionTable	Not-Density-based Clustering using EM	0.356
Supervised-Learning-Wrapper method using DecisionTable	Density-based Clustering using EM	0.356
Supervised-Learning-Wrapper method using DecisionTable	Not-Density-based Clustering using SimpleKMeans	0.674
Supervised-Learning-Wrapper method using DecisionTable	Density-based Clustering using SimpleKMeans	0.638

Table A5. FPR Table using Supervised Learning in Hybrid Feature Selection.

Applied Clustering-Based Hybrid Feature Selection	Applied Clustering Methods in DDoS Attack Detection	False Positive Rates
Hybrid Feature Selection using ChiSquared and NaïveBayes	Not-Density-based Clustering using EM	0.340
Hybrid Feature Selection using ChiSquared and NaïveBayes	Density-based Clustering using EM	0.344
Hybrid Feature Selection using ChiSquared and NaïveBayes	Not-Density-based Clustering using SimpleKMeans	0.200
Hybrid Feature Selection using ChiSquared and NaïveBayes	Density-based Clustering using SimpleKMeans	0.209
Hybrid Feature Selection using Information Gain and NaïveBayes	Not-Density-based Clustering using EM	0.001
Hybrid Feature Selection using Information Gain and NaïveBayes	Density-based Clustering using EM	0.001
Hybrid Feature Selection using Information Gain and NaïveBayes	Not-Density-based Clustering using SimpleKMeans	0.199
Hybrid Feature Selection using Information Gain and NaïveBayes	Density-based Clustering using SimpleKMeans	0.198
Hybrid Feature Selection using ChiSquared and J48	Not-Density-based Clustering using EM	0.392
Hybrid Feature Selection using ChiSquared and J48	Density-based Clustering using EM	0.391
Hybrid Feature Selection using ChiSquared and J48	Not-Density-based Clustering using SimpleKMeans	0.373
Hybrid Feature Selection using ChiSquared and J48	Density-based Clustering using SimpleKMeans	0.367
Hybrid Feature Selection using Information Gain and J48	Not-Density-based Clustering using EM	0.326
Hybrid Feature Selection using Information Gain and J48	Density-based Clustering using EM	0.326
Hybrid Feature Selection using Information Gain and J48	Not-Density-based Clustering using SimpleKMeans	0.372
Hybrid Feature Selection using Information Gain and J48	Density-based Clustering using SimpleKMeans	0.369
Hybrid Feature Selection using ChiSquared and DecisionTable	Not-Density-based Clustering using EM	0.362
Hybrid Feature Selection using ChiSquared and DecisionTable	Density-based Clustering using EM	0.362
Hybrid Feature Selection using ChiSquared and DecisionTable	Not-Density-based Clustering using SimpleKMeans	0.674
Hybrid Feature Selection using ChiSquared and DecisionTable	Density-based Clustering using SimpleKMeans	0.638
Hybrid Feature Selection using InformationGain and DecisionTable	Not-Density-based Clustering using EM	0.362
Hybrid Feature Selection using InformationGain and DecisionTable	Density-based Clustering using EM	0.362
Hybrid Feature Selection using InformationGain and DecisionTable	Not-Density-based Clustering using SimpleKMeans	0.674
Hybrid Feature Selection using InformationGain and DecisionTable	Density-based Clustering using SimpleKMeans	0.638

Appendix C. Experimental Results Using NSL-KDD Dataset

Table A6. FPR Table using Clustering Feature Selection.

Applied Clustering-Based Feature Selection	Applied Clustering Methods in DDoS Attack Detection	False Positive Rates
Not-Density-Clustering-based-Wrapper method using EM	Not-Density-based Clustering using EM	0.017
Not-Density-Clustering-based-Wrapper method using EM	Density-based Clustering using EM	0.031
Not-Density-Clustering-based-Wrapper method using EM	Not-Density-based Clustering using SimpleKMeans	0.090
Not-Density-Clustering-based-Wrapper method using EM	Density-based Clustering using SimpleKMeans	0.093
Not-Density-Clustering-based-Wrapper method using SimpleKMeans	Not-Density-based Clustering using EM	0.045
Not-Density-Clustering-based-Wrapper method using SimpleKMeans	Density-based Clustering using EM	0.046
Not-Density-Clustering-based-Wrapper method using SimpleKMeans	Not-Density-based Clustering using SimpleKMeans	0.039
Not-Density-Clustering-based-Wrapper method using SimpleKMeans	Density-based Clustering using SimpleKMeans	0.042
Density-Clustering-based-Wrapper method using EM	Not-Density-based Clustering using EM	0.063
Density-Clustering-based-Wrapper method using EM	Density-based Clustering using EM	0.032
Density-Clustering-based-Wrapper method using EM	Not-Density-based Clustering using SimpleKMeans	0.045
Density-Clustering-based-Wrapper method using EM	Density-based Clustering using SimpleKMeans	0.053
Density-Clustering-based-Wrapper method using SimpleKMeans	Not-Density-based Clustering using EM	0.058
Density-Clustering-based-Wrapper method using SimpleKMeans	Density-based Clustering using EM	0.068
Density-Clustering-based-Wrapper method using SimpleKMeans	Not-Density-based Clustering using SimpleKMeans	0.003
Density-Clustering-based-Wrapper method using SimpleKMeans	Density-based Clustering using SimpleKMeans	0.033

Table A7. FPR Table using Clustering Method in Hybrid Feature Selection.

Applied Clustering-Based Hybrid Feature Selection	Applied Clustering Methods in DDoS Attack Detection	False Positive Rates
Not-Density-Clustering-based-Hybrid method using ChiSquared and EM	Not-Density-based Clustering using EM	0.021
Not-Density-Clustering-based-Hybrid method using ChiSquared and EM	Density-based Clustering using EM	0.040
Not-Density-Clustering-based-Hybrid method using ChiSquared and EM	Not-Density-based Clustering using SimpleKMeans	0.011
Not-Density-Clustering-based-Hybrid method using ChiSquared and EM	Density-based Clustering using SimpleKMeans	0.028
Not-Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeans	Not-Density-based Clustering using EM	0.045
Not-Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeans	Density-based Clustering using EM	0.046
Not-Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeans	Not-Density-based Clustering using SimpleKMeans	0.039
Not-Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeans	Density-based Clustering using SimpleKMeans	0.042
Not-Density-Clustering-based-Hybrid method using Information Gain and EM	Not-Density-based Clustering using EM	0.059
Not-Density-Clustering-based-Hybrid method using Information Gain and EM	Density-based Clustering using EM	0.026
Not-Density-Clustering-based-Hybrid method using Information Gain and EM	Not-Density-based Clustering using SimpleKMeans	0.043
Not-Density-Clustering-based-Hybrid method using Information Gain and EM	Density-based Clustering using SimpleKMeans	0.046
Not-Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeans	Not-Density-based Clustering using EM	0.068
Not-Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeans	Density-based Clustering using EM	0.044
Not-Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeans	Not-Density-based Clustering using SimpleKMeans	0.006
Not-Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeans	Density-based Clustering using SimpleKMeans	0.040
Density-Clustering-based-Hybrid method using ChiSquared and EM	Not-Density-based Clustering using EM	0.063
Density-Clustering-based-Hybrid method using ChiSquared and EM	Density-based Clustering using EM	0.032
Density-Clustering-based-Hybrid method using ChiSquared and EM	Not-Density-based Clustering using SimpleKMeans	0.045
Density-Clustering-based-Hybrid method using ChiSquared and EM	Density-based Clustering using SimpleKMeans	0.053
Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeans	Not-Density-based Clustering using EM	0.058
Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeans	Density-based Clustering using EM	0.068
Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeans	Not-Density-based Clustering using SimpleKMeans	0.003
Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeans	Density-based Clustering using SimpleKMeans	0.033
Density-Clustering-based-Hybrid method using Information Gain and EM	Not-Density-based Clustering using EM	0.059
Density-Clustering-based-Hybrid method using Information Gain and EM	Density-based Clustering using EM	0.012
Density-Clustering-based-Hybrid method using Information Gain and EM	Not-Density-based Clustering using SimpleKMeans	0.006
Density-Clustering-based-Hybrid method using Information Gain and EM	Density-based Clustering using SimpleKMeans	0.016
Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeans	Not-Density-based Clustering using EM	0.059
Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeans	Density-based Clustering using EM	0.012
Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeans	Not-Density-based Clustering using SimpleKMeans	0.006
Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeans	Density-based Clustering using SimpleKMeans	0.016

Table A8. FPR Table using Supervised Learning in Wrapper Feature Selection.

Applied Clustering-Based Feature Selection	Applied Clustering Methods in DDoS Attack Detection	False Positive Rates
Supervised-Learning-Wrapper method using NaïveBayes	Not-Density-based Clustering using EM	0.237
Supervised-Learning-Wrapper method using NaïveBayes	Density-based Clustering using EM	0.255
Supervised-Learning-Wrapper method using NaïveBayes	Not-Density-based Clustering using SimpleKMeans	0.185
Supervised-Learning-Wrapper method using NaïveBayes	Density-based Clustering using SimpleKMeans	0.228
Supervised-Learning-Wrapper method using J48	Not-Density-based Clustering using EM	0.091
Supervised-Learning-Wrapper method using J48	Density-based Clustering using EM	0.091
Supervised-Learning-Wrapper method using J48	Not-Density-based Clustering using SimpleKMeans	0.007
Supervised-Learning-Wrapper method using J48	Density-based Clustering using SimpleKMeans	0.035
Supervised-Learning-Wrapper method using DecisionTable	Not-Density-based Clustering using EM	0.142
Supervised-Learning-Wrapper method using DecisionTable	Density-based Clustering using EM	0.155
Supervised-Learning-Wrapper method using DecisionTable	Not-Density-based Clustering using SimpleKMeans	0.008
Supervised-Learning-Wrapper method using DecisionTable	Density-based Clustering using SimpleKMeans	0.106

Table A9. FPR Table using Supervised Learning in Hybrid Feature Selection.

Applied Clustering-Based Hybrid Feature Selection	Applied Clustering Methods in DDoS Attack Detection	False Positive Rates
Hybrid Feature Selection using ChiSquared and NaïveBayes	Not-Density-based Clustering using EM	0.237
Hybrid Feature Selection using ChiSquared and NaïveBayes	Density-based Clustering using EM	0.255
Hybrid Feature Selection using ChiSquared and NaïveBayes	Not-Density-based Clustering using SimpleKMeans	0.185
Hybrid Feature Selection using ChiSquared and NaïveBayes	Density-based Clustering using SimpleKMeans	0.228
Hybrid Feature Selection using Information Gain and NaïveBayes	Not-Density-based Clustering using EM	0.012
Hybrid Feature Selection using Information Gain and NaïveBayes	Density-based Clustering using EM	0.012
Hybrid Feature Selection using Information Gain and NaïveBayes	Not-Density-based Clustering using SimpleKMeans	0.170
Hybrid Feature Selection using Information Gain and NaïveBayes	Density-based Clustering using SimpleKMeans	0.172
Hybrid Feature Selection using ChiSquared and J48	Not-Density-based Clustering using EM	0.095
Hybrid Feature Selection using ChiSquared and J48	Density-based Clustering using EM	0.089
Hybrid Feature Selection using ChiSquared and J48	Not-Density-based Clustering using SimpleKMeans	0.007
Hybrid Feature Selection using ChiSquared and J48	Density-based Clustering using SimpleKMeans	0.036
Hybrid Feature Selection using Information Gain and J48	Not-Density-based Clustering using EM	0.059
Hybrid Feature Selection using Information Gain and J48	Density-based Clustering using EM	0.068
Hybrid Feature Selection using Information Gain and J48	Not-Density-based Clustering using SimpleKMeans	0.007
Hybrid Feature Selection using Information Gain and J48	Density-based Clustering using SimpleKMeans	0.044
Hybrid Feature Selection using ChiSquared and DecisionTable	Not-Density-based Clustering using EM	0.142
Hybrid Feature Selection using ChiSquared and DecisionTable	Density-based Clustering using EM	0.155
Hybrid Feature Selection using ChiSquared and DecisionTable	Not-Density-based Clustering using SimpleKMeans	0.008
Hybrid Feature Selection using ChiSquared and DecisionTable	Density-based Clustering using SimpleKMeans	0.106
Hybrid Feature Selection using InformationGain and DecisionTable	Not-Density-based Clustering using EM	0.068
Hybrid Feature Selection using InformationGain and DecisionTable	Density-based Clustering using EM	0.044
Hybrid Feature Selection using InformationGain and DecisionTable	Not-Density-based Clustering using SimpleKMeans	0.013
Hybrid Feature Selection using InformationGain and DecisionTable	Density-based Clustering using SimpleKMeans	0.036

Appendix D. Selected Features with Best Performance

Table A10. Selected Features with Best Performance using CICIDS2017.

Applied Feature Selection Methods	Selected Features
Density-Clustering-based-Wrapper method using SimpleKMeans	Total Length of Fwd Packets Bwd Packet Length Std Flow IAT Std Fwd IAT Mean act_data_pkt_fwd
Not-Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeans	Total Length of Fwd Packets Subflow Fwd Bytes Avg Bwd Segment Size Fwd IAT Mean Fwd IAT Std Bwd Packet Length Std
Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeans	Subflow Fwd Bytes Fwd IAT Mean act_data_pkt_fwd Bwd Packet Length Std Flow IAT Std
Density-Clustering-based-Hybrid method using Information Gain and EM	Total Length of Fwd Packets Subflow Fwd Bytes Avg Bwd Segment Size Destination Port Bwd Packet Length Max Avg Fwd Segment Size Fwd Packet Length Mean Init_Win_bytes_forward Fwd IAT Max Fwd IAT Mean Init_Win_bytes_backward Subflow Fwd Packets Total Fwd Packets Fwd IAT Std Packet Length Variance
Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeans	Total Length of Fwd Packets act_data_pkt_fwd Bwd Packet Length Std

Table A11. Selected Features with Best Performance using NSL-KDD.

Applied Feature Selection Methods	Selected Features
Density-Clustering-based-Wrapper method using SimpleKMeans	duration service flag hot su_attempted num_shells count srv_serror_rate same_srv_rate dst_host_count dst_host_srv_count dst_host_diff_srv_rate dst_host_srv_rerror_rate
Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeans	service flag same_srv_rate dst_host_srv_count dst_host_diff_srv_rate count srv_serror_rate dst_host_count dst_host_srv_rerror_rate duration hot su_attempted num_shells

References

Zeinalpour, A.; McElroy, C.P. Comparing metaheuristic search techniques in addressing the effectiveness of clustering-based DDoS attack detection methods. Electronics 2024, 13, 899. [Google Scholar] [CrossRef]
Najafimehr, M.; Zarifzadeh, S.; Mostafavi, S. DDoS attacks and machine-learning-based detection methods: A survey and taxonomy. Eng. Rep. 2023, 5, e12697. [Google Scholar] [CrossRef]
Das, S.; Ashrafuzzaman, M.; Sheldon, F.T.; Shiva, S. Ensembling supervised and unsupervised machine learning algorithms for detecting distributed denial of service attacks. Algorithms 2024, 17, 99. [Google Scholar] [CrossRef]
Riskhan, B.; Safuan, H.A.J.; Hussain, K.; Elnour, A.A.H.; Abdelmaboud, A.; Khan, F.; Kundi, M. An adaptive distributed denial of service attack prevention technique in a distributed environment. Sensors 2023, 23, 6574. [Google Scholar] [CrossRef]
Prasad, A.; Chandra, S. VMFCVD: An optimized framework to combat volumetric ddos attacks using machine learning. Arab. J. Sci. Eng. 2022, 47, 9965–9983. [Google Scholar] [CrossRef]
Xu, K.; Li, Z.; Liang, N.; Kong, F.; Lei, S.; Wang, S.; Paul, A.; Wu, Z. Research on Multi-Layer Defense against DDoS Attacks in Intelligent Distribution Networks. Electronics 2024, 13, 3583. [Google Scholar] [CrossRef]
Ali, T.E.; Yung-Wey, C.; Manickam, S.; Yusoff, M.N.; Kok-Lim, A.Y.; Zoltan, A.D. A stacking ensemble model with enhanced feature selection for Distributed Denial-of-Service detection in software-defined networks. Eng. Technol. Appl. Sci. Res. 2025, 15, 19232–19245. [Google Scholar] [CrossRef]
Zou, H. Clustering Algorithm and Its Application in Data Mining. Wirel. Pers. Commun. 2020, 110, 21–30. [Google Scholar] [CrossRef]
Ahmed, S.; Khan, Z.A.; Mohsin, S.M.; Latif, S.; Aslam, S.; Mujlid, H.; Adil, M.; Najam, Z. Effective and efficient DDoS attack detection using deep learning algorithm, multi-layer perceptron. Future Internet 2023, 15, 76. [Google Scholar] [CrossRef]
Belouch, M.; Elhadaj, S.; Idhammad, M. A hybrid filter-wrapper feature selection method for DDoS detection in cloud computing. Intell. Data Anal. 2018, 22, 1209–1226. [Google Scholar] [CrossRef]
Kim, Y.E.; Kim, Y.S.; Kim, H. Effective feature selection methods to detect IoT DDoS attack in 5G core network. Sensors 2022, 22, 3819. [Google Scholar] [CrossRef] [PubMed]
Zeinalpour, A. Addressing High False Positive Rates of DDoS Attack Detection Methods. Ph.D. Thesis, Walden University, Minneapolis, MN, USA, 2021. [Google Scholar]
Bhattacharjee, P.; Mitra, P. A survey of density based clustering algorithms. Front. Comput. Sci. 2021, 15, 151308. [Google Scholar] [CrossRef]
Hassan, A.I.; Reheem, E.A.E.; Guirguis, S.K. An entropy and machine learning based approach for DDoS attacks detection in software defned networks. Sci. Rep. 2024, 14, 18159. [Google Scholar] [CrossRef]
Alrayes, F.S.; Zakariah, M.; Amin, S.U.; Khan, Z.I.; Helal, M. Intrusion detection in IoT systems using denoising autoencoder. IEEE Access 2024, 12, 122401–122425. [Google Scholar] [CrossRef]
Ahn, B.; Abbas, E.; Park, J.A.; Choi, H.J. Increasing splicing site prediction by training gene set based on species. KSII Trans. Internet Inf. Syst. 2012, 6, 2784–2799. [Google Scholar] [CrossRef]
Altalhan, M.; Algarni, A.; Alouane, M.T.H. Imbalanced data problem in machine learning: A review. IEEE Access 2025, 13, 13686–13699. [Google Scholar] [CrossRef]
Aamir, M.; Zaidi, S.M.A. DDoS attack detection with feature engineering and machine learning: The framework and performance evaluation. Int. J. Inf. Secur. 2019, 18, 761–785. [Google Scholar] [CrossRef]
Dasari, S.; Kaluri, R. An effective classification of DDoS attacks in a distributed network by adopting hierarchical machine learning and hyperparameters optimization techniques. IEEE Access 2024, 12, 10834–10845. [Google Scholar] [CrossRef]
Revathi, M.; Ramalingam, V.V.; Amutha, B. A machine learning based detection and mitigation of the DDoS attack by using SDN controller framework. Wirel. Pers. Commun. Int. J. 2022, 127, 2417–2441. [Google Scholar] [CrossRef]
Adedeji, K.B.; Abu-Mahfouz, A.M.; Kurien, A.M. DDoS attack and detection methods in internet-enabled networks: Concept, research perspectives, and challenges. J. Sens. Actuator Netw. 2023, 12, 51. [Google Scholar] [CrossRef]
Keserwani, P.K.; Govil, M.C.; Pilli, E.S. An effective NIDS framework based on a comprehensive survey of feature optimization and classification techniques. Neural Comput. Appl. 2023, 35, 4993–5013. [Google Scholar] [CrossRef]
Yoachimik, O.; Pacheco, J. 4.2 Tbps of Bad Packets and a Whole Lot More: Cloudflare’s Q3 DDoS Report; Cloudflare, Inc.: San Francisco, CA, USA, 2024; Available online: https://blog.cloudflare.com/ddos-threat-report-for-2024-q3 (accessed on 30 October 2024).
Alduailij, M.; Khan, Q.W.; Tahir, M.; Sardaraz, M.; Alduailij, M.; Malik, F. Machine-learning-based DDoS attack detection using mutual information and random forest feature importance method. Symmetry 2022, 14, 1095. [Google Scholar] [CrossRef]
Abdullayeva, F.J. Distributed denial of service attack detection in E-government cloud via data clustering. Array 2022, 15, 100229. [Google Scholar] [CrossRef]
Zong, Y.; Huang, G. Application of artificial fish swarm optimization semi-supervised kernel fuzzy clustering algorithm in network intrusion. J. Intell. Fuzzy Syst. 2020, 39, 1619–1626. [Google Scholar] [CrossRef]
Panda, M.; Patra, M.R. Some clustering algorithms to enhance the performance of the network intrusion detection system. J. Theor. Appl. Inf. Technol. 2008, 26, 795–801. [Google Scholar]
Kriegel, H.P.; Kröger, P.; Sander, J.; Zimek, A. Density-based clustering. WIREs Data Min. Knowl. Discov. 2011, 1, 231–240. [Google Scholar] [CrossRef]
Mondragón, J.C.M.; Lara, E.R.; Eleuterio, R.A.; Gutirrez, E.E.G.; López, F.D.R. Density-based clustering to deal with highly imbalanced data in multi-class problems. Mathematics 2023, 11, 4008. [Google Scholar] [CrossRef]
Koo, J.; Hwang, S. A unified defect pattern analysis of wafer maps using density-based clustering. IEEE Access 2021, 9, 78873–78882. [Google Scholar] [CrossRef]
Zeinalpour, A.; Ahmed, H.A. Addressing the effectiveness of DDoS-attack detection methods based on the clustering method using an ensemble method. Electronics 2022, 11, 2736. [Google Scholar] [CrossRef]
Shakil, M.; Fuad Yousif Mohammed, A.; Arul, R.; Bashir, A.K.; Choi, J.K. A novel dynamic framework to detect DDoS in SDN using metaheuristic clustering. Trans. Emerg. Telecommun. Technol. 2019, 33, e3622. [Google Scholar] [CrossRef]
Bhaya, W.; Manaa, M.E. A proactive DDoS attack detection approach using data mining cluster analysis. J. Next Gener. Inf. Technol. 2014, 5, 36–47. [Google Scholar]
Bhaya, W.; Manaa, M. DDoS attack detection approach using an efficient cluster analysis in large data scale. In Proceedings of the IEEE 2017 Annual Conference on New Trends in Information & Communications Technology Applications, Baghdad, Iraq, 7–9 March 2017; pp. 168–173. [Google Scholar]
Qin, X.; Xu, T.; Wang, C. DDoS attack detection using flow entropy and clustering technique. In Proceedings of the IEEE 2015 11th International Conference on Computational Intelligence and Security, Shenzhen, China, 19–20 December 2015; pp. 412–415. [Google Scholar]
Al-mamory, S.O.; Algelal, Z.M. A modified DBSCAN clustering algorithm for proactive detection of DDoS attacks. In Proceedings of the IEEE 2017 Annual Conference on New Trends in Information & Communications Technology Applications, Baghdad, Iraq, 7–9 March 2017; pp. 304–309. [Google Scholar]
Ateş, Ç.; Özdel, S.; Anarım, E. Clustering based DDoS attack detection using the relationship between packet headers. In Proceedings of the IEEE 2019 Innovations in Intelligent Systems and Applications Conference, Izmir, Turkey, 31 October–2 November 2019; pp. 1–6. [Google Scholar]
Gu, Y.; Li, K.; Guo, Z.; Wang, Y. Semi-supervised K-means DDoS detection method using hybrid feature selection algorithm. IEEE Access 2019, 7, 64351–64365. [Google Scholar] [CrossRef]
Mansoor, A.; Anbar, M.; Bahashwan, A.A.; Alabsi, B.A.; Rihan, S.D.A. Deep Learning-Based Approach for Detecting DDoS Attack on Software-Defined Networking Controller. Systems 2023, 11, 296. [Google Scholar] [CrossRef]
Elejla, O.E.; Anbar, M.; Hamouda, S.; Faisal, S.; Bahashwan, A.A.; Hasbullah, I.H. Deep-Learning-Based Approach to Detect ICMPv6 Flooding DDoS Attacks on IPv6 Networks. Appl. Sci. 2022, 12, 6150. [Google Scholar] [CrossRef]
Wu, P.; Guo, H.; Moustafa, N. Pelican: A Deep Residual Network for Network Intrusion Detection. In Proceedings of the Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), Valencia, Spain, 29 June–2 July 2020; pp. 55–62. [Google Scholar]
Das, S.; Venugopal, D.; Shiva, S.; Sheldon, F.T. Empirical Evaluation of the Ensemble Framework for Feature Selection in DDoS Attack. In Proceedings of the IEEE International Conference on Edge Computing and Scalable Cloud (EdgeCom), New York, NY, USA, 1–3 August 2020; pp. 56–61. [Google Scholar]
Feng, Y.; Li, J.; Sisodia, D.; Reiher, P. On Explainable and Adaptable Detection of Distributed Denial-of-Service Traffic. IEEE Trans. Dependable Secur. Comput. 2023, 21, 2211–2226. [Google Scholar] [CrossRef]
Bhattacharya, S.; Selvakumar, S. Multi-measure multi-weight ranking approach for the identification of the network features for the detection of DoS and Probe attacks. Comput. J. 2016, 59, 923–943. [Google Scholar] [CrossRef]
Bhattacharya, S.; Selvakumar, S. LAWRA: A layered wrapper feature selection approach for network attack detection. Secur. Commun. Netw. 2015, 8, 3459–3468. [Google Scholar] [CrossRef]
Bouzoubaa, K.; Taher, Y.; Nsiri, B. Predicting DOS-DDOS attacks: Review and evaluation study of feature selection methods based on wrapper process. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 132–145. [Google Scholar] [CrossRef]
Bouzoubaa, K.; Taher, Y.; Nsiri, B. Dos attack forecasting: A comparative study on wrapper feature selection. In Proceedings of the IEEE 2020 International Conference on Intelligent Systems and Computer Vision, Fez, Morocco, 9–11 June 2020; pp. 1–7. [Google Scholar]
Polat, H.; Polat, O.; Cetin, A. Detecting DDoS Attacks in Software-Defined Networks Through Feature Selection Methods and Machine Learning Models. Sustainability 2020, 12, 1035. [Google Scholar] [CrossRef]
Budiman, A.; Hamidi, E.A.Z.; Ahdan, S.; Negara, R.M. Wrapper-Based Feature Selection to Improve The Accuracy of Intrusion Detection System (IDS). In Proceedings of the IEEE 2024 10th International Conference on Wireless and Telematics, Batam, Indonesia, 4–5 July 2024; pp. 1–5. [Google Scholar]
Saha, S.; Priyoti, A.T.; Sharma, A.; Haque, A. Towards an optimized ensemble feature selection for DDoS detection using both supervised and unsupervised method. Sensors 2022, 22, 9144. [Google Scholar] [CrossRef]
Miniak-Górecka, A.; Podlaski, K.; Gwizdałła, T. Using k-means clustering in python with periodic boundary conditions. Symmetry 2022, 14, 1237. [Google Scholar] [CrossRef]
Yang, M.S.; Lai, C.Y.; Lin, C.Y. A robust EM clustering algorithm for Gaussian mixture models. Pattern Recognit. 2012, 45, 3950–3961. [Google Scholar] [CrossRef]
Ellis, T.J.; Levy, Y. Towards a guide for novice researchers on research methodology: Review and proposed methods. J. Issues Inf. Sci. Inf. Technol. 2009, 6, 323–337. [Google Scholar]
Sarker, I.H. Machine Learning for intelligent data analysis and automation in cybersecurity: Current and future prospects. Ann. Data Sci. 2023, 10, 1473–1498. [Google Scholar] [CrossRef]
Chiba, Z.; Abghour, N.; Moussaid, K.; El omri, A.; Rida, M. Intelligent approach to build a Deep Neural Network based IDS for cloud environment using combination of machine learning algorithms. Comput. Secur. 2019, 86, 291–317. [Google Scholar] [CrossRef]
Haskasa, E.; Kalemi, E.; Koci, L.; Shpk, C.C. The influence that WEKA workbench has in processing information. In Proceedings of the ISCIM, Langkawi, Malaysia, 7–9 April 2013; pp. 27–37. [Google Scholar]
Green, S.B.; Salkind, N.J. Using SPSS for Windows and Macintosh: Analyzing and Understanding the Data, 8th ed.; Pearson: Upper Saddle River, NJ, USA, 2017; p. 131. [Google Scholar]
Arango-López, J.; Isaza, G.; Ramirez, F.; Duque, N.; Montes, J. Cloud-based deep learning architecture for DDoS cyber attack prediction. Expert Syst. 2025, 42, e13552. [Google Scholar] [CrossRef]
Najar, A.A.; Naik, S.M. DDoS attack detection using MLP and Random Forest algorithms. Int. J. Inf. Tecnol. 2022, 14, 2317–2327. [Google Scholar] [CrossRef]
Kaliyaperumal, P.; Periyasamy, S.; Thirumalaisamy, M.; Balusamy, B.; Benedetto, F. A novel hybrid unsupervised learning approach for enhanced cybersecurity in the IoT. Future Internet 2024, 16, 253. [Google Scholar] [CrossRef]
Emadi, H.S.; Mazinani, S.M. A Novel Anomaly Detection Algorithm Using DBSCAN and SVM in Wireless Sensor Networks. Wirel. Pers. Commun. 2018, 98, 2025–2035. [Google Scholar] [CrossRef]

Figure 1. Clustering-based DDoS attack detection method construction using CICIDS2017.

Figure 2. Clustering-based DDoS attack detection method construction using NSL-KDD.

Table 1. Tests between subjects applying wrapper using CICIDS2017.

Source	Type III Sum of Squares	df	Mean Square	F	Sig.	Partial Eta Squared
Corrected Model	0.378 ^a	1	0.378	10.547	0.003	0.289
Intercept	2.294	1	2.294	63.969	<0.001	0.711
Method	0.378	1	0.378	10.547	0.003	0.289
Error	0.932	26	0.036
Total	3.388	28
Corrected Total	1.310	27

^a R Squared = 0.289 (Adjusted R Squared = 0.261).

Table 2. Tests between subjects applying hybrid using CICIDS2017.

Source	Type III Sum of Squares	df	Mean Square	F	Sig.	Partial Eta Squared
Corrected Model	0.400 ^a	1	0.400	10.043	0.003	0.157
Intercept	3.939	1	3.939	98.897	<0.001	0.647
Method	0.400	1	0.400	10.043	0.003	0.157
Error	2.151	54	0.040
Total	6.213	56
Corrected Total	2.550	55

^a R Squared = 0.157 (Adjusted R Squared = 0.141).

Table 3. Tests between subjects applying wrapper using NSL-KDD.

Source	Type III Sum of Squares	df	Mean Square	F	Sig.	Partial Eta Squared
Corrected Model	0.045 ^a	1	0.045	12.768	0.001	0.329
Intercept	0.212	1	0.212	60.141	<0.001	0.698
Method	0.045	1	0.045	12.768	0.001	0.329
Error	0.092	26	0.004
Total	0.325	28
Corrected Total	0.136	27

^a R Squared = 0.329 (Adjusted R Squared = 0.304).

Table 4. Tests between subjects applying hybrid using NSL-KDD.

Source	Type III Sum of Squares	df	Mean Square	F	Sig.	Partial Eta Squared
Corrected Model	0.046 ^a	1	0.046	15.511	<0.001	0.223
Intercept	0.230	1	0.230	77.572	<0.001	0.590
Method	0.046	1	0.046	15.511	<0.001	0.223
Error	0.160	54	0.003
Total	0.412	56
Corrected Total	0.206	55

^a R Squared = 0.223 (Adjusted R Squared = 0.209).

Table 5. Descriptive statistics when applying wrapper using CICIDS2017.

Method	Mean	Std. Deviation	N
Clustering-based Wrapper	0.17175	0.214897	16
Supervised Wrapper	0.40658	0.147547	12
Total	0.27239	0.220297	28

Table 6. Descriptive statistics when applying hybrid using CICIDS2017.

Method	Mean	Std. Deviation	N
Clustering-based hybrid feature selection	0.18256	0.215233	32
Supervised learning hybrid feature selection	0.35333	0.176245	24
Total	0.25575	0.215342	56

Table 7. Descriptive statistics when applying wrapper using NSL-KDD.

Method	Mean	Std. Deviation	N
Clustering-based wrapper	0.04738	0.023703	16
Supervised wrapper	0.12833	0.086914	12
Total	0.08207	0.071094	28

Table 8. Descriptive statistics when applying hybrid using NSL-KDD.

Method	Mean	Std. Deviation	N
Clustering-based hybrid feature selection	0.03578	0.020054	32
Supervised learning hybrid feature selection	0.09367	0.080083	24
Total	0.06059	0.061189	56

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zeinalpour, A.; McElroy, C.P. Comparative Analysis of Feature Selection Methods in Clustering-Based Detection Methods. Electronics 2025, 14, 2119. https://doi.org/10.3390/electronics14112119

AMA Style

Zeinalpour A, McElroy CP. Comparative Analysis of Feature Selection Methods in Clustering-Based Detection Methods. Electronics. 2025; 14(11):2119. https://doi.org/10.3390/electronics14112119

Chicago/Turabian Style

Zeinalpour, Alireza, and Charles P. McElroy. 2025. "Comparative Analysis of Feature Selection Methods in Clustering-Based Detection Methods" Electronics 14, no. 11: 2119. https://doi.org/10.3390/electronics14112119

APA Style

Zeinalpour, A., & McElroy, C. P. (2025). Comparative Analysis of Feature Selection Methods in Clustering-Based Detection Methods. Electronics, 14(11), 2119. https://doi.org/10.3390/electronics14112119

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comparative Analysis of Feature Selection Methods in Clustering-Based Detection Methods

Abstract

1. Introduction

2. Materials and Methods

3. Literature Review

3.1. Concerns Surrounding DDoS Attacks

3.2. Application of Clustering Algorithms in DDoS Attack Detection

4. Data Analysis and Experimentation

4.1. Statistical Analysis Using One-Way ANOVA Considering the CICIDS2017 Dataset

4.2. Statistical Analysis Using One-Way ANOVA Considering NSL-KDD Dataset

4.3. Comparison Analysis Based on Descriptive Statistics Using CICIDS2017 Dataset

4.4. Comparison Analysis Based on Descriptive Statistics Using NSL-KDD Dataset

5. Discussion

Analysis of Mechanism Effectiveness

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Independent Variables Table

Appendix B. Experimental Results Using CICIDS2017 Dataset

Appendix C. Experimental Results Using NSL-KDD Dataset

Appendix D. Selected Features with Best Performance

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI