Next Article in Journal
Predictive Mobility Model for β-Ga2O3 at Cryogenic Temperature
Previous Article in Journal
Remote Vibration Monitoring of Combustion Engines Utilising Edge Computing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Comparative Analysis of Feature Selection Methods in Clustering-Based Detection Methods

by
Alireza Zeinalpour
* and
Charles P. McElroy
*
Department of Information Systems, Monte Ahuja College of Business, Cleveland State University, Cleveland, OH 44115, USA
*
Authors to whom correspondence should be addressed.
Electronics 2025, 14(11), 2119; https://doi.org/10.3390/electronics14112119
Submission received: 2 April 2025 / Revised: 15 May 2025 / Accepted: 19 May 2025 / Published: 23 May 2025

Abstract

:
Feature selection plays a crucial role in the effectiveness of distributed denial of service (DDoS) attack detection methods, particularly as network traffic data becomes increasingly complex. This study conducts a categorical investigation of feature selection methods in clustering-based DDoS attack detection, comparing wrapper and hybrid approaches. Through two experiments using one-way ANOVA analyses, the research evaluated the effectiveness of different clustering approaches and supervised learning algorithms. The findings reveal that clustering-based wrapper methods performed more effectively than supervised learning approaches in feature selection for clustering-based DDoS attack detection methods. The results show strong statistical significance for clustering-based methods, with p-values of less than 0.05 and η2 values indicating robust relationships between methods. Our clustering-based wrapper approach achieved a 57.7% reduction in false positive rates compared to supervised learning methods (mean FPR of 0.17 versus 0.40) on the CICIDS2017 dataset, with certain configurations reaching a false positive rate of 0.000. A similar pattern was observed with the NSL-KD dataset, where clustering-based methods reduced false positive rates by 63.1% compared to supervised approaches (0.048 versus 0.128). This study provides empirical evidence for effective combinations in which organizations and agencies can implement DDoS attack detection methods that have high performance.

1. Introduction

The occurrence of Distributed Denial of Service (DDoS) attacks has amplified with complexity over the last decade [1]. These attacks signify major security risks to computer networks [2]. An attacker launches a DDoS attack by overwhelming a victim system’s resources with a large number of network traffic requests [1]. Introducing effective methods for identifying these attacks is essential, yet it is challenging to do so due to the complexity of these attacks [2]. There is a dearth of categorical investigations on machine learning algorithms that enables researchers to properly compare and make assessments of the various approaches [2]. It is therefore essential to assess the effectiveness of machine learning algorithms categorically. This way, researchers can ensure the proper and systematic enhancement of the performance of machine learning algorithms in analyzing network traffic data. This study addresses that challenge.
A denial of service (DoS) attack sends a large number of service requests in a malicious way to negatively impact the functioning of a server in a network [3]. A DoS attack is primarily launched from a device targeting an intended server as the victim. A DDoS attack is a variation of a DoS attack that is more dangerous because it is initiated from several devices to a targeted server [3]. Modern cyber intrusions are complex in that distributed settings can go unidentified, in which case DDoS attacks can block the accessibility of resources [4]. This problem can lead to service unavailability, substantial income loss, reputation damage, and a bad user experience, to name but a few negative outcomes [5]. The evaluation of a large volume of network traffic data is challenging in anomaly-based DDoS attack detection [1]. This is particularly the case for clustering-based detection methods. Analyzing a large volume of data requires significant processing capabilities, and has a negative impact on the performance of the detection models [6]. Many studies have not paid attention to the significance of feature selection and focused mostly on the performance of modeling and classification [7]. According to Xu [6], there is an urgent need for effective modeling while maximizing efficiency and the security of network infrastructure. Feature selection should be carried out appropriately to assess all the necessary information for detection, thereby reducing redundant features [8].
The ramifications of this stream of research can have a broad impact. Statistical data reveal that threats from cyber-attacks hit 20% of small businesses, 33% of Small-Medium Enterprises (SMEs), and 41% of large businesses [9]. Given the nature of these threats posed by DDoS attacks, it becomes critical to investigate and propose effective DDoS attack detection methods in a systematic way, which supports further theory development. A total of 82% of organizations have suffered data thefts as a result of DDoS attacks [9]. Consequently, we performed a comparative analysis of feature selection methods by considering two categories of machine learning, focusing on the clustering-based detection methods. We compared clustering algorithms in the feature selection process that used supervised learning vs. unsupervised learning algorithms.
The feature selection phase of the process focuses on assessing large volumes of network traffic data to improve the classification accuracy through the identification of relevant features [10]. Filter and wrapper methods are two types of feature selection methods used in selecting proper network traffic data for training via machine learning algorithms. The filter method uses statistical measures to identify features, while the wrapper method uses a machine learning algorithm to determine features [11]. Two types of algorithms can be applied in the wrapper method for evaluating network traffic data—they are unsupervised and supervised learning algorithms. Clustering algorithms, as opposed to supervised learning algorithms, are unsupervised techniques that do not require data labels for training.
The clustering approach is a famous technique for intrusion detection methods. Similarity-based and distance-based clustering are two types of techniques in which the algorithm tries to maximize the distances of data points between clusters and minimize distances within clusters, leading to the effective categorization of DDoS attacks [12]. Another type of clustering approach is density-based. This approach categorizes data points in clusters in accordance with the density of a region, ensuring proper noise filtering [13]. ‘MakeDensityBasedClusterer’ is a clustering approach that is available in the Weka (Waikato Environment for Knowledge Analysis) tool. This clustering approach can incorporate similarity-based and distance-based clustering methods.
Weka tools comprise machine learning algorithms, and these tools are used to assess network traffic data for attack detection. This tool was a DDoS attack detection method used to properly assess the data in a study by Zeinalpour [12]. In this study, we consider clustering-based approaches comprised of both density-based-clustering and non-density-based-clustering. What are considered to be density-based-clustering approaches utilize the ‘MakeDensityBasedClusterer’ as opposed to not-density-based-clustering strategies.
We conducted a comparative analysis of feature selection through considering the hybrid and wrapper approaches. The hybrid feature selection approach uses the filter method preceded by the wrapper method. We used the Naïve Bayes, J48, and DecisionTable supervised learning algorithms, which are applied in the wrapper method when the clustering approach is incorporated. The subsequent comparison of the performance is conducted through consideration of the false positive rate. Currently, there is a paucity of studies that look at this issue.
The findings of this study contribute to the improvement of DDoS attack detection methods by reflecting on the most effective combinations of feature selection and clustering approaches. Indeed, while numerous studies have improved the detection accuracy (citing Ali et al. [7]), relatively few have investigated the comparative effectiveness of different feature selection paradigms for clustering-based detection. Our study addresses this gap by providing a systematic comparison of supervised versus unsupervised clustering approaches and feature selection within clustering-based detection frameworks, and by evaluating these approaches using the critical metric of false positive rates, which has been understudied in prior research. Through the use of effective combinations, organizations and agencies can implement DDoS attack detection methods that have a high performance value, reducing their exposure to this type of attack.

2. Materials and Methods

According to Najafimehr et al. [2], the absence of research that examines machine learning algorithms based on their classification capabilities prevents the suitable assessment of various methods in this field. Existing DDoS attack detection methods may not perform well in identifying novel attacks due to the sophistication and increasing complexities of the DDOS attacks [2]. In recent years, identifying DDoS attacks has become difficult due to the diversity of techniques in launching them [14]. Assessing the large volume of network traffic data generated through this type of attack is challenging [1]. This is even more true when multi-vector DDoS attacks are present. Hassan et al. [14] argue that the nature of network traffic data is dynamic, and that with the use of multiple attack protocols, introducing robust defense mechanisms becomes essential. Consequently, performing a comparative analysis of various feature selection methods is of paramount importance. Grounded in this purpose, we constructed two hypotheses in which two ex post facto design experimentations of the A-B single group were considered. The base measures are the control group, while the experimental measures are for the experimental group that includes interventions [1]. ‘A’ denotes the control group and ‘B’ denotes the experimental group.
The first experiment reflects the following research question: “Does incorporating a clustering-based-wrapper method differ in effectiveness as opposed to supervised-learning-wrapper method in clustering-based detection of DDoS attacks?” The corresponding null hypothesis is that there is no difference in effectiveness when incorporating a clustering-based wrapper method against a supervised learning wrapper method in clustering-based detection of DDoS attacks. The second experiment reflects the following question: “Does incorporating clustering-based-hybrid-feature-selection method differ in effectiveness as opposed to a supervised-learning-hybrid-feature-selection method in clustering-based detection of DDoS attacks?” The null hypothesis is that there is no difference in effectiveness when incorporating the clustering-based hybrid feature selection method against the supervised learning hybrid feature selection method in the clustering-based detection of DDoS attacks. The clustering-based wrapper method and the clustering-based hybrid feature selection method are the experimental groups, respectively. The corresponding control groups are the supervised learning wrapper and supervised learning hybrid feature selection methods. We used the entire CICIDS2017 dataset, which contains both DDoS and benign events. This dataset includes real-world data of harmless traffic and attack traffic in a CSV format [15]. To further confirm the statistical results of our two hypotheses, we also used the ‘KDDTrain+.ARFF’, which is a full NSL-KDD training dataset. The NSL-KDD dataset is a well-established network traffic dataset [15]. We performed one-way ANOVA analyses to test our hypotheses, which allowed us to specify a factor variable reflecting on the various groups that considered in our research study, as well as a dependent variable, which was the false positive rate. The name of the independent variables with the corresponding values are presented in Table A1 under Appendix A.
The first research question examines the statistical clustering-based wrapper method against the supervised machine learning wrapper method in this form of the clustering-based detection of DDoS attacks. The clustering-based wrapper method is one in which the wrapper method incorporates a clustering technique as the machine learning algorithm to evaluate network traffic data. We used k-means and expectation and maximization, known as EM, to apply the clustering approach. We also considered ‘MakeDensityBasedClusterer’ as another clustering approach using k-means and EM. The k-means and EM algorithms use Euclidean distance to perform clustering, while ‘MakeDensityBasedClusterer’ ensures clustering analysis based on density. We used J48, NaïveBayes, and DecisionTable as the corresponding supervised learning algorithms. J48 is a tree-based decision learning algorithm that constructs a tree-based structure for assessing features. NaïveBayes performs the assessment of features based on the Bayesian formula. DecisionTable is a rule-based learning algorithm that constructs decision tables to map and evaluate network traffic data. We used k-means, EM, and ‘MakeDensityBasedClusterer’ to construct the clustering-based DDoS attack detection methods.
The second research question assesses the clustering-based hybrid feature selection method against a supervised machine learning hybrid feature selection method for the clustering-based detection of DDoS attacks. The clustering-based hybrid feature selection method is a hybrid approach in which the filter method is incorporated prior to the wrapper method. We used ChiSquared and Information Gain as the corresponding algorithms for the filter method to evaluate the network traffic data. We used the machine learning algorithms that we considered for the wrapper regarding the first research question for the hybrid approach in this research question. In this case, we also used k-means, EM, and ‘MakeDensityBasedClusterer’ to construct clustering-based DDoS attack detection methods.
In this study, we used Weka (Waikato Environment for Knowledge Analysis) Workbench to build the DDoS attack detection models based on their applied feature selection methods. As the name of this tool suggests, it is used to facilitate knowledge discovery from data in building effective machine learning models. This tool offers a series of capabilities in assessing data and constructing prediction models [16]. The filter and wrapper methods rely on a search method for identifying the optimal attributes. The Ranker search method with the threshold of ‘0.5’, proposed by Zeinalpour [12], was used in the filter method to select attributes, while the wrapper method used ‘BestFirst’, with the default settings provided by Weka.
In summary, the integration operates through an iterative feedback mechanism: initially, the clustering algorithm (k-means or EM) generates clusters without labels, which are then evaluated against ground truth using the silhouette score, measuring the quality of cluster cohesion and separation. For each candidate feature subset proposed by the wrapper method, the clustering algorithm recomputes clusters and evaluates the resulting silhouette score. Feature subsets that improve this score are prioritized, creating a selection mechanism that optimizes for cluster quality rather than classification accuracy directly. This approach preserves the unsupervised nature of clustering while leveraging labeled data for validation, offering advantages in detecting novel attack patterns that might be misclassified by purely supervised approaches. In this study, we collected the false positives rates corresponding to categorization or classification of DDoS events.
Machine learning algorithms are susceptible to overfitting when learning models are trained to perform better on one set of network traffic events than another. Data quality can also impact a learning model’s performance. We ensured that the problems of overfitting and data quality were addressed through the consideration of the following data preprocessing procedures. For the CICIDS2017 dataset, we first manually removed ‘Fwd_Header_Length’, which was a duplicate attribute. This enabled Weka to read the data. Afterward, we applied the Numeric Cleaner to enable min–max normalization in order to process the data. In Weka, the ‘Normalize’ procedure performs the min–max normalization. Normalization requires numeric cleaning on values that are outliers, and without normalization, machine learning algorithms cannot properly undertake learning from processing network traffic data [12]. Subsequently, we applied EM imputation to address the missing values of the ‘Flow Bytes/s’ attribute. Missing values make learning challenging, as attributes with missing values lead to improper modeling [12]. Then, we applied the SpreadSubSample procedure with the distribution spread value of ‘1.0’ to balance the dataset. Imbalanced data increase the complexity of obtaining accurate results [17]. This problem leads the machine learning algorithms to produce biased results from analyzing the network traffic data [12]. Finally, we applied the ‘Randomize’ procedure provided by Weka to ensure that all the data were randomized and that the data of the same network traffic event were not aligned together. With respect to the NSL-KDD dataset, since it did not have any duplicate, outlier, or missing value, we were able to apply the min–max normalization, SpreadSubSample, and Randomize approaches in addressing the overfitting and data quality problems. In addition to the mentioned data preprocessing procedures, we incorporated a 10-fold cross validation method, which is a generalization approach in ensuring accurate modeling and results.

3. Literature Review

3.1. Concerns Surrounding DDoS Attacks

A DoS attack artificially creates a massive volume of traffic on a network, which overwhelms the computing power of the network [18]. Identifying attack patterns in a large volume of network traffic generated through this type of intrusion is difficult [1]. The prevalence of DDoS attacks has risen in recent years [14]. These attacks are the most common type of attack launched through networks [19]. The dynamic nature of this type of attack and the presence of multiple attack protocols requires the development of sophisticated defense mechanisms [14]. When a network is not configured properly, the corresponding network controller will be activated by a DDoS attack, enlarging the attack surface of the network [20]. Currently, this type of attack, e.g., focusing on the network controller, is the most common vector for a network intrusion [21]. The risk of bringing down servers that do not have protective layers in a short period of time is high, and therefore this type of attack makes it very difficult for organizations to provide uninterrupted services [5]. Since DDoS attacks are generated through various sources, locating the origin becomes especially difficult [19]. Hackers will try to keep the sessions open for the longest possible duration [1]. The estimated cost of dealing with each attack is around USD 3 million per organization [22]. Therefore, it becomes an economic imperative to design strong mechanisms to identify DDoS attacks [19].
Anomaly detection methods apply machine learning algorithms in identifying unusual activity on the network. Network intrusion detection methods that incorporate machine learning algorithms are effective in this regard. In anomaly-based intrusion detection methods, if network traffic patterns deviate from what are considered normal, the respective patterns are recognized as anomalies [22]. Anomaly-based methods are susceptible to producing high false alarm rates [22]. The modernization of network traffic data can increase this vulnerability. According to Prasad and Chandra [5], 54% of DDoS attacks that occurred between January 2020 and March 2021 were launched through these modern attack vectors. In the third quarter of 2024, the Cloudflare cybersecurity company dealt with 6 million DDoS attacks, and some of these attacks sent 2 billion packets per second [23]. However, one metric that can be used to measure the effectiveness of anomaly-based DDoS attack detection methods is the false positive rate. It is measured according to the number of false positives, which is divided by the summation of the total number of false positives and true negatives. False positives erroneously identify normal network traffic patterns as being DDoS attacks. Conversely, true negatives correctly recognize normal network traffic patterns.
One major challenge of DDoS attack detection methods is dealing with high dimensional data. High dimensional data require huge computational power as well as longer training periods, and they increase the chance that anomaly-based methods will overfit the data [24]. Feature selection is an important process for efficient and effective learning in enhancing the operational efficiency of machine learning and reducing overfitting and improving the accuracy of the algorithms [11]. The focus of our study is on assessing the effectiveness of DDoS attack detection methods categorically.

3.2. Application of Clustering Algorithms in DDoS Attack Detection

Supervised and unsupervised machine learning approaches are used in detecting DDoS attacks, with clustering algorithms being the most common unsupervised approach [25]. Given the significance of clustering algorithms being considered in DDoS attack detection methods, it is important to develop systematic and categorical investigations of this approach. Cybercrime has become a big business, and stolen data are a significant problem for any business [9]. Clustering leads to low detection rates [26]. This approach is challenging in research, as it can be considered an independent tool in assessing data patterns and finding particular clustering analysis [27]. Clustering-based detection methods assess network traffic data through similarity-based and distance-based approaches via the distance of the data points. These two forms of clustering by themselves cannot perform a density-based analysis of network traffic data. However, they can be used together to perform density-based analysis. We have used ‘MakeDensityBasedClusterer’ provided by Weka for this purpose. Density-based cluster analysis can be considered one approach that applies a cut among data points based on the density level obtained from a probability function [28]. According to Mondragón et al. [29], the robustness of density-based clustering against noise, as well as its enhanced quality in clustering, have been demonstrated. The calculation is based on the analysis of data points with respect to clusters given a certain number of objects k (a predetermined threshold) considering the radius of a neighborhood [30].
For the proper categorization of data points, clustering algorithms, whether density-based or not density-based, need suitable feature selection methods. The curse of dimensionality is a problem in DDoS attack detection methods [12]. This problem is due to the large volume of network traffic features that have a negative impact on the performance of DDoS attack detection methods [31]. The aim of these attack detection methods is to have network traffic data between categories at their maximum distances, while the distances within clusters should be at their minimum [12]. This enables the methods to identify clusters of data points. The Zeinalpour study [12] investigated the addition of the filter and wrapper methods prior to the clustering algorithms. This study compared the performance of clustering-based DDoS attack detection methods when the filter method was applied in contrast to when the wrapper method that was incorporated after the filter method was used. The application of the wrapper method after the filter method made the feature selection process hybrid. The study [31] took the investigation further to perform one-way ANOVA statistical analyses, and found that the wrapper method had slightly better performance than the filter method.
The study [1] compared the hybrid approach to when only the wrapper method was incorporated in clustering-based DDoS attack detection methods. This study found that the “BestFirst” search method outperformed the metaheuristic search methods in searching the feature space for optimal solutions. This was in accordance with the one-way ANOVA statistical analyses when the search methods were incorporated into the wrapper method in the considered approaches.
Metaheuristics-driven DDoS detection has drawn increasing attention, with optimization algorithms like the Whale Optimization Algorithm (WOA) [27], Firefly Search Algorithm (FSA) [1], and ensemble methods [31] being applied to enhance clustering performance. For instance, Shakil et al. [32] employed WOA to dynamically adjust clustering centroids for Software-Defined Networking (SDN)-based DDoS detection. Likewise, Zeinalpour and McElroy [1] and Zeinalpour and Ahmed [31] explored feature selection-based optimizations combined with clustering, but these studies primarily focused on improving parameters or reducing false positive rates rather than developing a taxonomy of clustering approaches. Furthermore, limited evaluation with key metrics (e.g., silhouette scores) and the sparse inclusion of real-world datasets restrict the applicability of these works.
Clustering techniques, commonly employed for unsupervised anomaly detection, have demonstrated effectiveness in this domain when integrated with metaheuristics—optimization methods aimed at improving clustering adaptability and efficiency in large-scale datasets. However, the existing literature lacks a systematic taxonomy of clustering techniques specifically tailored to DDoS attack detection, evaluated comprehensively using internal and external validation metrics (e.g., silhouette scores, the F-measure, and the true positive rate (TPR)) across both simulated and real-world datasets.
Several studies explore specific clustering methods for DDoS detection, but fall short of generalization or taxonomy development. Many studies have overlooked the significant role that feature selection plays in the effectiveness of detection models, and the most attention was being paid to increasing the accuracy and performance of detection models [7]. For example, Bhaya and Manaa [33] and Bhaya and Manaa [34] proposed early clustering-based approaches using unsupervised methods such as k-means and CURE, providing high accuracy (>99%) and a high F-measure (97.98%) in DDoS detection using CAIDA datasets. Modified or hybrid clustering methods, such as entropy-enhanced approaches [35,36] and non-parametric clustering [37], address issues like dynamic detection thresholds and overlapping data characteristics, but remain focused on specific algorithmic improvements without a broader framework. Similarly, Gu et al. [38] introduced a semi-supervised weighted k-means method leveraging hybrid feature selection, tested extensively on simulated (DARPA, CAIDA, and CICIDS2017) and real-world datasets, demonstrating superior performance metrics. However, these studies do not generalize findings into comprehensive taxonomies. The argument of focusing only on algorithmic improvement has been made by Ali et al. [7] as well. For example, research studies [39,40,41] examined a three-stage deep learning model, deep learning in detecting ICMPv6 DDoS attacks, and a Pelican deep learning model in evaluating performance, respectively. As another example, according to Ali et al. [7], this is present in [42], in which ensemble feature selection was applied to enhance accuracy. Also, efforts to systematically adopt performance metrics, such as the F-measure and TPR, remain inconsistent. While studies like [33,34,36] incorporate the F-measure (up to 97.98%) or detection rates (~96–98%), other key metrics, such as silhouette scores, are rarely utilized. To stay consistent and build upon the studies [1,12,31], as it relates to clustering-based DDoS attack detection methods, we considered false positive rates as being a key and underutilized metric.
Additionally, few works test approaches across both simulated and real-world datasets; exceptions include Gu et al. [38], who integrate dataset diversity, and Feng et al. [43], who enhance adaptability through explainable clustering methods evaluated in both contexts. Despite these advancements, no study comprehensively investigates clustering techniques to address the analysis of large volumes of network traffic data more effectively. These gaps highlight the need for a structured taxonomy of clustering techniques for DDoS detection. This work seeks to consolidate the existing research and identify avenues for a systematic approach to clustering technique classification, emphasizing performance benchmarking and practical applicability.
The filter method, as opposed to the wrapper method, evaluates attributes individually without relying on predictive models [44]. Wrapper-based methods, which evaluate feature subsets iteratively using predictive models, are particularly promising for optimizing detection metrics such as accuracy, false positives (FPs), and false negatives (FNs). Based on Bhattacharya and Selvakumar [44], because of the wrapper method in which a subset of features are evaluated in a group considering a given class of information, a feature within that group can be more informative. In the context of DDoS detection, two distinct approaches to wrapper-based feature selection have garnered attention: supervised learning wrapper methods, which leverage labeled data and classifiers (e.g., decision trees and Random Forest, etc.), and clustering-based wrapper methods, which rely on unsupervised clustering and cluster quality indices (e.g., the F-measure and Davies–Bouldin index) to evaluate feature subsets. However, a systematic comparative evaluation of these two paradigms concerning DDoS detection remains underexplored, especially with regard to specific metrics such as detection accuracy, FPs, and FNs.
Several studies have examined the role of clustering-based wrapper methods in DDoS detection. Bhattacharya and Selvakumar proposed LAWRA, a layered clustering-wrapper framework utilizing external cluster validity indices and cooperative game theory to optimize feature selection [45]. This approach demonstrated an improved detection accuracy and F-measure compared to classifier-driven methods, highlighting the potential of clustering-based wrappers in high-dimensional, unlabeled settings. Similarly, Bhattacharya and Selvakumar extended these principles through a multi-weight ranking approach, integrating clustering and filter methods to prioritize features, achieving higher detection accuracy in identifying DDoS and probe attacks [44]. Despite these contributions, limited attention was given to evaluating FNs or directly comparing clustering-based methods with supervised learning approaches in feature selection.
In contrast, supervised learning wrapper methods predominately use labeled datasets, where they efficiently optimize detection models for accuracy and precision. For example, wrapper-based feature selection using algorithms such as Random Forest, Genetic Algorithms, and KNN classifiers has achieved high detection accuracy, often exceeding 96%, in multiple DDoS detection contexts [46,47,48]. These approaches, however, face challenges in overfitting and reduced generalizability to novel attack types, as highlighted in works emphasizing supervised classifiers’ dependency on labeled data [47,48,49]. While supervised methods consistently outperform clustering-based approaches in accuracy, they often neglect metrics critical to DDoS contexts, such as FNs and FPs.
Emerging research suggests that hybrid approaches, combining clustering and supervised learning paradigms, offer a promising middle ground. Studies such as Zeinalpour and Ahmed [31] and Saha et al. [50] demonstrate that ensemble feature selection approaches leveraging insights from both clustering-based and supervised methods can improve detection generalizability and robustness. For instance, in the study [31], the use of the vote classifier with clustering and wrapper-derived features achieved significant reductions in false positives, though direct quantitative comparisons between clustering-based and supervised learning wrapper methods were absent. Similarly, Saha et al. [50] explored a hybrid ensemble framework to unify feature subsets across supervised and unsupervised methodologies, improving feature robustness in DDoS detection models. Despite these advancements, research remains sparse regarding explicit analyses of FPs and FNs in hybrid or comparative evaluations. For example, Saha et al. [50] reflect on the need for an evaluation of their proposed approach using NSL-KDD and CICIDS network traffic datasets. They state that the consideration of various datasets contributes to the effective combination of feature selection and corresponding detection models.
Overall, the current body of literature identifies key strengths and weaknesses in both clustering-based and supervised learning wrapper methods for DDoS detection. Clustering-based wrappers excel in generalizing to high-dimensional or unlabeled data [44,45], while supervised learning wrappers outperform in precision metrics for labeled datasets in the studies of Bouzoubaa et al. [46] and Polat et al. [48]. Hybrid frameworks show the potential to balance these strengths [31,50], yet comprehensive comparative evaluations addressing FPs and FNs across paradigms are critically lacking. This gap motivates further investigation into how these methods perform under diverse data and attack conditions to guide the optimization of DDoS detection systems. As outlined in our literature review, current studies mainly focus on improving the performance of learning algorithms rather than taking a systematic approach in introducing robust attack detection methods.
In this study, we considered the k-means and EM clustering algorithms, which represent the not density-based clustering approach. We also use ‘MakeDensityBasedClusterer’, provided by Weka, to incorporate the algorithms, which is a density-based clustering approach. The not density-based and density-based clustering algorithms are two types of clustering approaches used in this study.
The k-means algorithm is a distance-based cluster analysis [12]. The objective of the algorithm is to minimize WCCS, known as the within-cluster sum of squares [51]. Based on the study [12], the algorithm initially selects random data points as the center points in which the values are adjusted based on calculation. According to Miniak-Górecka et al. [51], the k-means algorithm is presented below, where ‘k’ is the number of subsets, ‘xi’ is the corresponding data point belonging to a set of ‘X’ of ‘n’ data points, and C i represents the sum of clusters that include all of the data points.
W C C S = i = 1 k x j     C i x j c i 2
According to Yang et al. [52], the EM is presented below, where ‘ α k ’ is the mixing proportions with the given restriction of ‘ k = 1 c α k = 1’, ‘ f ( x i ; θ k ) ’ is the density of ‘x’ given the kth class with the corresponding parameters θ k , and Z is the missing data as belongs to ‘C’. EM is a similarity-based cluster analysis [1]. As stated by Yang et al. [52], it aims to maximize the log likelihood.
L ( α ,   θ )   = i = 1 n k = 1 c z k i ln [ α k f ( x i ; θ k ) ]

4. Data Analysis and Experimentation

4.1. Statistical Analysis Using One-Way ANOVA Considering the CICIDS2017 Dataset

We used the same data preprocessing techniques that Zeinalpour initially [12] introduced and applied in his study. The same techniques were used in the studies [1,31] to ensure the proper analysis of the CICIDS2017 network traffic dataset using clustering-based DDoS detection methods. Figure 1 below represents how the clustering-based DDoS attack detection methods are constructed to test the two hypotheses.
We used the same three data preprocessing techniques considering NSL-KDD, as shown below, which Zeinalpour initially [12] introduced and applied in his study. These preprocessing techniques, similar to their applications in the CICIDS2017 dataset, ensured that the dataset was normalized, balanced, and randomized for proper analysis by machine learning algorithms. We did not apply EM imputation on the NSL-KDD dataset. Figure 2 below represents the way in which the clustering-based DDoS attack detection methods are constructed to test and verify the methods considering the two hypotheses.
Experimental limitations are problematic in cybersecurity research studies and reflect factors or experimental circumstances that cannot be controlled. Limitations are issues for internal validity [53]. The curse of dimensionality is a limitation of this study. According to the study [31], due to assessing a large amount of network traffic data, the performance of anomaly-based DDoS attack detection methods could be reduced. To address this issue, we applied the wrapper and hybrid feature selection methods in selecting relevant attributes. According to Zeinalpour and Ahmed [31], implications reflect delimitations and assumptions. The delimitation of this study was supervised DDoS attack detection methods. Supervised learning algorithms in contrast to unsupervised can improve model performance [54]. In general, supervised DDoS attack detection methods are more robust as they use labeled data to be trained. The assumption of this study was that the results from analyzing the CICIDS2017 dataset are reflective of the real world in performance when identifying DDoS attacks. Frequent and known network protocols were used to generate the dataset [55]. We also used NSL-KDD to further confirm the statistical testing of hypotheses. To ensure that our experimentation does not introduce any bias, we considered internal, predictive, conclusion, and external validities. We used Weka workbench to guarantee internal validity. This tool has a modular architecture and supports the entire process of data mining experimentation [56]. To guarantee predictive and conclusion validities, we applied a ten-fold cross-validation method, and we used the entire CICIDS2017 to ensure external validity. For further verification, we also used the full training dataset of NSL-KDD.
We conducted one-way ANOVA analyses for assessing the effectiveness of DDoS attack detection methods. One-way ANOVA uses a factor variable in specifying the types of groups or levels, and a dependent variable that measures each level on a quantitative dimension [57]. This allowed us to reflect on the two research questions for testing the corresponding hypotheses. The ‘FS’ denotes feature selection when naming the tables. The first research question was whether incorporating clustering-based wrapper methods differs in effectiveness as opposed to supervised learning wrapper methods in the clustering-based detection of DDoS attacks. We considered the one-way ANOVA F-test. The outcomes of the first experiment are shown in Table 1. The results show that the test was significant with F(1, 26) = 10.55 and p = 0.003. The p-value is represented under the “Sig” column. The p-value was less than 0.05, leading us to reject the null hypothesis. The null hypothesis was that there is no difference in effectiveness when incorporating clustering-based wrapper methods against supervised learning wrapper methods in the clustering-based detection of DDoS attacks. The η2, represented by the “Partial Eta Squared” column with the value of 0.29 shows a strong relationship among the DDoS attack detection methods when incorporating clustering-based wrapper methods in contrast to supervised learning wrapper methods.
The second research question was whether incorporating clustering-based hybrid feature selection methods differs in effectiveness as opposed to supervised learning hybrid feature selection methods in the clustering-based detection of DDoS attacks. We considered the one-way ANOVA F-test. The outcomes are shown in Table 2. The results show that the test was significant, with F(1, 54) = 10.04 and p = 0.003. The p-value is represented under the “Sig” column. The p-value was less than 0.05 in leading us to reject the null hypothesis. The null hypothesis was that there is no difference in effectiveness when incorporating clustering-based hybrid feature selection methods against supervised learning hybrid feature selection methods in the clustering-based detection of DDoS attacks. The η2, represented by the “Partial Eta Squared” column with the value of 0.16 shows a strong relationship among the DDoS attack detection methods when incorporating clustering-based hybrid feature selection methods against supervised learning hybrid feature selection methods.

4.2. Statistical Analysis Using One-Way ANOVA Considering NSL-KDD Dataset

To further verify our first hypothesis, we applied the one-way ANOVA F-test considering the NSL-KDD dataset. The outcomes of the experiment are shown in Table 3. The results show that the test was significant, with F(1, 26) = 12.77 and p = 0.001. The p-value is represented under the “Sig” column. The p-value was less than 0.05, leading us to reject the null hypothesis using the NSL-KDD dataset. The null hypothesis was that there is no difference in effectiveness when incorporating clustering-based wrapper methods against supervised learning wrapper methods in the clustering-based detection of DDoS attacks. The η2, represented by the “Partial Eta Squared” column with the value of 0.33 shows a strong relationship among the DDoS attack detection methods when incorporating clustering-based wrapper methods in contrast to supervised learning wrapper methods.
To further verify the second hypothesis, we applied the one-way ANOVA F-test using the NSL-KDD dataset. The outcomes are shown in Table 4. The results show that the test was significant, with F(1, 54) = 15.51 and p = 0.001. The p-value is represented under the “Sig” column. The p-value was less than 0.05, leading us to reject the null hypothesis considering the NSL-KDD dataset. The null hypothesis was that there is no difference in effectiveness when incorporating clustering-based hybrid feature selection methods against supervised learning hybrid feature selection methods in the clustering-based detection of DDoS attacks. The η2, represented by the “Partial Eta Squared” column with the value of 0.22 shows a strong relationship among the DDoS attack detection methods when incorporating clustering-based hybrid feature selection methods against supervised learning hybrid feature selection methods.

4.3. Comparison Analysis Based on Descriptive Statistics Using CICIDS2017 Dataset

The mean results of the descriptive statistics presented in Table 5 correspond to the first experiment. The table shows that incorporating clustering-based wrapper methods was more effective than the supervised learning wrapper methods in the clustering-based detection of DDoS attacks. This is in accordance with the mean value of 0.17 for clustering-based wrapper methods against the mean value of 0.40 for supervised learning wrapper methods when constructing clustering-based DDoS attack detection methods.
The mean results of the descriptive statistics presented in Table 6 are related to the second experiment. The table shows that incorporating clustering-based hybrid feature selection methods was more effective than supervised learning hybrid feature selection methods in the clustering-based detection of DDoS attacks. This is in accordance with the mean value of 0.18 for clustering-based hybrid feature selection methods against the mean value of 0.35 for supervised learning hybrid feature selection methods when constructing clustering-based DDoS attack detection methods.
The mean results of the descriptive statistics presented in Table 7 correspond to the first experiment. The table shows that incorporating clustering-based wrapper methods was more effective than supervised learning wrapper methods in the clustering-based detection of DDoS attacks. This is in accordance with the mean value of 0.05 for clustering-based wrapper methods against the mean value of 0.13 for supervised learning wrapper methods when constructing clustering-based DDoS attack detection methods.

4.4. Comparison Analysis Based on Descriptive Statistics Using NSL-KDD Dataset

The mean results of the descriptive statistics presented in Table 8 are with respect to the second experiment. The table shows that incorporating clustering-based hybrid feature selection methods was more effective than supervised learning hybrid feature selection methods in the clustering-based detection of DDoS attacks. This is in accordance with the mean value of 0.04 for clustering-based hybrid feature selection methods against the mean value of 0.09 for supervised learning hybrid feature selection methods when constructing clustering-based DDoS attack detection methods.

5. Discussion

This study conducted a comparative analysis of feature selection methods in clustering-based DDoS attack detection methods. Service availability is crucial for computer networks and machine learning algorithms offer promising tactics to counter DDoS attacks [2]. Arango-López et al. [58] mention that modern DDoS attacks are launched through a combination of methods, concurrently making detection challenging. Identifying these attacks can be difficult before encountering them [59]. The nature of network traffic data is dynamic and with the presence of multiple attack protocols, there is a necessity for robust defense mechanisms [14]. Rishkhan et al. [4] claim that machine leaning-based intrusion detection methods are one crucial strategy in preventing information security attacks. This type of attack poses major challenges to organizational networks. Some of these challenges are service interruptions, the exposure of network vulnerabilities to hackers, increases in risk of data loss and data theft, and similar others. Initially, Zeinalpour [12] examined the application of the wrapper method and the hybrid feature selection prior to clustering algorithms in DDoS attack detection. The hybrid feature selection approach used the filter method preceded by the wrapper method. He could not verify which approach was more effective. Nevertheless, the study [12] found that the addition of the hybrid approach that incorporated ChiSquared and NaïveBayes was more effective. It had the lowest false positive rate of 0.013. The study [31] took the investigation further by conducting one-way ANOVA analyses, comparing the addition of the wrapper method and hybrid approach prior to clustering algorithms using the vote classifier method. The results of the descriptive analyses from one-way ANOVA showed that the addition of the wrapper method introduced more effectiveness than just applying the filter method. Likewise, the results of the study [31] showed that the addition of the filter method prior to the wrapper method that incorporated ChiSquared and J48 (a decision tree classifier) was more effective. The incorporation of such an approach in selecting features produced the lowest false positive rate of 0.012. In a similar endeavor, the one-way ANOVA results in the study [1] showed that the BestFirst search method outperformed metaheuristic search techniques when using the wrapper method. The lowest obtained false positive rate in the study [1] was when Information Gain and the k-means clustering algorithm were applied in the filter and wrapper methods accordingly prior to the clustering algorithms. This method was able to obtain a false positive rate of 0.000.
Density-based clustering is shown to be promising in network intrusion detection models. Kaliyaperumal et al. [60] examined the performance of only DBSCAN, which is a density-based clustering algorithm using the CICIDS2018 network traffic dataset. The obtained specificity was 0.9752. This is equivalent to a 0.0248 false positive rate. When Kaliyaperumal et al. [60] proposed a novel way to use DBSCAN, and assessed it using the CICIDS2017 dataset, the obtained specificity was 0.9806. This is equivalent to a 0.0194 false positive rate. The same proposed approach applied by Kaliyaperumal et al. [60] on CICIDS2018 had the specificity of 0.9814, which is 0.0186 in false positive rate. When Emadi and Mazinani [61] evaluated the performance of DBSCAN as a density-based approach, they could achieve the highest accuracy of 95.5%. This was lower than the accuracy of 0.9888, which was obtained as the highest accuracy by Kaliyaperumal et al. [60] with the specificity of 0.9814.
However, feature selection to address the efficacy and effectiveness of network intrusion detection methods is extremely important. In this research study, we analyzed two variations of the clustering-based detection method, e.g., not density-based clustering and density-based clustering. We applied the wrapper method and the hybrid approach (filter–wrapper) and compared the performance of DDoS attack detection methods. We examined whether incorporating supervised learning against a clustering approach that included not density-based and density-based approaches in the wrapper method would impact the performance of the detection methods. Given the results of the two experiments, we found that incorporating the clustering approach in the wrapper method had a greater impact on the performance of clustering-based DDoS attack detection methods in terms of lowering the false positive rates. The one-way ANOVA analyses show statistical significance in that regard.
We were also able to obtain a false positive rate of 0.000 in several cases. The first case was when we incorporated a density clustering-based wrapper method using SimpleKMeans in not density-based clustering, using SimpleKMeans to identify DDoS attacks. The second two cases were when we applied a not density-clustering-based hybrid method using Information Gain and SimpleKMeans for feature selection. They occurred when we used density-based and not density-based clustering using EM in attack detection. The fourth case was when we applied a density-clustering-based hybrid method using ChiSquared and SimpleKMeans for not density-based clustering using SimpleKMeans in DDoS attack detection. The fifth case was when we incorporated a density-clustering-based-hybrid method using Information Gain and EM prior to not density-based clustering using EM. The sixth and final case was when we applied a density-clustering-based hybrid method using Information Gain and SimpleKMeans prior to not density-based clustering using SimpleKMeans in DDoS attack detection. In all of the considered cases of applying supervised learning algorithms in the wrapper method, the DDoS attack detection methods were not able to obtain a false positive rate of 0.000. In general, the clustering algorithms are effective techniques that categorize data using the centroid or the mean of a data point. In this case, similar network traffic data points that are considered normal are in one group, while similar data points that are of DDoS attack events are categorized in another group.
Overfitting is a big problem of machine learning algorithms when learning models are (overly) well trained to perform well on a set of network traffic events and are not readily generalizable. Also, data quality is a major concern. We used min–max normalization to facilitate the construction of an accurate learning model and EM imputation method for fixing missing values. We also applied SpreadSubSample and a randomization approach to prevent the bias introduced by the datasets during predictive analysis. Consequently, the results that we achieved are from using the generalization approach of a ten-fold cross-validation method to ensure the validity and accuracy of results.

Analysis of Mechanism Effectiveness

In this study, we gathered false positive rates in relation to the categorization or classification of only DDoS events. Our clustering-based wrapper method demonstrates superior performance primarily through better feature space representation. Clustering algorithms maintain a better representation of network traffic patterns by preserving cluster separability (measured using the silhouette coefficient), while supervised approaches focus narrowly on class discrimination.
This fundamental difference explains the significantly lower false positive rates achieved using clustering-based methods. Specifically, clustering methods excel at modeling the inherent structure of network traffic patterns rather than making binary classifications, which proves particularly effective when attack patterns form distinct clusters in the feature space but overlap with normal traffic in individual feature dimensions. The CICIDS2017 dataset initially contained 78 features (after removing duplicates), while NSL-KDD contained 41 features. In the NSL-KDD dataset, the best performance was obtained using thirteen features, with the best performance of 0.003 FPR. Our top-performing configurations for the CICIDS2017 dataset typically retained a spectrum of features that included three, five, six, and fifteen features, with the best performance of 0.000 FPR. These optimal features were able to facilitate sufficient discriminative power while avoiding the curse of dimensionality. This analysis demonstrates the effectiveness of our feature selection approaches, with respect to the highest performance, in balancing model complexity and accuracy, a critical consideration for real-time DDoS detection systems where computational efficiency is important. Appendix D represents the tables for the selected features by the applied clustering techniques in the wrapper method that led to the highest performance of 0.000 in the false positive rate using the CICIDS2017 dataset and the highest performance of 0.003 in the false positive rate using the NSL-KDD.

6. Conclusions

This study contributes to improving DDoS attack detection methods by assessing the incorporation of the most effective combinations of feature selection that considered supervised learning and clustering algorithms. In this respect, various organizations can implement DDoS attack detection methods that are more likely to have a high performance in countering attacks. In today’s modernization of internet communication, it is essential to have the best countermeasures against this type of attack. With constant modernization, the complexity of network traffic data analysis increases. Therefore, feature selection is extremely important in intrusion detection methods.
In this study, we compared the results obtained from supervised learning with clustering approaches that included not density-based and density-based approaches in the wrapper method. The comparative analyses were based on obtained false positive rates in DDoS attack detection methods. The outcomes of one-way ANOVA analyses showed that the wrapper method performs more effectively using clustering algorithms for feature selection than supervised learning. DDoS attack detection methods that apply clustering algorithms suffer from the curse of dimensionality due to high network traffic data dimensionality [12]. Therefore, proceeding with the appropriate feature selection processes is essential. Analysis of the large volume of data from having these detection models to counter the attacks is problematic [1]. As a result, the categorical investigation of clustering-based detection models remains an important research stream. With respect to the outcomes of our research study, we found that clustering algorithms were effective in clustering-based DDoS attack detection methods. Given the importance of the categorical investigation of feature selection due to the need for proper analysis of network traffic data, future studies can take the findings of this research study further. For example, the considered feature selection methods can further be evaluated with other State-of-the-Art detection methods such as deep learning approaches or other approaches. In this study, we found that clustering-based feature selection is more effective for clustering-based DDoS attack detection methods. A future study could examine whether the effectiveness of clustering-based feature selection, realized through statistical analyses in this study, is consistent with other State-of-the-Art detection methods or not. Ensuring the robustness of DDoS attack detection methods is important. The dynamic nature of attacks, along with the use of multiple attack protocols, necessitate the robustness of defense mechanisms [14]. This can ensure that DDoS attack detection methods, through the consideration of different machine learning frameworks, are able to deal with the complexity of network traffic data throughout internet communication.

Author Contributions

Conceptualization, A.Z. and C.P.M.; methodology, A.Z.; validation, A.Z.; original draft, A.Z. and C.P.M.; writing-review and editing C.P.M. All authors have read and agreed to the published version of the manuscript.

Funding

The Authors received no external funding.

Data Availability Statement

The authors of this study used the CICIDS2017 and NSL-KDD datasets. The CICIDS2017 dataset is publicly available at https://www.unb.ca/cic/datasets/ids-2017.html, accessed on 3 March 2024. The NSL-KDD dataset is publicly available at https://web.archive.org/web/20150205070216/http://nsl.cs.unb.ca/NSL-KDD/, accessed on 20 April 2025.

Conflicts of Interest

The authors of this study declare no conflicts of interest. The authors of this research guided the study with no sponsorship.

Appendix A. Independent Variables Table

Table A1. Independent Variables Table.
Table A1. Independent Variables Table.
Independent VariablesProcedures
Clustering Based DDoS Detection MethodNot-Density-Clustering of EM
Not-Density-Clustering of SimpleKMeans
MakeDensityBasedClusterer(EM)
MakeDensityBasedClusterer(SimpleKMeans)
Clustering-Based-Wrapper MethodWrapperSubsetEval(Not-Density-Based-Clustering)
WrapperSubsetEval(Density-Based-Clustering)
Supervised-Learning-Wrapper MethodWrapperSubsetEval(J48)
WrapperSubsetEval(DecisionTable)
WrapperSubsetEval(NaïveBayes)
Clustering-Based-Hybrid-Feature-Selection MethodInformationGainAttributeEval + WrapperSubsetEval(Not-Density-Based-Clustering)
ChiSquaredAttributeEval + WrapperSubsetEval(Not-Density-Based-Clustering)
InformationGainAttributeEval + WrapperSubsetEval(Density-Based-Clustering)
ChiSquaredAttributeEval + WrapperSubsetEval(Density-Based-Clustering)
Supervised-Learning-Hybrid-Feature-Selection MethodInformationGainAttributeEval + WrapperSubsetEval(J48)
InformationGainAttributeEval + WrapperSubsetEval(DecisionTable)
InformationGainAttributeEval + WrapperSubsetEval(NaïveBayes)
ChiSquaredAttributeEval + WrapperSubsetEval(J48)
ChiSquaredAttributeEval + WrapperSubsetEval(DecisionTable)
ChiSquaredAttributeEval + WrapperSubsetEval(NaïveBayes)

Appendix B. Experimental Results Using CICIDS2017 Dataset

Table A2. FPR Table using Clustering Feature Selection.
Table A2. FPR Table using Clustering Feature Selection.
Applied Clustering-Based Feature SelectionApplied Clustering Methods in DDoS Attack DetectionFalse Positive Rates
Not-Density-Clustering-based-Wrapper method using EMNot-Density-based Clustering using EM0.002
Not-Density-Clustering-based-Wrapper method using EMDensity-based Clustering using EM0.027
Not-Density-Clustering-based-Wrapper method using EMNot-Density-based Clustering using SimpleKMeans0.216
Not-Density-Clustering-based-Wrapper method using EMDensity-based Clustering using SimpleKMeans0.282
Not-Density-Clustering-based-Wrapper method using SimpleKMeansNot-Density-based Clustering using EM0.086
Not-Density-Clustering-based-Wrapper method using SimpleKMeansDensity-based Clustering using EM0.121
Not-Density-Clustering-based-Wrapper method using SimpleKMeansNot-Density-based Clustering using SimpleKMeans0.005
Not-Density-Clustering-based-Wrapper method using SimpleKMeansDensity-based Clustering using SimpleKMeans0.083
Density-Clustering-based-Wrapper method using EMNot-Density-based Clustering using EM0.004
Density-Clustering-based-Wrapper method using EMDensity-based Clustering using EM0.008
Density-Clustering-based-Wrapper method using EMNot-Density-based Clustering using SimpleKMeans0.299
Density-Clustering-based-Wrapper method using EMDensity-based Clustering using SimpleKMeans0.332
Density-Clustering-based-Wrapper method using SimpleKMeansNot-Density-based Clustering using EM0.636
Density-Clustering-based-Wrapper method using SimpleKMeansDensity-based Clustering using EM0.636
Density-Clustering-based-Wrapper method using SimpleKMeansNot-Density-based Clustering using SimpleKMeans0.000
Density-Clustering-based-Wrapper method using SimpleKMeansDensity-based Clustering using SimpleKMeans0.011
Table A3. FPR Table using Clustering Method in Hybrid Feature Selection.
Table A3. FPR Table using Clustering Method in Hybrid Feature Selection.
Applied Clustering-Based Hybrid Feature SelectionApplied Clustering Methods in DDoS Attack DetectionFalse Positive Rates
Not-Density-Clustering-based-Hybrid method using ChiSquared and EMNot-Density-based Clustering using EM0.002
Not-Density-Clustering-based-Hybrid method using ChiSquared and EMDensity-based Clustering using EM0.027
Not-Density-Clustering-based-Hybrid method using ChiSquared and EMNot-Density-based Clustering using SimpleKMeans0.216
Not-Density-Clustering-based-Hybrid method using ChiSquared and EMDensity-based Clustering using SimpleKMeans0.282
Not-Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeansNot-Density-based Clustering using EM0.006
Not-Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeansDensity-based Clustering using EM0.003
Not-Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeansNot-Density-based Clustering using SimpleKMeans0.102
Not-Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeansDensity-based Clustering using SimpleKMeans0.098
Not-Density-Clustering-based-Hybrid method using Information Gain and EMNot-Density-based Clustering using EM0.290
Not-Density-Clustering-based-Hybrid method using Information Gain and EMDensity-based Clustering using EM0.290
Not-Density-Clustering-based-Hybrid method using Information Gain and EMNot-Density-based Clustering using SimpleKMeans0.267
Not-Density-Clustering-based-Hybrid method using Information Gain and EMDensity-based Clustering using SimpleKMeans0.260
Not-Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeansNot-Density-based Clustering using EM0.000
Not-Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeansDensity-based Clustering using EM0.000
Not-Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeansNot-Density-based Clustering using SimpleKMeans0.006
Not-Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeansDensity-based Clustering using SimpleKMeans0.033
Density-Clustering-based-Hybrid method using ChiSquared and EMNot-Density-based Clustering using EM0.003
Density-Clustering-based-Hybrid method using ChiSquared and EMDensity-based Clustering using EM0.008
Density-Clustering-based-Hybrid method using ChiSquared and EMNot-Density-based Clustering using SimpleKMeans0.309
Density-Clustering-based-Hybrid method using ChiSquared and EMDensity-based Clustering using SimpleKMeans0.331
Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeansNot-Density-based Clustering using EM0.636
Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeansDensity-based Clustering using EM0.636
Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeansNot-Density-based Clustering using SimpleKMeans0.000
Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeansDensity-based Clustering using SimpleKMeans0.011
Density-Clustering-based-Hybrid method using Information Gain and EMNot-Density-based Clustering using EM0.000
Density-Clustering-based-Hybrid method using Information Gain and EMDensity-based Clustering using EM0.009
Density-Clustering-based-Hybrid method using Information Gain and EMNot-Density-based Clustering using SimpleKMeans0.359
Density-Clustering-based-Hybrid method using Information Gain and EMDensity-based Clustering using SimpleKMeans0.343
Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeansNot-Density-based Clustering using EM0.626
Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeansDensity-based Clustering using EM0.625
Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeansNot-Density-based Clustering using SimpleKMeans0.000
Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeansDensity-based Clustering using SimpleKMeans0.064
Table A4. FPR Table using Supervised Learning in Wrapper Feature Selection.
Table A4. FPR Table using Supervised Learning in Wrapper Feature Selection.
Applied Clustering-Based Feature SelectionApplied Clustering Methods in DDoS Attack DetectionFalse Positive Rates
Supervised-Learning-Wrapper method using NaïveBayesNot-Density-based Clustering using EM0.340
Supervised-Learning-Wrapper method using NaïveBayesDensity-based Clustering using EM0.344
Supervised-Learning-Wrapper method using NaïveBayesNot-Density-based Clustering using SimpleKMeans0.200
Supervised-Learning-Wrapper method using NaïveBayesDensity-based Clustering using SimpleKMeans0.209
Supervised-Learning-Wrapper method using J48Not-Density-based Clustering using EM0.381
Supervised-Learning-Wrapper method using J48Density-based Clustering using EM0.380
Supervised-Learning-Wrapper method using J48Not-Density-based Clustering using SimpleKMeans0.511
Supervised-Learning-Wrapper method using J48Density-based Clustering using SimpleKMeans0.490
Supervised-Learning-Wrapper method using DecisionTableNot-Density-based Clustering using EM0.356
Supervised-Learning-Wrapper method using DecisionTableDensity-based Clustering using EM0.356
Supervised-Learning-Wrapper method using DecisionTableNot-Density-based Clustering using SimpleKMeans0.674
Supervised-Learning-Wrapper method using DecisionTableDensity-based Clustering using SimpleKMeans0.638
Table A5. FPR Table using Supervised Learning in Hybrid Feature Selection.
Table A5. FPR Table using Supervised Learning in Hybrid Feature Selection.
Applied Clustering-Based Hybrid Feature SelectionApplied Clustering Methods in DDoS Attack DetectionFalse Positive Rates
Hybrid Feature Selection using ChiSquared and NaïveBayesNot-Density-based Clustering using EM0.340
Hybrid Feature Selection using ChiSquared and NaïveBayesDensity-based Clustering using EM0.344
Hybrid Feature Selection using ChiSquared and NaïveBayesNot-Density-based Clustering using SimpleKMeans0.200
Hybrid Feature Selection using ChiSquared and NaïveBayesDensity-based Clustering using SimpleKMeans0.209
Hybrid Feature Selection using Information Gain and NaïveBayesNot-Density-based Clustering using EM0.001
Hybrid Feature Selection using Information Gain and NaïveBayesDensity-based Clustering using EM0.001
Hybrid Feature Selection using Information Gain and NaïveBayesNot-Density-based Clustering using SimpleKMeans0.199
Hybrid Feature Selection using Information Gain and NaïveBayesDensity-based Clustering using SimpleKMeans0.198
Hybrid Feature Selection using ChiSquared and J48Not-Density-based Clustering using EM0.392
Hybrid Feature Selection using ChiSquared and J48Density-based Clustering using EM0.391
Hybrid Feature Selection using ChiSquared and J48Not-Density-based Clustering using SimpleKMeans0.373
Hybrid Feature Selection using ChiSquared and J48Density-based Clustering using SimpleKMeans0.367
Hybrid Feature Selection using Information Gain and J48Not-Density-based Clustering using EM0.326
Hybrid Feature Selection using Information Gain and J48Density-based Clustering using EM0.326
Hybrid Feature Selection using Information Gain and J48Not-Density-based Clustering using SimpleKMeans0.372
Hybrid Feature Selection using Information Gain and J48Density-based Clustering using SimpleKMeans0.369
Hybrid Feature Selection using ChiSquared and DecisionTableNot-Density-based Clustering using EM0.362
Hybrid Feature Selection using ChiSquared and DecisionTableDensity-based Clustering using EM0.362
Hybrid Feature Selection using ChiSquared and DecisionTableNot-Density-based Clustering using SimpleKMeans0.674
Hybrid Feature Selection using ChiSquared and DecisionTableDensity-based Clustering using SimpleKMeans0.638
Hybrid Feature Selection using InformationGain and DecisionTableNot-Density-based Clustering using EM0.362
Hybrid Feature Selection using InformationGain and DecisionTableDensity-based Clustering using EM0.362
Hybrid Feature Selection using InformationGain and DecisionTableNot-Density-based Clustering using SimpleKMeans0.674
Hybrid Feature Selection using InformationGain and DecisionTableDensity-based Clustering using SimpleKMeans0.638

Appendix C. Experimental Results Using NSL-KDD Dataset

Table A6. FPR Table using Clustering Feature Selection.
Table A6. FPR Table using Clustering Feature Selection.
Applied Clustering-Based Feature SelectionApplied Clustering Methods in DDoS Attack DetectionFalse Positive Rates
Not-Density-Clustering-based-Wrapper method using EMNot-Density-based Clustering using EM0.017
Not-Density-Clustering-based-Wrapper method using EMDensity-based Clustering using EM0.031
Not-Density-Clustering-based-Wrapper method using EMNot-Density-based Clustering using SimpleKMeans0.090
Not-Density-Clustering-based-Wrapper method using EMDensity-based Clustering using SimpleKMeans0.093
Not-Density-Clustering-based-Wrapper method using SimpleKMeansNot-Density-based Clustering using EM0.045
Not-Density-Clustering-based-Wrapper method using SimpleKMeansDensity-based Clustering using EM0.046
Not-Density-Clustering-based-Wrapper method using SimpleKMeansNot-Density-based Clustering using SimpleKMeans0.039
Not-Density-Clustering-based-Wrapper method using SimpleKMeansDensity-based Clustering using SimpleKMeans0.042
Density-Clustering-based-Wrapper method using EMNot-Density-based Clustering using EM0.063
Density-Clustering-based-Wrapper method using EMDensity-based Clustering using EM0.032
Density-Clustering-based-Wrapper method using EMNot-Density-based Clustering using SimpleKMeans0.045
Density-Clustering-based-Wrapper method using EMDensity-based Clustering using SimpleKMeans0.053
Density-Clustering-based-Wrapper method using SimpleKMeansNot-Density-based Clustering using EM0.058
Density-Clustering-based-Wrapper method using SimpleKMeansDensity-based Clustering using EM0.068
Density-Clustering-based-Wrapper method using SimpleKMeansNot-Density-based Clustering using SimpleKMeans0.003
Density-Clustering-based-Wrapper method using SimpleKMeansDensity-based Clustering using SimpleKMeans0.033
Table A7. FPR Table using Clustering Method in Hybrid Feature Selection.
Table A7. FPR Table using Clustering Method in Hybrid Feature Selection.
Applied Clustering-Based Hybrid Feature SelectionApplied Clustering Methods in DDoS Attack DetectionFalse Positive Rates
Not-Density-Clustering-based-Hybrid method using ChiSquared and EMNot-Density-based Clustering using EM0.021
Not-Density-Clustering-based-Hybrid method using ChiSquared and EMDensity-based Clustering using EM0.040
Not-Density-Clustering-based-Hybrid method using ChiSquared and EMNot-Density-based Clustering using SimpleKMeans0.011
Not-Density-Clustering-based-Hybrid method using ChiSquared and EMDensity-based Clustering using SimpleKMeans0.028
Not-Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeansNot-Density-based Clustering using EM0.045
Not-Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeansDensity-based Clustering using EM0.046
Not-Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeansNot-Density-based Clustering using SimpleKMeans0.039
Not-Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeansDensity-based Clustering using SimpleKMeans0.042
Not-Density-Clustering-based-Hybrid method using Information Gain and EMNot-Density-based Clustering using EM0.059
Not-Density-Clustering-based-Hybrid method using Information Gain and EMDensity-based Clustering using EM0.026
Not-Density-Clustering-based-Hybrid method using Information Gain and EMNot-Density-based Clustering using SimpleKMeans0.043
Not-Density-Clustering-based-Hybrid method using Information Gain and EMDensity-based Clustering using SimpleKMeans0.046
Not-Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeansNot-Density-based Clustering using EM0.068
Not-Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeansDensity-based Clustering using EM0.044
Not-Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeansNot-Density-based Clustering using SimpleKMeans0.006
Not-Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeansDensity-based Clustering using SimpleKMeans0.040
Density-Clustering-based-Hybrid method using ChiSquared and EMNot-Density-based Clustering using EM0.063
Density-Clustering-based-Hybrid method using ChiSquared and EMDensity-based Clustering using EM0.032
Density-Clustering-based-Hybrid method using ChiSquared and EMNot-Density-based Clustering using SimpleKMeans0.045
Density-Clustering-based-Hybrid method using ChiSquared and EMDensity-based Clustering using SimpleKMeans0.053
Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeansNot-Density-based Clustering using EM0.058
Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeansDensity-based Clustering using EM0.068
Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeansNot-Density-based Clustering using SimpleKMeans0.003
Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeansDensity-based Clustering using SimpleKMeans0.033
Density-Clustering-based-Hybrid method using Information Gain and EMNot-Density-based Clustering using EM0.059
Density-Clustering-based-Hybrid method using Information Gain and EMDensity-based Clustering using EM0.012
Density-Clustering-based-Hybrid method using Information Gain and EMNot-Density-based Clustering using SimpleKMeans0.006
Density-Clustering-based-Hybrid method using Information Gain and EMDensity-based Clustering using SimpleKMeans0.016
Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeansNot-Density-based Clustering using EM0.059
Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeansDensity-based Clustering using EM0.012
Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeansNot-Density-based Clustering using SimpleKMeans0.006
Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeansDensity-based Clustering using SimpleKMeans0.016
Table A8. FPR Table using Supervised Learning in Wrapper Feature Selection.
Table A8. FPR Table using Supervised Learning in Wrapper Feature Selection.
Applied Clustering-Based Feature SelectionApplied Clustering Methods in DDoS Attack DetectionFalse Positive Rates
Supervised-Learning-Wrapper method using NaïveBayesNot-Density-based Clustering using EM0.237
Supervised-Learning-Wrapper method using NaïveBayesDensity-based Clustering using EM0.255
Supervised-Learning-Wrapper method using NaïveBayesNot-Density-based Clustering using SimpleKMeans0.185
Supervised-Learning-Wrapper method using NaïveBayesDensity-based Clustering using SimpleKMeans0.228
Supervised-Learning-Wrapper method using J48Not-Density-based Clustering using EM0.091
Supervised-Learning-Wrapper method using J48Density-based Clustering using EM0.091
Supervised-Learning-Wrapper method using J48Not-Density-based Clustering using SimpleKMeans0.007
Supervised-Learning-Wrapper method using J48Density-based Clustering using SimpleKMeans0.035
Supervised-Learning-Wrapper method using DecisionTableNot-Density-based Clustering using EM0.142
Supervised-Learning-Wrapper method using DecisionTableDensity-based Clustering using EM0.155
Supervised-Learning-Wrapper method using DecisionTableNot-Density-based Clustering using SimpleKMeans0.008
Supervised-Learning-Wrapper method using DecisionTableDensity-based Clustering using SimpleKMeans0.106
Table A9. FPR Table using Supervised Learning in Hybrid Feature Selection.
Table A9. FPR Table using Supervised Learning in Hybrid Feature Selection.
Applied Clustering-Based Hybrid Feature SelectionApplied Clustering Methods in DDoS Attack DetectionFalse Positive Rates
Hybrid Feature Selection using ChiSquared and NaïveBayesNot-Density-based Clustering using EM0.237
Hybrid Feature Selection using ChiSquared and NaïveBayesDensity-based Clustering using EM0.255
Hybrid Feature Selection using ChiSquared and NaïveBayesNot-Density-based Clustering using SimpleKMeans0.185
Hybrid Feature Selection using ChiSquared and NaïveBayesDensity-based Clustering using SimpleKMeans0.228
Hybrid Feature Selection using Information Gain and NaïveBayesNot-Density-based Clustering using EM0.012
Hybrid Feature Selection using Information Gain and NaïveBayesDensity-based Clustering using EM0.012
Hybrid Feature Selection using Information Gain and NaïveBayesNot-Density-based Clustering using SimpleKMeans0.170
Hybrid Feature Selection using Information Gain and NaïveBayesDensity-based Clustering using SimpleKMeans0.172
Hybrid Feature Selection using ChiSquared and J48Not-Density-based Clustering using EM0.095
Hybrid Feature Selection using ChiSquared and J48Density-based Clustering using EM0.089
Hybrid Feature Selection using ChiSquared and J48Not-Density-based Clustering using SimpleKMeans0.007
Hybrid Feature Selection using ChiSquared and J48Density-based Clustering using SimpleKMeans0.036
Hybrid Feature Selection using Information Gain and J48Not-Density-based Clustering using EM0.059
Hybrid Feature Selection using Information Gain and J48Density-based Clustering using EM0.068
Hybrid Feature Selection using Information Gain and J48Not-Density-based Clustering using SimpleKMeans0.007
Hybrid Feature Selection using Information Gain and J48Density-based Clustering using SimpleKMeans0.044
Hybrid Feature Selection using ChiSquared and DecisionTableNot-Density-based Clustering using EM0.142
Hybrid Feature Selection using ChiSquared and DecisionTableDensity-based Clustering using EM0.155
Hybrid Feature Selection using ChiSquared and DecisionTableNot-Density-based Clustering using SimpleKMeans0.008
Hybrid Feature Selection using ChiSquared and DecisionTableDensity-based Clustering using SimpleKMeans0.106
Hybrid Feature Selection using InformationGain and DecisionTableNot-Density-based Clustering using EM0.068
Hybrid Feature Selection using InformationGain and DecisionTableDensity-based Clustering using EM0.044
Hybrid Feature Selection using InformationGain and DecisionTableNot-Density-based Clustering using SimpleKMeans0.013
Hybrid Feature Selection using InformationGain and DecisionTableDensity-based Clustering using SimpleKMeans0.036

Appendix D. Selected Features with Best Performance

Table A10. Selected Features with Best Performance using CICIDS2017.
Table A10. Selected Features with Best Performance using CICIDS2017.
Applied Feature Selection MethodsSelected Features
Density-Clustering-based-Wrapper method using SimpleKMeansTotal Length of Fwd Packets
Bwd Packet Length Std
Flow IAT Std
Fwd IAT Mean
act_data_pkt_fwd
Not-Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeansTotal Length of Fwd Packets
Subflow Fwd Bytes
Avg Bwd Segment Size
Fwd IAT Mean
Fwd IAT Std
Bwd Packet Length Std
Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeansSubflow Fwd Bytes
Fwd IAT Mean
act_data_pkt_fwd
Bwd Packet Length Std
Flow IAT Std
Density-Clustering-based-Hybrid method using Information Gain and EMTotal Length of Fwd Packets
Subflow Fwd Bytes
Avg Bwd Segment Size
Destination Port
Bwd Packet Length Max
Avg Fwd Segment Size
Fwd Packet Length Mean
Init_Win_bytes_forward
Fwd IAT Max
Fwd IAT Mean
Init_Win_bytes_backward
Subflow Fwd Packets
Total Fwd Packets
Fwd IAT Std
Packet Length Variance
Density-Clustering-based-Hybrid method using Information Gain and SimpleKMeansTotal Length of Fwd Packets
act_data_pkt_fwd
Bwd Packet Length Std
Table A11. Selected Features with Best Performance using NSL-KDD.
Table A11. Selected Features with Best Performance using NSL-KDD.
Applied Feature Selection MethodsSelected Features
Density-Clustering-based-Wrapper method using SimpleKMeansduration
service
flag
hot
su_attempted
num_shells
count
srv_serror_rate
same_srv_rate
dst_host_count
dst_host_srv_count
dst_host_diff_srv_rate
dst_host_srv_rerror_rate
Density-Clustering-based-Hybrid method using ChiSquared and SimpleKMeansservice
flag
same_srv_rate
dst_host_srv_count
dst_host_diff_srv_rate
count
srv_serror_rate
dst_host_count
dst_host_srv_rerror_rate
duration
hot
su_attempted
num_shells

References

  1. Zeinalpour, A.; McElroy, C.P. Comparing metaheuristic search techniques in addressing the effectiveness of clustering-based DDoS attack detection methods. Electronics 2024, 13, 899. [Google Scholar] [CrossRef]
  2. Najafimehr, M.; Zarifzadeh, S.; Mostafavi, S. DDoS attacks and machine-learning-based detection methods: A survey and taxonomy. Eng. Rep. 2023, 5, e12697. [Google Scholar] [CrossRef]
  3. Das, S.; Ashrafuzzaman, M.; Sheldon, F.T.; Shiva, S. Ensembling supervised and unsupervised machine learning algorithms for detecting distributed denial of service attacks. Algorithms 2024, 17, 99. [Google Scholar] [CrossRef]
  4. Riskhan, B.; Safuan, H.A.J.; Hussain, K.; Elnour, A.A.H.; Abdelmaboud, A.; Khan, F.; Kundi, M. An adaptive distributed denial of service attack prevention technique in a distributed environment. Sensors 2023, 23, 6574. [Google Scholar] [CrossRef]
  5. Prasad, A.; Chandra, S. VMFCVD: An optimized framework to combat volumetric ddos attacks using machine learning. Arab. J. Sci. Eng. 2022, 47, 9965–9983. [Google Scholar] [CrossRef]
  6. Xu, K.; Li, Z.; Liang, N.; Kong, F.; Lei, S.; Wang, S.; Paul, A.; Wu, Z. Research on Multi-Layer Defense against DDoS Attacks in Intelligent Distribution Networks. Electronics 2024, 13, 3583. [Google Scholar] [CrossRef]
  7. Ali, T.E.; Yung-Wey, C.; Manickam, S.; Yusoff, M.N.; Kok-Lim, A.Y.; Zoltan, A.D. A stacking ensemble model with enhanced feature selection for Distributed Denial-of-Service detection in software-defined networks. Eng. Technol. Appl. Sci. Res. 2025, 15, 19232–19245. [Google Scholar] [CrossRef]
  8. Zou, H. Clustering Algorithm and Its Application in Data Mining. Wirel. Pers. Commun. 2020, 110, 21–30. [Google Scholar] [CrossRef]
  9. Ahmed, S.; Khan, Z.A.; Mohsin, S.M.; Latif, S.; Aslam, S.; Mujlid, H.; Adil, M.; Najam, Z. Effective and efficient DDoS attack detection using deep learning algorithm, multi-layer perceptron. Future Internet 2023, 15, 76. [Google Scholar] [CrossRef]
  10. Belouch, M.; Elhadaj, S.; Idhammad, M. A hybrid filter-wrapper feature selection method for DDoS detection in cloud computing. Intell. Data Anal. 2018, 22, 1209–1226. [Google Scholar] [CrossRef]
  11. Kim, Y.E.; Kim, Y.S.; Kim, H. Effective feature selection methods to detect IoT DDoS attack in 5G core network. Sensors 2022, 22, 3819. [Google Scholar] [CrossRef] [PubMed]
  12. Zeinalpour, A. Addressing High False Positive Rates of DDoS Attack Detection Methods. Ph.D. Thesis, Walden University, Minneapolis, MN, USA, 2021. [Google Scholar]
  13. Bhattacharjee, P.; Mitra, P. A survey of density based clustering algorithms. Front. Comput. Sci. 2021, 15, 151308. [Google Scholar] [CrossRef]
  14. Hassan, A.I.; Reheem, E.A.E.; Guirguis, S.K. An entropy and machine learning based approach for DDoS attacks detection in software defned networks. Sci. Rep. 2024, 14, 18159. [Google Scholar] [CrossRef]
  15. Alrayes, F.S.; Zakariah, M.; Amin, S.U.; Khan, Z.I.; Helal, M. Intrusion detection in IoT systems using denoising autoencoder. IEEE Access 2024, 12, 122401–122425. [Google Scholar] [CrossRef]
  16. Ahn, B.; Abbas, E.; Park, J.A.; Choi, H.J. Increasing splicing site prediction by training gene set based on species. KSII Trans. Internet Inf. Syst. 2012, 6, 2784–2799. [Google Scholar] [CrossRef]
  17. Altalhan, M.; Algarni, A.; Alouane, M.T.H. Imbalanced data problem in machine learning: A review. IEEE Access 2025, 13, 13686–13699. [Google Scholar] [CrossRef]
  18. Aamir, M.; Zaidi, S.M.A. DDoS attack detection with feature engineering and machine learning: The framework and performance evaluation. Int. J. Inf. Secur. 2019, 18, 761–785. [Google Scholar] [CrossRef]
  19. Dasari, S.; Kaluri, R. An effective classification of DDoS attacks in a distributed network by adopting hierarchical machine learning and hyperparameters optimization techniques. IEEE Access 2024, 12, 10834–10845. [Google Scholar] [CrossRef]
  20. Revathi, M.; Ramalingam, V.V.; Amutha, B. A machine learning based detection and mitigation of the DDoS attack by using SDN controller framework. Wirel. Pers. Commun. Int. J. 2022, 127, 2417–2441. [Google Scholar] [CrossRef]
  21. Adedeji, K.B.; Abu-Mahfouz, A.M.; Kurien, A.M. DDoS attack and detection methods in internet-enabled networks: Concept, research perspectives, and challenges. J. Sens. Actuator Netw. 2023, 12, 51. [Google Scholar] [CrossRef]
  22. Keserwani, P.K.; Govil, M.C.; Pilli, E.S. An effective NIDS framework based on a comprehensive survey of feature optimization and classification techniques. Neural Comput. Appl. 2023, 35, 4993–5013. [Google Scholar] [CrossRef]
  23. Yoachimik, O.; Pacheco, J. 4.2 Tbps of Bad Packets and a Whole Lot More: Cloudflare’s Q3 DDoS Report; Cloudflare, Inc.: San Francisco, CA, USA, 2024; Available online: https://blog.cloudflare.com/ddos-threat-report-for-2024-q3 (accessed on 30 October 2024).
  24. Alduailij, M.; Khan, Q.W.; Tahir, M.; Sardaraz, M.; Alduailij, M.; Malik, F. Machine-learning-based DDoS attack detection using mutual information and random forest feature importance method. Symmetry 2022, 14, 1095. [Google Scholar] [CrossRef]
  25. Abdullayeva, F.J. Distributed denial of service attack detection in E-government cloud via data clustering. Array 2022, 15, 100229. [Google Scholar] [CrossRef]
  26. Zong, Y.; Huang, G. Application of artificial fish swarm optimization semi-supervised kernel fuzzy clustering algorithm in network intrusion. J. Intell. Fuzzy Syst. 2020, 39, 1619–1626. [Google Scholar] [CrossRef]
  27. Panda, M.; Patra, M.R. Some clustering algorithms to enhance the performance of the network intrusion detection system. J. Theor. Appl. Inf. Technol. 2008, 26, 795–801. [Google Scholar]
  28. Kriegel, H.P.; Kröger, P.; Sander, J.; Zimek, A. Density-based clustering. WIREs Data Min. Knowl. Discov. 2011, 1, 231–240. [Google Scholar] [CrossRef]
  29. Mondragón, J.C.M.; Lara, E.R.; Eleuterio, R.A.; Gutirrez, E.E.G.; López, F.D.R. Density-based clustering to deal with highly imbalanced data in multi-class problems. Mathematics 2023, 11, 4008. [Google Scholar] [CrossRef]
  30. Koo, J.; Hwang, S. A unified defect pattern analysis of wafer maps using density-based clustering. IEEE Access 2021, 9, 78873–78882. [Google Scholar] [CrossRef]
  31. Zeinalpour, A.; Ahmed, H.A. Addressing the effectiveness of DDoS-attack detection methods based on the clustering method using an ensemble method. Electronics 2022, 11, 2736. [Google Scholar] [CrossRef]
  32. Shakil, M.; Fuad Yousif Mohammed, A.; Arul, R.; Bashir, A.K.; Choi, J.K. A novel dynamic framework to detect DDoS in SDN using metaheuristic clustering. Trans. Emerg. Telecommun. Technol. 2019, 33, e3622. [Google Scholar] [CrossRef]
  33. Bhaya, W.; Manaa, M.E. A proactive DDoS attack detection approach using data mining cluster analysis. J. Next Gener. Inf. Technol. 2014, 5, 36–47. [Google Scholar]
  34. Bhaya, W.; Manaa, M. DDoS attack detection approach using an efficient cluster analysis in large data scale. In Proceedings of the IEEE 2017 Annual Conference on New Trends in Information & Communications Technology Applications, Baghdad, Iraq, 7–9 March 2017; pp. 168–173. [Google Scholar]
  35. Qin, X.; Xu, T.; Wang, C. DDoS attack detection using flow entropy and clustering technique. In Proceedings of the IEEE 2015 11th International Conference on Computational Intelligence and Security, Shenzhen, China, 19–20 December 2015; pp. 412–415. [Google Scholar]
  36. Al-mamory, S.O.; Algelal, Z.M. A modified DBSCAN clustering algorithm for proactive detection of DDoS attacks. In Proceedings of the IEEE 2017 Annual Conference on New Trends in Information & Communications Technology Applications, Baghdad, Iraq, 7–9 March 2017; pp. 304–309. [Google Scholar]
  37. Ateş, Ç.; Özdel, S.; Anarım, E. Clustering based DDoS attack detection using the relationship between packet headers. In Proceedings of the IEEE 2019 Innovations in Intelligent Systems and Applications Conference, Izmir, Turkey, 31 October–2 November 2019; pp. 1–6. [Google Scholar]
  38. Gu, Y.; Li, K.; Guo, Z.; Wang, Y. Semi-supervised K-means DDoS detection method using hybrid feature selection algorithm. IEEE Access 2019, 7, 64351–64365. [Google Scholar] [CrossRef]
  39. Mansoor, A.; Anbar, M.; Bahashwan, A.A.; Alabsi, B.A.; Rihan, S.D.A. Deep Learning-Based Approach for Detecting DDoS Attack on Software-Defined Networking Controller. Systems 2023, 11, 296. [Google Scholar] [CrossRef]
  40. Elejla, O.E.; Anbar, M.; Hamouda, S.; Faisal, S.; Bahashwan, A.A.; Hasbullah, I.H. Deep-Learning-Based Approach to Detect ICMPv6 Flooding DDoS Attacks on IPv6 Networks. Appl. Sci. 2022, 12, 6150. [Google Scholar] [CrossRef]
  41. Wu, P.; Guo, H.; Moustafa, N. Pelican: A Deep Residual Network for Network Intrusion Detection. In Proceedings of the Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), Valencia, Spain, 29 June–2 July 2020; pp. 55–62. [Google Scholar]
  42. Das, S.; Venugopal, D.; Shiva, S.; Sheldon, F.T. Empirical Evaluation of the Ensemble Framework for Feature Selection in DDoS Attack. In Proceedings of the IEEE International Conference on Edge Computing and Scalable Cloud (EdgeCom), New York, NY, USA, 1–3 August 2020; pp. 56–61. [Google Scholar]
  43. Feng, Y.; Li, J.; Sisodia, D.; Reiher, P. On Explainable and Adaptable Detection of Distributed Denial-of-Service Traffic. IEEE Trans. Dependable Secur. Comput. 2023, 21, 2211–2226. [Google Scholar] [CrossRef]
  44. Bhattacharya, S.; Selvakumar, S. Multi-measure multi-weight ranking approach for the identification of the network features for the detection of DoS and Probe attacks. Comput. J. 2016, 59, 923–943. [Google Scholar] [CrossRef]
  45. Bhattacharya, S.; Selvakumar, S. LAWRA: A layered wrapper feature selection approach for network attack detection. Secur. Commun. Netw. 2015, 8, 3459–3468. [Google Scholar] [CrossRef]
  46. Bouzoubaa, K.; Taher, Y.; Nsiri, B. Predicting DOS-DDOS attacks: Review and evaluation study of feature selection methods based on wrapper process. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 132–145. [Google Scholar] [CrossRef]
  47. Bouzoubaa, K.; Taher, Y.; Nsiri, B. Dos attack forecasting: A comparative study on wrapper feature selection. In Proceedings of the IEEE 2020 International Conference on Intelligent Systems and Computer Vision, Fez, Morocco, 9–11 June 2020; pp. 1–7. [Google Scholar]
  48. Polat, H.; Polat, O.; Cetin, A. Detecting DDoS Attacks in Software-Defined Networks Through Feature Selection Methods and Machine Learning Models. Sustainability 2020, 12, 1035. [Google Scholar] [CrossRef]
  49. Budiman, A.; Hamidi, E.A.Z.; Ahdan, S.; Negara, R.M. Wrapper-Based Feature Selection to Improve The Accuracy of Intrusion Detection System (IDS). In Proceedings of the IEEE 2024 10th International Conference on Wireless and Telematics, Batam, Indonesia, 4–5 July 2024; pp. 1–5. [Google Scholar]
  50. Saha, S.; Priyoti, A.T.; Sharma, A.; Haque, A. Towards an optimized ensemble feature selection for DDoS detection using both supervised and unsupervised method. Sensors 2022, 22, 9144. [Google Scholar] [CrossRef]
  51. Miniak-Górecka, A.; Podlaski, K.; Gwizdałła, T. Using k-means clustering in python with periodic boundary conditions. Symmetry 2022, 14, 1237. [Google Scholar] [CrossRef]
  52. Yang, M.S.; Lai, C.Y.; Lin, C.Y. A robust EM clustering algorithm for Gaussian mixture models. Pattern Recognit. 2012, 45, 3950–3961. [Google Scholar] [CrossRef]
  53. Ellis, T.J.; Levy, Y. Towards a guide for novice researchers on research methodology: Review and proposed methods. J. Issues Inf. Sci. Inf. Technol. 2009, 6, 323–337. [Google Scholar]
  54. Sarker, I.H. Machine Learning for intelligent data analysis and automation in cybersecurity: Current and future prospects. Ann. Data Sci. 2023, 10, 1473–1498. [Google Scholar] [CrossRef]
  55. Chiba, Z.; Abghour, N.; Moussaid, K.; El omri, A.; Rida, M. Intelligent approach to build a Deep Neural Network based IDS for cloud environment using combination of machine learning algorithms. Comput. Secur. 2019, 86, 291–317. [Google Scholar] [CrossRef]
  56. Haskasa, E.; Kalemi, E.; Koci, L.; Shpk, C.C. The influence that WEKA workbench has in processing information. In Proceedings of the ISCIM, Langkawi, Malaysia, 7–9 April 2013; pp. 27–37. [Google Scholar]
  57. Green, S.B.; Salkind, N.J. Using SPSS for Windows and Macintosh: Analyzing and Understanding the Data, 8th ed.; Pearson: Upper Saddle River, NJ, USA, 2017; p. 131. [Google Scholar]
  58. Arango-López, J.; Isaza, G.; Ramirez, F.; Duque, N.; Montes, J. Cloud-based deep learning architecture for DDoS cyber attack prediction. Expert Syst. 2025, 42, e13552. [Google Scholar] [CrossRef]
  59. Najar, A.A.; Naik, S.M. DDoS attack detection using MLP and Random Forest algorithms. Int. J. Inf. Tecnol. 2022, 14, 2317–2327. [Google Scholar] [CrossRef]
  60. Kaliyaperumal, P.; Periyasamy, S.; Thirumalaisamy, M.; Balusamy, B.; Benedetto, F. A novel hybrid unsupervised learning approach for enhanced cybersecurity in the IoT. Future Internet 2024, 16, 253. [Google Scholar] [CrossRef]
  61. Emadi, H.S.; Mazinani, S.M. A Novel Anomaly Detection Algorithm Using DBSCAN and SVM in Wireless Sensor Networks. Wirel. Pers. Commun. 2018, 98, 2025–2035. [Google Scholar] [CrossRef]
Figure 1. Clustering-based DDoS attack detection method construction using CICIDS2017.
Figure 1. Clustering-based DDoS attack detection method construction using CICIDS2017.
Electronics 14 02119 g001
Figure 2. Clustering-based DDoS attack detection method construction using NSL-KDD.
Figure 2. Clustering-based DDoS attack detection method construction using NSL-KDD.
Electronics 14 02119 g002
Table 1. Tests between subjects applying wrapper using CICIDS2017.
Table 1. Tests between subjects applying wrapper using CICIDS2017.
SourceType III Sum of SquaresdfMean SquareFSig.Partial Eta Squared
Corrected Model0.378 a10.37810.5470.0030.289
Intercept2.29412.29463.969<0.0010.711
Method0.37810.37810.5470.0030.289
Error0.932260.036
Total3.38828
Corrected Total1.31027
a R Squared = 0.289 (Adjusted R Squared = 0.261).
Table 2. Tests between subjects applying hybrid using CICIDS2017.
Table 2. Tests between subjects applying hybrid using CICIDS2017.
SourceType III Sum of SquaresdfMean SquareFSig.Partial Eta Squared
Corrected Model0.400 a10.40010.0430.0030.157
Intercept3.93913.93998.897<0.0010.647
Method0.40010.40010.0430.0030.157
Error2.151540.040
Total6.21356
Corrected Total2.55055
a R Squared = 0.157 (Adjusted R Squared = 0.141).
Table 3. Tests between subjects applying wrapper using NSL-KDD.
Table 3. Tests between subjects applying wrapper using NSL-KDD.
SourceType III Sum of SquaresdfMean SquareFSig.Partial Eta Squared
Corrected Model0.045 a10.04512.7680.0010.329
Intercept0.21210.21260.141<0.0010.698
Method0.04510.04512.7680.0010.329
Error0.092260.004
Total0.32528
Corrected Total0.13627
a R Squared = 0.329 (Adjusted R Squared = 0.304).
Table 4. Tests between subjects applying hybrid using NSL-KDD.
Table 4. Tests between subjects applying hybrid using NSL-KDD.
SourceType III Sum of SquaresdfMean SquareFSig.Partial Eta Squared
Corrected Model0.046 a10.04615.511<0.0010.223
Intercept0.23010.23077.572<0.0010.590
Method0.04610.04615.511<0.0010.223
Error0.160540.003
Total0.41256
Corrected Total0.20655
a R Squared = 0.223 (Adjusted R Squared = 0.209).
Table 5. Descriptive statistics when applying wrapper using CICIDS2017.
Table 5. Descriptive statistics when applying wrapper using CICIDS2017.
MethodMeanStd. DeviationN
Clustering-based Wrapper0.171750.21489716
Supervised Wrapper0.406580.14754712
Total0.272390.22029728
Table 6. Descriptive statistics when applying hybrid using CICIDS2017.
Table 6. Descriptive statistics when applying hybrid using CICIDS2017.
MethodMeanStd. DeviationN
Clustering-based hybrid feature selection0.182560.21523332
Supervised learning hybrid feature selection0.353330.17624524
Total0.255750.21534256
Table 7. Descriptive statistics when applying wrapper using NSL-KDD.
Table 7. Descriptive statistics when applying wrapper using NSL-KDD.
MethodMeanStd. DeviationN
Clustering-based wrapper0.047380.02370316
Supervised wrapper0.128330.08691412
Total0.082070.07109428
Table 8. Descriptive statistics when applying hybrid using NSL-KDD.
Table 8. Descriptive statistics when applying hybrid using NSL-KDD.
MethodMeanStd. DeviationN
Clustering-based hybrid feature selection0.035780.02005432
Supervised learning hybrid feature selection 0.093670.08008324
Total0.060590.06118956
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zeinalpour, A.; McElroy, C.P. Comparative Analysis of Feature Selection Methods in Clustering-Based Detection Methods. Electronics 2025, 14, 2119. https://doi.org/10.3390/electronics14112119

AMA Style

Zeinalpour A, McElroy CP. Comparative Analysis of Feature Selection Methods in Clustering-Based Detection Methods. Electronics. 2025; 14(11):2119. https://doi.org/10.3390/electronics14112119

Chicago/Turabian Style

Zeinalpour, Alireza, and Charles P. McElroy. 2025. "Comparative Analysis of Feature Selection Methods in Clustering-Based Detection Methods" Electronics 14, no. 11: 2119. https://doi.org/10.3390/electronics14112119

APA Style

Zeinalpour, A., & McElroy, C. P. (2025). Comparative Analysis of Feature Selection Methods in Clustering-Based Detection Methods. Electronics, 14(11), 2119. https://doi.org/10.3390/electronics14112119

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop