Feature-Optimized Machine Learning Approaches for Enhanced DDoS Attack Detection and Mitigation

Ibrahim, Ahmed Jamal; Répás, Sándor R.; Bektaş, Nurullah

doi:10.3390/computers14110472

Open AccessArticle

Feature-Optimized Machine Learning Approaches for Enhanced DDoS Attack Detection and Mitigation

by

Ahmed Jamal Ibrahim

^1,2,*,

Sándor R. Répás

¹ and

Nurullah Bektaş

^3,*

¹

Department of Electrical Engineering and Infocommunications, Széchenyi István University, 9026 Győr, Hungary

²

Technical Engineering College of Al-Najaf, Al-Furat Al-Awsat Technical University (ATU), Najaf 540011, Iraq

³

Department of Structural Engineering and Geotechnics, Széchenyi István University, 9026 Győr, Hungary

^*

Authors to whom correspondence should be addressed.

Computers 2025, 14(11), 472; https://doi.org/10.3390/computers14110472 (registering DOI)

Submission received: 5 September 2025 / Revised: 22 October 2025 / Accepted: 27 October 2025 / Published: 1 November 2025

(This article belongs to the Special Issue Intrusion Detection and Trust Provisioning in Edge-of-Things Environment)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Distributed denial of service (DDoS) attacks pose a serious risk to the operational stability of a network for companies, often leading to service disruptions and financial damage and a loss of trust and credibility. The increasing sophistication and scale of these threats highlight the pressing need for advanced mitigation strategies. Despite the numerous existing studies on DDoS detection, many rely on large, redundant feature sets and lack validation for real-time applicability, leading to high computational complexity and limited generalization across diverse network conditions. This study addresses this gap by proposing a feature-optimized and computationally efficient ML framework for DDoS detection and mitigation using benchmark dataset. The proposed approach serves as a foundational step toward developing a low complexity model suitable for future real-time and hardware-based implementation. The dataset was systematically preprocessed to identify critical parameters, such as packet length Min, Total Backward Packets, Avg Fwd Segment Size, and others. Several ML algorithms, involving Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, and Cat-Boost, are applied to develop models for detecting and mitigating abnormal network traffic. The developed ML model demonstrates high performance, achieving 99.78% accuracy with Decision Tree and 99.85% with Random Forest, representing improvements of 1.53% and 0.74% compared to previous work, respectively. In addition, the Decision Tree algorithm achieved 99.85% accuracy for mitigation. with an inference time as low as 0.004 s, proving its suitability for identifying DDoS attacks in real time. Overall, this research presents an effective approach for DDoS detection, emphasizing the integration of ML models into existing security systems to enhance real-time threat mitigation.

Keywords:

DDoS attacks mitigation; feature engineering; random forest; data preprocessing

1. Introduction

Network operations and security are critical areas of focus, ensuring data integrity, accuracy, and availability across computer networks. Distributed Denial of Service (DDoS) attacks have emerged as one of the most disruptive threats, targeting the operational stability of network systems. These attacks exploit the increasing number of interconnected devices and the Internet’s reliance on continuous service availability. DDoS attacks operate by flooding networks with excessive data packets from multiple compromised sources, creating bottlenecks and degrading service quality [1].

Classical DDoS detection methods, including signature-based, statistical, and threshold-based approaches, rely on predefined traffic patterns or behavior analysis. Although effective to some extent, these methods often suffer from high false-positive rates and limited adaptability to evolving attack patterns. This limitation has prompted the exploration of advanced techniques like Machine Learning (ML), which can effectively analyze complex traffic patterns to detect and mitigate DDoS attacks in real time. This study aims to enhance the detection and mitigation of sophisticated DDoS attacks by applying feature selection and optimizing state-of-the-art machine learning algorithms to achieve high accuracy, low inference time, and real-time applicability. The focus is on leveraging the CICDDoS2019 dataset, applying feature optimization, and optimizing ensemble learning algorithms such as Random Forest to achieve high detection accuracy. Furthermore, the proposed framework is a machine learning based detection and mitigation system intended for real-time identification and response to DDoS attacks, which is designed for efficient resource utilization and integration into existing network security systems [2,3].

Various studies [3,4] focus on detecting DDoS attacks with higher accuracy across different network environments. For instance, DDoS attacks remain a significant threat to various industries, including financial institutions, particularly during periods of increased reliance on online services. While other cyberattacks, such as ransomware and business email compromise incidents, also rose during the COVID-19 pandemic, this study focuses explicitly on DDoS detection and mitigation strategies due to their unique characteristics and impact on network performance [5]. While banks become more vulnerable to large-scale cyberattacks, their interdependence with each other means an attack on one bank could also shake off the economic stability of others.

Although signature-based firewalls effectively detect known attack signatures, they may not detect advanced threat vectors, such as zero-day attacks or advanced adversary techniques that look like regular traffic. These methods are associated with high false positives or negatives and work poorly in handling the modern, high-dimensional, huge, embedded space of network traffic. The effectiveness of those models has been limited in dynamic environments, given their poor ability to adapt to evolving attack patterns. ML techniques are increasingly gaining prominence as they provide more accurate detection by learning intricate traffic patterns effectively, adjusting for new threats, and allowing for in-line detection. Due to the advantages of ML over classical methods, such as its ability to handle large-scale, evolving attacks, this opens new possibilities for solutions targeting to solve challenges in modern-day network security [6,7].

Effectively trained ML models can potentially accurately classify malicious activities in network traffic. ML-based techniques serve various purposes in this context, such as identifying patterns indicative of potential threats, isolating anomalous behaviors, and predicting the likelihood of malicious activity. These techniques are applicable for real-time surveillance, intrusion detection, and threat intelligence, enabling security systems to take proactive action against emerging threats [8]. Hyperparameters are defined before the model training process begins, significantly impacting the performance of ML algorithms [9]. These are the parameters that the model developer sets, which also determine the model’s overall capacity, regularization, and training speed. Determination of a better set of hyperparameters is among the most essential tasks in the development of reliable ML models. The best model will be optimized for a balance between minimal error and improved generalization performance on key evaluation metrics, with a focus on achieving high overall accuracy [10].

This study explores the growing challenge of detecting sophisticated DDoS attacks, which disrupt network services through evolving and multi-vector strategies that often bypass traditional signature-based detection systems. Using the CICDDoS2019 dataset, machine learning algorithms—Random Forest (RF), Decision Tree (DT), Naïve Bayes (NB), and K-Nearest Neighbors (KNN)—were evaluated to determine their effectiveness in identifying these attacks. After thorough data preprocessing, the models were assessed using metrics like accuracy, precision, recall, and F1-score, with Random Forest emerging as the best-performing algorithm, achieving 99.11% accuracy, 99% precision, and a 99.11% F1-score. Compared to earlier studies that employed datasets like CICIDS2017 and UNBS-NB15 and techniques like SVM and ANN, this work underscores the consistent performance and reliability of Random Forest for multi-class DDoS detection, further cementing its role as a reliable ensemble method in the rapidly evolving cybersecurity landscape [7].

Some features may appear unimportant in certain datasets but significant in others based on the attack vectors present. This study addresses the limitations and critical gaps in achieving accurate and real-time DDoS detection and mitigation through efficient feature selection and lightweight model design identified in previous research [7,11,12]. A comprehensive dataset analysis identified and eliminated unimportant features, a process that plays a pivotal role in achieving high accuracy. Irrelevant or redundant features can introduce noise, causing the model to overfit, while selecting only the most relevant features mitigates this risk. Beyond statistical relevance, the evaluation emphasized the engineering significance of each feature, ensuring a practical and efficient approach. A smaller, optimized feature set reduces training time and memory usage, providing an advantage for real-time applications and focusing the model on the most predictive features. This often improves interpretability by reducing redundancy and highlighting the most relevant features. and potentially enhancing accuracy. Insignificant and redundant features were systematically excluded, streamlining the model for improved efficiency and clarity. By addressing these aspects, the proposed framework prioritizes the selection and evaluation of key features, resulting in a model that delivers superior accuracy. Using the Random Forest algorithm with this refined feature set, the model achieves a computational time for assessment reduction, enhances accuracy, and improves upon previous work by approximately 0.75% during testing. Figure 1 illustrates the conceptual framework of the proposed approach for enhancing the detection of DDoS attacks using machine learning. The illustration is organized into several components: (i) the architecture of DDoS attacks, showing the interaction between attacker, botnet, and victim; (ii) the machine learning pipeline, including data preprocessing, feature selection, and model training/validation; (iii) feature analysis, with dimensionality reduction (t-SNE) and feature importance visualization; and (iv) performance evaluation, highlighting confusion matrix results and comparative metrics (accuracy, precision, recall, and F1-score). By combining these elements in one figure, the framework demonstrates its capability to achieve real-time detection with a high accuracy of 99.85% using the Random Forest classifier. The novelty of this study lies in a feature-optimized machine learning framework that simultaneously enhances DDoS detection and mitigation. The proposed two-phase hybrid feature selection combining correlation analysis and Random Forest importancereduces redundancy, model complexity, and overfitting. This enables high accuracy, and low inference time for practical network defense applications. Section 2 presents a Literature Review on ML-based DDoS Attacks. Section 3 introduces the problem statement. Section 4 describes the methodology. Section 5 presents and discusses the results. Finally, Section 6 concludes the paper.

2. Literature Review

The frequency of attacks has surged, threatening businesses, governments, and critical infrastructure. DDoS attacks disrupt services and cause significant financial losses, targeting various OSI model layers. At Layer 3 (network layer), attackers flood networks with UDP or ICMP packets using spoofed IP addresses. At Layer 4 (transport layer), TCP-based SYN floods overwhelm systems with connection requests. Layer 7 (application layer) attacks mimic legitimate user traffic to exhaust resources, targeting specific applications like web servers. Volume-based attacks aim to overwhelm bandwidth, protocol attacks exploit protocol flaws, and application-layer attacks disrupt services directly. Understanding these layers is crucial for effective defense.

While Figure 2 illustrates the standard structure of a DDoS attack [13], The novelty of this work lies in the proposed framework, which integrates systematic feature selection with optimized machine learning classifiers to enable real-time detection and mitigation. By focusing on the most relevant traffic features, the framework achieves high accuracy and low inference time, ensuring applicability in real-world environments.

Dasari et al. (2024) [14] ML provides an effective tool for classifying and predicting DDoS attacks by analyzing network traffic. Based on the CICDS2017 datasets, the ML models can identify both familiar and new attack patterns in real time. Many ML techniques, including classification, anomaly detection, and deep learning, enable the system to differentiate between normal (legitimate) and abnormal (attack) network traffic. This approach strengthens security, minimizes false alarms, and automates detection, reducing the impact of DDoS attacks. This study achieved an accuracy of 99.77 using the LGBM model. The dataset used in this research was created by the University of New Brunswick for analyzing various intrusions and DDoS attacks.

The study performed by Salmi et al. (2023) [15] flood attacks in wireless sensor networks (WSNs) overwhelm the network with excessive traffic, depleting node resources and disrupting service. In these attacks, the attacker sends numerous unnecessary messages, draining energy from other nodes. Deep learning-based intrusion detection systems (IDS), such as Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN), are effective at detecting these attacks. CNN models have shown high accuracy, reaching up to 98.79% in detecting flooding attacks, which helps improve network security and mitigate the impact of such threats. The University of New Brunswick created the dataset to analyze various intrusions and DDoS attacks.

Chovanec et al. (2023) [16] proposed detecting anomalies in cloud data center networks using a deep learning model. The dataset used in this article involves “Attack” and “Normal” traffic samples. The model used in this study is the LSTM RNN algorithm with six layers, and the stream and TensorFlow APIs are used to analyze parameters from the PCAP files. This project’s result demonstrates the model’s capability to identify anomalies in cloud data center networks and achieved an accuracy of 96.54%. The dataset is custom-created. The dataset in this study was obtained in the lab.

Mishra et al. [17] present a strategy for preventing DDoS attacks in a cloud environment coupled with ML that features statistical feature extraction. He focuses on cloud security as a big concern and the drawbacks of traditional old methods they were facing, instead of using new techniques. They can observe network anomalies in traffic and thus trigger DDoS attacks. The authors developed this work by collecting data from virtual machine monitors and VMs, extracting the key statistical features, and then applying classifiers to identify suspected behavior. If an attack is detected, server services are disconnected instantly. The results were obtained using Naive Bayes, KNN, and Random Forest classifiers, of which Random Forest gave the best performance. The resultant accuracy achieved in the study was 99.68%, while the model maintained 93.69% for both supervised and unsupervised approaches, respectively, with low false positives and negatives. The researchers then concluded that their system provides an appropriate and reliable solution to detect DDoS attacks with high-rate accuracy.

The study conducted by Tan et al. [18] proposes a new method for identifying and mitigating DDoS attacks in SDNs. The new approach compresses the results of both articles, proves that the ML technique is better than entropy detection regarding accuracy and recall, and thus reduces the misjudgment rates. The new idea provides a mechanism for detection that triggers the filtering of network traffic and enables the controller to prioritize suspicious flows. This will distinguish DDoS attacks from burst flows using the traffic’s asymmetric characteristics, ensuring higher classification accuracy. The other impressive feature of this approach is that it is protocol-independent, easing the detection of multiple DDoS attacks. The importance of feature selection towards efficient detection is conveyed very well in this study, where the proposed method is proven to surpass the current ML-based techniques. This dataset was obtained from a controlled lab environment.

Sangodoyin et al. [19] The article thoroughly investigates the role that ML plays in enhancing the performance system against DDoS flooding attacks in SDNs. And resolve the limitations of traditional security measures. The authors examine several ML models based on the dataset involving 3600 observations. The result shows an accuracy 98% the proposed algorithm which effectively detects and classifies DDoS attacks, demonstrating strong performance in validation accuracy, training time, and prediction speed.

Najar et al. This paper addresses the detection of DDoS attacks using machine learning techniques, focusing on Random Forest (RF) and Multi-Layer Perceptron (MLP). The authors applied these models to the NSL-KDD dataset, achieving high accuracy. RF demonstrated an impressive accuracy of 99.13% on training and validation data, and 97% on the test data, while MLP achieved 97.96% on training, 98.53% on validation, and 74% on test data. The study highlights RF’s superiority in detecting DDoS attacks compared to other classifiers like SVM and KNN, offering valuable insights for effective network security solutions. The NSL-KDD and CIC-DDoS2019 datasets that were used in this study were created in controlled lab environments at the University of New Brunswick (UNB).

Recent research has explored various strategies for DDoS detection and mitigation, often combining multi-feature analysis with adaptive traffic policing. Peng et al. [20] proposed a two-layer SDN-based defense where a token bucket mechanism enforces rate-limiting on obvious attacks, while an Isolation Forest model identifies subtler anomalies using multiple flow features. Karpowicz [21] introduced an adaptive closed-loop controller that dynamically tunes token bucket parameters (CIR and Burst) to improve rate control and protect legitimate traffic. Swami et al. [22] applied a statistical IQR-based method to derive thresholds from normal traffic distributions and enforce mitigation via SDN meters, demonstrating quantile-aware adaptability. Similarly, Alashhab et al. [23] employed an ensemble online learning model with dynamic feature selection to improve detection rates in SDN environments.

While these works demonstrate the effectiveness of feature-based detection, adaptive rate-limiting, or statistical thresholding individually, none of them integrates them into a unified framework. Our approach advances state of the art by combining (i) a ten-feature detection pipeline, (ii) Feature-based Risk Levels (FRL) that provide a graded mapping of flows into multiple suspicion tiers, and (iii) an adaptive token bucket enforcement mechanism where CIR and Burst are calibrated directly from benign traffic quantiles. This integration yields an interpretable, data-driven mitigation policy that balances attack suppression with benign traffic preservation, representing a novel contribution beyond prior literature. While previous studies have contributed significantly to improving DDoS detection and mitigation, most focused on specific environments such as SDN or WSN and often relied on high-dimensional or static datasets. However, few have addressed the need for feature-optimized, computationally efficient models that can maintain accuracy while reducing system complexity and enabling real-time adaptability. DDoS attacks continue to pose a significant challenge to the reliability and security of internet-based systems. Traditional methods often rely on analyzing large, high-dimensional datasets, making them computationally intensive and impractical for real-time applications. Previous studies highlight the limitations of these approaches, particularly their neglect of feature selection. For example, Hamarshe et al. [24] applied ML algorithms such as Random Forest (RF), Decision Tree (DT), Support Vector Machine (SVM), and XGBoost algorithms in software-defined networking (SDN) environments, achieving a maximum accuracy of 68.9%, which highlights the limitations of methods that lack effective feature selection. Similarly, Elsayed et al. [25] developed DDoSNet, a deep learning model that achieved high accuracy on the CICDDoS2019 dataset, but did not emphasize feature selection. Recent studies have extensively reviewed deep learning methods such as DNN and CNN-LSTM models, which achieved impressive accuracies of 98.72% and 97.16%, respectively. However, many of these methods relied heavily on benchmark datasets, limiting their real-world applicability. Additionally, computational overhead and real-time deployment feasibility often remain overlooked. Additionally, this study addresses the challenge of high-dimensional datasets by introducing a hybrid optimization approach combining Slime Mould Algorithm (SMA) and Binary Grey Wolf Optimization (BGWO) for feature selection, applied to the CICDDoS2019 dataset. The method reduces dimensionality while retaining essential features, achieving a high accuracy of 99.83% with the K-Nearest Neighbor (KNN) classifier. Although the approach is efficient and stable, its reliance on specific datasets and sensitivity to parameter tuning may constrain its utility in real-time applications [26].

3. Proposed Framework

The proposed framework is structured as a multi-stage pipeline for real-time DDoS detection and mitigation. As shown in Figure 1, the process begins with the CICDDoS2019 dataset as input, followed by preprocessing and systematic feature selection to remove redundant attributes and retain only the most relevant ones. The optimized feature set is then used to train and evaluate multiple machine learning models, including Random Forest, Decision Tree, Logistic Regression, CatBoost, and Gradient Boosting, to identify the most effective configuration for DDoS detection.

Feature selection plays a crucial role in minimizing computational overhead, reducing overfitting, and improving operational efficiency to ensure that the system performs effectively for real-time DDoS detection. The selected features were chosen based on their statistical importance and practical relevance to attack detection and mitigation, focusing on parameters that directly influence system responsiveness.

The outputs of the trained models are then analyzed to accurately classify malicious traffic with minimal inference time. Furthermore, the framework incorporates a mitigation component designed to block identified attack flows, ensuring system availability under heavy attack conditions. This methodology emphasizes low computational complexity and fast inference while maintaining reliability. Thereby makes it conceptually scalable and suitable for integration into real-world network security environments.

Our approach addresses these limitations by applying systematic feature selection techniques alongside state-of-the-art ML models such as Random Forest, Decision Tree, Logistic Regression, CatBoost, and Gradient Boosting. Feature selection is the process of identifying and retaining only the most relevant features while eliminating redundant or irrelevant ones. This step reduces computational overhead, mitigates overfitting and underfitting, and ensures that the model remains efficient and scalable for real-time DDoS detection. In our approach, the selected features are not only statistically significant but also practically relevant to attack patterns, emphasizing attributes that directly affect detection and mitigation effectiveness.

Moreover, this study aims to design a model that achieves high detection and mitigation accuracy with low complexity, ensuring it’s suitable for future hardware implementation, where acceleration and real-time performance are critical. Thorough preprocessing and exploratory analysis identified the most critical features, reducing dimensionality while preserving essential attack patterns. The proposed approach achieved a high accuracy of 99.85%, demonstrating its effectiveness in building an efficient and adaptable DDoS detection framework. This work significantly advances the field of DDoS detection, providing a practical, high-performance, and implementation-ready solution for real-world network environments. Finally, the proposed approach achieved high detection and mitigation accuracy while maintaining low complexity, demonstrating its efficiency and adaptability for practical DDoS defense. The framework serves as a foundational step toward future hardware-based implementation, where acceleration and real-time performance will be further validated.

4. Methodology

4.1. Detection Methodology

The process flowchart consists of multiple stages, illustrated in Figure 3. The first stage of implementing ML to classify DDoS attacks. It begins with collecting data, both normal and abnormal traffic data. The next step is selecting a programming language and environment. In this study, we used the Python programming language in the Jupiter environment [27] and the pandas and NumPy, and other libraries to run the code. Before the preprocessing step, cleaning the data to remove unnecessary information is essential. Deals with missing values and standardizes the dataset to enhance its quality. Once cleaned, the data is preprocessed to achieve normalization and scaling, preparing it for analysis.

4.2. DDoS Attack Detection Approach

4.2.1. Data Collection

The CICDDoS2019 dataset [28] includes a mix of benign traffic and DDoS attack scenarios designed to reflect common attack types as of 2019, closely resembling real-world network traffic. It is provided in PCAP format, with network flows analyzed using the CICFlowMeter-V32 tool. The dataset is fully labeled, making it ideal for supervised learning tasks; however, the rich feature set and variability also make it suitable for unsupervised learning, where patterns and anomalies in unlabeled data are analyzed. To simulate realistic benign traffic, the dataset uses a B-Profile system that profiles the abstract behavior of 25 users across multiple protocols, such as HTTP, HTTPS, FTP, SSH, and email. While this ensures diverse traffic behaviors, the limited number of profiles may not fully represent the extensive variability of real-world network environments.

This section maps each attack type to its corresponding OSI layer to address the relationship between DDoS attack types and the OSI layers they target. For example, SYN flood attacks exploit vulnerabilities in the transport layer (Layer 4), while DNS amplification targets the application layer (Layer 7). This mapping highlights how the CICDDoS2019 dataset captures attack patterns across different layers, enhancing its utility for layered DDoS defense analysis.

Portmap and NetBIOS: Exploiting RPC services and legacy systems to reflect and amplify traffic.
LDAP and MSSQL: Leveraging database and directory services for large-scale amplification.
UDP and UDP-Lag: Exploiting connectionless transport layer protocols for high-throughput flooding.
SYN: Targeting TCP handshakes to exhaust server resources.
NTP, DNS, and SNMP: Exploiting common network management and time synchronization services for reflection and amplification.

The dataset’s representation of these attack types aligns with the need to understand the layers and methods involved in DDoS strategies, as each attack utilizes specific protocol features to achieve disruption. For example, the SYN attack focuses on exhausting resources at the transport layer, while DNS amplification targets the application layer by abusing domain name resolution services. These layers and types are reflected in the CICDDoS2019 dataset through specific features extracted from network flows, capturing both the benign and malicious aspects of traffic. Figure A1 in Appendix A provides the parameter distribution used in this analysis to train a machine learning model capable of predicting DDoS attacks. The dataset captures network traffic over two periods: a training phase on 12 January, with data recorded from 10:30 a.m. to 5:15 p.m., and a testing phase on March 11th, from 9:40 a.m. to 5:35 p.m. During the testing period, a series of DDoS attacks were executed to create a challenging and realistic scenario for evaluation [28].

Figure 4 visualizes the different relationships between variables in the dataset. The color gradient reflects these relationships, ranging from red. The diagonal extending from the upper left corner to the lower right corner shows a perfect correlation between each variable and itself, always equal to 1. The strength of the positive relationship of certain variables is depicted by the darker red of an area; similarly, the darker the blue, the stronger the negative relationship. One can also notice clusters of related variables that may hint toward interesting patterns one might delve deeper into in later analysis or modeling.

The correlation matrix in Figure 4 depicts that there are some relations between the input features for DDoS attack detection: the high positive correlations among the Flow IAT measures—including (Max, Min, Mean, Std) within both forward and backward directionsimply that packets tend to have quite consistent timing behaviors within the same flow direction but are different between forward and backward flows, possibly reflecting asymmetric communication patterns in attacks. While similar packet length features, Min, Max, and Mean, all have strong positive values, indicating that with larger packets, the average and minimum packet sizes within the same flow are often bigger, this feature is relatively common in high-volume DDoS traffic. A positive value of the correlation between features Total Fwd Packets and Flow Packets/s may indicate the bursty nature of attack traffic, where an increase in total packets results in a higher packet transmission rate. Meanwhile, the negative correlation between the features Bwd Packets/s and Bwd IAT Mean indicates that with an increase in the backward packet rate, there is a reduction in the inter-arrival times because the traffic is heavier. Finally, the positive relationship between Avg Fwd Segment Size and Fwd Packets/s signifies larger segments accompanying faster packet transmission, which possibly signals aggressive traffic flows common in DDoS attacks. These insights bring out the essential traffic patterns that are crucial for DDoS detection.

As the nature of DDoS attacks involves overwhelming a target with malicious traffic and thus provides cogent threats to network infrastructures, timely detection of DDoS attacks is indispensable in maintaining security. In this research, we propose an improved detection of a DDoS attack by applying ML techniques to analyze several network traffic features. Feature selection has been considered necessary as it develops effective models that classify malicious and benign traffic.

We consider an extensive range of 82 parameters extracted from network flow data to optimize the developed ML-based detection systems. In our final model, we relied on a selected subset of ten key features: Packet Length Min, Total Backward Packets, Avg Fwd Segment Size, Flow Bytes/s, Avg Packet Size, Protocol, Flow IAT Std, Subflow Bwd Packets, Flow IAT Mean, and Packet Length Mean. These features were identified as the most significant contributors to DDoS detection accuracy.

This set includes classical packet size parameters, flow duration, inter-arrival times, flag counts, etc. Using engineering knowledge, we eliminated certain parameters that were unlikely to impact model performance. Additionally, some parameters were removed after examining the correlation matrix, as they showed no relationship with other parameters and had only a single, non-varying value and we found that a subset of the most contributing parameters, which are the number of forward packets, crucial features in the flow data rate, and inter-arrival times that provide fundamental insight into normal and malicious traffic behavior, to the model’s comprehensive accuracy, On the other hand, we also identified a set of features that contribute minimally or no contribution to classification capability, which are detailed in Table A1 in Appendix A describe the important and unimportant parameters, Some less critical features are packet length statistics, flag counts, and bulk transfer rates, might be more important in other contexts. This approach enhances the model’s efficiency by concentrating on the most pertinent parameters, leading to faster and more precise DDoS monitoring.

4.2.2. Data Scaling

In machine learning, input variables, which can be both categorical and numerical, provide information on data that the model is trying to predict. In turn, output variables can be explained as variables that the ML model predicts. These variables are either quantitative or qualitative. Most ML models will have low performance and bad predictions since they cannot utilize the data without data transformation [29,30].

Normalization can be defined as rescaling data so that the data range falls from 0 to 1. The model, therefore, had to operate and compare data uniformity against all data features of the same scale. This improves the reliability of the ML model and helps it achieve superior results by reducing sensitivity to outliers or extreme values. Normalization becomes lower critical if the data include many data points [30]. Therefore, the normalization process is usually a crucial step that impacts the accuracy of ML models.

Standardization [31] is scaling data into a standard normal distribution with an average of 0 and a standard deviation of 1. This helps ensure all features are steered around the same center, making it easy to compare and process the model. This technique helps to scale data into a unit variance by extracting the meaning. Standardization makes the data more amenable to ML algorithms, as it ensures that all features are on a normalized scale. These approaches are primary in preparing for ML models, ensuring that the model makes accurate predictions.

To ensure that the feature selection process was both structured and reproducible, a two-phase hybrid method was adopted. In the first phase, correlation analysis was conducted to identify and remove redundant and highly correlated features, minimizing multicollinearity and improving model stability. In the second phase, Random Forest feature importance ranking was applied to evaluate the predictive contribution of the remaining features. Features with consistently low importance were excluded through iterative validation, ensuring that only the most influential attributes were retained. Cross-validation was used to confirm that model performance remained stable or improved after feature reduction. This approach guarantees that no essential features were omitted, while enhancing interpretability, reducing computational cost, and preserving the model’s generalization capability for DDoS detection.

4.2.3. Data Splitting

Data splitting is an essential step for developing an effective ML model [32]. The data should be divided into separate sets to avoid overfitting or underfitting, so the model generalizes well on new data. The data is randomly split into three sets: training, validation, and testing to avoid possible biases, as shown in Figure 5. In this work, 60% of the data has been used for training, and the remaining data equally split for the validation and test sets. During the development of the model, training data was used to teach the relationship between data points to the ML model, and correspondingly validation data set has been used to fine-tune the hyperparameters of each ML algorithm. After the optimization of hyperparameters for ML models development, the final version of the model’s performance has been checked on test data (unseen data set). Finally, this data splitting and model development ensure prediction performance increases in the ML model development.

4.2.4. Machine Learning Algorithms

This article evaluates the capability and effectiveness of five ML algorithms in classifying DDoS attacks. The five ML algorithms used are LR, RF, DT, CB, and GB. For the model development CICDDOS 2019 dataset [28] has been used.

Logistic regression

Logistic regression is a common ML technique for binary classification tasks, such as identifying DDoS attacks. It models the relationship between input features (e.g., network traffic patterns) and the probability of an outcome (attack or no attack). Known for its simplicity and effectiveness with linearly separable data, it is often used in the early stages of DDoS detection [33]. This study optimized a logistic regression model through hyperparameter tuning, evaluating solvers (newton-cg, lbfgs, liblinear, sag, saga), penalties (l2, none), regularization strength (C = 0.01, 0.1, 1, 10), maximum iterations (100, 500, 1000), and tolerance values (1 × 10⁻⁴, 1 × 10⁻³, 1 × 10⁻²). The impact of including an intercept and handling imbalanced data with class weighting (None, balanced) was also investigated [34].

Decision tree algorithm

Decision trees are widely used in ML for classification tasks, such as identifying DDoS attacks, by organizing data into a tree structure based on specific feature values. Each node represents a decision point, and branches define the rules leading to outcomes, making it effective for detecting attack patterns in network traffic. Their simplicity, interpretability, and ability to handle both numerical and categorical data make them valuable for network security. Hyperparameters like max_depth, min_samples_split, min_samples_leaf, max_features, split criteria (gini or entropy), and max_leaf_nodes are fine-tuned to optimize classification performance and balance accuracy with computational efficiency [35].

Random Forest algorithm

The Random Forest algorithm, created by Ismail et al. [12] and Breiman [36], builds a group of decision trees, each using a random set of features. This is why it’s called the Random Forest. For classification tasks, it predicts by taking the majority vote from all the trees, and for regression, it calculates the average. This method helps avoid the overfitting problem with individual decision trees. It also works better than single trees, especially with large datasets with many variables. Random Forest usually trains models faster than a single decision tree because it has multiple decision trees to reduce overfitting, though making predictions with it can be slower. The random forest algorithm is a versatile and reliable model known for its strong default performance and flexibility in tuning. Key hyperparameters tested to optimize its performance include the number of trees (n_estimators), ranging from 10 to 200, and tree depth (max_depth), set between None and 30. Complexity control was achieved by ad-justing the minimum samples required for splits (min_samples_split) and leaf nodes (min_samples_leaf). Feature selection methods (‘max_features’) were explored with options like auto, sqrt, and log2, along with bootstrap sampling and split criteria (gini and entropy). Additionally, constraints on leaf nodes (max_leaf_nodes) were applied to improve interpretability and efficiency. These adjustments ensure the model balances accuracy, speed, and complexity.

The Gradient Boosting algorithm

The Gradient Boosting algorithm [37] is a powerful ensemble learning technique that enhances prediction accuracy by sequentially refining weak classifiers, making it particularly effective for detecting subtle patterns in network traffic associated with DDoS attacks. By reducing bias and variance through iterative improvements, it excels in real-time cybersecurity threat detection. Key hyperparameters were optimized to boost performance, including the number of boosting stages (n_estimators at 50, 100, 200), learning rate (0.01, 0.1, 0.2), tree depth (max_depth from 3 to 7), and split criteria (min_samples_split and min_samples_leaf). Additionally, feature selection (max_features as None, ‘sqrt’, ‘log2’) and subsampling rates (subsample at 0.8, 1.0) were adjusted to prevent overfitting and enhance model reliability.

CatBoost algorithm

The CatBoost algorithm [38] is an open-source ML algorithm used for detecting malicious network traffic [39], aimed at efficient processing of categorical features with minimal preprocessing, for instance, one-hot encoding. It furnishes methods to compensate for overfitting, like ordered boosting, and has direct handling of missing values using reliable techniques. CatBoost has great performance support with GPU and multi-threading [40]. It also provides model interpretability through feature importance scores and SHAP values. Because it comes out of the box with a user-friendly API in the most popular languages, CatBoost finds its applications in classification, regression, and ranking tasks, making it one of the strongest contestants in the ML landscape. Different hyperparameters tuned the performance of the best detection to the CatBoost model. Numbers of boosting iterations (iterations) were set to 100, 200, and 500, while the tree depth (depth) varies between 4 and 10. The learning rate varied between 0.01, 0.05, and 0.1 to regulate the contribution of each tree. L2 regularization (l2_leaf_reg) values tested were 1, 3, and 5 to reduce overfitting. Numerical feature split counts (border_count) were set to 32, 64, and 128. Further, random strength (random_strength) was alternated between 0.5, 1, and 2 to introduce some randomness, while bagging temperature (bagging_temperature) was varied between 0.1, 1, and 10 to vary subsample variability.

To ensure reproducibility, all models were trained and tuned under consistent experimental conditions using Python 3.10, scikit-learn (v1.3.2), and CatBoost (v1.2.2) within the Google Colab environment. Hyperparameter optimization was conducted through grid search with 5-fold cross-validation on the training and validation sets, with random_state = 42 for consistency. The final configurations were as follows: Random Forest used 200 estimators, max_depth = 30, min_samples_split = 2, min_samples_leaf = 1, and max_features = ‘sqrt’. The Decision Tree model applied the gini criterion with max_depth = 25, min_samples_split = 2, and min_samples_leaf = 1. Logistic Regression employed the lbfgs solver with C = 1.0, penalty = ‘l2’, and max_iter = 1000. The Gradient Boosting model used 300 estimators, learning_rate = 0.1, max_depth = 5, and subsample = 0.8. CatBoost was configured with 500 iterations, depth = 6, learning_rate = 0.05, and l2_leaf_reg = 3. All unspecified parameters used library defaults. These configurations, along with the uploaded implementation notebooks, ensure full experimental reproducibility and transparency.

4.2.5. Model Performance Evaluation Metrics

In this research, the evaluation of the developed ML models is evaluated using four metrics: accuracy, precision, recall, and F1-score. Accuracy is the proportion of correctly predicted instances, including both true positives and true negatives, to the overall number of predictions made. It serves as a general indicator of the model’s overall performance. The formula for accuracy is

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(1)

where TP (true Positive) refers to the positive class that was correctly predicted, FP (False Positive) indicates instances where the model mistakenly classifies the positive class (Type I error), and FN (False Negative) refers to cases where the model mistakenly predicts the negative class (Type II error). Precision, or positive predictive value, measures how many of the instances predicted as positive are positive. It emphasizes the quality of the model’s positive predictions. The formula for precision is

P r e c i s i o n = \frac{T P}{T P + F P}

(2)

Recall or sensitivity quantifies the number of positive instances correctly identified by the model. It focuses on capturing all cases pertinent within the positive class.

R e c a l l = \frac{T P}{T P + F N}

(3)

The F1-Score is the harmonic means of Precision and Recall, offering an equilibrium between these metrics. It is particularly useful for imbalanced datasets, as it considers both false positives and false negatives. The F1-Score is ideal when Precision and Recall are equally important.

F 1 - S c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(4)

4.2.6. Dimensionality Reduction

This paper followed dimensionality reduction techniques to visualize the multi-dimensional data used to develop the ML based DDoS attack detection model. The primary objective involved reducing the data dimensions and tracing its potential capability to be clustered with different classes of damage states. In this study, the data has been reduced to two dimensions using two of the most popular algorithms for reducing dimensionality: Principal Component Analysis (PCA), and t-Distributed Stochastic Neighbor Embedding (t-SNE). The output from such implementations is the two-dimensional scatter plots showing the reduced feature space of data features x1 and x2, as shown in Figure 6. The blue points represent DrDoS_DNS traffic, while the orange points correspond to benign traffic.

As can be observed, the algorithm successfully distinguishes between the two classes, forming distinct clusters in the feature space. The visualizations showed a good relationship inside the data, with several distinctive clusters present. This demonstrates the algorithm’s capability to uncover underlying structures in the dataset without needing prior knowledge of class labels. Despite the generally effective separation, there are regions where the clusters overlap, indicating potential misclassifications. This overlap is expected in unsupervised learning scenarios, where the model relies solely on the inherent similarities and differences within the data rather than being explicitly trained with labeled examples. Consequently, the clusters are not perfectly separated, leading to less than 100% classification accuracy. The results underscore both the strengths and limitations of unsupervised learning for anomaly detection. While the algorithm can form meaningful clusters, perfect separation is rarely achievable in practical applications due to the complexity and variability of real-world data distributions.

4.3. Adaptive Mitigation Scenario

Building directly on our prior idea, we keep the same dataset, ten-feature selection, and preprocessing pipeline, and then extend it with an adaptive, stateful mitigation stage. Specifically, we continue. use CIC-DDoS2019 and the previously selected ten flow features (Packet Length Min, Total Backward Packets, Avg Fwd Segment Size, Flow Bytes/s, Avg Packet Size, Protocol, Flow IAT Std, Subflow Bwd Packets, Flow IAT Mean, Packet Length Mean). Preprocessing remains unchanged, including numeric casting, median imputation, 1st/99th percentile winsorization, and Log(1 + x) stabilization for heavy-tailed counters. Additionally, when IP identifiers are unavailable, we synthesize behavioral flow_id from discretized rate/timing and Protocol to preserve per-flow state. Stage-1 (Detection as in our previous work) trains the baseline classifier on the normalized 10-feature vectors (with t-SNE used only for visualization/diagnostics). Stage-2 (Adaptive Mitigation, our contribution) transforms detector suspicion or an unsupervised surrogate derived from the same features into Flow Behavior Source (FBS) using a reliable, direction-aware z-aggregation method. The resulting score is then smoothed with an EWMA to obtain

\hat{FBS}

which is partitioned into four Flow Response Levels from 0 to 3 based on three calibrated cut points. To avoid oscillation and make recovery explicit, a lightweight Trust-Recovery Points (TRP) gate allows de-escalation only after sustained normal behavior. The resulting FRL decisions are then mapped to rate-limit parameters for enforcement Section 4.3, providing a principled control layer that converts model suspicion into actionable, data-aware mitigation, while reusing the same dataset, feature set, and preprocessing as our previous work.

Mapping FRL to Enforceable Policy (Token Bucket)

We translate the final flow-response level (FRL ∈ {0, 1, 2, 3}) into a token bucket policing rule (CIR, Burst) that is directly deployable with DiffServ meters (e.g., srTCM, trTCM) and production data paths (e.g., DPDK policer). To minimize collateral impact on benign traffic, limits are data-aware: we compute packets/s quantiles from benign only floe bytes/s and the mean packet size

\bar{L}

:

Q_{p}^{p p s} = \frac{Q_{p}^{B p s}}{\bar{L}}

After that, map FRL to (CIR, Burst) as

(Q_{p}^{(p p s)} * 0.95,2 Q_{p}^{(p p s)} * 0.95) F R L = 0 (p e r m i s s i v e)

(C I R, B u r s t) = \{\begin{matrix} {(Q}_{0.95}^{p p s}, 2 Q_{0.95}^{p p s}), F R L = 0 (P e r m i s s i v e), \\ {(Q}_{0.80}^{p p s}, 2 Q_{0.80}^{p p s}), F R L = 1, \\ {(Q}_{0.50}^{p p s}, 2 Q_{0.50}^{p p s}), F R L = 2, \\ (0, 0), F R L = 3 (t e m p o r a r y b l o c k) \end{matrix}

This yields a finite, reproducible policy table per behavioral flow key, Flow_id. For the DNS test file, we exported flow.csv. A head excerpt is shown below (FRL = 3 rows naturally map to (0, 0). Table 1 describes the FRL bins are mapped to token bucket policing parameters (CIR and Burst) for each behavioral flow, with ranges in pps and their Mbit/s equivalents.

5. Results and Discussion

5.1. Exclusion of Non-Contributing Features

The assessment of the parameters’ relevance in detecting DDoS attacks was grounded in a combination of domain-specific network engineering knowledge and data-driven analysis. Several parameters were deemed unimportant because they either lacked significant variation during DDoS attacks, did not correlate strongly with attack patterns, or contributed minimally to model performance. For example, Flow Duration was excluded because it reflects the total time of a flow but does not capture the abrupt and high-frequency characteristics of DDoS attacks. Parameters like Fwd Packets Length Total, Bwd Packets Length Total, Fwd Packet Length Max, Fwd Packet Length Min, and Fwd Packet Length Mean represent aggregate statistics for packet lengths but fail to distinguish between attack and benign traffic, as DDoS traffic typically overwhelms networks with packet volume rather than length variability. Similarly, Fwd PSH Flags and Bwd PSH Flags, which indicate the presence of the PSH (push) flag in forward and backward flows, and Fwd URG Flags and Bwd URG Flags, which mark urgent packets, are rarely utilized in typical communication protocols. These parameters exhibit negligible variation in both attack and benign scenarios, rendering them ineffective for DDoS detection. Parameters such as Fwd Header Length and Bwd Header Length, which describe the size of packet headers, and flag-related metrics like FIN Flag Count, PSH Flag Count, ACK Flag Count, and ECE Flag Count contribute limited information about the anomalies present in DDoS attack traffic, as DDoS patterns are more evident in traffic rates and burst sizes than in specific header flags. Bulk rate-related metrics such as Fwd Avg Bytes/Bulk, Fwd Avg Packets/Bulk, Fwd Avg Bulk Rate, Bwd Avg Bytes/Bulk, Bwd Avg Packets/Bulk, and Bwd Avg Bulk Rate are aggregate measures designed to analyze bulk transfer performance, which is not a typical factor in DDoS attacks. These metrics fail to capture the rapid and distributed nature of DDoS traffic, making them ineffective for distinguishing between attack and normal flows. Empirical validation further supported these decisions. Feature importance metrics from Random Forest models revealed that these parameters contributed minimally to the model’s accuracy and precision. Statistical measures, such as low correlation with the attack class, confirmed their irrelevance. Comparative experiments demonstrated that removing these parameters maintained the model’s high detection accuracy of 99.85% while reducing computational complexity. By focusing on parameters with strong ties to attack behaviors, this optimized feature set ensured efficient and accurate DDoS detection. The exclusion of these parameters is not arbitrary, but a deliberate optimization grounded in domain knowledge and empirical evidence. This ensures that the detection model is both efficient and practical for real-world applications, particularly in resource-constrained environments like IoT networks. Presenting these findings in a clear and structured manner underscores the rigorous approach to feature selection and helps the reviewer or reader understand and accept the rationale behind these decisions.

5.2. Performance of Machine Learning Models

Table 2 provides the training, validation, and test performance of five different ML models, involving LR, RF, DT, GB, and CB algorithms. The Random Forest model stands out with the best performance on both the training and validation sets, achieving 99.98% accuracy during training, 99.70% for validation, and 99.85% during testing. This result demonstrates that the Random Forest model is highly effective for detecting DDoS attacks. When comparing these results to previous studies [7,41,42,43] both the Random Forest, Decision Tree, and CatBoost models developed in this work demonstrate improved performance and higher stability within the CICDDoS2019 dataset. While earlier research achieved strong results using similar algorithms, our models exhibit enhanced detection capability and better generalization, reflecting the effectiveness of the proposed feature optimization and model tuning approach. This improvement, although seemingly small, is significant in the context of cybersecurity applications like DDoS detection, where even marginal gains can contribute to reducing false negatives and ensuring a more reliable and resilient defense system. Additionally, the other models, such as Gradient Boosting and CatBoost, perform competitively, but they do not quite match Random Forest’s outstanding performance. CatBoost, for instance, delivers a solid accuracy of 99.63%, but it lags Random Forest’s ability to generalize well on the testing set. This demonstrates that our ML-based approach has made meaningful progress in enhancing the detection of DDoS attacks.

Figure 7 shows the confusion matrices for the training data, showing consistently strong performance across all cases, with high true positives and true negatives, and minimal misclassifications. Matrix 1 shows 12 false positives and 50 false negatives, indicating slightly more errors in identifying Class 1. Matrix 2 has near-perfect results with no false positives and only 1 false negative, showing high precision and recall. Matrix 3 introduces 15 false positives but only 2 false negatives, indicating a slight increase in misclassifying Class 0. Matrix 4 further improves with just 11 false positives and no false negatives, achieving perfect recall for Class 1. Lastly, Matrix 5 has 17 false positives and 2 false negatives, showing a small rise in misclassifying Class 0, but overall, all matrices reflect the model’s strong classification ability with minor variations in false positives and negatives.

Figure 8 utilizes confusion matrices for the validation data to show consistent and strong performance across all cases, with 605 true negatives and 710 or 731 true positives. Each matrix has only 2 false positives, indicating high precision for Class 0. The number of false negatives ranges from 24 in matrix 1 to just 3 in matrices 2 to 5, highlighting improved recall for Class 1 in the latter cases. Overall, the model generalizes well to the test data, maintaining high accuracy with very few misclassifications, particularly showing strong precision for Class 0 and improved recall for Class 1 across most matrices. Figure 8 shows confusion matrices for the test data of consistent and strong performance across the cases: 605 true negatives versus 710 or 731 true positives. Each of these matrices had only 2 false positives, which means Class 0 had very high precision. The false negatives, however, ranged from as high as 24 in matrix 1 to as low as 3 in matrices 2 to 5, which had improved recall of Class 1. Overall, the model predicts very well on the validation data, keeping the accuracy high and misclassification few, especially with a very strong precision for Class 0 and good recall for Class 1 throughout most of the matrices.

The overall results from the cross-validation show that the developed ML models have demonstrated a strong capability of classification of “Benign” and “DrDoS_DNS”. Concerning their performance, all methods have precisions, recalls, and F1-scores near 1.00 for each class, which characterizes a highly accurate detection with few misclassifications. The classification reports show a steady trend in macro and weighted averages, which stand at 1.00, indicating that performances are balanced out between classes. Figure 9 shows the confusion matrices that the models present one or two misclassifications, false positives, and false negatives.

The findings support the reliability of the models in identifying both benign and attack traffic, making them suitable for real-world DDoS detection with minimal chances of misclassification. The small deviations in performance demonstrate that the Random Forest algorithm-based model can serve as an effective method in providing high accuracy with coherence across test data.

The results demonstrate that the developed model provides noticeable improvements over previous studies. For Random Forest (RF), the model achieved an accuracy of 99.85%, compared to 99.11% in [7] and 97.23% in [44]. In the case of Decision Tree (DT), the model reached an accuracy of 99.78%, surpassing the 98.25% reported in [7]. CatBoost (CB) achieved 99.63% higher than the 99.1% reported in [11]. The developed model improved the accuracy of Random Forest by 0.74%, Decision Tree by 1.53% Linear regression model by 4.7% and CatBoost by 0.53% relative to previous studies. These improvements highlight the effectiveness of the developed model in achieving higher accuracy for determining DDoS attacks compared to prior works.

Because of the distinct parameters of the datasets, it is recommended that feature importance be worked on for predicting DDoS attacks to recommend generally applicable dataset collection. For future studies, we plan to implement the ML model on the KR260 FPGA, which offers significant benefits in terms of real-time detection and prevention of DDoS attacks. Additionally, in the future, I plan to measure the speed of this model on traditional systems and then implement the same model on the FPGA to compare the speed performance between the two. Future studies could also investigate how the hardware acceleration provided by an FPGA could enhance the performance of ML algorithms, focusing on optimizing execution time and resource utilization.

5.3. Evaluation of the Proposed DDoS Mitigation Strategy

Evaluation protocol (no leakage). To prevent optimistic bias from duplicated flows, we adopt GroupShuffleSplit using our flow_id as the grouping key, ensuring that each flow appears in either train or test, but never both. We further calibrate classifier probabilities with CalibratedClassifierCV to obtain well-calibrated suspicion scores before mapping them to FRL cut points.

Baseline detection (10 features). Under the group-split on the DNS test file, detection metrics were strong across models in Table 3. Briefly, Decision Tree achieved Accuracy ≈ 0.9985/F1 ≈ 0.9985; Random Forest delivered Accuracy ≈ 0.9978/F1 ≈ 0.9978; Gradient Boosting reached Accuracy ≈ 0.9970/F1 ≈ 0.9970; CatBoost obtained Accuracy ≈ 0.9963/F1 ≈ 0.9963, and Logistic Regression attained Accuracy ≈ 0.9739/F1 ≈ 0.9739. These baselines confirm that our ten-feature set remains highly discriminative under leakage-free evaluation and yields calibrated probabilities suitable for mitigation.

Adaptive mitigation (our contribution). Mapping calibrated suspicion into FRL and then into data-aware token bucket limits yields effective mitigation with minimal collateral impact. With strict-only enforcement (FRL ≥ 2), we observe TP = 804, FP = 1, TN = 719, FN = 25, giving Accuracy = 98.32%, Precision = 99.88%, Recall = 96.98%, F1 = 98.41%. When we also treat FRL = 1 as a lenient cap (“gentle throttling” rather than allow), we further improve coverage without increasing FP: TP = 808, FP = 1, TN = 719, FN = 21, i.e., Accuracy = 98.58%, Precision = 99.88%, Recall = 97.47%, F1 = 98.66%. This two-tier FRL to token bucket policy is lenient at FRL =1 and stringent for 2 ≤ FRL, captures borderline attacks while preserving benign throughput, because CIR/Burst at FRL = 1 is derived from benign quantiles. Operationally, the policy compiles standard SR/TR-TCM meters/policers (e.g., DPDK MTR), so deployment on commodity NICs/DPDK pipelines is straightforward.

In this study, we report both training time and inference time. Training refers to the process of building the model using the training dataset, which is typically the most computationally intensive phase. Inference, on the other hand, denotes the stage where the trained model is applied to new or unseen data to generate predictions. While training measures the cost of constructing the model, inference reflects its efficiency in real-time detection scenarios. Specifically, the measured training and inference times were as follows: Logistic Regression required 3.23 s and 0.017 s, Random Forest required 10.03 s and 0.24 s, Decision Tree required only 0.05 s and 0.004 s, Gradient Boosting required 4.89 s and 0.03 s, and CatBoost required 4.78 s and 0.20 s. These results confirm that the proposed framework maintains low computational overhead, supporting its suitability for real-time DDoS detection and future hardware deployment.

Ablations and diagnostics. (i) Using detector probabilities directly for FRL cut points (tuned on validation) outperforms fixed FBS-only cut points; (ii) enabling TRP reduces oscillation and improves benign preservation in bursty benign bursts; (iii) byte-weighted metrics corroborate that benign throughput is largely preserved under the two-tier policy.

Limitations and scope. Results above are reported for the DNS split of CIC-DDoS2019; cross-scenario validation (e.g., NTP/LDAP). While CIC-DDoS2019 is widely used and well-documented, real-time deployment should additionally account for path latency and token bucket placement in the data plane.

To further validate the effectiveness of our proposed approach, we provide a comparative analysis with previously reported studies that applied Random Forest, Logistic Regression, CatBoost, and Decision Tree algorithms to the CICDDoS2019 dataset. The results, summarized in Table 4, highlight the clear improvements achieved by our framework. The model developed in this study demonstrates significant improvements in both performance and computational complexity. For instance, the Random Forest (RF) model shows a performance increase of approximately 0.74% compared to the best-performing RF model with a known number of features from literature. This improvement is achieved while also reducing the number of features used to develop the model from 24 to just 10, which boosts computational efficiency. The Decision Tree (DT) and CatBoost (CB) models also show notable performance gains. The Linear Regression (LR) model exhibits a substantial performance increment of 4.7% while using only 10 features, down from 24. These results, summarized in Table 4, highlight the effectiveness of the proposed approach.

The overall findings of this study demonstrate that the proposed feature-optimized machine learning framework achieves superior performance in both DDoS detection and mitigation. The Random Forest model attained an accuracy of 99.85%, outperforming previous studies that achieved accuracy between 86% and 99.34%, representing an improvement of up to 13.85%. Similarly, the Decision Tree achieved an accuracy of 99.78%, surpassing earlier works with a range from 77% to 98.25%, marking an enhancement of up to 22.53%. These results confirm the stability and consistency of the proposed framework. The systematic feature selection combining correlation analysis and Random Forest feature importance effectively reduces redundancy, minimizes overfitting, and enhances generalization. Moreover, the framework demonstrates practical real-time applicability, achieving an inference latency of approximately 0.004 s per instance, and providing a balanced trade-off between accuracy and efficiency, making it a promising candidate for real-world DDoS detection and mitigation.

However, this study is limited to a controlled evaluation using the CICDDoS2019 dataset. Future work will focus on cross-dataset validation using benchmark datasets such as Bot-IoT, CICIDS2017, and UNSW-NB15 to assess generalization to diverse network environments and unseen attack patterns. Additionally, the framework will be implemented on FPGA hardware to evaluate real-time performance, throughput, and resource efficiency under realistic network conditions. Finally, integration with IDS/SIEM environments and programmable data planes will be explored to validate full end-to-end system performance and operational interoperability.

6. Conclusions

This research demonstrates the performance of ML techniques in detecting DDoS attacks, leveraging essential network traffic parameters identified within the CICDDoS2019 dataset, such as packet length, flow packet count, and inter-arrival time. Various ML algorithms, including Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, and CatBoost, were employed in model development, exhibiting strong performance in identifying normal and anomalous network traffic. Achieving an accuracy of 99.85% for Random Forest and Decision Tree for mitigation, the model proves its suitability for real-time DDoS attack detection and has the potential for integration into existing network security systems. The results underscore the significance of incorporating ML for automating and enhancing DDoS mitigation efforts. This study provides a comprehensive and reliable framework for real-time threat detection, offering a solution to address the evolving nature of DDoS attacks. Moreover, our adaptive two-tier mitigation converts calibrated detector scores into enforceable rate limits and, on the DNS split, achieves 98.58% accuracy and 98.66% F1 (TP = 808, FP = 1) without increasing false positives, demonstrating a deployable, low-collateral real-time defense.

Author Contributions

Conceptualization, A.J.I. and N.B.; methodology, A.J.I. and N.B.; software, N.B. and A.J.I.; validation, N.B. and A.J.I.; formal analysis, N.B. and A.J.I.; investigation, A.J.I. and N.B.; resources, A.J.I. and N.B.; data curation, A.J.I. and N.B.; writing—original draft preparation, A.J.I.; writing—review and editing, A.J.I., N.B. and S.R.R.; visualization, A.J.I. and N.B.; supervision, S.R.R. and N.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data supporting the findings of this study are publicly available. The CICDDoS2019 dataset used for training and evaluation can be accessed at https://www.unb.ca/cic/datasets/ddos-2019.html (accessed on 25 October 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Figure A1. The Distribution of input parameters.

Table A1. Classification of parameters by importance in DDoS attack analysis.

Selected Features (Important Parameters)	Unimportant Parameters
Packet Length Min	Flow Duration
Total Backward Packets	Fwd Packets Length Total
Avg Fwd Segment Size	Bwd Packets Length Total
Flow Bytes/s	Fwd Packet Length Max
Avg Packet Size	Fwd Packet Length Min
Protocol	Fwd Packet Length Mean
Flow IAT Std	Fwd Packet Length Std
Subflow Bwd Packets	Bwd Packet Length Max
Flow IAT Mean	Bwd Packet Length Min
Packet Length Mean	Bwd Packet Length Mean
	Bwd Packet Length Std
	Fwd PSH Flags
	Bwd PSH Flags
	Fwd URG Flags
	Bwd URG Flags
	Fwd Header Length
	Bwd Header Length
	FIN Flag Count
	PSH Flag Count
	ACK Flag Count
	ECE Flag Count
	Fwd Avg Bytes/Bulk
	Fwd Avg Packets/Bulk
	Fwd Avg Bulk Rate
	Bwd Avg Bytes/Bulk
	Bwd Avg Packets/Bulk
	Bwd Avg Bulk Rate
	Flow IAT Max
	Flow IAT Min
	Fwd IAT Total
	Fwd IAT Mean
	Fwd IAT Std
	Fwd IAT Max
	Fwd IAT Min
	Bwd IAT Total
	Bwd IAT Mean
	Bwd IAT Std
	Bwd IAT Max
	Bwd IAT Min
	Fwd Packets/s
	Bwd Packets/s
	Packet Length Min
	Packet Length Max
	Packet Length Std
	Packet Length Variance
	FIN Flag Count
	SYN Flag Count
	RST Flag Count
	PSH Flag Count
	ACK Flag Count
	URG Flag Count
	CWE Flag Count
	ECE Flag Count
	Down/Up Ratio
	Subflow Bwd Bytes
	Init Fwd Win Bytes
	Init Bwd Win Bytes
	Fwd Act Data Packets
	Fwd Seg Size Min
	Active Mean
	Active Std
	Active Max
	Active Min
	Idle Mean
	Idle Std
	Idle Max
	Idle Min
	Label
	Avg Bwd Segment Size
	Subflow Fwd Packets
	Subflow Fwd Bytes
	Total Fwd Packets
	Flow Packets/s

References

Amrish, R.; Bavapriyan, K.; Gopinaath, V.; Jawahar, A.; Kumar, C.V. DDoS Detection using Machine Learning Techniques. J. IoT Soc. Mob. Anal. Cloud 2022, 4, 24–32. [Google Scholar] [CrossRef]
Karaca, O.; Sokullu, R.; Prasad, N.R.; Prasad, R. Application Oriented Multi Criteria Optimization in WSNs Using on AHP. Wirel. Pers. Commun. 2012, 65, 689–712. [Google Scholar] [CrossRef]
Ahmad, Z.; Khan, A.S.; Shiang, C.W.; Abdullah, J.; Ahmad, F. Network intrusion detection system: A systematic study of machine learning and deep learning approaches. Trans. Emerg. Telecommun. Technol. 2021, 32, e4150. [Google Scholar] [CrossRef]
Bhayo, J.; Shah, S.A.; Hameed, S.; Ahmed, A.; Nasir, J.; Draheim, D. Towards a machine learning-based framework for DDOS attack detection in software-defined IoT (SD-IoT) networks. Eng. Appl. Artif. Intell. 2023, 123, 106432. [Google Scholar] [CrossRef]
Pranggono, B.; Arabo, A. COVID-19 pandemic cybersecurity issues. Internet Technol. Lett. 2021, 4, e247. [Google Scholar] [CrossRef]
Mittal, M.; Kumar, K.; Behal, S. Deep learning approaches for detecting DDoS attacks: A systematic review. Soft Comput. 2023, 27, 13039–13075. [Google Scholar] [CrossRef] [PubMed]
Saluja, K.; Bagchi, S.; Solanki, V.; Khan, M.N.A.; Dhamija, E.; Debnath, S.K. Exploring Robust DDoS Detection: A Machine Learning Analysis with the CICDDoS2019 Dataset. In Proceedings of the 2024 IEEE 5th India Council International Subsections Conference (INDISCON), Chandigarh, India, 22–24 August 2024; pp. 1–6. [Google Scholar] [CrossRef]
Siddiqui, S.; Hameed, S.; Shah, S.A.; Ahmad, I.; Aneiba, A.; Draheim, D.; Dustdar, S. Toward Software-Defined Networking-Based IoT Frameworks: A Systematic Literature Review, Taxonomy, Open Challenges and Prospects. IEEE Access 2022, 10, 70850–70901. [Google Scholar] [CrossRef]
Agrawal, S.; Sarkar, S.; Alazab, M.; Maddikunta, P.K.R.; Gadekallu, T.R.; Pham, Q.-V. Genetic CFL: Hyperparameter Optimization in Clustered Federated Learning. Comput. Intell. Neurosci. 2021, 2021, 7156420. [Google Scholar] [CrossRef]
Prima, F.; Dylan, L.; Gunawan, A.A.S. Comparison of Machine Learning Models for Classification of DDoS Attacks. In Proceedings of the 2023 5th International Conference on Cybernetics and Intelligent System (ICORIS), Pangkapinang, Indonesia, 6–7 October 2023; pp. 1–6. [Google Scholar] [CrossRef]
Parfenov, D.; Kuznetsova, L.; Yanishevskaya, N.; Bolodurina, I.; Zhigalov, A.; Legashev, L. Research Application of Ensemble Machine Learning Methods to the Problem of Multiclass Classification of DDoS Attacks Identification. In Proceedings of the 2020 International Conference Engineering and Telecommunication (En&T), Dolgoprudny, Russia, 25–26 November 2020; pp. 1–7. [Google Scholar] [CrossRef]
Mohmand, M.I.; Hussain, H.; Khan, A.A.; Ullah, U.; Zakarya, M.; Ahmed, A.; Raza, M.; Rahman, I.U.; Haleem, M. A Machine Learning-Based Classification and Prediction Technique for DDoS Attacks. IEEE Access 2022, 10, 21443–21454. [Google Scholar] [CrossRef]
Nagpal, B.; Sharma, P.; Chauhan, N.; Panesar, A. DDoS tools: Classification, analysis and comparison. In Proceedings of the 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 11–13 March 2015; pp. 342–346. Available online: https://ieeexplore.ieee.org/document/7100270 (accessed on 25 October 2024).
Dasari, S.; Kaluri, R. An Effective Classification of DDoS Attacks in a Distributed Network by Adopting Hierarchical Machine Learning and Hyperparameters Optimization Techniques. IEEE Access 2024, 12, 10834–10845. [Google Scholar] [CrossRef]
Salmi, S.; Oughdir, L. Performance evaluation of deep learning techniques for DoS attacks detection in wireless sensor network. J. Big Data 2023, 10, 17. [Google Scholar] [CrossRef]
Chovanec, M.; Hasin, M.; Havrilla, M.; Chovancová, E. Detection of HTTP DDoS Attacks Using NFStream and TensorFlow. Appl. Sci. 2023, 13, 6671. [Google Scholar] [CrossRef]
Mishra, A.; Gupta, B.B.; Perakovic, D.; Penalvo, F.J.G.; Hsu, C.-H. Classification Based Machine Learning for Detection of DDoS attack in Cloud Computing. In Proceedings of the 2021 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 10–12 January 2021; pp. 1–4. [Google Scholar] [CrossRef]
Tan, L.; Pan, Y.; Wu, J.; Zhou, J.; Jiang, H.; Deng, Y. A New Framework for DDoS Attack Detection and Defense in SDN Environment. IEEE Access 2020, 8, 161908–161919. Available online: https://ieeexplore.ieee.org/abstract/document/9186014 (accessed on 25 October 2024). [CrossRef]
Sangodoyin, A.O.; Akinsolu, M.O.; Pillai, P.; Grout, V. Detection and Classification of DDoS Flooding Attacks on Software-Defined Networks: A Case Study for the Application of Machine Learning. IEEE Access 2021, 9, 122495–122508. [Google Scholar] [CrossRef]
Peng, S.; Tian, J.; Zheng, X.; Chen, S.; Shu, Z. DDoS Defense Strategy Based on Blockchain and Unsupervised Learning Techniques in SDN. Future Internet 2025, 17, 367. [Google Scholar] [CrossRef]
Karpowicz, M.P. Adaptive tuning of network traffic policing mechanisms for DDoS attack mitigation systems. Eur. J. Control 2021, 61, 101–118. [Google Scholar] [CrossRef]
Swami, R.; Dave, M.; Ranga, V. IQR-based approach for DDoS detection and mitigation in SDN. Def. Technol. 2023, 25, 76–87. [Google Scholar] [CrossRef]
Alashhab, A.A.; Zahid, M.S.; Isyaku, B.; Elnour, A.A.; Nagmeldin, W.; Abdelmaboud, A.; Abdullah, T.A.A.; Maiwada, U.D. Enhancing DDoS Attack Detection and Mitigation in SDN Using an Ensemble Online Machine Learning Model. IEEE Access 2024, 12, 51630–51649. [Google Scholar] [CrossRef]
Hamarshe, A.; Ashqar, H.I.; Hamarsheh, M. Detection of DDoS Attacks in Software Defined Networking Using Machine Learning Models. arXiv 2023, arXiv:2303.06513. [Google Scholar] [CrossRef]
Elsayed, M.S.; Le-Khac, N.-A.; Dev, S.; Jurcut, A.D. DDoSNet: A Deep-Learning Model for Detecting Network Attacks. arXiv 2020, arXiv:2006.13981. [Google Scholar] [CrossRef]
Khan, A.A.R.; Nisha, S.S. Efficient hybrid optimization based feature selection and classification on high dimensional dataset. Multimed. Tools Appl. 2023, 83, 58689–58727. [Google Scholar] [CrossRef]
NSF NHERI DesignSafe|DesignSafe-CI. Available online: https://www.designsafe-ci.org/ (accessed on 26 October 2024).
DDoS 2019|Datasets|Research|Canadian Institute for Cybersecurity|UNB. Available online: https://www.unb.ca/cic/datasets/ddos-2019.html (accessed on 25 October 2024).
Kamaldeep; Malik, M.; Dutta, M. Feature Engineering and Machine Learning Framework for DDoS Attack Detection in the Standardized Internet of Things. IEEE Internet Things J. 2023, 10, 8658–8669. [Google Scholar] [CrossRef]
Raju, V.N.G.; Lakshmi, K.P.; Jain, V.M.; Kalidindi, A.; Padma, V. Study the Influence of Normalization/Transformation process on the Accuracy of Supervised Classification. In Proceedings of the 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), Tirunelveli, India, 20–22 August 2020; pp. 729–735. [Google Scholar] [CrossRef]
Ullah, F.; Babar, M.A. On the scalability of Big Data Cyber Security Analytics systems. J. Netw. Comput. Appl. 2022, 198, 103294. [Google Scholar] [CrossRef]
Medar, R.; Rajpurohit, V.S.; Rashmi, B. Impact of Training and Testing Data Splits on Accuracy of Time Series Forecasting in Machine Learning. In Proceedings of the 2017 International Conference on Computing, Communication, Control and Automation (ICCUBEA), Pune, India, 7–18 August 2017; pp. 1–6. [Google Scholar] [CrossRef]
Zou, X.; Hu, Y.; Tian, Z.; Shen, K. Logistic Regression Model Optimization and Case Analysis. In Proceedings of the 2019 IEEE 7th International Conference on Computer Science and Network Technology (ICCSNT), Dalian, China, 19–20 October 2019; pp. 135–139. [Google Scholar] [CrossRef]
Kim, M.; Song, Y.; Wang, S.; Xia, Y.; Jiang, X. Secure Logistic Regression Based on Homomorphic Encryption: Design and Evaluation. JMIR Med. Inform. 2018, 6, e19. [Google Scholar] [CrossRef]
Hai, T.; Zhou, J.; Adetiloye, O.A.; Zadeh, S.A.; Yin, Y.; Iwendi, C. DDoS Attack Prediction Using Decision Tree and Random Forest Algorithms. In Proceedings of ICACTCE’23—The International Conference on Advances in Communication Technology and Computer Engineering, Bolton, UK, 24–25 February 2023; Iwendi, C., Boulouard, Z., Kryvinska, N., Eds.; Springer Nature: Cham, Switzerland, 2023; pp. 37–46. [Google Scholar] [CrossRef]
Breiman, L. Random Forest. January 2001. Available online: https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf (accessed on 1 January 2025).
Sharma, P.; Singh, P.; Kumar, C.N.S.V. Web Guardian: Harnessing Web Mining to Combat Online Terrorism. In Proceedings of the 2024 International Conference on Signal Processing, Computation, Electronics, Power and Telecommunication (IConSCEPT), Karaikal, India, 4–5 July 2024; pp. 1–5. [Google Scholar] [CrossRef]
Hajjouz, A.; Avksentieva, E. Evaluating the Effectiveness of the CatBoost Classifier in Distinguishing Benign Traffic, FTP BruteForce and SSH BruteForce Traffic. In Proceedings of the 2024 9th International Conference on Signal and Image Processing (ICSIP), Nanjing, China, 12–14 July 2024; pp. 351–358. [Google Scholar] [CrossRef]
Saleem, M.; Azam, M.; Mubeen, Z.; Mumtaz, G. Machine Learning for Improved Threat Detection: LightGBM vs. CatBoost. J. Comput. Biomed. Inform. Jun 2024, 7, 571–580. [Google Scholar]
Samat, A.; Li, E.; Du, P.; Liu, S.; Xia, J. GPU-Accelerated CatBoost-Forest for Hyperspectral Image Classification Via Parallelized mRMR Ensemble Subspace Feature Selection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 3200–3214. [Google Scholar] [CrossRef]
Ramzan, M.; Shoaib, M.; Altaf, A.; Arshad, S.; Iqbal, F.; Castilla, Á.K.; Ashraf, I. Distributed Denial of Service Attack Detection in Network Traffic Using Deep Learning Algorithm. Sensors 2023, 23, 8642. [Google Scholar] [CrossRef] [PubMed]
Chaira, M.; Belhenniche, A.; Chertovskih, R. Enhancing DDoS Attacks Mitigation Using Machine Learning and Blockchain-Based Mobile Edge Computing in IoT. Computation 2025, 13, 158. [Google Scholar] [CrossRef]
Raza, M.S.; Sheikh, M.N.A.; Hwang, I.-S.; Ab-Rahman, M.S. Feature-Selection-Based DDoS Attack Detection Using AI Algorithms. Telecom 2024, 5, 333–346. [Google Scholar] [CrossRef]
Chu, T.S.; Si, W.; Simoff, S.; Nguyen, Q.V. A Machine Learning Classification Model Using Random Forest for Detecting DDoS Attacks. In Proceedings of the 2022 International Symposium on Networks, Computers and Communications (ISNCC), Shenzhen, China, 19–22 July 2022; pp. 1–7. [Google Scholar] [CrossRef]

Figure 1. Conceptual framework of the proposed approach for enhancing the detection of DDoS attacks using machine learning.

Figure 2. Standard DDoS attack architecture of a Distributed Denial of Service (DDoS) attack.

Figure 3. Machine learning flowchart.

Figure 4. Correlation matrix of input parameters.

Figure 5. The data splitting for the Developed Machine Learning model.

Figure 6. t-SNE algorithm-based dimensionality reduction in the data to two dimensions.

Figure 7. Train confusion matrices of the developed machine learning models.

Figure 8. Confusion matrices for model Validation of the developed machine learning models.

Figure 9. Confusion matrices of the developed machine learning models for test.

Table 1. Mapping of Flow Response Levels (FRL) to Token Bucket policing parameters (CIR and Burst, expressed in packets per second and Mbps) for enforcing adaptive DDoS mitigation.

Flow_ID (Behavioral Key)	FRL	CIR_pps	Burst	CIR_Mbps
17	(−0.001, 412.04]	(29,883.18, 5,353,271.0]	(38,203.44, 15,491,469.0]	3
17	(1.0824 × 10⁹, 2.656 × 10⁹]	(0.999, 2.0]	(−0.001, 59.507]	3
17	(1.350 × 10⁷, 6.264 × 10⁷]	(2.0, 7.0]	(−0.001, 59.507]	3
17	(2.552 × 10⁴, 1.350 × 10⁷]	(182.567, 6877.667]	(59.507, 11,895.958]	3

Table 2. Train, validation, and test performance of the developed DDoS attack machine learning exhibits.

	Accuracy			Precision			Recall			F1-Score
	Train	Valid	Test	Train	Valid	Test	Train	Valid	Test	Train	Valid	Test
LR	99.73	99.70	99.70	99.73	99.70	99.70	99.73	99.70	99.70	99.73	99.70	99.70
RF	99.98	99.70	99.85	99.98	99.70	99.85	99.98	99.70	99.85	99.98	99.70	99.85
DT	100	99.78	99.78	100	99.78	99.77	100	99.78	99.77	100	99.78	99.77
GB	99.88	99.78	99.63	99.88	99.78	99.62	99.88	99.78	99.62	99.88	99.78	99.62
CB	99.75	99.70	99.63	99.75	99.70	99.62	99.75	99.70	99.62	99.75	99.70	99.62

Table 3. Train, validation, and test performance of the developed Mitigation DDoS attack machine learning models.

	Accuracy			Precision			Recall			F1-Score
	Train	Valid	Test	Train	Valid	Test	Train	Valid	Test	Train	Valid	Test
LR	97.65	97.93	97.39	97.71	97.97	97.48	97.65	97.93	97.39	97.65	97.93	97.39
RF	100	99.89	99.78	100	99.89	99.78	100	99.89	99.78	100	99.89	99.78
DT	100	99.72	99.85	100	99.72	99.85	100	9972	99.85	100	99.72	99.85
GB	100	99.89	99.70	100	99.89	99.70	100	99.89	99.70	100	99.89	99.70
CB	99.97	99.94	99.63	99.97	99.94	99.63	99.97	99.94	99.63	99.97	99.94	99.63

Table 4. The Compression results are compared with other articles that used CICDDoS2019.

Algorithms	Accuracy	Precision	Recall	F1-Score	Notes
RF [7]	99.11	99	99.23	99.11	24 Features
RF [41]	86	78	70	73	-
RF [42]	99.62	99.34	98.11	98.72	13 Features
RF [43]	99	99	97	99	16 features
Our proposed RF	99.85	99.85	99.85	99.85	10 Features
DT [7]	98.25	97.55	98.25	97.89	24 Features
DT [41]	77	92	60	40	-
DT [42]	96.80	95.10	99.67	97.33	13 Features
DT [43]	91	91	91	94	16 features
Our proposed DT	99.78	99.77	99.77	99.77	10 Features
CB [11]	-	96.7	97	96.8	24 Features
Our proposed CB	99.63	99.62	99.62	99.62	10 Features
LR [41]	95	86	11	19	-
Our proposed LR	99.70	99.70	99.70	99.70	10 Features

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ibrahim, A.J.; Répás, S.R.; Bektaş, N. Feature-Optimized Machine Learning Approaches for Enhanced DDoS Attack Detection and Mitigation. Computers 2025, 14, 472. https://doi.org/10.3390/computers14110472

AMA Style

Ibrahim AJ, Répás SR, Bektaş N. Feature-Optimized Machine Learning Approaches for Enhanced DDoS Attack Detection and Mitigation. Computers. 2025; 14(11):472. https://doi.org/10.3390/computers14110472

Chicago/Turabian Style

Ibrahim, Ahmed Jamal, Sándor R. Répás, and Nurullah Bektaş. 2025. "Feature-Optimized Machine Learning Approaches for Enhanced DDoS Attack Detection and Mitigation" Computers 14, no. 11: 472. https://doi.org/10.3390/computers14110472

APA Style

Ibrahim, A. J., Répás, S. R., & Bektaş, N. (2025). Feature-Optimized Machine Learning Approaches for Enhanced DDoS Attack Detection and Mitigation. Computers, 14(11), 472. https://doi.org/10.3390/computers14110472

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Feature-Optimized Machine Learning Approaches for Enhanced DDoS Attack Detection and Mitigation

Abstract

1. Introduction

2. Literature Review

3. Proposed Framework

4. Methodology

4.1. Detection Methodology

4.2. DDoS Attack Detection Approach

4.2.1. Data Collection

4.2.2. Data Scaling

4.2.3. Data Splitting

4.2.4. Machine Learning Algorithms

4.2.5. Model Performance Evaluation Metrics

4.2.6. Dimensionality Reduction

4.3. Adaptive Mitigation Scenario

Mapping FRL to Enforceable Policy (Token Bucket)

5. Results and Discussion

5.1. Exclusion of Non-Contributing Features

5.2. Performance of Machine Learning Models

5.3. Evaluation of the Proposed DDoS Mitigation Strategy

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI