Enhancing Machine Learning-Based DDoS Detection Through Hyperparameter Optimization

Chen, Shao-Rui; Chen, Shiang-Jiun; Hsieh, Wen-Bin

doi:10.3390/electronics14163319

Open AccessEditor’s ChoiceArticle

Enhancing Machine Learning-Based DDoS Detection Through Hyperparameter Optimization

by

Shao-Rui Chen

¹,

Shiang-Jiun Chen

^1,*

and

Wen-Bin Hsieh

^2,*

¹

Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei 10608, Taiwan

²

Department of Green Energy and Information Technology, National Taitung University, Taitung 950017, Taiwan

^*

Authors to whom correspondence should be addressed.

Electronics 2025, 14(16), 3319; https://doi.org/10.3390/electronics14163319

Submission received: 1 July 2025 / Revised: 6 August 2025 / Accepted: 20 August 2025 / Published: 20 August 2025

(This article belongs to the Special Issue Advancements in AI-Driven Cybersecurity and Securing AI Systems)

Download

Browse Figures

Versions Notes

Abstract

In recent years, the occurrence and complexity of Distributed Denial of Service (DDoS) attacks have escalated significantly, posing threats to the availability, performance, and security of networked systems. With the rapid progression of Artificial Intelligence (AI) and Machine Learning (ML) technologies, attackers can leverage intelligent tools to automate and amplify DDoS attacks with minimal human intervention. The increasing sophistication of such attacks highlights the pressing need for more robust and precise detection methodologies. This research proposes a method to enhance the effectiveness of ML models in detecting DDoS attacks based on hyperparameter tuning. By optimizing model parameters, the proposed approach is going to enhance the performance of ML models in identifying DDoS attacks. The CIC-DDoS2019 dataset is utilized in this study as it offers a comprehensive set of real-world DDoS attack scenarios across various protocols and services. The proposed methodology comprises key stages, including data preprocessing, data splitting, and model training, validation, and testing. Three ML models are trained and tuned using an adaptive GridSearchCV (Cross Validation) strategy to identify optimal parameter configurations. The results demonstrate that our method significantly improves performance and efficiency compared with the general GridSearchCV. The SVM model achieves 99.87% testing accuracy and requires approximately 28% less execution time than the general GridSearchCV. The LR model achieves 99.6830% testing accuracy with an execution time of 16.90 s, maintaining the same testing accuracy but reducing the execution time by about 22.8%. The KNN model achieves 99.8395% testing accuracy and 2388.89 s of execution time, also preserving accuracy while decreasing the execution time by approximately 63%. These results indicate that our approach enhances DDoS detection performance and efficiency, offering novel insights into the practical application of hyperparameter tuning for improving ML model performance in real-world scenarios.

Keywords:

distributed denial of service attacks; machine learning; hyperparameter tuning; CIC DDoS2019 dataset; cybersecurity

1. Introduction

In recent years, the field of cybersecurity [1] has increasingly concentrated on the detection and mitigation of Distributed Denial of Service (DDoS) [2] attacks, which posed a significant threat to network availability and performance. This research stems from the growing prevalence of DDoS attacks, which can disrupt services and lead to gigantic financial losses to organizations. With the evolution of technology, the landscape of DDoS attacks has undergone significant changes. Historically, launching a DDoS attack often requires manual intervention and considerable technical expertise. Attacks would need to orchestrate their efforts by coordinating multiple compromised systems, which limits the scale and frequency of such attacks. However, with the swift advancements in Artificial Intelligence (AI) [3] and Machine Learning (ML) [4], the nature of DDoS attacks has shifted dramatically. In the present era, AI-driven tools enable attackers to automate the process of launching DDoS attacks, making them more efficient and more challenging to detect. These sophisticated algorithms can analyze vulnerabilities in target systems and deploy large-scale attacks with minimal human involvement. As the occurrence and complexity of DDoS attacks continue to evolve, the necessity for more effective and efficient detection approaches has become more critical. The ability to execute complex attack strategies on a scale has made it essential for cybersecurity professionals to develop more robust defenses against these evolving threats.

To advance the field of DDoS detection, numerous researchers have contributed significantly by leveraging machine learning (ML) techniques. For example, Peneti et al. [5] employed a combination of ensemble learning methods to detect DDoS attacks, using various models such as Random Forest (RF), AdaBoost, XGBoost, and Multi-Layer Perceptron (MLP) to classify the network requests as normal or abnormal. By combining feature selection and ML techniques, the authors were able to develop a high-performing Intrusion Detection System (IDS) that enhances the capability to detect DDoS attacks. Similarly, AI-Eryani et al. [6] evaluated the performance of various ML algorithms on the CIC-DDoS2019 dataset, focusing particularly on the Gradient Boosting (GB) and XGBoost. They found that these algorithms achieve high accuracy with low false positive rates. The result showed that the accuracy of GB and XGBoost reached 99.99% and 99.98%, respectively. The study also emphasized the importance of continuously updating and improving detection technologies to maintain the ability to mitigate the risk of novel DDoS attacks. In the domain of deep learning, Alfatemi et al. [7] proposed an advanced DDoS detection framework that integrates Deep Residual Neural Networks (ResNets) with SMOTE (Synthetic Minority Oversampling Technique) to address class imbalance. Their method achieved impressive performance (~99.98% accuracy), demonstrating the effectiveness of combining deep learning with data balancing techniques for intrusion detection. Panggabean et al. [8] introduced a hybrid model combining Gated Recurrent Units (GRU) with a Neural Turing Machine (NTM) to enhance sequential modeling of network traffic. This approach captures long-term dependencies and context in packet flows, making it particularly effective for detecting stealthy or evolving DoS/DDoS patterns. From the perspective of algorithm benchmarking, Haque et al. [9] conducted a comparative analysis of traditional ML algorithms—Random Forest (RF), XGBoost, KNN, SVC, and MLP—for DDoS detection in Software-Defined Networking (SDN) environments. Their results showed that, when properly tuned, classical ML models can reach accuracies of up to 99.99%, underscoring their continued relevance in modern detection systems. Furthermore, de Melo et al. [10] proposed Anomaly-Flow, a federated learning framework using Generative Adversarial Networks (GANs) to detect DDoS attacks across multiple domains while preserving data privacy. This approach is particularly suited for distributed systems where centralizing traffic data is impractical or raises privacy concerns. Collectively, these studies represent a wide spectrum of methodologies, ranging from advanced deep learning approaches to optimized classical ML models and privacy-aware distributed frameworks. However, few have explicitly focused on the systematic and efficient hyperparameter optimization of lightweight ML models—a crucial factor for ensuring both high accuracy and computational efficiency in real-world, resource-constrained scenarios. To address this gap, this study proposes strengthening the capability of machine learning models for detecting DDoS attacks through hyperparameter tuning, aiming to enhance their detection performance.

Hyperparameters are configuration parameters external to the model that are not learned from the data but are instead set prior to the training process, such as the kernel type in Support Vector Machines (SVM) or the number of trees in a Random Forest. Hyperparameter tuning refers to the systematic search for the optimal combination of hyperparameters that maximizes model performance, typically evaluated using metrics such as accuracy, F1-score, or AUC on a validation dataset. Existing studies on hyperparameter tuning face several challenges, including high computational cost, overfitting to the validation set, and biases arising from limited or poorly designed search spaces. This research aims to enhance the performance of DDoS detection by optimizing the hyperparameters of machine learning models. To this end, an adaptive GridSearchCV approach is proposed, which directly addresses the limitations and demonstrates improved detection effectiveness. The proposed approach will be evaluated by using the CIC-DDoS2019 dataset, which has a diverse array of network traffic data. The results obtained from this research will furnish important insights into the utility of our approach and its potential to bolster DDoS attack detection. The contribution of this paper is summarized as follows.

We propose an Adaptive GridSearchCV method for hyperparameter tuning that improves both detection accuracy and computational efficiency, addressing the limitations of traditional exhaustive grid search in DDoS detection tasks.
We empirically validate our approach using the CIC-DDoS2019 dataset, demonstrating that the Adaptive GridSearchCV achieves comparable or superior classification performance while significantly reducing execution time across multiple ML models.
Our results reveal that the adaptive hyperparameter tuning enables ML models to generalize robustly, with minimal performance drop between training, validation, and testing phases—highlighting their reliability for practical deployment in evolving DDoS detection environments.

The remaining chapters are organized as follows: Section 2 provides related literature on DDoS attacks, AI, use of ML models, loss function, and hyperparameter tuning for DDoS detection. Section 3 indicates the methodology used in this research, including the dataset used and our approach for optimizing the hyperparameters of ML models. The experiment environment is described in Section 4. Section 5 details the results and discusses the findings. Finally, Section 6 analyzes the implications of our results and proposes future research directions along with improvements to DDoS defense approaches.

2. Related Work

Faced with increasingly complicated and recurring DDoS attacks, traditional defense mechanisms often fall short in effectively addressing these threats. The combination of AI and ML techniques has emerged as a promising solution, offering a more adaptive and stronger scheme. This section introduces the present research on AI, ML, DDoS attacks, and Hyperparameter Tuning that are related to the research topic.

2.1. Artificial Intelligence (AI)-Driven Defense Mechanisms

The rising complexity of DDoS threats has driven the development of advanced defense strategies, particularly those leveraging AI and statistical techniques. Khalaf et al. [11] provided a comprehensive review of DDoS attack and defense methodologies, highlighting the limitations of traditional approaches and stressing the need for dynamic analysis of attack patterns. Their study serves as a valuable reference for AI- and statistics-based mitigation mechanisms. In [12], AI applications in intrusion detection are explored, focusing on enhancing system effectiveness through feature selection and data analysis. By filtering irrelevant data, applying clustering techniques, and selecting key features, the study demonstrates improved classification accuracy and faster, real-time intrusion detection performance. In addition, Ahmad et al. [13] introduced XGRU-IDS, an explainable deep learning-based intrusion detection framework for industrial IoT environments. By combining an Extra Trees Classifier (ETC) for feature selection, Gated Recurrent Units (GRU) for temporal modeling, and SHAP for model interpretability, the system achieves a balance of accuracy, efficiency, and transparency. XGRU-IDS reached a 97.56% accuracy on the CICIoT2023 dataset, effectively detecting 34 types of attacks while providing interpretable insights to support real-world deployment in critical infrastructure.

With the growing threat of DDoS attacks, AI and statistical methods have become essential tools for detection and mitigation. Suhag and Daniel [14] reviewed a wide range of AI- and statistics-based defense mechanisms, including techniques such as regression analysis, statistical modeling, and comparative analysis, demonstrating their effectiveness in identifying DDoS patterns. Antoni Jaszcz and Dawid Polap [15] proposed the Artificial Intelligence Merged Methods (AIMM) framework, which integrates KNN and ANN classifiers in a three-stage process—data preprocessing, classification, and decision-making. The AIMM framework achieved a high detection accuracy of 99.5% on the BOUN dataset, validating its potential for real-time DDoS detection.

Nachaat Mohamed [16] conducted a comprehensive review of AI-based cybersecurity methods, categorizing them into machine learning and deep learning approaches such as SVM and Random Forest. The study evaluated their effectiveness in DDoS detection and highlighted how AI integration with traditional security protocols enhances defense capabilities. Experimental results showed that AI-driven systems outperform traditional methods in both accuracy and response time, offering valuable insights for practical cybersecurity applications across various fields.

2.2. Machine Learning (ML)-Based Detection Mechanisms

Parvinder Singh Saini et al. [17] applied four classifiers (J48, MLP, RF, NB) using the WEKA tool on a 27-feature DDoS dataset, achieving the highest accuracy of 98.64% with J48. The dataset categorized traffic into five classes, including normal and various DDoS types. In [18], a machine learning-based approach was used to detect DDoS attacks in a simulated Software-Defined Networking (SDN) environment using Mininet and POX. The study tested SVM, MLP, Decision Tree, and RF classifiers to identify three attack types—flooding, controller, and bandwidth attacks—demonstrating effective DDoS detection in SDN contexts.

Mahmood. A. AI. Shareeda et al. [19] reviewed various ML techniques, including Naive Bayes, SVM, Decision Tree, Artificial Neural Networks (ANN), etc. Each technique was evaluated based on its capability to classify and detect DDoS attack effectively. Furthermore, the author explored Deep Learning (DL) approaches that offer enhanced capabilities for feature extraction and classification in complex network environments. The author emphasized that the choice of the appropriate methodologies varies according to particular use cases, available resources, and the requirements for accuracy. And the result showed the effectiveness of different ML and DL techniques for DDoS 5 detection.

Mona Alduailij et al. [20] proposed a machine learning–based method for DDoS detection in cloud environments, using Mutual Information and Random Forest Feature Importance for feature selection. They tested several ML algorithms—RF, GB, WVE, KNN, and LR—on a reduced feature set, achieving up to 99% accuracy, with RF misclassifying only one instance. In [21], Murk Marvi et al. introduced a generalized gradient boosting model using the CIC-DDoS2019 dataset, focusing on detecting unseen DDoS attacks in near real-time. They applied feature engineering techniques such as EDA and Data Type Optimization to improve performance and efficiency, achieving 99.9% accuracy. The model was also benchmarked against deep learning models using the CICDS2017 dataset. Saravanan and Balasubramanian [22] proposed a scalable data pipeline for attack classification in large-scale IoT networks using Apache Kafka, Spark, and MongoDB. Their system adapts to concept drift by retraining with both recent and historical data, achieving 99.46% accuracy on the IoT23 dataset. While their focus is on stream processing and adaptive learning, our work emphasizes efficient hyperparameter tuning for lightweight ML models, making it better suited for resource-constrained environments.

2.3. Distributed Denial of Service Attack (DDoS Attack)

Jai Dalvi et al. [23] developed a DDoS detection framework using Artificial Neural Networks combined with a Mutual Information Classifier for feature selection. Their model preprocesses data from an open-source dataset, splitting it 70/30 for training and testing. The approach reduced time complexity by nearly half and slightly improved accuracy from 89.6% to 89.62%. Latha Retal [24] proposed a procedural framework using Naïve Bayes and Logistic Regression classifiers on the KDDCup dataset with 41 features. They divided data into training, testing, and validation sets and evaluated model performance based on throughput and threshold metrics, comparing the two algorithms for DDoS detection effectiveness. Ahamed Aljuhani [25] presented a comprehensive analysis of ML approaches for integrating DDoS attacks in present networking environments. It discussed various ML techniques employed in DDoS defense systems, particularly in Cloud Computing, SDN, and Internet of Things (IoT) environments. It also categorized recent research findings, assessed the effectiveness of different ML models, and identified challenges for future research in developing robust defense mechanisms against DDoS threats.

Firooz B. Saghezchi et al. [26] developed a DDoS detection system for Industry 4.0 Cyber Physical Production Systems using statistical network flow features extracted via NetMate. After removing irrelevant attributes, they found that supervised algorithms, especially Decision Trees, performed best, achieving 99.9% accuracy with a 1% false positive rate. Abdullah Emir Cil et al. [27] proposed a feed-forward Deep Neural Network (DNN) framework trained on the CIC-DDoS2019 dataset. Their multi-layer DNN architecture combined feature extraction and classification, optimized with AdaMax and binary cross-entropy loss, achieving 99.97% accuracy and 99.99% precision in DDoS detection and classification.

In a recent study, Ma and Su [28] proposed a decentralized DDoS defense framework specifically designed for Software-Defined AIoT (SD-AIoT) environments. Their approach combines federated learning with a semi-supervised Autoencoder-MLP model to detect attacks while preserving data privacy. Secure Multiparty Computation is used for privacy-preserving model aggregation, and programmable P4 switches at the network edge support real-time, distributed mitigation through rate-limiting and path pushback. Experimental evaluation on CICIDS2017 and InSDN datasets demonstrated high detection accuracy, minimal performance overhead, and rapid recovery from attacks, positioning the framework as a state-of-the-art solution for collaborative and privacy-aware DDoS defense.

2.4. Hyperparameter Tuning

Odnan Ref Sanchez et al. [29] investigated traditional machine learning methods for DDoS attack classification, emphasizing hyperparameter optimization using GridSearchCV. Focusing on lightweight algorithms such as Naïve Bayes, Logistic Regression, Decision Trees, Random Forests, KNN, SVM, and MLP, their study targeted resource-constrained environments like IoT. Using CIC datasets, the optimized Random Forest and Decision Tree models achieved over 98% accuracy, comparable to deep learning methods. The research highlights the importance of hyperparameter tuning in improving traditional ML performance for efficient DDoS detection.

In [30], Hijrah Nisya et al. investigated the application of a hyperparameter-tuned method for DDoS attack classification within SDN. Addressing the security vulnerabilities of SDN’s centralized control, they proposed a novel approach that optimizes the RF algorithms’ performance to enhance detection accuracy. This study employed the InSDN dataset and utilized SelectFromModel (SFM) for keeping the most relevant traffic features. And hyperparameter tuning was conducted by using Random Search to find optimal parameter configurations for the RF model.

Hyperparameter tuning notably enhanced the Random Forest algorithm’s classification accuracy, achieving a detection rate of 99.99%. Sandeep Dasari and Rajesh Kaluri [31] developed a hierarchical machine learning framework incorporating hyperparameter optimization for DDoS classification in distributed networks. Using the CICIDS 2017 dataset, they applied Min-Max scaling, SMOTE for data balancing, and LASSO for feature selection. Their tiered ensemble combined XGBoost, LightGBM, CatBoost, RF, and Decision Tree classifiers with optimized hyperparameters. The LightGBM classifier achieved a classification accuracy of 99.77%, demonstrating the effectiveness of integrating feature selection, hierarchical ML, and hyperparameter tuning to enhance intrusion detection in modern networks. In [32], Tan May May et al. investigated the optimization of ML and DL models for IDSs by focusing on hyperparameter tuning. Recognizing the high false rates and the increasing sophistication of cyberattacks posed significant challenges, the authors explored the use of RF, DNNs, and Deep Autoencoders (DAES) with a goal of improving detection accuracy. The research utilized the CIC-IDS2017 dataset with Pearson correlation for feature selection, QuantileTransformerScaler for scaling, and grid search for hyperparameter optimization, achieving an average accuracy of 99.5%. Abdussalam Ahmed Alashhab et al. [33] proposed an Online Machine Learning framework for DDoS detection in SDN environments. Their system employs four classifiers—Bernoulli Naive Bayes, Passive-Aggressive, Stochastic Gradient Descent, and MLP—which adapt continuously to evolving attacks. Experiments on CIC-DDoS2019 and InSDN datasets showed a 99.2% detection rate, outperforming similar methods.

3. Methodology

Recognizing and moderating DDoS attacks [34] is a critical challenge in the network security field. The CIC-DDoS2019 dataset [35] provides a thorough benchmark for assessing DDoS attack classification models. The general statistics and detailed information of the dataset are presented in Table A4 and Table A5 of the Appendix A. To leverage the data effectively, we developed a structured framework for data processing, feature engineering, and model optimization. We present the approach used to execute, evaluate, and optimize the ML-based DDoS classification models. This section can be divided into four subsections: Machine Learning Pipeline for DDoS Detection, Data Loading and Preprocessing, Data Splitting, Model Training, Evaluation, and Mitigation. We will discuss the three significant components of the proposed framework in detail.

3.1. Machine Learning Pipeline for DDoS Detection

The proposed architecture comprises modules for data preprocessing, data splitting, model training, validation and testing, as well as model evaluation. The overall structure is illustrated in Appendix A Figure A1.

The data processing and modeling pipeline consists of four phases: data preprocessing, data splitting, model training/validation/testing, and model evaluation. First, the CIC-DDoS2019 dataset is loaded, labeled (flows_ok as 0 and flows_ddos as 1), balanced through downsampling, cleaned by replacing invalid values, and reduced via feature selection. Next, the data is split into features (the first 82 columns) and labels (the last column), then divided into training and validation sets with a 67:33 ratio. In the third phase, three machine learning models—SVM, Logistic Regression (LR), and K-Nearest Neighbors (KNN)—are trained using GridSearchCV for hyperparameter optimization and evaluated using the validation and test sets. Finally, the models are assessed using accuracy, precision, recall, F1-score, and a confusion matrix to determine their effectiveness in detecting DDoS attacks.

3.2. Data Preprocessing

In this phase, we will introduce the flowchart of data preprocessing. Figure 1 shows the data preprocessing framework for DDoS detection. At the beginning of the procedure, the raw data from the CIC-DDoS2019 dataset is loaded. The dataset consists of two different folders: 01–12 and 03–11 (excluding UDP, UDPLag, these two files), which contain lots of .csv files with normal and malicious traffic.

Global temporary lists collect benign and attack flows from CSV chunks, with a multiplier parameter set to 5 to balance their quantities. The system reads each CSV file chunk, identifying flows via the label field. If the number of benign flows is less than five times the attack flows, downsampling reduces benign flows to five times the attack flows. Otherwise, all attack flows are kept unchanged. This ensures a balanced dataset to support effective DDoS detection.

Both attack and benign flows are appended to global lists, which are then concatenated into a single DataFrame. Subsequent steps include data cleaning, such as replacing NA/NaN and infinite values with 0, and type conversion—for example, converting the “Benign” label to 0 and all others to 1, and transforming timestamps into integer values using UTF-8 encoding followed by MD5 hashing. Finally, feature selection is performed by removing irrelevant attributes that do not contribute to DDoS detection, including “Source IP”, “Destination IP”, “Flow ID”, “Similar HTTP”, and “Unnamed: 0”.

3.3. Data Splitting

Figure 2 presents the data splitting framework used in this research. The processed dataset, consisting of 83 columns, was first loaded into the system. Feature extraction was then performed, where the first 82 columns served as input features and the last column represented the label: 1 for DDoS attacks and 0 for benign traffic. The data was split into a feature set X (input features) and a label set y (output labels).

After extraction, these feature and label sets were further separated into training and validation sets, with 67% training (X_train, y_train) and 33% validation data (X_val, y_val). This division ensured that the model was trained on enough data to learn the pattern of attacks. The training set was used to fit the ML models, while the validation set was utilized to assess their performance. This splitting strategy is crucial for evaluating the generalization capability of the models and preventing overfitting in the DDoS detection process.

3.4. Model Training and Mitigation

Figure 3 indicates the systematic process of ML model development, from the training phase to the evaluation and mitigation phase. The pipeline begins with the separation of training and testing datasets (X_train, y_train, X_test, y_test), followed by fitting using multiple ML algorithms (SVM, Logistic Regression (LR), KNN).

A critical component is the hyperparameter tuning cycle, implemented through GridSearch methodology with 5-fold cross-validation (with 4 folds used for training and 1 fold for validation).

After the model is trained, the performance metrics are calculated to estimate the model’s outcome. If the hyperparameter combination is not optimal, the process is repeated until the best-performing model is identified. Once the optimal hyperparameters are determined, the combination will be saved to the best_params variable. The final model is then estimated by using the testing dataset, and the metrics are calculated. This iterative process ensures that those three models have the best hyperparameters and performance metrics for more effective DDoS attack detection

3.5. Model Validation and Evaluation

Figure 4 shows the process of model validation. After we separate the data, we use validation data as the input for the models, which is mitigated by the Adaptive GridSearch method. These models will be used to predict the set of the validation data (y_val_pred), and the predicted labels will be compared with the actual labels (y_val) to assess the model’s outcome. The outcome indicators are estimated to assess the model’s effectiveness in detecting DDoS attacks.

3.6. Model Testing and Evaluation

Figure 4 shows the process of model testing and evaluation. After we validate the model, we will use the testing set as the input for the models, which is mitigated by the Adaptive Grid SearchCV method. These models will be utilized to generate the set of testing data (y_test_pred), and effectiveness. and the predicted labels will be compared with the actual labels (y_test) to estimate the model’s performance. The outcome indicators are shown as a figure, which helps us to assess the model’s effectiveness.

4. Experiment

In this section, we outline requirements for implementing the proposed framework, including hardware and software requirements, third-party software tools, and custom-developed scripts.

This research utilized a system powered by a 13th Gen Intel Core i7-13700 processor, 32 GB of RAM, and a SAMSUNG solid-state drive for efficient data handling. An NVIDIA GeForce RTX 4060 GPU was employed to accelerate model training and inference, especially for large-scale DDoS detection tasks. These hardware components were chosen to ensure optimal performance during the training and evaluation of the proposed detection models.

The third-party software requirements utilized in our experimental implementation. The development environment was built upon VSCode 1.97.2, which served as the primary IDE, complemented by Jupyter Notebook 7.4.2 for interactive code development and visualization. For the code data processing and analysis capabilities, our implementation leveraged NumPy for scientific computing operations and Pandas for efficient data manipulation and preprocessing tasks. Details regarding the hardware and third-party software requirements are provided in Appendix A Table A1 and Table A2, respectively.

The ML models were implemented utilizing multiple libraries from the scikit-learn ecosystem, including specialized modules for Support Vector Machine (sklearn.svm), Logistic Regression (sklearn.linear_model), and K-Nearest Neighbors (sklearn.neighbors) algorithms. Additional scikit-learn modules were employed for data preprocessing (sklearn.preprocessing), model selection (sklearn.model_selection), and performance evaluation (sklearn.metrics).

And the system resource monitoring was facilitated by Psutil, while Tqdm was utilized to track processing progress during computationally intensive operations. This comprehensive third-party software configuration enabled efficient implementation of the proposed framework, providing the necessary tools for data processing, model development, performance evaluation, and mitigation of models’ performance.

The software developed for the implementation of the proposed framework is listed in Appendix A Table A3. From the beginning of the data preprocessing and cleaning phase to the model training, evaluation, and mitigation phase, two Jupyter Notebooks are created to facilitate the implementation of the DDoS detection models. The Data Preprocessing and Cleaning.ipynb focuses on the crucial initial step of data preprocessing and cleaning for the CIC-DDoS2019 dataset. Utilizing Pandas, Numpy, and Scikit-learn libraries, this notebook addresses the challenges posed by the dataset’s scale and class imbalance. For the Model, Training, and mitigation.ipynb, it concentrated on the model training, evaluation, and mitigation phases. This notebook leverages the Scikit-learn library to implement the SVM, LR, and KNN models for DDoS detection.

5. Results and Discussion

For this section, we will offer the results of our experimental study and discuss the performance of the Machine Learning (ML) models. This chapter unfolds in the sequence described below: First, we will introduce the Evaluation indicators used to estimate the outcome of the models. Then we will present the outcomes of the research, including the outcome of the models on the training and validation sets, as well as on the testing set (with two different optimized methods). In the end, we will contrast the outcome and efficacy of the models with different optimized methods on the testing set.

5.1. Evaluation Metrics

We first present the evaluation metrics employed to rigorously assess the performance of the machine learning models. To establish a comprehensive evaluation framework, we utilize five key metrics—Accuracy, Precision, Recall, F1-score, and the Confusion Matrix—each quantifying distinct dimensions of classification performance. These metrics provide complementary perspectives on model behavior, including overall correctness, sensitivity to positive cases, and the trade-off between false positives and false negatives. When considered collectively, they enable a holistic and robust assessment of the models’ capability to accurately distinguish DDoS and normal network traffic, thereby ensuring the reliability and effectiveness of the proposed detection system.

5.2. Result and Analysis

In this part, we will present the outcomes of our experiments and analyze the performance of three ML models: SVM, LR, and KNN. For the first step, we will show the performance of the models which were optimized by using Adaptive GridSearchCV with training and validation sets, and then we will show the outcome of the models on the testing set. The end of this part will contrast the outcome and efficiency of the models with different optimized methods on the testing set. The results will be presented in figures and tables, and we will provide a detailed analysis of the findings.

5.2.1. The Performance on the Training and Validation Sets

The training and validation accuracy of the SVM model, optimized using Adaptive GridSearchCV, is shown in Table 1. As illustrated in the training output, the SVM model attains 0.999726 training accuracy while maintaining 0.999767 validation accuracy. The close alignment between these two metrics—where validation accuracy slightly exceeds training accuracy—is a positive indicator that the model generalizes well to unseen data rather than merely memorizing training examples. This indicates that it has successfully learned the underlying patterns in the training data without overfitting to noise or specific instances. Also presented in Table 1 are the results for the KNN model, which was similarly optimized using Adaptive GridSearchCV.

The optimized KNN model attains 1.0000 training accuracy and 0.999680 validation accuracy, indicating that the model performs exceptionally well on the training data, achieving perfect accuracy. However, the validation accuracy is slightly lower, showing that while the model has learned the pattern very well, it still generalizes effectively to unseen data. The minimal difference between training and validation accuracy (only 0.000320 gap) demonstrates that the model is not overfitting to the training set. The consistent high performance across both training and validation sets confirms the model’s robustness and its ability to maintain predictive accuracy in real-world scenarios.

Finally, the table also includes accuracy results for the LR model, which was optimized using Adaptive GridSearchCV. The optimized LR model reaches a 0.990638 training accuracy and 0.986496 validation accuracy. It also demonstrates that the model performs well on the training data, and still generalizes effectively to unseen data like the validation set. The difference between training and validation accuracy is 0.004142, which is relatively small, suggesting that the model is not overfitting to the training data. This shows that the LR model has learned the underlying patterns in the training data and can maintain a good level of predictive accuracy when applied to new, unseen data (such as a testing set).

5.2.2. The Performance on the Testing Set

In this subsection, we will split into three parts to show the performance of the models on the testing set. The first part will show the performance of the models which optimized by using Standard GridSearchCV, the second part will show the performance of the models which optimized by using Adaptive GridSearchCV with testing set, and the last part will contrast the outcome and efficiency of the models with different optimized methods on the testing set.

The results of the SVM model evaluated on the testing set using general GridSearchCV are summarized in Table 2. With the optimized hyperparameters, the classification report shows that the SVM model achieves perfect precision, recall, and F1-score of 1.00 for both classes (0 and 1), the model correctly classifies 49,763 instances of class 0 and 25,620 instances of class 1. Overall accuracy achieves 0.998647, with macro and weighted averages also reaching perfect scores of 1.00 across all metrics. The corresponding confusion matrix—[49,666, 97] and [5, 25,615]—reveals minimal misclassifications, with only 97 false positives and 5 false negatives out of 75,383 total instances.

Similarly, the Logistic Regression (LR) model, also evaluated using GridSearchCV, demonstrates strong performance. The classification report indicates that the LR model achieves perfect precision (1.00), recall (1.00), and F1-score (1.00) for class 0 with 49,763 instances. For class 1, it maintains high precision (0.99), recall (1.00), and F1-score (1.00). The total number of correctly classified instances is 49,526 for class 0 and 25,618 for class 1. The model attains an overall accuracy of 0.996830, with macro and weighted averages again reaching 1.00. According to the confusion matrix, there are 237 false positives and 2 false negatives, reflecting only a small number of errors.

As for the KNN model, it also exhibits excellent classification capability on the same test set. Both classes achieve perfect scores for precision, recall, and F1-score. The classification report demonstrates exceptional performance across both classes. Both class 0 and class 1 achieve perfect precision (1.00), recall (1.00), and F1-score (1.00), with class 0 having 49,763 instances and class 1 having 25,620 instances. The overall accuracy of the KNN model is 0.998395, with macro and weighted averages also reaching perfect scores of 1.00 across all metrics. Misclassifications are minimal, as evidenced by 119 false positives and 2 false negatives out of the total 75,383 instances.

Table 3 indicates the outcome of the SVM model on the testing set with Adaptive Grid SearchCV. With the classification report, the SVM model achieves a precision, recall and F1-score of 1.00 for both classes (0 and 1). And the macro and weighted averages also reach scores of 1.00 across all metrics. The overall accuracy of the SVM model is 0.998700, with the confusion matrix showing [49,669, 94] and [4, 25,616], demonstrating only 94 false positives and 4 false negatives out of 49,763 class 0 instances and 25,620 class 1 instances (total 75,383 instances). Table 3 shows the outcome of the LR model on the testing set with Adaptive GridSearchCV. The classification report indicates that the LR model attains a precision, recall, and F1-score of 1.00 for class 0, a precision of 0.99, recall of 1.00, and F1-score of 1.00 for class 1. And the overall testing accuracy is 0.996830, with macro and weighted averages also reaching scores of 1.00 across all metrics. The confusion matrix shows [49,526, 237] and [2, 25,618], indicating with only 237 false positives and 2 false negatives out of 75,383 total instances. Table 3 demonstrates the outcome of the KNN model on the testing set with Adaptive GridSearchCV. The classification report shows that the KNN model achieves a precision, recall, and F1-score of 1.00 for both classes (0 and 1). The overall testing accuracy is 0.998395, with macro and weighted averages also attaining scores of 1.00 across all metrics. The confusion matrix shows [49,644, 119] and [2, 25,618], indicating only 119 false positives and 2 false negatives out of 75,383 total instances.

Table 4 illustrates that the SVM model optimized by our method achieves a testing accuracy of 0.998700 with an execution time of 3949.99 s. The General GridSearchCV attains 0.998647 testing accuracy with an execution time of 5518.38 s. The General Grid SearchCV from another paper attains a testing accuracy of 0.95 (without execution time). This indicates that our method decreases around 1568.39 s of execution time (about 28% time decrease), and it further improves the testing accuracy by 0.000053 (extremely close to testing accuracy of 1).

Table 5 indicates the comparison of performance of LR model optimized by general Grid SearchCV and Adaptive GridSearchCV. The result shows that the LR model optimized by our method attains 0.996830 testing accuracy with an execution time of 16.90 s, while the General GridSearchCV achieves the same testing accuracy of 0.996830 with an execution time of 21.88 secoonds, and the General GridSearchCV from another paper (without execution time) reaches a testing accuracy of 0.94. From the result, we can see that our method apparently reduces the execution time by 4.98 s (about 22.8% time reduction) while maintaining the same testing accuracy as the general method. And compared to the general method from another paper, our method evidently has higher testing accuracy by 0.056830 (about 6% improvement). It suggests that our method keeps the same testing accuracy and is even better than another paper while reducing the execution time.

In Table 6, it shows that the KNN model optimized by our method achieves a testing accuracy of 0.998395 with an execution time of 2388.89 s, while the general GridSearchCV achieves a testing accuracy of 0.998395 with an execution time of 6455.22 s, and the general one from other papers (without execution time) attains a testing accuracy of 0.9461. Based on the result, we can see that our method strongly reduces the execution time by 4066.33 s (about 63% time reduction) while maintaining the same testing accuracy. And compared to the general method from another paper, our method reaches a higher testing accuracy by 0.052295 (about 5.5% improvement). This indicates that our method is more efficient than the general method while maintaining a good level of testing accuracy.

6. Conclusions and Future Work

Considering the increasing complexity and frequency of Distributed Denial-of-Service (DDoS) attacks, this research has presented an enhanced methodology for improving the detection capabilities of machine learning (ML) models through optimized hyperparameter tuning. Specifically, the study proposed an adaptive GridSearchCV technique that strategically refines the hyperparameter search space, offering a more efficient alternative to traditional exhaustive grid search methods. This adaptive approach was empirically validated using the CIC-DDoS2019 dataset, where it achieved comparable or superior classification accuracy while significantly reducing computational overhead. For instance, the SVM model achieved a 99.87% testing accuracy with a 28% reduction in execution time, the Logistic Regression model maintained 99.68% accuracy with a 22.8% improvement in efficiency, and the KNN model preserved 99.84% accuracy while reducing computation time by 63%. These results underscore the potential of adaptive hyperparameter tuning to simultaneously enhance both the effectiveness and efficiency of DDoS detection systems.

While the proposed method demonstrates high accuracy in detecting DDoS attacks under offline conditions, it is not yet suitable for real-time deployment due to its reliance on batch processing and static model training. The current models are trained on a fixed dataset and lack incremental learning capabilities, limiting their responsiveness to dynamic traffic patterns and sudden shifts in attack behavior. Furthermore, since the models are optimized based on known attack patterns present in the CIC-DDoS2019 dataset, their ability to detect novel or evolving DDoS variants remains constrained. To address these limitations, future work will focus on integrating online learning frameworks to enable real-time model updates and responsiveness. In addition, transfer learning techniques will be explored to enhance the adaptability and generalization of the models across diverse network environments and previously unseen attack types.

Author Contributions

Conceptualization, S.-R.C., S.-J.C. and W.-B.H.; methodology, S.-R.C., S.-J.C. and W.-B.H.; software, S.-R.C.; validation, S.-R.C., S.-J.C. and W.-B.H.; formal analysis, S.-R.C.; investigation, S.-R.C.; resources, S.-R.C.; data curation, S.-R.C.; writing—original draft preparation, S.-R.C., S.-J.C. and W.-B.H.; writing—review and editing, S.-R.C., S.-J.C. and W.-B.H.; visualization, S.-R.C., S.-J.C. and W.-B.H.; supervision, S.-J.C. and W.-B.H.; project administration, S.-J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data is contained within the article.

Acknowledgments

The authors would like to thank GPT-4o for providing valuable assistance in identifying typographical and grammatical errors during the preparation of this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

AI	Artificial Intelligence
ML	Machine Learning
DDoS	Distributed Denial of Service
SVM	Support Vector Machine
LR	Logistic Regression
KNN	K-Nearest Neighbors
IDS	Intrusion Detection System
GB	Gradient Boosting
XGBoost	Extreme Gradient Boosting
MLP	Multi-Layer Perceptron
SMOTE	Synthetic Minority Oversampling Technique
GRU	Gated Recurrent Unit
NTM	Neural Turing Machine
SDN	Software-Defined Networking
GAN	Generative Adversarial Network
AIMM	Artificial Intelligence Merged Methods
ANN	Artificial Neural Network
RF	Random Forest
CV	Cross Validation (GridSearchCV)
UTF-8	Unicode Transformation Format—8-bit
MD5	Message Digest Algorithm 5
ETC	Extra Trees Classifier
SHAP	SHapley Additive exPlanations
SFM	SelectFromModel
SGD	Stochastic Gradient Descent
DNN	Deep Neural Network
DAES	Deep Autoencoders
InSDN	Intelligent SDN Dataset
CIC-DDoS2019	CIC DDoS 2019 Dataset
CICIDS2017	CIC IDS 2017 Dataset
AIoT	Artificial Intelligence of Things
SD-AIoT	Software-Defined AIoT

Appendix A

Figure A1. System Architecture of Enhancing ML-Based DDoS Detection.

Table A1. Hardware Requirements.

Name	Description
CPU	13th Gen Intel(R) Core(TM) i7-13700
Memory	32 GB
Disk	SAMSUNGM ZVL2512HCJQ-00B00
GPU	NVIDIA GeForce RTX 4060

Table A2. Third-Party Software Requirements.

Name	Description	Version	License
Visual Studio Code	Integrated Development Environment	1.97.2	MIT License
Jupyter Notebook	Interactive Computing Environment	7.4.2	BSDLicense
Python	Programming Language	3.12.0	PSF License
NumPy	Scientific Computing Library	2.0.2	BSDLicense
Pandas	Data Manipulation Library	2.2.3	BSDLicense
Keras	Deep Learning Framework	3.7.0	MIT License
Psutil	Used to retrieve information on running processes and system utilization	6.1.1	BSDLicense
Scikit-Learn	Machine Learning Library	1.6.1	BSDLicense
Tqdm	Progress Bar Library	4.67.1	MIT License

Table A3. Software Requirements.

Name	Description
Data Preprocessing and Cleaning.ipynb	Loading and Merging Dataset, Data cleaning, resampling, converting, normalizing, and Splitting
Model Training, Evaluation, and Mitigation.ipynb	Executing model training, evaluating the performance on the validation set and testing set, and reinforcing the capability of ML models

Table A4. General Statistics of the CIC-DDoS2019 Dataset.

Parameter Name	Total #
Total number of records	12,794,627
Total number of features	82
Total number of labels	1
Total number of normal records	6,398,925
Total number of attack records	6,395,702
% of normal records	50.02%
% of attack records	49.98%

Table A5. Attack Class Summary in CIC-DDoS2019.

Category	Attack Type	Description
DDoS (Volumetric)	DDoS-UDP	UDP flood attack that overwhelms targets by sending large numbers of UDP packets.
	DDoS-ICMP	ICMP flood using pings to exhaust bandwidth and processing.
	DDoS-TCP	SYN floods or connection requests that exhaust resources.
DDoS (Application)	DDoS-HTTP	HTTP GET/POST floods targeting web servers.
	DDoS-HTTPS	SSL-encrypted HTTP floods, harder to detect due to encryption.
	DDoS-DNS	DNS amplification attacks.
	DDoS-SMTP	Attacks that exploit mail servers with spam floods or malformed SMTP requests.
DDoS (Reflection)	DrDoS-DNS	Amplified DNS responses from spoofed IPs to flood targets.
	DrDoS-NTP	Exploits NTP servers to amplify traffic.
	DrDoS-SSDP	SSDP (UPnP) reflection for amplification.
	DrDoS-SNMP	Uses SNMP devices for reflected flood.
	DrDoS-MSSQL	MSSQL reflection/amplification.
Benign	Benign Traffic	Normal user-generated traffic with no attacks.

References

Sarker, I.H.; Kayes, A.S.M.; Watters, P. Cybersecurity data science: An overview from machine learning perspective. J. Big Data 2020, 7, 41. [Google Scholar] [CrossRef]
Abhishta, A.; van Eeten, M.; Nieuwenhuis, L.J.M. Why would we get attacked? An analysis of attacker’s aims behind DDoS attacks. J. Wirel. Mob. Netw. Ubiquitous Comput. Dependable Appl. 2020, 11, 3–22. [Google Scholar]
Abdullahi, M.; Arshad, S.Z.; Alzahrani, B.A.; Nam, Y. Detecting cybersecurity attacks in Internet of Things using artificial intelligence methods: A systematic literature review. Electronics 2022, 11, 198. [Google Scholar] [CrossRef]
Mijwil, M.M.; Salem, I.E.; Ismaeel, M.M. The significance of machine learning and deep learning techniques in cybersecurity: A comprehensive review. Iraqi J. Comput. Sci. Math. 2023, 4, 10. [Google Scholar] [CrossRef]
Peneti, S.; Hemalatha, E. DDoS attack identification using machine learning techniques. In Proceedings of the 2021 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 27–29 January 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–5. [Google Scholar]
Al-Eryani, A.M.; Hossny, E.; Omara, F.A. Efficient machine learning algorithms for DDoS attack detection. In Proceedings of the 2024 6th International Conference on Computing and Informatics (ICCI), Cairo, Egypt, 25–27 February 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 174–181. [Google Scholar]
Alfatemi, A.; Rahouti, M.; Amin, R.; ALJamal, S.; Xiong, K.; Xin, Y. Advancing DDoS attack detection: A synergistic approach using deep residual neural networks and synthetic oversampling. arXiv 2024, arXiv:2401.03116. [Google Scholar] [CrossRef]
Panggabean, C.; Venkatachalam, C.; Shah, P.; John, S.; Renuka Devi, P.; Venkatachalam, S. Intelligent DoS and DDoS Detection: A Hybrid GRU-NTM Approach to Network Security. In Proceedings of the 2024 5th International Conference on Smart Electronics and Communication (ICOSEC), Trichy, India, 18–20 September 2024; pp. 658–665. [Google Scholar]
Haque, M.E.; Hossain, A.; Alam, M.S.; Siam, A.H.; Rabbi, S.M.F.; Rahman, M.M. Optimizing DDoS Detection in SDNs Through Machine Learning Models. In Proceedings of the 2024 IEEE 16th International Conference on Computational Intelligence and Communication Networks (CICN), Indore, India, 22–23 December 2024; pp. 426–431. [Google Scholar]
de Melo, L.H.; Bertoli, G.d.C.; Nogueira, M.; dos Santos, A.L.; Pereira Junior, L.A. Anomaly-Flow: A Multi-domain Federated Generative Adversarial Network for Distributed Denial-of-Service Detection. arXiv 2025, arXiv:2503.14618. [Google Scholar] [CrossRef]
Khalaf, B.A.; Gao, S.; Abdulsahib, G.M.; Al-Jumeily, D.; Baker, T. Comprehensive review of artificial intelligence and statistical approaches in distributed denial of service attack and defense methods. IEEE Access 2019, 7, 51691–51713. [Google Scholar] [CrossRef]
Frank, J. Artificial intelligence and intrusion detection: Current and future directions. In Proceedings of the 17th National Computer Security Conference, Baltimore, MD, USA, 11–14 October 1994; Volume 10, pp. 1–12. [Google Scholar]
Ahmad, J.; Latif, S.; Khan, I.U.; Alshehri, M.S.; Khan, M.S.; Alasbali, N.; Jiang, W. An Interpretable Deep Learning Framework for Intrusion Detection in Industrial Internet of Things. Internet Things 2025, 33, 101681. [Google Scholar] [CrossRef]
Suhag, A.; Daniel, A. Study of statistical techniques and artificial intelligence methods in distributed denial of service (DDoS) assault and defense. J. Cyber Secur. Technol. 2023, 7, 21–51. [Google Scholar] [CrossRef]
Jaszcz, A.; Połap, D. AIMM: Artificial intelligence merged methods for flood DDoS attacks detection. J. King Saud Univ.–Comput. Inf. Sci. 2022, 34, 8090–8101. [Google Scholar] [CrossRef]
Mohamed, N. DDoS Attacks Mitigation: A Review of AI-Based Strategies and Techniques. In Proceedings of the 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, India, 10–12 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
Saini, P.S.; Behal, S.; Bhatia, S. Detection of DDoS attacks using machine learning algorithms. In Proceedings of the 7th International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 12–14 February 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 16–21. [Google Scholar]
Santos, R.; Souza, D.; Santo, W.; Ribeiro, A.; Moreno, E. Machine learning algorithms to detect DDoS attacks in SDN. Concurr. Computat. Pract. Exp. 2020, 32, e5402. [Google Scholar] [CrossRef]
Al-Shareeda, M.A.; Manickam, S.; Saare, M.A. DDoS attacks detection using machine learning and deep learning techniques: Analysis and comparison. Bull. Electr. Eng. Inform. 2023, 12, 930–939. [Google Scholar] [CrossRef]
Alduailij, M.; Khan, Q.W.; Tahir, M.; Sardaraz, M.; Alduailij, M.; Malik, F. Machine-learning-based DDoS attack detection using mutual information and random forest feature importance method. Symmetry 2022, 14, 1095. [Google Scholar] [CrossRef]
Marvi, M.; Arfeen, A.; Uddin, R. A generalized machine learning-based model for the detection of DDoS attacks. Int. J. Netw. Manag. 2021, 31, e2152. [Google Scholar] [CrossRef]
Saravanan, S.; Balasubramanian, U.M. An Adaptive Scalable Data Pipeline for Multiclass Attack Classification in Large-Scale IoT Networks. Big Data Min. Anal. 2024, 7, 500–511. [Google Scholar] [CrossRef]
Dalvi, J.; Sharma, P.; Pardeshi, K. DDoS Attack Detection Using Artificial Neural Network. In Proceedings of the 2021 International Conference on Industrial Electronics Research and Applications (ICIERA), Nagpur, India, 5–6 August 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–5. [Google Scholar]
Latha, R.; Thangaraj, S.J.J. Machine Learning Approaches for DDoS Attack Detection: Naive Bayes vs Logistic Regression. In Proceedings of the 2023 Second International Conference on Smart Technologies for Smart Nation (SmartTechCon), Bengaluru, India, 10–11 August 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1043–1048. [Google Scholar]
Aljuhani, A. Machine Learning Approaches for Combating Distributed Denial of Service Attacks in Modern Networking Environments. IEEE Access 2021, 9, 42236–42264. [Google Scholar] [CrossRef]
Saghezchi, F.B.; Mantas, G.; Violas, M.A.; de Oliveira Duarte, A.M.; Rodriguez, J. Machine learning for DDoS attack detection in industry 4.0 CPPSs. Electronics 2022, 11, 602. [Google Scholar] [CrossRef]
Cil, A.E.; Yildiz, K.; Buldu, A. Detection of DDoS Attacks with Feedforward-Based Deep Neural Network Model. Expert Syst. Appl. 2021, 169, 114520. [Google Scholar] [CrossRef]
Ma, J.; Su, W. Collaborative DDoS Defense for SDN-Based AIoT with Autoencoder-Enhanced Federated Learning. Inf. Fusion 2025, 117, 102820. [Google Scholar] [CrossRef]
Sanchez, O.R.; Gomez, J.A.; Serrat, J.; Muntés-Mulero, V. Evaluating ML-Based DDoS Detection with Grid Search Hyperparameter Optimization. In Proceedings of the 2021 IEEE 7th International Conference on Network Softwarization (NetSoft), Tokyo, Japan, 28 June–2 July 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 402–408. [Google Scholar]
Nisya, H.; Hertiana, S.N.; Purwanto, Y. Implementation of Hyperparameter Tuning Random Forest Algorithm in Machine Learning for SDN Security: An Innovative Exploration of DDoS Attack Detection. In Proceedings of the 2024 International Conference on Artificial Intelligence, Blockchain, Cloud Computing, and Data Analytics (ICoABCD), Jakarta, Indonesia, 8–9 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 321–326. [Google Scholar]
Dasari, S.; Kaluri, R. An Effective Classification of DDoS Attacks in a Distributed Network by Adopting Hierarchical Machine Learning and Hyperparameters Optimization Techniques. IEEE Access 2024, 12, 10834–10845. [Google Scholar] [CrossRef]
May, T.M.; Zainudin, Z.; Muslim, N.; Jamil, N.S.; Jan, N.A.M.; Ibrahim, N.; Sabri, N.A.B. Intrusion Detection System (IDS) Classifications Using Hyperparameter Tuning for Machine Learning and Deep Learning. In Proceedings of the 2024 5th International Conference on Artificial Intelligence and Data Sciences (AiDAS), Bangkok, Thailand, 3–4 September 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 344–349. [Google Scholar]
Alashhab, A.A.; Zahid, M.S.; Isyaku, B.; Elnour, A.A.; Nagmeldin, W.; Abdelmaboud, A.; Abdullah, T.A.A.; Maiwada, U.D. Enhancing DDoS Attack Detection and Mitigation in SDN Using an Ensemble Online Machine Learning Model. IEEE Access 2024, 12, 51630–51649. [Google Scholar] [CrossRef]
Gniewkowski, M. An Overview of DoS and DDoS Attack Detection Techniques. In Proceedings of the International Conference on Dependability and Complex Systems, Brunów, Poland, 29 June–3 July 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 233–241. [Google Scholar]
University of New Brunswick Cyber Defence Lab. CIC-DDoS2019 Dataset. Available online: https://www.unb.ca/cic/datasets/ddos-2019.html (accessed on 14 December 2024).
Sabbir, M.A. Enhanced Detection and Classification of DDoS Attacks Using Optimized Hybrid Machine Learning Models. 2025. Available online: https://www.researchgate.net/profile/Md-Ariful-Sabbir/publication/388646389_Enhanced_Detection_and_Classification_of_DDoS_Attacks_Using_Optimized_Hybrid_Machine_Learning_Models/links/67a0e1f0645ef274a4623249/Enhanced-Detection-and-Classification-of-DDoS-Attacks-Using-Optimized-Hybrid-Machine-Learning-Models.pdf (accessed on 6 June 2025).
Talaei Khoei, T.; Ismail, S.; Kaabouch, N. Boosting-based models with tree-structured Parzen estimator optimization to detect intrusion attacks on smart grid. In Proceedings of the 2021 IEEE 12th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA, 1–4 December 2021; pp. 165–170. [Google Scholar]

Figure 1. Data Preprocessing Framework for DDoS Detection.

Figure 2. Data Splitting Framework for DDoS Detection.

Figure 3. Iterative Hyperparameter Tuning Methodology for ML Model Optimization.

Figure 4. Framework for Model Validation and Evaluation.

Table 1. The training and validation results are shown for the optimized models: (a) SVM, (b) KNN, and (c) LR.

		Model	SVM (Adaptive)	KNN (Adaptive)	LR (Adaptive)
Metric			SVM (Adaptive)	KNN (Adaptive)	LR (Adaptive)
Training Accuracy			0.999726	1.000000	0.990638
Validation Accuracy			0.999690	0.999680	0.986496
Precision	class	0	1.00	1.00	0.96
Precision	class	1	1.00	1.00	1.00
recall	class	0	1.00	1.00	1.00
recall	class	1	1.00	1.00	0.98
F1-score	class	0	1.00	1.00	0.96
F1-score	class	1	1.00	1.00	0.99
Support *	class	0	18,807
Support *	class	1	90,714

* Support is the number of actual occurrences of each class in the ground truth labels of the dataset.

Table 2. The performance on the testing set is shown for the models optimized with standard GridSearchCV: (a) SVM, (b) Logistic Regression (LR), and (c) KNN.

		Model	SVM (GridSearchCV)	KNN (GridSearchCV)	LR (GridSearchCV)
Metric			SVM (GridSearchCV)	KNN (GridSearchCV)	LR (GridSearchCV)
Best Params			C = 1000, γ = 0.01, kernel = rbf	n_neighbors = 3, p = 1, weights = distance	C = 0.001, penalty = l2, solver = lbfgs
CV Accuracy			0.999693	0.999799	0.990380
Validation Accuracy			0.998647	0.998395	0.996830
Precision	class	0	1.00	1.00	1.00
Precision	class	1	1.00	1.00	0.99
recall	class	0	1.00	1.00	1.00
recall	class	1	1.00	1.00	1.00
F1-score	class	0	1.00	1.00	1.00
F1-score	class	1	1.00	1.00	1.00
support	class	0	49,763
support	class	1	25,620
Confusion matrix			$[\begin{matrix} 49,666 & 97 \\ 5 & 25,615 \end{matrix}]$	$[\begin{matrix} 49,644 & 119 \\ 2 & 25,618 \end{matrix}]$	$[\begin{matrix} 49,526 & 237 \\ 2 & 25,618 \end{matrix}]$

Table 3. The performance on the testing set is shown for the models optimized with Adaptive GridSearchCV: (a) SVM, (b) Logistic Regression (LR), and (c) KNN.

		Model	SVM (Adaptive GridSearchCV)	KNN (Adaptive GridSearchCV)	LR (Adaptive GridSearchCV)
Metric			SVM (Adaptive GridSearchCV)	KNN (Adaptive GridSearchCV)	LR (Adaptive GridSearchCV)
Best Params			C = 1000, γ = 0.03548137, kernel = rbf	n_neighbors = 3, p = 1, weights = distance	C = 0.001, penalty = l2, solver = lbfgs
CV Accuracy			0.999685	0.999799	0.990380
Validation Accuracy			0.998700	0.998395	0.996830
Precision	class	0	1.00	1.00	1.00
Precision	class	1	1.00	1.00	0.99
recall	class	0	1.00	1.00	1.00
recall	class	1	1.00	1.00	1.00
F1-score	class	0	1.00	1.00	1.00
F1-score	class	1	1.00	1.00	1.00
support	class	0	49,763
support	class	1	25,620
Confusion matrix			$[\begin{matrix} 49,669 & 94 \\ 4 & 25,616 \end{matrix}]$	$[\begin{matrix} 49,644 & 119 \\ 2 & 25,618 \end{matrix}]$	$[\begin{matrix} 49,526 & 237 \\ 2 & 25,618 \end{matrix}]$

Table 4. The Comparison of Performance of SVM Model Optimized by General GridSearchCV and Adaptive GridSearchCV.

Optimized Method	Execution Time	Testing Accuracy
SVM (Adaptive GridSearchCV)	3949.99 s	0.998700
SVM (GridSearchCV)	5518.38 s	0.998647
SVM (GridSearchCV from other paper) [36]	X	0.95

Table 5. The Comparison of Performance of LR Model Optimized by General GridSearchCV and Adaptive GridSearchCV.

Optimized Method	Execution Time	Testing Accuracy
LR (Adaptive GridSearchCV)	16.90 s	0.996830
LR (GridSearchCV)	21.88 s	0.996830
LR (GridSearchCV from other paper) [36]	X	0.94

Table 6. The Comparison of Performance of KNN Model Optimized by General GridSearchCV and Adaptive GridSearchCV.

Optimized Method	Execution Time	Testing Accuracy
KNN (Adaptive GridSearchCV)	2388.89 s	0.998395
KNN (GridSearchCV)	6455.22 s	0.998395
KNN (GridSearchCV from other paper) [37]	X	0.9461

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, S.-R.; Chen, S.-J.; Hsieh, W.-B. Enhancing Machine Learning-Based DDoS Detection Through Hyperparameter Optimization. Electronics 2025, 14, 3319. https://doi.org/10.3390/electronics14163319

AMA Style

Chen S-R, Chen S-J, Hsieh W-B. Enhancing Machine Learning-Based DDoS Detection Through Hyperparameter Optimization. Electronics. 2025; 14(16):3319. https://doi.org/10.3390/electronics14163319

Chicago/Turabian Style

Chen, Shao-Rui, Shiang-Jiun Chen, and Wen-Bin Hsieh. 2025. "Enhancing Machine Learning-Based DDoS Detection Through Hyperparameter Optimization" Electronics 14, no. 16: 3319. https://doi.org/10.3390/electronics14163319

APA Style

Chen, S.-R., Chen, S.-J., & Hsieh, W.-B. (2025). Enhancing Machine Learning-Based DDoS Detection Through Hyperparameter Optimization. Electronics, 14(16), 3319. https://doi.org/10.3390/electronics14163319

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Machine Learning-Based DDoS Detection Through Hyperparameter Optimization

Abstract

1. Introduction

2. Related Work

2.1. Artificial Intelligence (AI)-Driven Defense Mechanisms

2.2. Machine Learning (ML)-Based Detection Mechanisms

2.3. Distributed Denial of Service Attack (DDoS Attack)

2.4. Hyperparameter Tuning

3. Methodology

3.1. Machine Learning Pipeline for DDoS Detection

3.2. Data Preprocessing

3.3. Data Splitting

3.4. Model Training and Mitigation

3.5. Model Validation and Evaluation

3.6. Model Testing and Evaluation

4. Experiment

5. Results and Discussion

5.1. Evaluation Metrics

5.2. Result and Analysis

5.2.1. The Performance on the Training and Validation Sets

5.2.2. The Performance on the Testing Set

6. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI