Evaluating the Generalization Gaps of Intrusion Detection Systems Across DoS Attack Variants
Abstract
1. Introduction
- Evaluating the two ML models—RF and DNN models on DoS variants: DoS Hulk, Goldeneye, Slowloris and Slowhttptest, and analyzing the performance.
- Assessing the generalization capabilities of the two models, trained on DoS Hulk and Benign samples.
- Comparing the importance of features to understand the behavior of the model using SHAP.
- Applying UMAP visualizations to support the divergence in behavior.
2. Literature Review
3. Methodology
- Data Acquisition: Collect CIC IDS 2017 benchmark dataset and identify the relevant subsets like DoS Hulk, DoS Goldeneye, DoS Slowloris and DoS Slowhttptest.
- Data Labeling: The categorical values are labeled using label encoding.
- Missing Data Decision: The data is checked for missing or null values.
- Infinite Data Decision: The data is checked for infinite values.
- Data Cleaning: If missing or infinite values are found in the dataset, they are deleted.
- Feature Scaling: Standard scaler is used to scale the features.
- Feature Selection: To extract relevant features and reduce the dimensionality of the dataset, PCA is implemented.
- Model Training: ML models are trained on a dataset with 50k benign samples from Monday traffic and 50k DoS Hulk samples from Wednesday traffic. The algorithms used are RF and DNN, with 5-fold cross-validation.
- Performance Evaluation: Accuracy, precision, recall, F1-score and confusion matrix are used to evaluate the performance of the system.
- Cross-Variant Analysis: For cross-variant analysis of ML models, the trained models were used for testing datasets with unseen variants of DoS attacks like Goldeneye, Slowloris and Slowhttptest. Furthermore, threshold tuning is performed for each of the tests, and the one with best accuracy is chosen.
- Model Generalizability Analysis: SHAP is used to understand the importance of features for each of the applied models.
- Feature Space Visualization: UMAP is applied to visually depict how the feature space of different attacks is learned by the models.
3.1. Dataset Overview and Preprocessing
3.2. Model Training and Evaluation
3.3. Cross-Variant Analysis
3.4. Model Generalizability Analysis
3.5. Feature Space Visualization
4. Implementation and Results
4.1. Model Training
4.2. Cross-Variant Evaluation
4.3. Generalizability Assessment
4.4. SHAP-Based Interpretability
4.5. UMAP-Based Feature Space Mapping
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Mandiant. M-Trends 2024: Our View from the Frontlines. 2024. Available online: https://services.google.com/fh/files/misc/m-trends-2024.pdf (accessed on 2 July 2025).
- Aijaz, I.; Idrees, S.M.; Agarwal, P. An Empirical Study on Analysing DDoS Attacks in Cloud Environment. In Advances in Intelligent Computing and Communication: Proceedings of ICAC 2020; Springer: Singapore, 2021; pp. 295–305. [Google Scholar]
- Bace, R.G.; Mell, P. Intrusion Detection Systems; National Institute of Standards and Technology: Gaithersburg, MD, USA, 2001.
- Sommer, R.; Paxson, V. Outside the closed world: On using machine learning for network intrusion detection. In Proceedings of the 2010 IEEE Symposium on Security and Privacy, Oakland, CA, USA, 16–19 May 2010; IEEE: New York, NY, USA, 2010; pp. 305–316. [Google Scholar]
- Ring, M.; Wunderlich, S.; Scheuring, D.; Landes, D.; Hotho, A. A survey of network-based intrusion detection data sets. Comput. Secur. 2019, 86, 147–167. [Google Scholar] [CrossRef]
- Vinayakumar, R.; Alazab, M.; Soman, K.P.; Poornachandran, P.; Al-Nemrat, A.; Venkatraman, S. Deep learning approach for intelligent intrusion detection system. IEEE Access 2019, 7, 41525–41550. [Google Scholar] [CrossRef]
- Kayode-Ajala, O. Applying Machine Learning Algorithms for Detecting Phishing Websites: Applications of SVM, KNN, Decision Trees, and Random Forests. Int. J. Inf. Cybersecur. 2022, 6, 43–61. [Google Scholar]
- Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. CICIDS 2017 Dataset; Canadian Institute for Cybersecurity, University of New Brunswick: Fredericton, NB, Canada, 2017; Available online: https://www.unb.ca/cic/datasets/ids-2017.html (accessed on 2 July 2025).
- Garcia, S.; Grill, M.; Stiborek, J.; Zunino, A. An empirical comparison of botnet detection methods. Comput. Secur. 2020, 45, 100–123. [Google Scholar] [CrossRef]
- McInnes, L.; Healy, J.; Melville, J. UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv 2018, arXiv:1802.03426. [Google Scholar] [CrossRef]
- Bhuyan, M.H.; Kashyap, H.J.; Bhattacharyya, D.K.; Kalita, J.K. Detecting Distributed Denial of Service Attacks: Methods, Tools and Future Directions. Comput. J. 2013, 57, 537–556. [Google Scholar] [CrossRef]
- Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef] [PubMed]
- Abbas, A.; Khan, M.A.; Latif, S.; Ajaz, M.; Shah, A.A.; Ahmad, J. A New Ensemble-Based Intrusion Detection System for Internet of Things. Arab. J. Sci. Eng. 2022, 47, 1805–1819. [Google Scholar] [CrossRef]
- Berahmand, K.; Heydari, M.; Nabizadeh, A. A hybrid interpretable model for anomaly detection in high-dimensional cybersecurity data. Expert Syst. Appl. 2024, 235, 120215. [Google Scholar]
- Taher, K.A.; Jisan, B.M.Y.; Rahman, M.M. Network intrusion detection using supervised machine learning technique with feature selection. In Proceedings of the 2019 International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST), Dhaka, Bangladesh, 10–12 January 2019; IEEE: New York, NY, USA, 2019; pp. 643–646. [Google Scholar]
- Dhanabal, L.; Shantharajah, S.P. A study on NSL-KDD dataset for intrusion detection system based on classification algorithms. Int. J. Adv. Res. Comput. Commun. Eng. 2015, 4, 446–452. [Google Scholar]
- Leevy, J.L.; Khoshgoftaar, T.M. A survey and analysis of intrusion detection models based on CSE-CIC-IDS2018 Big Data. J. Big Data 2020, 7, 104. [Google Scholar] [CrossRef]
- Saabith, S.; Thangarajah, V.; Fareez, M. A survey of machine learning techniques for anomaly detection in cybersecurity. Int. J. Res. Eng. Sci. 2023, 11, 183–193. [Google Scholar]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4765–4774. [Google Scholar]
- Javaid, A.; Niyaz, Q.; Sun, W.; Alam, M. A Deep Learning Approach for Network Intrusion Detection System. In Proceedings of the 9th EAI International Conference on Bio-Inspired Information and Communications Technologies (BICT), New York, NY, USA, 3–5 December 2015; pp. 21–26. [Google Scholar]
- Yin, C.; Zhu, Y.; Fei, J.; He, X. A deep learning approach for intrusion detection using recurrent neural networks. IEEE Access 2017, 5, 21954–21961. [Google Scholar] [CrossRef]
- Al-Turaiki, I.; Altwaijry, N. A convolutional neural network for improved anomaly- and attack-type classification in network intrusion detection systems. Biomed. Inform. Insights 2021, 9, 233–252. [Google Scholar] [PubMed Central]
- Apruzzese, G.; Pajola, L.; Conti, M. The cross-evaluation of machine learning-based network intrusion detection systems. IEEE Trans. Netw. Service Manag. 2022, 19, 5152–5169. [Google Scholar] [CrossRef]
- Layeghy, S.; Portmann, M. Explainable Cross-domain Evaluation of ML-based Network Intrusion Detection Systems. Comput. Electr. Eng. 2022, 100, 108692. [Google Scholar] [CrossRef]
- Alsaffar, A.M.; Nouri-Baygi, M.; Zolbanin, H.M. Shielding networks: Enhancing intrusion detection with hybrid feature selection and stack ensemble learning. J. Big Data 2024, 11, 133. [Google Scholar] [CrossRef]
- Cantone, M.; Marrocco, C.; Bria, A. Machine learning in network intrusion detection: A cross-dataset generalization study. arXiv 2024, arXiv:2402.10974. [Google Scholar] [CrossRef]
- Korniszuk, K.; Sawicki, B. Autoencoder-Based Anomaly Detection in Network Traffic. In Proceedings of the 25th International Conference on Computational Problems of Electrical Engineering (CPEE), Stronie Śląskie, Poland, 10–13 September 2024; pp. 1–4. [Google Scholar]








| Research Work (Author, Year) | IDS | Single Attack | Cross-Dataset | Cross-Variant | SHAP | UMAP |
|---|---|---|---|---|---|---|
| Javaid et al. (2016) [20] | Yes | No | No | No | No | No |
| Sharafaldin, Lashkari and Ghorbani (2017) [7] | Yes | No | No | No | No | No |
| Yin et al. (2017) [21] | Yes | No | No | No | No | No |
| Al-Turaiki et al. (2021) [22] | Yes | No | No | No | No | No |
| Apruzzese et el. (2022) [23] | Yes | No | Yes | Yes | No | No |
| Layeghy and Portmann (2022) [24] | Yes | No | Yes | No | Yes | No |
| Alsaffar, Nouri-Baygi and Zolbanin (2024) [25] | Yes | No | No | No | No | No |
| Cantone, Marrocco and Bria (2024) [26] | Yes | No | Yes | No | No | Yes (visualization) |
| Korniszuk and Sawicki (2024) [27] | Yes | No | No | No | No | No |
| This Research work | Yes | Yes | No | Yes | Yes | Yes |
| Number of Trees | 100 |
| Random state | 42 |
| Criteria | Gini Impurity |
| Layer | Neurons | Activation | Dropout |
|---|---|---|---|
| Input Layer | 64 | ReLU | 0.3 |
| Hidden Layer 1 | 32 | ReLU | 0.2 |
| Hidden Layer 2 | 16 | ReLU | - |
| Output Layer | 1 | Sigmoid | - |
| Hyperparameters | Value | ||
| Loss Function | Binary Cross-entropy | ||
| Optimizer | Adam (lr = 0.001) | ||
| Batch Size | 256 | ||
| Epochs | 50 | ||
| Early Stopping | Yes (patience = 5) | ||
| Predicted Benign | Predicted DoS Hulk | |
|---|---|---|
| RF | ||
| Actual Benign | 49,877 | 86 |
| Actual DoS Hulk | 47 | 49,745 |
| DNN | ||
| Actual Benign | 48,793 | 1170 |
| Actual DoS Hulk | 79 | 49,713 |
| Threshold | Precision | Recall | F1-Score | Accuracy |
|---|---|---|---|---|
| 0.10 | 0.979 | 0.533 | 0.690 | 0.760 |
| 0.20 | 0.987 | 0.446 | 0.615 | 0.720 |
| 0.30 | 0.992 | 0.425 | 0.595 | 0.711 |
| 0.40 | 0.993 | 0.403 | 0.573 | 0.700 |
| 0.50 | 0.994 | 0.345 | 0.512 | 0.668 |
| 0.60 | 0.994 | 0.270 | 0.424 | 0.634 |
| 0.70 | 0.994 | 0.186 | 0.314 | 0.592 |
| 0.80 | 0.993 | 0.124 | 0.220 | 0.561 |
| 0.90 | 0.987 | 0.059 | 0.111 | 0.529 |
| Predicted Benign | Predicted DoS Goldeneye | ||
|---|---|---|---|
| RF with Default Threshold (0.5) | |||
| Actual Benign | 9971 | 22 | Accuracy: 66.89% |
| Actual DoS Goldeneye | 6598 | 3402 | |
| RF with optimal threshold (0.1) | |||
| Actual Benign | 9881 | 112 | Accuracy: 76.07% |
| Actual DoS Goldeneye | 4672 | 5328 | |
| Threshold | Precision | Recall | F1-Score | Accuracy |
|---|---|---|---|---|
| 0.10 | 0.947 | 0.448 | 0.608 | 0.711 |
| 0.20 | 0.947 | 0.442 | 0.602 | 0.708 |
| 0.30 | 0.949 | 0.434 | 0.595 | 0.705 |
| 0.40 | 0.953 | 0.419 | 0.582 | 0.699 |
| 0.50 | 0.953 | 0.412 | 0.575 | 0.694 |
| 0.60 | 0.952 | 0.404 | 0.568 | 0.692 |
| 0.70 | 0.952 | 0.397 | 0.560 | 0.688 |
| 0.80 | 0.952 | 0.385 | 0.549 | 0.683 |
| 0.90 | 0.987 | 0.364 | 0.532 | 0.679 |
| Predicted Benign | Predicted DoS Goldeneye | ||
|---|---|---|---|
| DNN with Default Threshold (0.5) | |||
| Actual Benign | 9792 | 201 | Accuracy: 69.40% |
| Actual DoS Goldeneye | 5917 | 4083 | |
| DNN with optimal threshold (0.1) | |||
| Actual Benign | 9742 | 251 | Accuracy: 71.11% |
| Actual DoS Goldeneye | 5523 | 4477 | |
| Threshold | Precision | Recall | F1-Score | Accuracy |
|---|---|---|---|---|
| 0.10 | 0.960 | 0.215 | 0.351 | 0.603 |
| 0.20 | 0.971 | 0.186 | 0.312 | 0.590 |
| 0.30 | 0.053 | 0.000 | 0.000 | 0.498 |
| 0.40 | 0.056 | 0.000 | 0.000 | 0.498 |
| 0.50 | 0.077 | 0.000 | 0.000 | 0.498 |
| 0.60 | 0.091 | 0.000 | 0.000 | 0.498 |
| 0.70 | 0.091 | 0.000 | 0.000 | 0.498 |
| 0.80 | 0.000 | 0.000 | 0.000 | 0.498 |
| 0.90 | 0.000 | 0.000 | 0.000 | 0.498 |
| Predicted Benign | Predicted DoS Slowloris | ||
|---|---|---|---|
| RF with Default Threshold (0.5) | |||
| Actual Benign | 4988 | 12 | Accuracy: 49.88% |
| Actual DoS Slowloris | 4999 | 0 | |
| RF with optimal threshold (0.1) | |||
| Actual Benign | 4955 | 45 | Accuracy: 60.3% |
| Actual DoS Slowloris | 3926 | 1074 | |
| Threshold | Precision | Recall | F1-Score | Accuracy |
|---|---|---|---|---|
| 0.10 | 0.038 | 0.001 | 0.002 | 0.488 |
| 0.20 | 0.039 | 0.001 | 0.002 | 0.488 |
| 0.30 | 0.025 | 0.001 | 0.001 | 0.488 |
| 0.40 | 0.030 | 0.001 | 0.001 | 0.491 |
| 0.50 | 0.030 | 0.001 | 0.001 | 0.491 |
| 0.60 | 0.030 | 0.001 | 0.001 | 0.491 |
| 0.70 | 0.031 | 0.001 | 0.001 | 0.491 |
| 0.80 | 0.021 | 0.000 | 0.001 | 0.491 |
| 0.90 | 0.065 | 0.000 | 0.001 | 0.497 |
| Predicted Benign | Predicted DoS Slowloris | ||
|---|---|---|---|
| DNN with default threshold (0.5) | |||
| Actual Benign | 4903 | 97 | Accuracy: 49.1% |
| Actual DoS Slowloris | 4997 | 3 | |
| DNN with optimal threshold (0.9) | |||
| Actual Benign | 4876 | 124 | Accuracy: 49.7% |
| Actual DoS Slowloris | 4906 | 94 | |
| Threshold | Precision | Recall | F1-Score | Accuracy |
|---|---|---|---|---|
| 0.1 | 0.962 | 0.233 | 0.375 | 0.612 |
| 0.2 | 0.947 | 0.099 | 0.181 | 0.547 |
| 0.3 | 0.763 | 0.012 | 0.023 | 0.504 |
| 0.4 | 0.000 | 0.000 | 0.000 | 0.498 |
| 0.5 | 0.000 | 0.000 | 0.000 | 0.498 |
| 0.6 | 0.000 | 0.000 | 0.000 | 0.499 |
| 0.7 | 0.000 | 0.000 | 0.000 | 0.499 |
| 0.8 | 0.000 | 0.000 | 0.000 | 0.499 |
| 0.9 | 0.000 | 0.000 | 0.000 | 0.499 |
| Predicted Benign | Predicted DoS Slowhttptest | ||
|---|---|---|---|
| RF with Default Threshold (0.5) | |||
| Actual Benign | 4988 | 12 | Accuracy: 49.88% |
| Actual DoS Slowhttptest | 5000 | 0 | |
| RF with optimal threshold (0.1) | |||
| Actual Benign | 4955 | 45 | Accuracy: 61.20% |
| Actual DoS Slowhttptest | 3835 | 1165 | |
| Threshold | Precision | Recall | F1-Score | Accuracy |
|---|---|---|---|---|
| 0.10 | 0.332 | 0.013 | 0.025 | 0.493 |
| 0.20 | 0.364 | 0.013 | 0.024 | 0.495 |
| 0.30 | 0.366 | 0.013 | 0.024 | 0.495 |
| 0.40 | 0.370 | 0.012 | 0.023 | 0.496 |
| 0.50 | 0.380 | 0.012 | 0.023 | 0.496 |
| 0.60 | 0.380 | 0.012 | 0.023 | 0.496 |
| 0.70 | 0.385 | 0.012 | 0.023 | 0.496 |
| 0.80 | 0.385 | 0.012 | 0.023 | 0.496 |
| 0.90 | 0.667 | 0.012 | 0.024 | 0.503 |
| Predicted Benign | Predicted DoS Slowhttptest | ||
|---|---|---|---|
| DNN with default threshold (0.5) | |||
| Actual Benign | 4903 | 97 | Accuracy: 49.6% |
| Actual DoS Slowhttptest | 4997 | 3 | |
| DNN with optimal threshold (0.9) | |||
| Actual Benign | 4874 | 126 | Accuracy: 50.3% |
| Actual DoS Slowhttptest | 4844 | 156 | |
| Attack Variant | RF Best Accuracy | DNN Best Accuracy |
|---|---|---|
| DoS Hulk | 1.00 | 0.99 |
| Goldeneye | 0.76 | 0.71 |
| Slowloris | 0.60 | 0.50 |
| SlowHTTPTest | 0.61 | 0.50 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jameel, R.; Marwah, K.; Idrees, S.M.; Nowostawski, M. Evaluating the Generalization Gaps of Intrusion Detection Systems Across DoS Attack Variants. J. Cybersecur. Priv. 2025, 5, 85. https://doi.org/10.3390/jcp5040085
Jameel R, Marwah K, Idrees SM, Nowostawski M. Evaluating the Generalization Gaps of Intrusion Detection Systems Across DoS Attack Variants. Journal of Cybersecurity and Privacy. 2025; 5(4):85. https://doi.org/10.3390/jcp5040085
Chicago/Turabian StyleJameel, Roshan, Khyati Marwah, Sheikh Mohammad Idrees, and Mariusz Nowostawski. 2025. "Evaluating the Generalization Gaps of Intrusion Detection Systems Across DoS Attack Variants" Journal of Cybersecurity and Privacy 5, no. 4: 85. https://doi.org/10.3390/jcp5040085
APA StyleJameel, R., Marwah, K., Idrees, S. M., & Nowostawski, M. (2025). Evaluating the Generalization Gaps of Intrusion Detection Systems Across DoS Attack Variants. Journal of Cybersecurity and Privacy, 5(4), 85. https://doi.org/10.3390/jcp5040085

