Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Feature-Optimized Machine Learning Approaches for Enhanced DDoS Attack Detection and Mitigation

Computers 2025, 14(11), 472; https://doi.org/10.3390/computers14110472

by Ahmed Jamal Ibrahim^1,2,*, Sándor R. Répás¹ and Nurullah Bektaş^3,*

Reviewer 1: Anonymous

Reviewer 2:

João Henriques

Reviewer 3:

Anureev Igor

Reviewer 4:

Goncalo Jesus

Computers 2025, 14(11), 472; https://doi.org/10.3390/computers14110472

Submission received: 5 September 2025 / Revised: 22 October 2025 / Accepted: 27 October 2025 / Published: 1 November 2025

(This article belongs to the Special Issue Intrusion Detection and Trust Provisioning in Edge-of-Things Environment)

Round 1

Reviewer 1 Report (Previous Reviewer 1)

Comments and Suggestions for Authors

Several critical issues remain unaddressed:

No validation on real network traffic (live or replayed); claims of real-time suitability lack runtime/throughput evidence.

No analysis or measurements of computational complexity, latency, throughput, CPU/memory usage.

Limited generalization: evaluation is bound to a single scenario/dataset split; no cross-vector tests (e.g., beyond DNS).

Integration remains conceptual: no working prototype within an Intrusion Detection System (IDS) or Security Information and Event Management (SIEM) platform.

Contribution positioning is still ambiguous (no ablation to isolate the value of calibration, thresholds, or policy components).

Reproducibility gaps (missing code/configs/data artifacts) and lingering editorial issues (typos/inconsistent figure/table references).

Author Response

Comment 1: No validation on real network traffic (live or replayed); claims of real-time suitability lack runtime/throughput evidence.

Response:

We thank the reviewer for this valuable comment. The current study focuses primarily on the development and evaluation of a feature-optimized ML framework using the CICDDoS2019 benchmark dataset to ensure methodological rigor and reproducibility. While live network validation was beyond the present study’s scope, the framework was explicitly designed for real-time applicability through optimized feature selection, reduced model complexity, and minimal inference latency.

To address this point, we have added a clarification in the end of Discussion section, emphasizing that the next stage of this research will involve deployment and runtime validation using real or replayed network traffic to further assess throughput and real-time performance.

Comment 2: No analysis or measurements of computational complexity, latency, throughput, CPU/memory usage.

Response: We appreciate this valuable observation. While explicit hardware-level profiling was not performed, we have added a detailed discussion analyzing computational efficiency in terms of model complexity and inference latency (~0.004 seconds). This section, placed before the conclusion, highlights the framework’s low overhead and suitability for real-time deployment, with future work focused on runtime and resource utilization validation.

Comment 3: Limited generalization: evaluation is bound to a single scenario/dataset split; no cross-vector tests (e.g., beyond DNS).

Response: The current evaluation focuses on the CICDDoS2019 dataset, particularly emphasizing DNS-based attack vectors to ensure controlled and consistent validation of the proposed framework. This focus allows for a detailed assessment of detection accuracy and computational efficiency within a defined scope. However, we acknowledge the importance of broader generalization. Therefore, future work will extend evaluation to additional attack vectors (e.g., LDAP, NTP, UDP-Lag) and multiple dataset splits to further validate cross-vector robustness and adaptability of the framework in diverse real-world scenarios.

Comment 4: Integration remains conceptual: no working prototype within an Intrusion Detection System (IDS) or Security Information and Event Management (SIEM) platform.

Response: The current study focuses on developing and validating the proposed framework through systematic feature optimization and extensive experimentation using the CICDDoS2019 dataset. While integration into an IDS or SIEM environment is not yet implemented, the framework’s architecture and real-time inference capability (0.004 s per instance) were explicitly designed to support seamless deployment in such systems. Future work will extend this research toward building a functional prototype integrated with existing IDS/SIEM platforms to evaluate performance in live network environments.

Comment 5:Contribution positioning is still ambiguous (no ablation to isolate the value of calibration, thresholds, or policy components).

Response: We appreciate the reviewer’s valuable comment. This study focuses on developing a feature-optimized, low-complexity machine learning framework capable of achieving high accuracy while maintaining real-time applicability. Although a detailed ablation study was not included, the effects of calibration, thresholding, and policy tuning are implicitly reflected in the optimized performance and model stability. Importantly, the framework was intentionally designed with low computational complexity to facilitate future hardware implementation on FPGA-based or edge devices for real-time DDoS detection and mitigation. Future work will include a comprehensive ablation analysis to further quantify the contribution of each component and validate the model in hardware-based environments.

Comment 6:Reproducibility gaps (missing code/configs/data artifacts) and lingering editorial issues (typos/inconsistent figure/table references).

Response: We thank the reviewer for this important observation. To enhance reproducibility, all code implementations, configuration details, and dataset references have now been clearly described and made available in the revised manuscript. Specifically, the complete model configurations and hyperparameter settings were added at the end of Section 4.1.4, ensuring full experimental transparency. In addition, editorial corrections have been made throughout the manuscript to fix typographical errors and ensure consistent figure and table references. These revisions address the reproducibility and presentation concerns raised.

Author Response File: Author Response.pdf

Reviewer 2 Report (Previous Reviewer 2)

Comments and Suggestions for Authors

The authors aim to improve the accuracy when detecting DDOS attacks supported by a specific dataset.

Despite this, the motivation or contributions of this work are unclear.

The abstract should not include the dataset. Moreover, the dataset should be introduced with a reference.

Authors should look for a clear gap in the literature.

Minor edit issues should be fixed"state-ment", "re-duction".

The proposed framework should not include a literature review.

I'd like to point out that the results require additional discussion.

Authors should clearly define the novelty of this work.

Some conclusions are not supported by results (e.g., robust, rigorous, exceptional, scalability, etc).

Author Response

Comment 1: The authors aim to improve the accuracy when detecting DDOS attacks supported by a specific dataset. Despite this, the motivation or contributions of this work are unclear.

Response: We sincerely thank the reviewer for this constructive observation. To clarify, we have strengthened the motivation and contributions section in the revised manuscript. The study is motivated by the persistent challenge of accurately detecting and mitigating sophisticated DDoS attacks in real-time, where traditional methods and existing ML approaches often suffer from high false-positive rates, poor scalability, and limited adaptability.
The main contributions of this work are now explicitly stated as follows:
Development of a feature-optimized machine learning framework that integrates systematic feature selection with advanced ML classifiers (RF, DT, CB, GB, and LR) for DDoS detection using the CICDDoS2019 dataset.
Implementation of a two-phase hybrid feature selection method combining correlation analysis and Random Forest feature importance ranking to enhance model interpretability and efficiency.
Achievement of high accuracy (99.85%) with reduced inference time, ensuring real-time applicability and scalability for deployment in practical network environments.
Comprehensive evaluation against prior studies on the same dataset (CICDDoS2019), demonstrating the robustness and superiority of the proposed approach.
These clarifications have been incorporated into the Introduction and Methodology sections, providing a clearer statement of purpose and emphasizing the novelty and significance of the proposed framework.

Comment 2: The abstract should not include the dataset. Moreover, the dataset should be introduced with a reference.

Response: We thank the reviewer for this helpful suggestion. In response, the dataset name has been removed from the abstract to maintain generality and focus on the study’s objectives and contributions. The CICDDoS2019 dataset is now properly introduced and referenced in the Methodology section (Section 4.1), where a citation to its original source has been added. This revision ensures clarity, academic consistency, and compliance with standard publication practices.

Comment 3: Authors should look for a clear gap in the literature.

Response: We thank the reviewer for the helpful remark. The research gap has been clearly defined and added to the middle of the Introduction, emphasizing that prior studies lacked systematic feature selection, efficient model optimization, and real-time mitigation. Our revised text highlights how the proposed framework directly addresses these limitations.

Commnet 4: Minor edit issues should be fixed"state-ment", "re-duction".

Response: We thank the reviewer for noting these typographical issues. All minor formatting inconsistencies, including words such as “state-ment” and “re-duction,” have been carefully reviewed and corrected throughout the manuscript to ensure linguistic accuracy and consistency.

Commnet 5: The proposed framework should not include a literature review.

Response: We thank the reviewer for this comment. The literature-related content has been removed from the Proposed Framework section and retained only in the Literature Review section to maintain a clear structural focus.

Comment 6: I'd like to point out that the results require additional discussion.

Response: We thank the reviewer for the comment. Additional discussion has been added after Table 4, highlighting the impact of feature optimization, real-time applicability, and mitigation potential to strengthen result interpretation.

before the conclusion

Comment 7: Authors should clearly define the novelty of this work.

Response : We thank the reviewer for the comment. The novelty has been clearly defined at the end of the Introduction, emphasizing a feature-optimized ML framework that enhances both detection and mitigation of DDoS attacks while reducing model complexity, overfitting, and inference time for real-time scalability.

Before Figure 1

Comment 8: Some conclusions are not supported by results (e.g., robust, rigorous, exceptional, scalability, etc).

Response: We appreciate the comment. All subjective terms (e.g., robust, rigorous, exceptional, scalability) have been revised to ensure that conclusions are fully supported by the presented results.

Author Response File: Author Response.pdf

Reviewer 3 Report (Previous Reviewer 3)

Comments and Suggestions for Authors

The text of the article is written very carelessly. Some conclusions are insufficiently substantiated. The details are given below. The article needs significant revision.

1. Line 23: "previous work. and proving" -- You need a comma instead of a dot.

2. Line 38: "quality[1] ." -- Extra space before the dot.

3. Line 45: "This study investigates the growing challenge of detecting sophisticated DDoS attacks using ML techniques." -- Is that the purpose of this article? It would be good to set a specific goal here.

4. Line 48: "The framework proposed is designed for scalability" -- What kind of framework? What is it intended for? Nothing was said about it above.

5. Line 48: "The framework proposed is designed for scalability and integration into existing network security systems[2], [3]." -- Why should it be integrated into real systems? Is checking on the CICDDoS2019 dataset enough for this?

6. Line 39 "The domain of DDoS mitigation encompasses classical detection methods such as signature-based detection, statistical analysis, and threshold-based approaches. However, these methods often struggle with high false-positive rates and poor adaptability to evolving attack patterns", and Line 61 "Classical detection techniques for DDoS attacks, such as signature-based detection, threshold-based detection, and statistical analysis methods, depend on predefined patterns or traffic behavior analysis" are paraphrases.

7. Line 98: Comparison is made with methods trained on other datasets. It is ok. But it is unclear whether the results are compared on the same dataset or not. If not (and this is most likely the case), then such a comparison is incorrect. Why is there no comparison with other studies on the same CICDDoS2019 dataset, which can be found in the search bar https://scholar.google.com/scholar?hl=ru&as_sdt=0%2C5&q=CICDDoS2019&btnG=. For example, with the article by "Ashraf, Usman, et al. "A machine learning based approach for the detection of DDoS attacks on internet of things using CICDDoS2019 dataset-PortMap." Lahore Garrison University research journal of computer science and information technology 8.2 (2024)".

8. Line 104: "A comprehensive dataset analysis identified and reduced the unimportant features that play a pivotal role in achieving high accuracy." -- Why reduce them if they play an important role? Most likely, there is an incorrectly worded sentence here.

9. Line 112: "often improving interpretability" -- Why?

10. Line 114: "the proposed methodology prioritizes"" -- Is a methodology or framework proposed?

11. Line 118: "Figure 1 illustrates a comprehensive system for enhancing the detection of DDoS attacks using ML." -- Figure 1 shows a diagram or even a set of diagrams rather than a system.

12. Line 119: "comprehensive system for enhancing the detection of DDoS attacks using ML" -- Is a methodology, framework, or system proposed?

13. Line 125: "Section 2 Literature Review on ML based DDoS attack. Section 3 proposed statement. Section 4 methodology. Section 5 results and discussion. Finally, Section 6 con-cludes the paper." -- The copula verbs are missing.

14. Line 130: "2. Literature Review" begins with remnants of an earlier draft (two full paragraphs and part of a third).

15. Lines 146-165: "Why is this well-known text in a literary review?

16. Line 173: "model. the dataset" -- The word must begin with a capital letter.

17. Lin 258: The "Proposed Framework" is not properly described in this section.

18. Line 293: "Figure 2" instead of "Figure 3".

19. Line 294: "Machine learning flowchart" - The letter "e" in the word "Machine" is missing. She is also skipped on "Figure 2".

20. The diagram in Figure 2 is standard. Most of the component descriptions for this diagram are also well-known. It would be better to focus on the specifics of the proposed solution.

21. Lines 323-331: How does this list relate to the previous text? We need an introductory sentence.

22. Line 338: "Figure 12 in the Appendix" -- This figure has not been found.

23. There is no Figure 3.

24. The values of the parameters of the machine learning algorithms used (section 4.1.4) are not given in full, which does not allow reproducing the experiment.

25. Line 637: "The exclusion of these parameters is not arbitrary, but a deliberate optimization grounded in domain knowledge and empirical evidence." -- This offer is not enough to motivate the choice of features. We need specifics on the choice procedure. How can we guarantee that there are no more non-essential features left?

26. Line 651: "When comparing these results to previous studies [5], [41], the test accuracy of the Random Forest model surpasses the accuracy re-ported in prior studies." -- Was the comparison conducted on the same test dataset? If not, then such a conclusion cannot be drawn.

27. Line 708: "The results demonstrate that the developed model provides noticeable im-provements over previous studies. For Random Forest (RF), the model achieved an accuracy of 99.85%, compared to 99.11% in [38] and 97.23% in [42], showing an im-provement of 0.55% and 2.62%, respectively. In the case of Decision Tree (DT), the model reached an accuracy of 99.78%, surpassing the 98.25% reported in [13] by 1.53%. For CatBoost (CB) achieved 99.63%, which is 0.53% higher than the 99.1% reported in [11]." -- The same question remains about the same test dataset.

28. In the list of references, the works numbered 9 and 13 are the same work.

Author Response

Comment 1: Line 23: "previous work. and proving" You need a comma instead of a dot.

response

Thank you for your observation. We have revised the sentence to remove the fragment and ensure grammatical correctness. The updated text now reads:

“The developed ML model demonstrates exceptional performance, achieving 99.70% accuracy with Logistic Regression and 99.85% with Random Forest, representing improvements of 4.7% and 0.23% compared to previous work, respectively. In addition, the Decision Tree algorithm achieved 99.85% accuracy, with an inference time as low as 0.004 seconds, proving its suitability for identifying DDoS attacks in real time.”

Comment 2 2. Line 38: "quality[1] ." -- Extra space before the dot.

response

Thank you for pointing this out. We have corrected the formatting issue by removing the extra space, and the sentence now reads correctly as “…creating bottlenecks and degrading service quality [1].”

Comment 3 3. Line 45: "This study investigates the growing challenge of detecting sophisticated DDoS attacks using ML techniques." -- Is that the purpose of this article? It would be good to set a specific goal here.

response statement to explicitly set a clear goal for the study. The updated text now: in line 47-50. This study aims to enhance the detection and mitigation of sophisticated DDoS attacks by applying feature selection and optimizing state-of-the-art machine learning algorithms to achieve high accuracy, low inference time, and real-time applicability. The focus is on leveraging the CICDDoS2019 dataset, applying feature optimization, and optimizing ensemble learning algorithms such as Random Forest to achieve high detection accuracy. Furthermore, it is designed for scalability and seamless integration into existing network security systems[2], [3].

Comment 4 . Line 48: "The framework proposed is designed for scalability" -- What kind of framework? What is it intended for? Nothing was said about it above.

response Thank you for pointing this out. We have revised the text to clarify the type and purpose of the framework. The updated sentence now reads: in the page 2. “… Furthermore, the proposed framework is a machine learning–based detection and mitigation system intended for real-time identification and response to DDoS attacks. It is designed for scalability and seamless integration into existing network security systems [2], [3].”

Comment 5 Line 48: "The framework proposed is designed for scalability and integration into existing network security systems[2], [3]." -- Why should it be integrated into real systems? Is checking on the CICDDoS2019 dataset enough for this?

response Thank you for this comment. Integration into real systems is essential to ensure practical applicability and real-time defense against DDoS attacks. While the CICDDoS2019 dataset offers a strong benchmark for validation, we acknowledge that further real-world evaluation is required and have clarified this in the manuscript as a future research direction.

Comment 6 Line 39 "The domain of DDoS mitigation encompasses classical detection methods such as signature-based detection, statistical analysis, and threshold-based approaches. However, these methods often struggle with high false-positive rates and poor adaptability to evolving attack patterns", and Line 61 "Classical detection techniques for DDoS attacks, such as signature-based detection, threshold-based detection, and statistical analysis methods, depend on predefined patterns or traffic behavior analysis" are paraphrases.

response We thank the reviewer for this observation. To avoid redundancy, we have revised and consolidated the explanation of classical methods into a single sentence. The updated text now reads: in the first 4 lines in the of page 2 “Classical DDoS detection methods, including signature-based, statistical, and threshold-based approaches, rely on predefined traffic patterns or behavior analysis. Although effective to some extent, these methods often suffer from high false-positive rates and limited adaptability to evolving attack patterns.” This modification removes the paraphrasing issue and improves clarity while retaining the intended meaning.

Comment 7: Line 98: Comparison is made with methods trained on other datasets. It is ok. But it is unclear whether the results are compared on the same dataset or not. If not (and this is most likely the case), then such a comparison is incorrect. Why is there no comparison with other studies on the same CICDDoS2019 dataset, which can be found in the search bar https://scholar.google.com/scholar?hl=ru&as_sdt=0%2C5&q=CICDDoS2019&btnG=. For example, with the article by "Ashraf, Usman, et al. "A machine learning based approach for the detection of DDoS attacks on internet of things using CICDDoS2019 dataset-PortMap." Lahore Garrison University research journal of computer science and information technology 8.2 (2024)".

response We thank the reviewer for this important comment. We would like to clarify that the study by Ashraf et al. (2024) focused primarily on the PortMap attack from the CICDDoS2019 dataset, whereas our work specifically targets DNS-based DDoS attacks, which were not addressed in their study. This difference in scope explains the variation in results, as our goal was CICDDoS2019 dataset, which can be found in the search bar https://scholar.google.com/scholar?hl=ru&as_ sdt=0%2C5&q=CICDDoS2019&btnG=. For example, with the article by "Ashraf, Usman, et al. "A machine learning based approach for the detection of DDoS attacks on internet of things using CICDDoS2019 dataset-PortMap." Lahore Garrison University research journal of computer science and information technology 8.2 (2024)".

Comment 8: . Line 104: "A comprehensive dataset analysis identified and reduced the unimportant features that play a pivotal role in achieving high accuracy." -- Why reduce them if they play an important role? Most likely, there is an incorrectly worded sentence here.

response Thank you for pointing this out. The sentence was incorrectly worded and has been revised for clarity. The intended meaning was that eliminating unimportant features plays a pivotal role in achieving high accuracy. The corrected version now reads:in Line 109. A comprehensive dataset analysis identified and eliminated unimportant features, a process that plays a pivotal role in achieving high accuracy.

Comment 9: . Line 112: "often improving interpretability" -- Why?

response Thank you for pointing this out. We clarified the sentence to explain that reducing redundant or insignificant features not only improves efficiency but also makes the model’s behavior easier to understand. By focusing on a smaller set of meaningful features, the relationship between inputs and outputs becomes clearer, thereby improving interpretability.

Comment 10: Line 114: "the proposed methodology prioritizes"" -- Is a methodology or framework proposed?

response Thank you for this observation. We have clarified the wording to avoid ambiguity. The correct term is “framework”, as our contribution is a structured system for feature optimization and machine learning–based DDoS detection and mitigation. The revised sentence now reads: “By addressing these aspects, the proposed framework prioritizes selecting and evaluating the key features, resulting in a model that delivers superior accuracy.”

Comment 11. Line 118: "Figure 1 illustrates a comprehensive system for enhancing the detection of DDoS attacks using ML." -- Figure 1 shows a diagram or even a set of diagrams rather than a system.

response We agree and have revised the wording. The text and caption now describe Figure 1 as a conceptual framework of the proposed approach rather than a system.

Comment 12: Line 119: "comprehensive system for enhancing the detection of DDoS attacks using ML" -- Is a methodology, framework, or system proposed?

response Thank you for the comment. To avoid ambiguity, we have revised the wording to clarify that what we

propose is a framework, not a full system. The sentence now reads:

Comment 13 Line 125: "Section 2 Literature Review on ML based DDoS attack. Section 3 proposed statement. Section 4 methodology. Section 5 results and discussion. Finally, Section 6 con-cludes the paper." -- The copula verbs are missing.

response Thank you for the comment. We have revised the sentence to include the missing verbs and improve readability. The updated version now reads: “Section 2 presents a literature review on ML-based DDoS attacks. Section 3 introduces the problem statement. Section 4 describes the methodology. Section 5 presents and discusses the results. Finally, Section 6 concludes the paper.”

Comment 14: Line 130: "2. Literature Review" begins with remnants of an earlier draft (two full paragraphs and part of a third).

response We thank the reviewer for this valuable observation. The unintended template text that remained at the beginning of Section 2 has been removed. The Literature Review section has been revised to start directly with the relevant background on DDoS attack types and the structure of such attacks, ensuring clarity and consistency.

Comment 15: Lines 146-165: "Why is this well-known text in a literary review?

response We appreciate the reviewer’s observation. The section in question contained general background information on DDoS attack mechanisms, which is already well known in the field. To address this, we have revised the Literature Review so that it now focuses only on related studies and prior research contributions. The general background has been condensed and repositioned to the Introduction, where it provides context for the study without overlapping with the Literature Review.

Comment 16: Line 173: "model. the dataset" -- The word must begin with a capital letter.

response We thank the reviewer for noticing this. The sentence has been corrected to: “… model. The dataset …” with the proper capitalization.

Comment 17: Lin 258: The "Proposed Framework" is not properly described in this section.

response We thank the reviewer for this valuable observation. In the revised manuscript, Section 3 has been expanded to begin with a clear and detailed description of the proposed framework. The new paragraph outlines the workflow, including dataset preprocessing, systematic feature selection, model training and evaluation, and the mitigation component. By adding this explanation and explicitly linking it to Figure 1, the framework is now described comprehensively and distinctly before presenting related works and comparisons.

Comment 18: Line 293: "Figure 2" instead of "Figure 3".

response We thank the reviewer for noting this error. The figure reference has been corrected to “Figure 2” in the revised manuscript.

Comment 19: Line 294: "Machine learning flowchart" - The letter "e" in the word "Machine" is missing. She is also skipped on "Figure 2".

response We thank the reviewer for identifying this typographical error. The word has been corrected to “Machine learning flowchart” in both the main text and the caption of Figure 2 in the revised manuscript.

Comment 20: The diagram in Figure 2 is standard. Most of the component descriptions for this diagram are also well-known. It would be better to focus on the specifics of the proposed solution.

response We thank the reviewer for this valuable observation. In the revised manuscript, the lengthy generic description of Figure 2 has been removed. Instead, we added a concise paragraph before Figure 2 that highlights the novelty of our proposed framework, emphasizing systematic feature selection, optimized ML classifiers, and real-time detection and mitigation. This ensures that the focus is placed on the contribution of our work rather than on standard, well-known DDoS attack components.

Comment 21: Lines 323-331: How does this list relate to the previous text? We need an introductory sentence.

Response We thank the reviewer for this helpful comment. An introductory sentence has been added before the list to explicitly link it to the preceding discussion of OSI layers. The revised text now reads: “… This mapping highlights how the CICDDoS2019 dataset captures attack patterns across different layers, enhancing its utility for layered DDoS defense analysis. The following examples illustrate key attack categories from the CICDDoS2019 dataset and their association with specific OSI layers:”

Comment 22: Line 338: "Figure 12 in the Appendix" -- This figure has not been found.

Response We appreciate the reviewer’s observation. Figure 12 has now been incorporated directly into the main manuscript Appendix to ensure it is clearly accessible

Comment 23 There is no Figure 3.

Response We thank the reviewer for the observation. Figure 3 is already included in the manuscript, and we have verified that both the figure and its reference are correctly placed in the revised version.

Comment 24 The values of the parameters of the machine learning algorithms used (section 4.1.4) are not given in full, which does not allow reproducing the experiment.

Response We appreciate the reviewer’s comment. The full hyperparameter configurations and experimental setup details for all machine learning models have now been added at the end of Section 4.1.4 to ensure complete reproducibility and transparency.

Comment 25. Line 637: "The exclusion of these parameters is not arbitrary, but a deliberate optimization grounded in domain knowledge and empirical evidence." -- This offer is not enough to motivate the choice of features. We need specifics on the choice procedure. How can we guarantee that there are no more non-essential features left?

Response We thank the reviewer for this insightful comment. To clarify the feature selection process, we have expanded the methodology section to include a detailed explanation of the two-phase hybrid approach used. Specifically, correlation analysis was first employed to remove redundant or highly correlated features, followed by Random Forest feature importance ranking to evaluate the predictive contribution of the remaining attributes. Cross-validation was then applied to ensure that no essential features were excluded and that model performance remained stable after feature reduction. This addition provides a transparent and reproducible justification for the selected features. Added in the end of section 4.1.2

Comment26 Line 651: "When comparing these results to previous studies [5], [41], the test accuracy of the Random Forest model surpasses the accuracy re-ported in prior studies." -- Was the comparison conducted on the same test dataset? If not, then such a conclusion cannot be drawn.

Response We thank the reviewer for this valuable comment. We agree that direct numerical comparison should only be made when the same dataset and experimental conditions are used. Accordingly, the sentence has been revised to clarify that the comparison with prior studies [41] [42] [44] is contextual. The updated text now emphasizes that both the Random Forest and Decision Tree models demonstrate improved performance and higher stability within the CICDDoS2019 dataset compared to trends reported in earlier research, without implying direct numerical equivalence. This revision ensures methodological accuracy and proper interpretation of the results. Added in the first paragraph of section 5.2

Comment 27 Line 708: "The results demonstrate that the developed model provides noticeable im-provements over previous studies. For Random Forest (RF), the model achieved an accuracy of 99.85%, compared to 99.11% in [38] and 97.23% in [42], showing an im-provement of 0.55% and 2.62%, respectively. In the case of Decision Tree (DT), the model reached an accuracy of 99.78%, surpassing the 98.25% reported in [13] by 1.53%. For CatBoost (CB) achieved 99.63%, which is 0.53% higher than the 99.1% reported in [11]." -- The same question remains about the same test dataset.

response We thank the reviewer for this important observation. We confirm that the comparative analysis was conducted using the same dataset, CICDDoS2019, as employed in the referenced studies. All reported accuracy values were obtained under the same dataset conditions, ensuring a valid and fair comparison. This clarification has been added to the revised manuscript to explicitly state that the performance comparison is based on the same dataset.

Comment 28 In the list of references, the works numbered 9 and 13 are the same work.

response We thank the reviewer for noting this duplication. The repeated reference has been carefully reviewed and corrected in the revised manuscript. The duplicate entry has been removed, and all subsequent references have been renumbered accordingly to ensure consistency throughout the paper.

Author Response File: Author Response.pdf

Reviewer 4 Report (Previous Reviewer 4)

Comments and Suggestions for Authors

This re-submited paper shows a improvement structurally and claritatively. The paper is well-structured and the methodology description is thorough, covering the preprocessing, the feature selection, as well as the usage of various ML models. The additional explanations of the framework and the mitigating approach further refine the study for clarity. However, there are several aspects that require further clarification and improvement:

1- The inconsistent statement in Section 4.1.4 between “four” and “five” ML algorithms continues. This would require correction for the purpose of eliminating confusion.

2 - The authors further explained the rationale for exclusion of features, which improves the paper. The claim, however, that feature selection reduces computational complexity does not yet come with empirical backup. The contribution would be more robust had the authors shown real gains in efficiency, i.e., reduced training time, less memory usage, or faster inference.

3- The study also solely utilizes the CICDDoS2019 dataset. Acknowledging the limitation of the validation based on the datasets, the omission of cross-dataset evaluation (e.g., CICIDS2017, UNSW-NB15, Bot-IoT) is a limitation. Given the inability of the model to generalize in other contexts, it becomes hard determining how much the proposed model generalizes well to the unknown traffic. At least, the authors must discuss this limitation more overtly in the conclusion and the discussion.

Author Response

Commnet 1: The inconsistent statement in Section 4.1.4 between “four” and “five” ML algorithms continues. This would require correction for the purpose of eliminating confusion.

Response Thank you for noting this. The inconsistency has been corrected Section 4.1.4 now consistently refers to five ML algorithms for clarity.

Comment 2: The authors further explained the rationale for exclusion of features, which improves the paper. The claim, however, that feature selection reduces computational complexity does not yet come with empirical backup. The contribution would be more robust had the authors shown real gains in efficiency, i.e., reduced training time, less memory usage, or faster inference.

Response

We appreciate the reviewer’s constructive feedback. While explicit runtime and memory profiling were not performed, the efficiency gains are indirectly reflected through the minimal performance gap between training, validation, and testing phases, as well as the achieved inference latency of approximately 0.004 seconds per instance. These results indicate reduced computational overhead and faster convergence due to optimized feature selection. In future work, we plan to include detailed empirical measurements of training time, memory consumption, and inference speed to quantitatively demonstrate these efficiency improvements.

Comment 3: The study also solely utilizes the CICDDoS2019 dataset. Acknowledging the limitation of the validation based on the datasets, the omission of cross-dataset evaluation (e.g., CICIDS2017, UNSW-NB15, Bot-IoT) is a limitation. Given the inability of the model to generalize in other contexts, it becomes hard determining how much the proposed model generalizes well to the unknown traffic. At least, the authors must discuss this limitation more overtly in the conclusion and the discussion.

Response We thank the reviewer for this important observation. We acknowledge that the current study is limited to the CICDDoS2019 dataset, which, while comprehensive, does not fully represent the variability of real-world traffic. To address this, we have explicitly discussed this limitation in both the Results and Discussion and Conclusion sections. The revised text now emphasizes that future work will focus on cross-dataset evaluation using CICIDS2017, UNSW-NB15, and Bot-IoT to assess the model’s generalization capability and robustness against diverse traffic behaviors and unseen attack patterns.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report (Previous Reviewer 1)

Comments and Suggestions for Authors

The authors have clearly put in substantial effort adding clarifications, reporting per-model timing numbers, expanding the policy mapping, and improving the write-up. That said, several of the original concerns remain either unaddressed or only partially addressed. To reach publication quality especially for a work that argues “real-time suitability” and practical deployment these gaps should be closed.

Live/replayed traffic validation:

No experiments on live or replayed network traffic; “real-time suitability” is not demonstrated under a realistic data path.

Performance profiling under load:

No throughput (pps/Gbps), no end-to-end latency under sustained traffic, and no CPU/RAM usage on target hardware only per-sample inference time is reported.

Generalization beyond a single scenario:

No cross-vector evaluation (beyond DNS) and no cross-dataset tests; broader validation is deferred with no results provided.

Working IDS/SIEM prototype:

No implemented integration with an IDS/SIEM or programmable data plane; no deployment architecture or end-to-end system measurements.

Author Response

Comment 1: No experiments on live or replayed network traffic; “real-time suitability” is not demonstrated under a realistic data path.

Response: We thank the reviewer for this valuable comment. We acknowledge that live or replayed traffic validation was not performed in this study. The current work focused on model development, optimization, and evaluation using the benchmark CICDDoS2019 dataset to ensure reproducibility and a controlled comparison with prior research. To address this limitation, we have explicitly added a statement in the Discussion and Conclusion sections highlighting that future work will involve deploying the model in a live or replayed network environment to validate its real-time performance, including throughput and latency measurements. This clarification ensures transparency regarding the scope of the present study and defines a clear path for future experimental validation.

Comment 2: No throughput (pps/Gbps), no end-to-end latency under sustained traffic, and no CPU/RAM usage on target hardware; only per-sample inference time is reported.

Response: We thank the reviewer for this valuable comment. The current study primarily focuses on developing a low-complexity and computationally efficient ML model to ensure suitability for future hardware deployment. Accordingly, runtime (inference time) was analyzed and reported in the Results and Discussion section (Section 5) to demonstrate the model’s processing efficiency. Hardware-level metrics such as throughput, latency, and resource utilization will be thoroughly evaluated in our future FPGA-based validation, where real-time performance under sustained traffic will be assessed.

Comment 3: No cross-vector evaluation (beyond DNS) and no cross-dataset tests; broader validation is deferred with no results provided.

Response: We appreciate the reviewer’s thoughtful observation. The current study intentionally focuses on establishing a feature-optimized and computationally efficient baseline framework, ensuring high detection accuracy with minimal complexity a necessary foundation before broader deployment. Although cross-vector and cross-dataset evaluations were not conducted at this stage, the framework was explicitly designed for generalization and scalability. Future work will extend validation to diverse attack vectors and datasets to demonstrate full adaptability and real-world robustness, building directly upon the present model’s strong baseline performance.

Comment 4: No implemented integration with an IDS/SIEM or programmable data plane; no deployment architecture or end-to-end system measurements.

Response: We appreciate the reviewer’s valuable comment. The current work represents the design and validation phase of a feature-optimized, low-complexity ML framework intended for future real-time and hardware deployment. While full integration with an IDS/SIEM or programmable data plane was beyond the present study’s scope, the framework was architecturally designed for modular integration into such environments. Future work will focus on embedding the trained model within an operational IDS/SIEM prototype to evaluate end-to-end performance, throughput, and system interoperability under real network conditions.

Reviewer 2 Report (Previous Reviewer 2)

Comments and Suggestions for Authors

Authors did not addressed conviently previous comments.

The abstract should be improved by presenting the gap in the literature.

Authors should lookup for related work to compare an not their previous work.

Some terms such as robust and robustness, still in the manuscipt.

Author still keeping the literature review contents in the Proposed Framework section.

Results from the literature in the Table 4 are not correct (e.g. [42] reports 99.78% accuracy and number of features are 13) .

Contributions such as scalability are not supported by results.

Author Response

Comment1: The abstract should be improved by presenting the gap in the literature.

Response: We appreciate the reviewer’s valuable comment. The abstract has been revised to explicitly highlight the research gap in the literature, emphasizing that many existing DDoS detection studies rely on large, redundant feature sets and lack validation for real-time applicability. The revised version now clarifies that our work addresses this gap by proposing a feature-optimized and computationally efficient ML framework as a foundational step toward low-complexity, real-time, and hardware-based implementation.

Comment 2: Authors should lookup for related work to compare an not their previous work.

Response: We appreciate the reviewer’s observation. The comparison in Table 4 was conducted based on closely related studies that utilize the same dataset (CICDDoS2019) and apply similar machine learning algorithms, ensuring a consistent and fair performance evaluation. These works were selected not because they include our previous study, but because they provide the most relevant technical and methodological alignment for benchmarking, including reported accuracy, inference efficiency, and feature count when available. This approach ensures a meaningful and objective comparison within the same experimental context.

Comment 3: Some terms such as robust and robustness, still in the manuscipt.

Response: We appreciate the reviewer’s observation. The term “robust” has been replaced with a more precise and objective description.

Comment 4: Author still keeping the literature review contents in the Proposed Framework section.

Response: We thank the reviewer for this helpful observation. The Proposed Framework section has been fully revised to remove all literature review content. It now focuses solely on the structure, process, and functionality of the proposed model, ensuring clear separation between background discussion and the methodological contribution.

Comment 5: Results from the literature in the Table 4 are not correct (e.g. [42] reports 99.78% accuracy and number of features are 13) .

Response: We appreciate the reviewer’s careful observation. Upon rechecking the cited reference [42], we confirm that the reported accuracies 99.62% for Random Forest and 96.80% for Decision Tree, are correct and consistent with reference 42 (page 14), aligning with the same algorithms that we used in Table 4. The number of features (13) in the cited study we add it to table 4.

Comment 6: Contributions such as scalability are not supported by results.

Response: We thank the reviewer for this important point; we have removed the term "scalability" from the manuscript. We have revised the text to more precisely articulate the model's performance, focusing exclusively on the quantifiable metrics such as high accuracy and low inference time.

Reviewer 3 Report (Previous Reviewer 3)

Comments and Suggestions for Authors

Thank you to the authors for addressing all the comments. The article is now suitable for publication in the journal.

Author Response

We thank the reviewer for their constructive feedback and thorough evaluation, and we are pleased that the revisions made have rendered the manuscript suitable for publication in this journal.

Reviewer 4 Report (Previous Reviewer 4)

Comments and Suggestions for Authors

The authors revised the manuscript according to my suggestions. I aggree with the promoted changes.

Author Response

We thank the reviewer for their positive feedback and appreciation of the improvements made to the manuscript.

Round 3

Reviewer 2 Report (Previous Reviewer 2)

Comments and Suggestions for Authors

Thank you for attending our concerns.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The article addresses one of the pressing issues in cybersecurity – DDoS attack detection using machine learning. The study has scientific value because:

It summarizes existing methods of DDoS detection and analyzes their limitations.

It proposes an improved methodology based on ML, achieving high accuracy (99.85%).

It applies feature selection optimization, enhancing classification quality.

However, the article is flawed. First of all it is

No real practical validation of the model on actual network traffic is presented.
The article primarily focuses on applying existing ML algorithms rather than developing new approaches (the authors did not propose a new algorithm but merely adapted existing ones).
The experimental methodology does not differ from other studies.
There is no analysis of the computational complexity of the algorithms.
The possibilities for integrating the model into real-world systems (e.g., IDS or SIEM) are not considered.

Reviewer 2 Report

Comments and Suggestions for Authors

The authors aim to address the current challenges of DDoS attack detection by leveraging the CICDDoS2019 dataset and advanced machine learning (ML) techniques to improve.

The authors should introduce the key concepts in this work, such as feature selection techniques.

The authors should also extend and improve the review the state of the art.

It is not clear that applying systematic feature selection techniques is out of scope of the state-of-the-art.

It is also not clear their statement "traditional methods often rely on analyzing large, high-dimensional datasets, making them computationally intensive and impractical for real-time applications."

It is also no clear "many of these methods relied heavily on benchmark
datasets, limiting their real-world applicability."

The authors propose a naive approach by introducing feature selection to improve the performance of ML algorithms.

Reviewer 3 Report

Comments and Suggestions for Authors

The article is devoted to the current topic of DDoS attack recognition. The results presented in the article undoubtedly contribute to this field, however, there are a number of methodological and experimental observations.

In the Methodology section, an introduction to the subject area and its methods (a very good introduction), which includes, in particular, the general schemes shown in Figures 1-5, and which makes up 90% of this section, is mixed with the part that relates to the procedure for obtaining results (settings for 5 algorithms used to recognize DDoS attacks, specializations other applied techniques, for example, PCA and t-SNE, used for visualization, the specifics of each stage of the construction and use of the model). It would be good to separate these parts.

The article lacks tables comparing the results for different feature sets on the same dataset. Such tables would serve as an empirical justification for choosing a specific set of features.

The article provides percentage figures (pages 14 and 16) to what extent the proposed algorithm is superior to existing analogues, but are they compared on the same dataset?

There are a huge number of typos

Line 106: the Figure 1 -> The Figure 1

Lines 113-114: Predicates are omitted in the following sentences: "Section 2 Literature Review on ML based DDoS attack. Section 3 proposed statement. 113 Section 4 methodology. Section 5 results and discussion."

Line 145: the dataset -> The dataset

Line 154: threats, The University -> threats. The University

Line 155: DDoS attack -> DDoS attacks

Line 161: 96.54%, The dataset -> 96.54%. The dataset

Line 163: an strategy -> a strategy

and so on

Line 369-370: This article evaluates the capability and effectiveness of four ML algorithms in clas-369 sifying DDoS attacks. The five ML algorithms used are LR, RF, DT, CB, and GB. - Four or five?

Line 371: CICDDOS 2019 dataset [reference to the data webpage] - No reference

Line 461: F1 - Score -> F1-score

Line 461: X -> x

Reviewer 4 Report

Comments and Suggestions for Authors

The manuscript provides a well-structured analysis of DDoS attack detection using machine learning (ML) techniques and the CICDDoS2019 dataset. The authors present a rigorous approach to data preprocessing and feature selection, followed by the application of multiple ML algorithms. The reported accuracy of 99.85% is impressive and suggests a strong model performance. However, there are several aspects that require further clarification and improvement.

1- In lines 369-370, the authors state:
"This article evaluates the capability and effectiveness of four ML algorithms in classifying DDoS attacks. The five ML algorithms used are LR, RF, DT, CB, and GB."
This sentence is contradictory. The first part mentions four ML algorithms, while the second part lists five. The authors should clarify this inconsistency to avoid confusion. If five algorithms were used, then the statement should be revised accordingly.

2- "Statistical measures, such as low correlation with the attack class, confirmed their irrelevance. Comparative experiments demonstrated that removing these parameters maintained the model's high detection accuracy of 99.85% while reducing computational complexity."

The idea that removing features kept accuracy high while simplifying the model is promising, but it doesn’t necessarily mean feature selection was the best approach for handling the dataset.
Did removing features affect how well the model can detect new, unseen attack types?
The paper states that computational complexity was reduced, but it would be helpful to see some concrete evidence—like shorter training times or lower memory usage—to back up this claim.

3- Another limitation of this study is that it relies on CICDDoS2019 dataset without comparing it to other well-known datasets like CICIDS2017, UNSW-NB15, or Bot-IoT. The authors should clarify why these datasets were not considered, as this raises several important concerns:
Can the model handle different network environments?
Why not test across multiple datasets?
Could the dataset introduce bias? While CICDDoS2019 is widely used, it may not fully capture modern or evolving attack strategies.

While the study presents promising results, clarifications on ML algorithm consistency, feature selection effectiveness, and dataset choice are necessary. Addressing these concerns would enhance the credibility and practical applicability of the proposed approach.

Reviewer 5 Report

Comments and Suggestions for Authors

This paper addresses the challenge of detecting Distributed Denial of Service (DDoS) attacks using machine learning techniques. The core idea is that by identifying and focusing on the most relevant network traffic parameters while eliminating insignificant ones, machine learning models can achieve higher accuracy in DDoS attack detection with reduced computational complexity. However, the novelty and contribution of this work is severely limited.

Strong aspects:

This paper provides a detailed review of research on DDoS attack detection.

Weak aspects:

The manuscript contains massive language issues that undermine its professionalism. Examples include incorrect verb forms ("for detection DDoS attack" instead of "for detecting DDoS attacks" on line 19), improper capitalization (line 106), incomplete sentences (lines 113-115), incorrect punctuation (lines 154 and 161), and a/an misuse ("an strategy" on line 163). The Figure 3 caption misspells "Machine learning" as "Machin learning," among many other errors throughout the text.
Figure 1 presents an overwhelming amount of information without sufficient explanation. A more concise and focused visualization would better communicate the key concepts and findings.
Section 4, which should represent the paper's primary contribution, largely describes standard machine learning procedures such as data preprocessing and splitting. The machine learning algorithms employed (logistic regression, random forest, etc.) are fundamental techniques without novel adaptations. This significantly restricts the paper's originality and contribution to the field.
The reported 0.55% accuracy improvement is minimal considering baseline accuracy already exceeds 99%.
While the paper emphasizes feature importance, the selection process described in Section 5.1 relies heavily on intuition rather than rigorous experimental validation. The classification of important versus unimportant features lacks sufficient empirical support.
No comparison with other methods from the literature, which makes this work more of an empirical study rather than a substantive research contribution that advances the state of the art in DDoS attack detection.

Comments on the Quality of English Language

Please refer to the first weak aspect.

Article Menu

Feature-Optimized Machine Learning Approaches for Enhanced DDoS Attack Detection and Mitigation

Further Information

Guidelines

MDPI Initiatives

Follow MDPI