IP Spoofing Detection Using Deep Learning
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsPlease see the attachment.
Comments for author File: Comments.pdf
Author Response
Thank you very much for taking the time to review this manuscript. Please find the detailed responses below and the corresponding revisions/corrections highlighted changes in the re-submitted files.
Comment 1: Clarify ground-truth mapping for “IP spoofing” in CIC-DDoS2019: You state that “reflection attacks were specifically identified as instances of IP spoofing” and reformulate the task as binary (spoof vs. normal). CIC-DDoS2019 contains multiple scenarios (e.g., DrDoS_DNS, DrDoS_NTP, etc.). Please:
- Explicitly list which attack classes you mapped to “spoofing.”
- Justify that all flows in those classes are indeed spoofed in the dataset context (reflection traces may include non-spoofed control/background flows).
- Provide per-attack-class performance and confusion matrices to rule out class-specific shortcuts.
Response 1: We would like to thank the reviewer for their valuable and insightful comments. We have revised the manuscript to address the concerns raised.
- To clarify the ground-truth mapping, we have explicitly listed the attack classes considered as IP spoofing. We re-examined the CIC-DDoS2019 dataset and confirmed that the following DrDoS (Distributed Reflection Denial of Service) attacks were the only ones used in our study, as they are inherently reflective and thus employ IP spoofing: TFTP, LDAP, MSSQL, NetBIOS, NTP, SNMP, SSDP, DNS, Portmap
- We have added a justification to the Experiment Setup section. The flows within these DrDoS classes are indeed spoofed in the context of the CIC-DDoS2019 dataset. The dataset's official documentation and research papers explicitly state that these attacks were generated using a reflective methodology. Consequently, the source IP addresses of the attack flows are spoofed to that of the victim.
- Per-attack-class sample confusion matrices for a representative model have been added to the end of the Results section. This addition addresses the concern about class-specific shortcuts by showing how the model performs on each individual attack type. A full set of confusion matrices for all models has been included as a supplementary file to provide comprehensive documentation to avoid cluttering the main paper.
Comment 2: Data splitting and leakage prevention. With flow-based features, flows from the same attack episode appearing across train/test can inflate performance. Specify the split strategy (random vs. time-based vs. by scenario/day/source/destination). A time- or scenario-based split is preferred.
Confirm that:
- SMOTE, normalization, and feature selection were fit only on training data.
- No duplicate or near-duplicate flows exist across splits.
Response 2: We are grateful for the reviewer's detailed and constructive comments regarding data splitting and leakage prevention. We have carefully re-examined our methodology and made the necessary revisions to address these crucial points.
- We appreciate the feedback on the use of SMOTE for data balancing. Upon a closer re-examination of our methodology, we determined that because our down-sampled dataset is already balanced, the application of SMOTE is not necessary. We have since removed all references to SMOTE from the flow diagram and the corresponding sections in the paper. We have repeated all tests with this corrected methodology and have updated the results to reflect these changes. This clarification is now detailed in the Experiment Setup section. Based on our updated results, we also decided to remove MLP2 from the analysis, as it yielded results very similar to the standard MLP method, and its inclusion might cause confusion for readers.
- Furthermore, as the reviewer suggested, our data split was conducted using a time-based strategy, which effectively ensures that no data from the same attack episode could appear in both the training and testing sets. By splitting the data based on the time of the attacks, we were able to guarantee that no duplicate or near-duplicate flows exist across the splits. These details are explicitly described in the Experiment Setup section of the manuscript.
Comment 3: Minor:
- “Acurracy” → “Accuracy” (Section 6).
- Use consistent decimal separators (dots vs commas) across tables (e.g., Table III).
Response 3: We thank the reviewer for pointing out these errors.
- We have conducted a thorough spell check, and the manuscript has been proofread by a native speaker to correct all typographical errors.
- Additionally, we have ensured consistency in our decimal separators by updating all tables to exclusively use dots.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe modification suggestions are presented in the attachment.
Comments for author File: Comments.pdf
Author Response
Thank you very much for taking the time to review this manuscript. Please find the detailed responses below and the corresponding revisions/corrections highlighted changes in the re-submitted files.
Comment 1: Some values in Table 3 use commas as decimal points (e.g., 0,9989). It is recommended to standardize them to the English decimal point format (0.9989).
Response 1: We thank the reviewer for the careful review and for pointing out this inconsistency. We have ensured consistency in our decimal separators by updating all tables to exclusively use dots.
Comment 2: References: Some of the cited references are from older years. It would be beneficial to supplement with the latest deep learning security detection research from 2023 to 2025.
Response 2: We are grateful for the reviewer's suggestion to include more recent literature. We conducted a thorough search of available academic catalogues but were unable to find recent papers from 2023 to 2025 that are directly focused on IP spoofing detection using deep learning methods. To address the reviewer's point and highlight the growing interest in this field, we have added a new motivational sentence to the Introduction section. This sentence cites a recent survey by Razzaq and Shah (2025) which found that the intersection of cybersecurity, machine learning, and deep learning has experienced significant growth and global collaboration from 2016 to 2025. This addition emphasizes the contemporary relevance of our study within the broader context of cyber security research.
Comment 3: The absence of ROC curves and AUC values limits a comprehensive assessment of the model's discriminative ability. It is recommended to include AUC-ROC and PR curves.
Response 3: We would like to thank the reviewer for this valuable suggestion. We agree that including AUC-ROC curves provides a more comprehensive assessment of the model's performance. For this reason, we have added AUC values to Tables III and IV. Furthermore, the ROC curves for all models have been included in the supplementary materials to provide a complete picture of their discriminative ability.
Reviewer 3 Report
Comments and Suggestions for AuthorsThis article compared nine deep learning models for their efficiency in detecting IP spoofing attacks. Results achieved show that all tested models perform well, with CNN and ResNet1D better. After ONNX conversion, DNN and MLP-based models retained their performance while reducing the inference time and minimizing model size. The Author’s work is of significance for selecting IP spoofing detection strategies, facilitating in network anomaly monitoring.
However, the article cannot be accepted for publication in Applied Sciences, because, for a research paper, the Author just used well-known and published deep learning models and used them for comparison, making the article lack novelty. There are no new findings. Imagine that when we proposed a new model, we should also describe its principle and implementation detail, followed by experiment and comparison with state-of-the-arts.
Although the Author has conducted a large number of experiments and a considerable amount of work, these experimental results can be summarized and compiled into a work report, e.g., for class use, rather than a research paper. Novelty is the core requirement for a scientific paper.
Author Response
Comment:
This article compared nine deep learning models for their efficiency in detecting IP spoofing attacks. Results achieved show that all tested models perform well, with CNN and ResNet1D better. After ONNX conversion, DNN and MLP-based models retained their performance while reducing the inference time and minimizing model size. The Author’s work is of significance for selecting IP spoofing detection strategies, facilitating in network anomaly monitoring.
However, the article cannot be accepted for publication in Applied Sciences, because, for a research paper, the Author just used well-known and published deep learning models and used them for comparison, making the article lack novelty. There are no new findings. Imagine that when we proposed a new model, we should also describe its principle and implementation detail, followed by experiment and comparison with state-of-the-arts.
Although the Author has conducted a large number of experiments and a considerable amount of work, these experimental results can be summarized and compiled into a work report, e.g., for class use, rather than a research paper. Novelty is the core requirement for a scientific paper.
Respond:
We sincerely thank the reviewer for their time and valuable feedback.
We respectfully disagree with the assessment that our work lacks novelty. While we acknowledge that we have utilized established deep learning architectures, the primary contribution of this research lies in its comprehensive and systematic evaluation of these models for the specific and under-explored task of IP spoofing detection. To the best of our knowledge, there is a limited body of literature that rigorously compares the performance of a wide range of deep learning models on this specific problem. Our work fills this gap by demonstrating that these well-known architectures, when properly configured and trained with appropriate feature engineering, can achieve high performance in detecting IP spoofing attacks.
Furthermore, a significant novel contribution of this study is the integration of ONNX (Open Neural Network Exchange) conversion. By converting and evaluating the models' performance, size, and inference time, we have demonstrated the feasibility of deploying these complex deep learning models in operational systems. Our findings, particularly highlight a crucial aspect for real-world application that is rarely addressed in academic research. This part of our work provides valuable insights not only for network security practitioners but also for the ONNX development community, identifying areas where further optimization is needed.
The extensive experimentation, which included training eight different deep learning models and meticulously tuning their hyperparameters, constitutes a significant and non-trivial amount of work. The results of this effort provide a practical benchmark and a clear roadmap for researchers and developers working on network security. Our work presents a unique and necessary contribution by moving the field from theoretical application to a practical, comparative analysis. We believe this work provides findings that are essential for the advancement of network anomaly monitoring.
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsAccept in present form.
Reviewer 3 Report
Comments and Suggestions for AuthorsConsidering the following: (1) Integration of ONNX, (2) Possible real-world applications, and (3) The Author’s careful revision of the previous version, now I agree to accept the submission in current form.