Review Reports
- Gizem Karyağdı1 and
- İlker Özçelik2,*
Reviewer 1: Anonymous Reviewer 2: Anonymous
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsReview Report
Summary
This paper proposes a novel application of heterogeneous Graph Neural Networks (GNNs) for botnet node detection using DNS query data from the TI-16 dataset. The authors design and compare four GNN architectures—HeteroGCN, HeteroSAGE, HeteroGAT, and HeteroGAE—within a bipartite heterogeneous graph composed of clients and domains. The experimental results demonstrate that HeteroSAGE and HeteroGAE achieve high accuracy (~95%) and exceptional recall (>98%), suggesting strong suitability for security applications where minimizing false negatives is critical. The study claims to be the first to apply heterogeneous GNNs to DNS-based botnet detection and provides a detailed ablation across model architectures, hyperparameters, and inference efficiency.
Strengths
1. Novelty and Relevance: The integration of heterogeneous GNNs with DNS traffic represents a timely and technically sound approach to botnet detection—an enduring challenge in cybersecurity. The focus on relational structure over isolated node features aligns well with the coordinated nature of botnet behavior.
2. Rigorous Experimental Design: The paper includes a clear pipeline for graph construction, feature engineering (including domain maliciousness scoring), train/validation/test splitting, and consistent hyperparameter settings across models—enabling a fair architectural comparison.
3. Comprehensive Evaluation Metrics: Beyond accuracy, the authors report precision, recall, F1-score, AUC, and inference time, offering a holistic view of model trade-offs. The emphasis on high recall is well-justified for security contexts.
4. Practical Insight on Model Complexity: The finding that HeteroGAT underperforms despite its attention mechanism is a valuable empirical observation that challenges assumptions about architectural superiority and provides pragmatic guidance for practitioners.
5. Reproducibility: Use of PyTorch Geometric, explicit hyperparameters (Table 3), and dataset details enhance reproducibility. The supplementary materials further support transparency.
Weaknesses (To Be Enhanced)
1. Single-Dataset Evaluation: All experiments rely solely on the TI-16 DNS dataset, which, while realistic, limits generalizability. The conclusions would be significantly strengthened by testing on additional botnet datasets (e.g., CTU-13, Bot-IoT) or cross-dataset transfer.
2. Superficial Threat Model Discussion: The paper does not discuss evasion strategies (e.g., botnets mimicking benign DNS patterns) or adversarial robustness of the GNN models—an important gap given the adaptive nature of modern botnets.
3. Limited Analysis of Graph Construction Choices: Key decisions—such as using Second-Level Domains (SLDs) instead of FQDNs or the specific maliciousness scoring scheme (Table 2)—are not ablated. It remains unclear how sensitive performance is to these preprocessing steps.
Final Recommendation
Accept with Minor Revisions
Author Response
Thank you very much for taking the time to review this manuscript. Please find the detailed responses below and the corresponding revisions/corrections highlighted changes in the resubmitted files.
Comment 1: Single-Dataset Evaluation: All experiments rely solely on the TI-16 DNS dataset, which, while realistic, limits generalizability. The conclusions would be significantly strengthened by testing on additional botnet datasets (e.g., CTU-13, Bot-IoT) or cross-dataset transfer.
Response 1: We thank Reviewer for this valuable feedback. We agree that evaluating the proposed methodology on a diverse range of botnet datasets is a critical requirement for establishing generalizability and robustness. We appreciate this constructive comment and agree with the principle of cross-dataset validation.
We thoroughly investigated the feasibility of utilizing additional publicly available datasets, including the CTU-13 and Bot-IoT corpora referenced by the reviewer, to address this concern. Our analysis revealed a persistent methodological challenge: the structural requirements of our graph-based approach necessitate the extraction of specific features linked to distinct node types (clients and domains), a granularity often absent in datasets designed for traditional, aggregated deep learning models.
Specifically, while the CTU-13 dataset provides packet capture (pcap) files, a significant portion of this data is subject to truncation for privacy considerations. This limitation makes it impossible to derive the necessary flow information and domain node-level metrics required for constructing our relational graph structures.
Therefore, while acknowledging the reviewer's concern, we maintain that the TI-16 dataset currently represents the only publicly available benchmark that offers the requisite granular data and relational information necessary to conduct the study in its current form. We have added an explicit discussion in the Limitations in Section 9 of the revised manuscript to transparently address this limitation and articulate the specialized data demands of our methodology. Furthermore, we wish to assure the reviewer that we are actively engaged in collaborative efforts to deploy and test these approaches within an operational environment provided by our sponsor, which will offer a vital, real-world validation of the findings.
Comment 2: Superficial Threat Model Discussion: The paper does not discuss evasion strategies (e.g., botnets mimicking benign DNS patterns) or adversarial robustness of the GNN models an important gap given the adaptive nature of modern botnets.
Response 2: We thank Reviewer for this valuable feedback. We appreciate the reviewer's insightful observation regarding the necessity of a more comprehensive discussion of the threat model, specifically concerning evasion strategies employed by adaptive adversaries, such as botnets mimicking benign DNS traffic, and the resultant adversarial robustness of our Graph Neural Network (GNN) approach.
The reviewer is correct to highlight the potential for obfuscation and mimicry techniques as a significant challenge in network defense. We assert that the inherent properties of our GNN approach, which process relational data, offer a strong natural defense against simple pattern-mimicry attacks.
To address this crucial point, we have incorporated explicit statements regarding the advanced nature of the threat model and the GNN’s resultant resilience against topological evasion techniques. These clarifications have been added to the Introduction (Section 1) and Conclusion (Section 9).
Comment 3: Limited Analysis of Graph Construction Choices: Key decisions—such as using Second-Level Domains (SLDs) instead of FQDNs or the specific maliciousness scoring scheme (Table 2)—are not ablated. It remains unclear how sensitive performance is to these preprocessing steps.
Response 3: We thank reviewer for this valuable feedback. We appreciate the observation regarding the need for comprehensive justification concerning the selection of domain name representation. The reviewer correctly identifies a point of ambiguity in our original manuscript.
We acknowledge that the initial description concerning the adoption of Second-Level Domains (SLDs) may have suggested an arbitrary or unablated design choice. To rectify this, we have thoroughly revised the relevant section of the manuscript to clarify that the adoption of the SLD + Top-Level Domain (TLD) format was a necessary constraint dictated by the structure of the input data. Specifically, the domain names provided by the dataset (located in the designated 'domains' folder) exclusively contain the SLD and TLD components, which mandates the use of this specific domain representation.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe paper addresses an important cybersecurity topic; however, I have the following concerns:
- The related work section is significantly lacking. Beyond describing different botnet topologies, their associated challenges and strengths, and the botnet lifecycle, the authors do not review or discuss existing GNN-based botnet detection studies, despite the numerous relevant works that have been published in the last two years.
- Almost half of the referenced papers are more than five years old. The authors need to incorporate more recent literature and update or extend their work accordingly.
- In Chapter 3, the authors repeatedly cite blog posts (e.g., Medium posts [16] and [19]) instead of peer-reviewed papers that originally proposed the GNN techniques used in the study. This needs to be corrected.
- In Section 4.2, the authors state that they identified node features relevant to the classification task but do not provide details on the methodology used to determine these features, nor do they list the selected features. This may limit reproducibility.
- In the same section (4.2), the authors mention performing feature extraction but do not describe how this process was conducted. More methodological detail is required.
- The rationale behind the chosen hyperparameters and configurations for model training and testing in Section 6 is unclear. It is not specified whether these settings represent baseline values, results of fine-tuning, or arbitrary choices. The authors should elaborate on the basis for these decisions.
Author Response
Thank you very much for taking the time to review this manuscript. Please find the detailed responses below and the corresponding revisions/corrections highlighted changes in the resubmitted files.
Comment 1: The related work section is significantly lacking. Beyond describing different botnet topologies, their associated challenges and strengths, and the botnet lifecycle, the authors do not review or discuss existing GNN-based botnet detection studies, despite the numerous relevant works that have been published in the last two years.
Comment 2: Almost half of the referenced papers are more than five years old. The authors need to incorporate more recent literature and update or extend their work accordingly.
Response 1 & 2: We thank Reviewer for this valuable and critical assessment of the Related Work section. The observation concerning the inadequate discussion of existing Graph Neural Network (GNN)-based botnet detection literature is entirely accurate and represents a significant omission in the previous manuscript.
We have addressed this deficiency comprehensively by conducting a focused and updated literature review on GNN-based botnet detection methodologies. Specifically, we concentrated our efforts on identifying and incorporating the most relevant and impactful scholarly contributions published within the last two years, as suggested by the reviewer.
The Related Work section (Section 2) has been substantially revised to include a new, dedicated subsection (now Section 2.1, titled "Graph Neural Networks for Botnet Detection"). This new subsection reviews the current state-of-the-art in this domain.
Comment 3: In Chapter 3, the authors repeatedly cite blog posts (e.g., Medium posts [16] and [19]) instead of peer-reviewed papers that originally proposed the GNN techniques used in the study. This needs to be corrected.
Response 3: We agree with the Reviewer’s assessment regarding the necessity of citing primary literature to maintain the academic rigor and credibility of the study. Relying on informal sources for established theoretical frameworks implies a lack of depth which we aim to avoid.
We have addressed this point by conducting a comprehensive audit of the bibliography, with specific attention to Chapter 3. We have systematically removed all citations referring to non-peer-reviewed mediums, such as the previously noted Medium posts ([16] and [19]). These have been superseded by citations to the seminal peer-reviewed conference proceedings and journal articles where the utilized Graph Neural Network (GNN) architectures were originally proposed.
Comment 4: In Section 4.2, the authors state that they identified node features relevant to the classification task but do not provide details on the methodology used to determine these features, nor do they list the selected features. This may limit reproducibility.
Response 4: We appreciate Reviewer's observation regarding the necessity of explicitly detailing feature selection methodology and ensuring full transparency, which is paramount for the maximal reproducibility of our findings. We acknowledge that the lack of clear cross-referencing and methodological rationale previously compromised the clarity of this section. To address these concerns comprehensively, we have implemented two specific revisions to the manuscript:
- The features utilized in the study were already defined and explained comprehensively in the introductory subsection of Section 4. We have corrected the oversight in Section 4.2 by incorporating a direct and explicit reference to the location within Section 4 where the complete list and detailed explanation of the node features are presented.
- Recognizing the importance of methodological rigor, we have augmented the beginning of Section 4 with an additional, dedicated paragraph.
Comment 5: In the same section (4.2), the authors mention performing feature extraction but do not describe how this process was conducted. More methodological detail is required.
Response 5: We thank the Reviewer for this valuable feedback. We appreciate the observation regarding the need for greater methodological transparency in Section 4.2, specifically concerning the provenance and processing of the extracted features.
We acknowledge that while the feature extraction process regarding domain features derived from the dataset's supplementary files was detailed in the general introduction of Section 4, the necessary explicit connection was inadvertently omitted within the context of Section 4.2.
We have addressed this point by revising Section 4.2 to strictly delineate the source of these features. We have added a clear cross-reference linking the discussion in Section 4.2 back to the comprehensive extraction methodology outlined in the opening of Section 4.
Comment 6: The rationale behind the chosen hyperparameters and configurations for model training and testing in Section 6 is unclear. It is not specified whether these settings represent baseline values, results of fine-tuning, or arbitrary choices. The authors should elaborate on the basis for these decisions.
Response 6: We thank the Reviewer for this valuable feedback. We appreciate the Reviewer’s observation regarding the justification of the hyperparameter settings presented in Section 6. We agree that distinguishing between arbitrary choices, baseline defaults, and empirically derived values is critical for ensuring the reproducibility and interpretability of our experimental results.
To address this concern, we have revised the methodology description in Section 6 to explicitly articulate the rationale behind our configuration choices. Specifically, we have clarified that the reported hyperparameters were selected following a comprehensive experimental optimization process rather than being arbitrary or default values. We have included new text detailing this tuning procedure.