Next Article in Journal
Impact of Temporal Window Shift on EEG-Based Machine Learning Models for Cognitive Fatigue Detection
Previous Article in Journal
Intent-Based Resource Allocation in Edge and Cloud Computing Using Reinforcement Learning
Previous Article in Special Issue
Development of a Model for Detecting Spectrum Sensing Data Falsification Attack in Mobile Cognitive Radio Networks Integrating Artificial Intelligence Techniques
 
 
Article
Peer-Review Record

A Transformer-Based Framework for DDoS Attack Detection via Temporal Dependency and Behavioral Pattern Modeling

Algorithms 2025, 18(10), 628; https://doi.org/10.3390/a18100628
by Yi Li 1, Xingzhou Deng 2, Ang Yang 2,* and Jing Gao 3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Algorithms 2025, 18(10), 628; https://doi.org/10.3390/a18100628
Submission received: 24 July 2025 / Revised: 30 August 2025 / Accepted: 2 October 2025 / Published: 4 October 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper presents a Transformer-based framework for DDoS attack detection, focusing on temporal dependencies and behavioral patterns using the CIC-DDoS2019 dataset.

Here are some comments to improve the paper:

- Emphasize the specific contributions of this study in the abstract.

- Shorten the repetitive sentences in the conclusion (e.g., lines 17-24 and 25-30 overlap on attack evolution).

- In the related works section, add a table: Columns for Method, Dataset, Accuracy, Novelty, Limitations.

- Cite more diverse works: Search for 2025 papers on Transformer variants for DDoS.

- Excellent feature categorization (Tables 2-10), but selection of 78 features (eliminating 9) lacks justification—why drop 'Protocol' or 'Timestamp' when temporal modeling is key? Sliding window (T=10) is mentioned briefly (line 251), but not how it handles imbalanced classes or multi-attack types. Eq. (1) for normalization is standard but not tailored to DDoS outliers. (Section 3)

- In section 4, solid transformer description is provided, but it's essentially vanilla (encoder-only, no decoder). No innovations like masked attention for anomalies or custom positional encoding for bursts. Hyperparameters (D=128, heads unspecified, layers=2) lack tuning justification. Eqs. have inconsistencies (e.g., "DDoS" prefix redundant, H(l) not clearly defined).

- In section 5, training details (e.g., split ratios, ~70% train?) missing. Add confusion matrix and loss curves.

- Good limitations (complexity) and futures (sparse attention, GNN), but superficial—no quantitative future directions or ethical considerations (e.g., privacy in traffic data).

- Many typos (e.g., "DDoSh" in Eq. 14), awkward phrasing (e.g., "mutant traffic patterns" line 263). Figures should be improved.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

The paper deals with the problem of detecting DDoS attacks. The Authors present their method of using a transformer-based machine-learning system for the analysis of the network data, which achieved a very high efficiency rate. The procedure requires a real-time analysis of the features of the network traffic and due to the use of a transformer can extract patterns of events which are separated by longer time distances. A comparison with a Bayes model is also presented. The paper is well written, the problem is stated clearly and the presentation of the method and results is sound.

 

The Authors may want to address the following point before publishing their results. The whole research was performed in a simulated environment and required, among others, preprocessing of the data set. The model uses more than 70 features of the current network state. In a real-life scenario, these features must be extracted and processed in real time before being sent to the detection system for analysis. Have you assessed the time needed for the realistic system to work in each time step? Will the delay coming from data acquisition and preparation not reduce the efficiency of your solution? What is more, a DDoS attack is always mixed with normal network traffic and your model should be able to distinguish the attack patterns among valid packets. Ideally, tests on real networks should be performed.

 

After commenting on the problems raised above I recommend the paper for publication.

 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors

This paper presents a strong application of the Transformer architecture for the detection of DDoS attacks. While the work demonstrates good implementations and performance, the novelty of the contributions are not fully clear, as the paper appears to be primarily implementation and application focused. 

The introduction and related work is thorough and provides a good overview, including a detailed discussion of prior works and their limitations. Table 1 is very difficult to read and should be made larger.

The data section is also very detailed, but it and many other areas could be shortened. The data processing strategy of removing all samples with any missing fields seems wasteful and raises concerns about applicability in real-world scenarios where complete data may not be available.

The authors state that 80% of samples were randomly selected for training and 20% for testing. This seems to raise the possibility of data leakage, and using the last 20% of data for testing may be better. Further clarification of the data splitting procedure is required. The reported results of 100% precision, recall, and F1 for attack traffic, with 99.99% overall accuracy are extremely high and should be carefully justified and reviewed.

Several aspects of the mathematical presentation could be improved. Equation 11 does not include a bias term, which needs justification. In Equation 15, it should be made clear that DDoS_h is separate from W0, and using a subscript h might avoid visual clustering. In Equation 16, P(attack) suggests a scalar output, but the use of softmax implies a vector output. Clarification is needed on whether the output is a scalar probability for binary classification, a 2-dimensional probability vector (which would be redundant), or a probability vector for the different types of attacks and not attack. More generally, the dimensions should be stated clearly throughout. 

The text states “The pseudo-code of the training procedure is as follows:” but the pseudo-code is located later in the manuscript. Algorithm 1 itself seems broad and generic, and not specific to this work.

The discussion of evaluation criteria is strong, and the comparison with Naive Bayes is helpful, but additional baselines should be considered to strengthen the experimental evaluation. The authors note that data is available upon request, it is not clear if the code will also be made available. Code availability would enable reproducibility.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

 Accept in present form

Back to TopTop