Next Article in Journal
Temperature-Dependent Mitigation of Sodium Lignosulfonate Adsorption on Buff Berea Sandstone Using Silica Nanoparticles for Chemical Enhanced Oil Recovery
Previous Article in Journal
CDD-Guard: A Training-Free Endogenous Defense Framework for LLMs via Contrastive Latent Distribution Analysis
Previous Article in Special Issue
Cyber–Physical Systems: The Last Defense
 
 
Article
Peer-Review Record

Few-Shot Network Intrusion Detection Using Online Triplet Mining

Appl. Sci. 2026, 16(10), 4589; https://doi.org/10.3390/app16104589
by Jack Wilkie 1,*, Hanan Hindy 2, Christos Tachtatzis 1, Miroslav Bures 3 and Robert Atkinson 3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Appl. Sci. 2026, 16(10), 4589; https://doi.org/10.3390/app16104589
Submission received: 1 February 2026 / Revised: 30 April 2026 / Accepted: 3 May 2026 / Published: 7 May 2026
(This article belongs to the Special Issue New Advances in Cybersecurity Technology and Cybersecurity Management)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper proposes a novel few-shot learning approach for network intrusion detection systems (NIDS) using a triplet network with online triplet mining and KNN classification. The method addresses the challenge of training effective classifiers with limited malicious samples, reducing reliance on large labeled datasets. The paper presents a well-motivated, innovative solution to a pressing problem in NIDS. It is supported by rigorous experiments and clear contributions. However, reproducibility concerns, limited baseline comparisons, and a lack of theoretical depth. Addressing these in a revision would significantly strengthen the paper, include:
(1) Key implementation details are missing, such as the exact architecture of the neural network used in the triplet network, hyperparameter settings for the KNN classifier, and computational resources required for online triplet mining.
(2) While several baselines are included, state-of-the-art few-shot or meta-learning methods from outside the NIDS domain (e.g., Prototypical Networks, MAML) are not compared, leaving the broader context unclear.
(3) Evaluation relies solely on two benchmark datasets (CICIDS2017, Lycos2017), which may not fully represent emerging attack patterns or real-time network environments.
(4) The paper lacks a formal analysis of why triplet networks outperform Siamese networks in this context, beyond empirical results.
Questions to Authors and Suggestions:
(1)Could you provide the detailed architecture of the neural network used in the triplet model (e.g., layer sizes, activation functions, regularization techniques)?
(2) How does the computational cost of online triplet mining scale with batch size and number of classes, and have you considered strategies to mitigate this for real-time NIDS?
(3) Have you evaluated the proposed method on more recent or diverse intrusion datasets (e.g., IoT-specific or encrypted traffic datasets) to assess generalizability?
(4) Can you discuss potential adaptations of this approach for incremental learning or streaming data scenarios where new attack classes emerge continuously?

Comments on the Quality of English Language

The overall quality of the English language in this manuscript is good. The writing is generally clear, formal, and appropriate for an academic journal. The structure is logical, and technical terms are used correctly. And the areas for Improvement (Minor Revisions Recommended):
(1)While the language is strong, some minor grammatical inconsistencies and typographical errors are present. A thorough proofreading pass would enhance polish. Examples include:
(2)Occasional missing or misplaced commas, particularly in complex sentences.
(3)Rare omissions or incorrect use of articles ("a", "an", "the"), e.g., "rendering them impractical" vs. "rendering them impractical" (though the original is acceptable, some instances feel slightly off).
Suggestions:
(1)Perform a careful line-by-line review to correct hyphenation, spacing, and punctuation.
(2)Consider reading the manuscript aloud to catch slightly awkward phrasings that are grammatically correct but could be smoothed for better flow (e.g., in the Introduction: "The predictable result is an escalating number of network intrusions. With the average data breach..." could be connected as "The predictable result is an escalating number of network intrusions, with the average data breach...").
These issues are minor and do not impede understanding. Correcting them will further improve the professionalism and readability of the manuscript.

Author Response

Please see attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The manuscript proposes the use of a triplet network, utilising online triplet mining and a KNN classifier, which is able to perform few-shot classification: enabling effective intrusion detection after being trained on a limited number of malicious examples. Overall, the manuscript is technically sound and the experimental workflow is clear. However, several low-cost but high-impact revisions are recommended to improve language quality, formatting professionalism, and readability. Most issues are typographical/grammatical or related to pseudocode typesetting.

(1) The phrase “generations of supervised work intrusion detection systems” appears to contain an unintended word (“work”). It is recommended to revise it to a clearer and technically correct expression, e.g., “generations of supervised network intrusion detection systems.”

(2) Algorithm 1 contains visible line-break/encoding artifacts (e.g., split words such as “malicious”), which reduces the perceived quality of the paper. Please: Remove encoding artifacts and ensure words are not broken (e.g., “malicious” appears correctly everywhere). Standardize formatting such as “5 fold” → “5-fold” and keep notation consistent.

(3) Improve coherence by minor sentence restructuring

Some sentences combine two concepts in a single long structure (e.g., Random Forest and SVM descriptions connected with a loose “and”). Consider splitting them into two sentences to improve clarity and flow.

(4) The evaluation procedure uses a 50/50 train-test split, which may appear unconventional compared to typical 80/20 settings. A single explanatory sentence would strengthen justification, e.g.,“A 50/50 split was adopted to maintain balanced class counts between training and test sets under the controlled sampling regime used for few-shot evaluation.”

(5) Replace slightly awkward phrasing such as “used to contrast …” with “used to compare …” where appropriate, to align with standard academic style.

(6) The fonts used in the figure legends should be standardized for consistency.

Author Response

Please see attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The authors combine online triplet mining with a KNN classifier to
address unbalanced training sets in intrusion detection.
The presentation of this approach is largely clear and understandable,
but I am still not convinced by the evaluation.
See the following remarks and questions:

- In Section 3.2 the balanced sampling is explained. In the evaluation
  section, the used number of benign samples is far larger than the
  number of malicious samples.
  Do the mini batches contain an equal
  number of benign and malicious samples of each class and the larger
  benign sample size is only used to improve the accuracy of the KNN
  classifier? If so,  please clarify this in the paper.

- Two datasets were used for the evaluation. Both datasets stem from
  the same network traffic (pcap files). The CIC-IDS2017 csv include
  multiple mistakes. The reasoning in line 411: "These classes were
  selected to capture a broad range of intrusion behaviors, while
  ensuring adequate sample sizes for training and evaluation." contradict
  your own motivation: The classes selected only contain DoS and brute-force
  attacks which are easily distinguishable from benign traffic using
  flow features! Why were classes with small sample sizes like SQL
  Injections excluded from both datasets? According to your
  Introduction and Conclusion small sample sizes is the primary
  motivation for your method!?

- Regarding data set: If the class and feature selection of the
  CIC-IDS2017 dataset was used to ensure comparability with your
  previous work (Hanan Hindy, Tachtatzis, Atkinson are also authors in
  [8,21,22]), then this should be explicitly mentioned in the paper.

- In section 4.3 the proposed triplet network was compared to
  autonencoders [34] and one class SVMs [8] developed by the authors,
  and RENOIR [25] which has online available source code.  I
  appreciate this.  But in your related work section you also mention
  DAE-LR and DUAD as AE that aim to improve detection performance of
  AEs: "Variations of the autocoder, such as DAE-LR [13] and DUAD
  [14], aim to improve anomaly detection performance through the use
  of various techniques such as additional regularisation techniques
  and iterative data filtering." To show the superiority of your
  method, you should compare your approach also against
  these methods or at least improve the related work section to give hints
  why your method may be superior.

- Why is the Lycos2017 dataset more challenging (Line 421) than
  CICIDS2017? (see the results of the Lycos2017 paper, Table 2)
  While there are more classes and features,
  errors and inconsistencies are removed from the CSV files. So, this should be
  a better data set.
  Overally please explain the feature selection you have done for both datasets
  in more detail.

- How were the results in Table 4 for `Siamese Networks' obtained? I could
  not find these results in the paper or PhD thesis. If they were
  obtained by re-implementing the method of Hanan Hindy, why are they
  worse than the results from the PhD thesis section 5.5.2.2?


- Line 494: Where is overfitting defined like this?

- Figure 4 & 5 is this really overfitting when training on such low
  sample sizes or an effect of the small sample size? Please clarify
  this in the paper.

 

While a Related Work section is included, there are many related works
not discussed.  For example, the following:

1) Xu, C., Zhang, F., Yang, Z. et al. A few-shot network intrusion
detection method based on mutual centralized learning. Sci Rep 15,
9848 (2025). https://doi.org/10.1038/s41598-025-93185-0,
https://www.nature.com/articles/s41598-025-93185-0

2)Handi Sun et al.: Few-Shot network intrusion detection based on
prototypical capsule network with attention mechanism
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0284632

3) Bo, J.; Chen, K.; Li, S.; Gao, P. Boosting Few-Shot Network
Intrusion Detection with Adaptive Feature Fusion
Mechanism. Electronics 2024, 13,
4560. https://doi.org/10.3390/electronics13224560
https://www.mdpi.com/2079-9292/13/22/4560

4)Z. Shi, M. Xing, J. Zhang and B. Hao Wu, "Few-Shot Network Intrusion
Detection Based on Model-Agnostic Meta-Learning with L2F Method," 2023
IEEE Wireless Communications and Networking Conference (WCNC),
Glasgow, United Kingdom, 2023, pp. 1-6, doi:
10.1109/WCNC55385.2023.10118898.
https://ieeexplore.ieee.org/document/10118898

5) Cao et al.: A study on Few-shot Learning approach for Intrusion
Detection System with Class Incremental Learning, Proceedings of the
2025 10th International Conference on Intelligent Information
Technology

https://dl.acm.org/doi/epdf/10.1145/3731763.3731795


Further hints for better readability and reproducibility:

- page 1: Abstract: "The final model was compared AGAINST other
  state-of-the-art approaches in few-shot binary and multiclass
  classification,"

- Page 6: Better wording: "In this section, the proposed few-shot
        learning system is proposed."

Page 9: Increase the readability by listing all variables in equations. For
  example Equation 3: m is not introduced. Hence add "by a predefined margin
  m" in the sentence in line 320.

- In Table 3 abbreviations like AE for Autoencoder are used inconsistently.

- To increase reproducibility please specify the used hyperparameters
  (margin m, k in k-nearest neighbor,... ) that were used for the
  evaluation in the paper.

Author Response

Please see attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

The paper addresses the limited-data “vulnerability window” in network intrusion detection by proposing a few-shot classifier built from a triplet network trained with online triplet mining and used with a KNN classifier at inference, aiming to achieve strong detection with very few labeled malicious samples while keeping false positives low. It motivates the approach by contrasting supervised NIDS (data-hungry and weak for new/rare attacks) with anomaly detection (often impractically high false positives), and then designs a pipeline that uses class-balanced batch sampling to make online triplet formation feasible under extreme imbalance, batch-all triplet mining to maximize relational supervision per minibatch, and KNN over learned embeddings to restore realistic class priors at inference. Experimentally, it evaluates on CICIDS2017 and the corrected Lycos2017 under a bespoke repeated sub-sampling and cross-validated model selection procedure across multiple few-shot regimes (10–160 malicious samples per class), compares to anomaly-based and contrastive baselines in binary and multiclass settings, and reports consistent improvements in macro-F1 and especially false positive rates, alongside ablations over mining strategy, distance metrics, and inference choices to justify design decisions.

 

(1) Please state the main research question more explicitly and operationally: is the paper’s core aim “few-shot intrusion detection with low false positives,” “few-shot multiclass attribution,” or “representation learning robust to imbalance,” and what are the primary deployment constraints you optimize for (e.g., FP rate bounds acceptable to a CSOC)? Right now the motivation spans several goals, but the study would benefit from a sharper, testable primary hypothesis.

 

(2) The evaluation procedure (Algorithm 1) uses a 50/50 train-test split, then forms 10 training subsets with NB=10,000 benign and NM∈{10,20,40,80,160} per malicious class plus 5-fold CV and random search; please provide enough implementation detail to replicate exactly: how the 50/50 split is stratified (per class, per day/session, per flow source), whether any leakage-prevention is applied (e.g., time-based splits for network flows), random seeds, and whether the same test set is shared across subsets or resampled each time.

 

(3) Please justify the choice of NB=10,000 benign samples per subset and discuss sensitivity: does performance/FP rate change materially if NB is smaller/larger (since benign prevalence dominates real traffic)? Without this, it’s hard to tell whether results reflect a particular benign sampling regime rather than an intrinsic advantage of the method.

 

(4) The paper claims KNN “reintroduces skew” and helps maintain low FP rates, but this is presented mainly as an intuition (Eq. 7). Please add empirical validation: e.g., compare inference using (a) KNN on unbalanced reference set (your default), (b) KNN on balanced reference set, and (c) a calibrated classifier head trained on embeddings, reporting FP/precision/recall tradeoffs to substantiate the claim.

 

(5) In binary classification, you compare to one-class SVM and autoencoder anomaly detectors and to RENOIR; however, it is unclear whether all baselines receive equally careful hyperparameter tuning under the same few-shot constraints, especially since anomaly detectors train only on benign and use malicious for validation. Please detail the search spaces, budgets, and selected hyperparameters for each baseline and ensure the tuning protocol is fair and consistent across methods.

 

(6) The papers in the introduction of the paper are old and insufficient, and the background description needs to cite more papers. The following paper needs to be cited:Few-Shot Image Classification Algorithm Based on Global–Local Feature Fusion; From Sample Poverty to Rich Feature Learning: A New Metric Learning Method for Few-Shot Classification

Author Response

Please see attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The revised manuscript  demonstrates substantial improvements over the initial submission. The authors have made commendable efforts to address potential concerns regarding clarity, experimental rigor, and completeness. The additions and modifications significantly strengthen the paper’s contribution and its positioning within the field.

1.The phrase “K-Neared Neighbours” appears twice (e.g., lines 65-66, 373 in V2). It should be “K-Nearest Neighbours”.

2.The caption “Class No. … ClassName” is slightly ambiguous. Consider rephrasing, for example, to “Class No.: 0 = Benign; 1-4 = Malicious Attacks”.

3.In Section 4.3, the text states the triplet network was “marginally surpassed by RENOIR… with only 10 samples per class”. This aligns with Table 3 (RENOIR F1: 0.9876 vs. Triplet F1: 0.9870). The phrasing is acceptable, but be aware that this specific case exists when summarizing performance elsewhere.

4.In the new limitations paragraph (Section 6), the decision not to compare with meta-learning methods is noted as “future work.” It would be slightly more convincing to add a sentence explaining *why* it was not included in the current study, even briefly (e.g., “as their typical episodic training paradigm differs fundamentally from the proposed approach,” or “due to different training data assumptions”).

5.Ensure consistent hyphenation in compound adjectives (e.g., “class-balanced” is used, but verify “few-shot” is consistent throughout).

Comments on the Quality of English Language

The overall quality of the English language in the revised manuscript is good. The text is generally clear, logically structured, and appropriate for a scientific audience. The explanations in the newly added sections (e.g., Section 4.1 and 4.5) are particularly well-articulated.

However, a final careful proofreading pass is strongly recommended before publication to address several typographical errors and minor grammatical slips that were overlooked during the revision process. Specific examples include:

1.In the Introduction (line 66), “K-Neared Neighbours” should be corrected to “K-Nearest Neighbours”.

2.In Section 3.4 (line 384), “inference can be preformed” should be “inference can be performed”.

3.In Section 4.2 (line 486), “Feature enigneering” should be “Feature engineering”.

4.In Section 4.2 (line 521), “class-balaned” should be “class-balanced”.

5.In the list of contributions (point 3, line 80), “analysis is conduct to demonstrate” should be “analysis is conducted to demonstrate”.

6.In the Introduction (line 56), the comma in “The primary objective of this work, focuses on…” is unnecessary and disrupts the flow; it should be removed.

Author Response

Please see attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

accept

Author Response

We thank the reviewer for reviewing our manuscript and believe that their feedback has greatly approved its quality.

Back to TopTop