Next Article in Journal
A New Framework for Precise Identification of Prostatic Adenocarcinoma
Next Article in Special Issue
GLASS: A Citizen-Centric Distributed Data-Sharing Model within an e-Governance Architecture
Previous Article in Journal
Privacy Policies of IoT Devices: Collection and Analysis
Previous Article in Special Issue
Secure Inter-Domain Routing Based on Blockchain: A Comprehensive Survey
 
 
Article
Peer-Review Record

A Novel Framework for Generating Personalized Network Datasets for NIDS Based on Traffic Aggregation

Sensors 2022, 22(5), 1847; https://doi.org/10.3390/s22051847
by Pablo Velarde-Alvarado 1,†, Hugo Gonzalez 2,†, Rafael Martínez-Peláez 3, Luis J. Mena 4,*, Alberto Ochoa-Brust 5, Efraín Moreno-García 6, Vanessa G. Félix 4 and Rodolfo Ostos 4
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Sensors 2022, 22(5), 1847; https://doi.org/10.3390/s22051847
Submission received: 30 December 2021 / Revised: 27 January 2022 / Accepted: 6 February 2022 / Published: 26 February 2022
(This article belongs to the Collection Cyber Situational Awareness in Computer Networks)

Round 1

Reviewer 1 Report

This paper presented a method for Generating Personalized Network
Datasets and Traffic Classification.

major issues:

I read the source code of the paper and find that there are some major issues:

use "esc-01-noIPs-paper.ipynb" as an example

(1) Cell [33] Feature selection

This section used both the training and test data for feature selection, which is wrong.  Test data should never be used for feature and model selection.

(2)  Grid search with Logistic Regression

Again, it used both the training and test data for feature selection, which is wrong.  Test data should never be used for feature and model selection.

(3) To compare different models,  there should be a unique test dataset.

Two standard approaches for a machine learning study:

(1) cross-validation based

split the whole data into train and test sets

run grid search only on training set using cross-validation

once the best model is found,  evaluate the model on the test set

(2) train-validation based

split the whole dataset into a train set and a test set

split the train set into a "pure" train set and a validation set

run grid search on the "pure" training set, and measure the performance on the validation set.  The best model hast the best performance on the validation set.

once the best model is found,  evaluate the model on the test set

mini issue:

line 42 "Recently, new dynamic and synthetic datasets based on specific profiles have been developed"  please cite the papers of these datasets

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

I have the following recommendations regarding improvements of the paper.

1. Add the motivation of the proposed approach.
2. Literature related to the proposed scheme published in the last 3 years should be added (at least 3 more references).
3. Write the Research gap i.e. which mentions the limitations in the literature and how the authors overcome it.
4. Problem statements need to be written that clearly illustrate which problem is the focus of the study.
5. Research contributions, which clearly describe the novelty of the proposed solution should be added in the form of bullets or numbered form (1, 2, 3, ...).
6. Mention some future directions and limitations of the proposed scheme in the conclusions section.
7. Mention all the acronyms in a table by the end of the manuscript.
8. Paper should be checked for english spellings/minor grammatical mistakes.
9. Improve the figures with different fill/colors to clearly distinguish one scheme from the others.

Author Response

Please see the attachment.

 

Author Response File: Author Response.pdf

Reviewer 3 Report

The authors have performed a very nice study of a network intrusion detection problem when the available data for the analysis is scarce. This is a quite relevant issue and should be published. However, I recommend the authors to consider the points raised (see list below) and carefully revise their manuscript.

Minor points:

(1) Table 4 should appear before Algorithm 3, because it is quoted in that order. However, this can be done at the editorial offices.

(2) Table 6 is quoted in the text much after it is shown. Please either move Table 6 down to the text (where it is quoted), or quote Table 6 before it appears. It is difficult to read in this way.

(3) Line 63: Maybe you use the notation 100MB if you refer to megabites.

(4) Line 91: 'The rest of the paper is...'

(5) Line 168: 'the' instead of 'teh'.

(6) Line 302: '...to a target function,...'

(7) Line 307: a closing parenthesis is missing for 'H={...}'

(8) Line 312: insert a comma after e: '...or error, e, between...'

(9) Line 319: you can join line 319 to line 318, i.e. no break, which looks nicer, keeping the sense of the sentence.

(10) Line 420: are p_c and r_c the harmonic means of p_i and r_i?
     If yes, then write: '...to the harmonic means, p_c and r_c, respectively.'
     Please define them explicitly to avoid confusion.

(11) Caption to Fig.7: Add the parameter r to the caption.
     For instance: '...Pearson Correlation Coefficient, r'.

(12) Line 471: change 'obervate' into 'observe'.

(13) A list of abbreviations used should be provided at the end of the paper for convenience of the reader.


References to be added to Section 2: 

D.Q. Le, T. Jeong, H.E. Roman, and J.W.K. Hong. 2011.
Traffic dispersion graph based anomaly detection. 
In Proceedings of the Second Symposium on Information and 
Communication Technology (SoICT '11). 
Association for Computing Machinery, New York, NY, USA, 36–41. 
DOI: https://doi.org/10.1145/2069216.2069227

which is related to DoS attacks and how to detect them. It would
complement the discussion in Sect.2.

 

Author Response

Please see attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

good revision

Back to TopTop