Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

IoT Dataset Validation Using Machine Learning Techniques for Traffic Anomaly Detection

Electronics 2021, 10(22), 2857; https://doi.org/10.3390/electronics10222857

by Laura Vigoya^*

, Diego Fernandez

, Victor Carneiro

and Francisco J. Nóvoa

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Electronics 2021, 10(22), 2857; https://doi.org/10.3390/electronics10222857

Submission received: 29 June 2021 / Revised: 9 November 2021 / Accepted: 16 November 2021 / Published: 19 November 2021

(This article belongs to the Special Issue Sensor Network Technologies and Applications with Wireless Sensor Devices)

Round 1

Reviewer 1 Report

It is vital to detect anomaly workload in IoT networks. This paper presents an optimized Random Forest classifier to detect traffic anomalies using the techniques such as SMOTE, RFE, and the ERDE metric. I have some concerns to the proposed work.

1) These proposed features of this work seem not novel for machine learning applications. For example, ERDE has been widely used for early anomaly detection in network IDSs as the authors have described in Line 422. This paper just adopts ERDE in IoT network applications for early anomaly detection. Besides, the contents of SMOTE, RFE, and the MQTT-IoT modeling are not with enough novelty.

2)Some more experiments compared with other relative works are required to support the authors' points. Besides, more classifier implementations using different machine learning algorithms are required to prove the effectiveness and high precision of the proposed random forest classifier.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

First of all congratulations on submitting the paper. The comments which could improve the paper are given below:

1) At the end of the abstract, the obtained results such as accuracy, other important measures (values) could be presented. It would help the reader to know what was obtained and how it related to what they are looking for, the first impression. At least once the definition of IoT has to be written as a full sentence, for people who are less familiar with this shortening.
2) Need to re-check keywords, sometimes a comma is used, sometimes a semicolon to separate them. Uppercase the machine learning term or other terms has no sense.
3) Lines 52-57 would be better to place somewhere else (for example before experiments), and in my opinion, the argument why Random forest has been used is too weak. In this field, there are a lot of deep learning solutions which prove efficiency.
4) In my opinion the 1.1, 1.2, 1.3 could be just a part of the Introduction section, and subsets no needed.
5) I have to commend the authors for a really thorough analysis of the literature, well done.
6) I don't think it is enough just to write that details about the dataset are given in some reference (line 231). In this case, it is hard to understand all situations about the data just from reading this paper, so at least some short description should be given to see what size of dataset items, classes, unbalanced classes distribution, etc.
7) Does not get the point why to bold abbreviations, lines 243-244.
8) No information about the dataset before SMOTE, and after SMOTE. Also, how the number of neighbors in SMOTE method has been selected (one of the SMOTE parameters)?
9) The Kappa and accuracy measures are used, so at least the main formulas what these measures show and how it is calculated could be given.
10) In lines 464 and 475 I don't understand what is (a), (b), because in the Figures there is no such labels, just one picture, something wrong.
11) I understood that the authors used in the research the Random forest mostly by default, so it still would be interesting to see how other algorithms with the same conditions are effective. From personal experience, I can guarantee that other tree-based methods would give almost the same accuracy. To reach 100% is impossible, but to get the worse results using such methods like K-NN is very easy. In my opinion, the authors need to give a very good explanation of why the Random Forest is the best in this situation because the is no comparative analysis made with other methods.

Good luck with submitting the manuscript.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The authors have made lots of modifications during the revision. Most of the concerns have been addressed. The writing of the paper still needs to be carefully checked.

Eg. in Table 6, 1.1226e-03 and 8.3510e-3 should use the same format.

Author Response

Response to Reviewer 1 Comments

Point 1: The authors have made lots of modifications during the revision. Most of the concerns have been addressed. The writing of the paper still needs to be carefully checked.

Eg. in Table 6, 1.1226e-03 and 8.3510e-3 should use the same format. 

Response 1: We have tried to check all format, the tables and figures and put them in the same format in the best possible way. Lines 539, 544, 550, 562,576, 583, 586, 603 and 612.

Reviewer 2 Report

Thank you for taking the suggestion into account. There are some minor mistakes left, for example, the references are not formatted in the same style. But I hope it will be fixed. Good luck.

Author Response

Response to Reviewer 2 Comments

Point 1: Thank you for taking the suggestion into account. There are some minor mistakes left, for example, the references are not formatted in the same style. But I hope it will be fixed. Good luck. 

Response 1: We have tried to check all the references and put them in the same format in the best possible way. Thank you.

Article Menu

IoT Dataset Validation Using Machine Learning Techniques for Traffic Anomaly Detection

Further Information

Guidelines

MDPI Initiatives

Follow MDPI