Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

On the Improvement of the Isolation Forest Algorithm for Outlier Detection with Streaming Data

Electronics 2021, 10(13), 1534; https://doi.org/10.3390/electronics10131534

by Michael Heigl^1,2,*

, Kumar Ashutosh Anand²

, Andreas Urmann²

, Dalibor Fiala¹

, Martin Schramm²

and Robert Hable²

Reviewer 1:

Muzaffar Rao

Reviewer 2: Anonymous

Electronics 2021, 10(13), 1534; https://doi.org/10.3390/electronics10131534

Submission received: 18 May 2021 / Revised: 16 June 2021 / Accepted: 21 June 2021 / Published: 24 June 2021

(This article belongs to the Special Issue Design of Intelligent Intrusion Detection Systems)

Round 1

Reviewer 1 Report

This work introduces a new framework PCB-iForest.

Overall the given work looks good.

- Authors should add more details about drift detection - probably in introduction.

- Mention limitations of the PCB-iForest framework.

- Ref [61] and Figure 5 are not referred in the manuscript?

- Conclusion is too big - try to short it.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

This manuscript proposes an abnormly detection approach in detail. It provides a comprehensive description on existing methods, design considerations, datasets, algorithm details and experimental configurations. However, the analysis of experimental results contains some unconvincing arguments and unclear statements. Some modifications are required to make the discussions more convincing.

1. Line 705 states that "the higher the parameters n_sample, r and n_dim, the more false detections take place", however, the number of false detections is 15 (Fig2a) vs 19 (Fig 2b), due to the nature of such experiments, this difference may be due to random fluctuations. Since some following discussions are based on this statement, it is necessary to strengthen this statement.

2. Line 773 states that "PCB-iForestIBFS only marginally performs worse compared to PCB-iForestEIF in most of the cases and outperforms it on dataset". But it is not very easy to verify this conclusion based on Table 5, because some other algorithms such as iMForrest also have decent performance. Maybe adding a row of "overall evaluation of all 15 datasets" (e.g. weighted sum of the metric or ranking of each algorithm) can better guide the reader.

3.The captions "(PCB-iForestEIF - PCBEIF, PCB-iForestIBFS - PCBIBFS)" in figure 3 and 4 are ambiguous, it should be rephrased.

4. In figure 3 and figure 4, in addition to the names of the datasets, it is better to include the IDs so that the reader can better compare them with Table 5.

5. In comparing the avg_t and F1/avg_t, the manuscript provides examplary results (Figures 3 and 4). But these results are based on 4 DIFFERENT datasets, making it confusing to compare. Maybe it is better to provide a complete table like Table 5 to avoid confusion, so that the reader can have a fair comparison to know that the algorithm behave consistently across different datasets.

6.Line 798 "Except for the datasets with ID 10-13 and 15, an applied feature subset yielded the best F1/avg_t on all other datasets referring to Table 6." It is not fair to exclude 10-13 and 15 since these are 1/3 of all datasets and there are 4 cases (columns). Also, the data in Table 6 seems to contain somewhat random variations. It can be more convincing if the author uses statistical method to exclude the possibility that subsets perform worse than full dimensions.

7. The discussion in Lines 801-805 does not provide a full logic why "having a large amount of features, feature selection significantly supports achieving better results in terms of a tradeoff between classification and computational performance."

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The comments are well addressed by the authors.

Article Menu

On the Improvement of the Isolation Forest Algorithm for Outlier Detection with Streaming Data

Further Information

Guidelines

MDPI Initiatives

Follow MDPI