Next Article in Journal
Heavy-Ion Induced Single Event Upsets in Advanced 65 nm Radiation Hardened FPGAs
Previous Article in Journal
Improved Fractional Open Circuit Voltage MPPT Methods for PV Systems
 
 
Article
Peer-Review Record

Features Dimensionality Reduction Approaches for Machine Learning Based Network Intrusion Detection

Electronics 2019, 8(3), 322; https://doi.org/10.3390/electronics8030322
by Razan Abdulhammed 1, Hassan Musafer 1, Ali Alessa 1, Miad Faezipour 1,2,* and Abdelshakour Abuzneid 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Electronics 2019, 8(3), 322; https://doi.org/10.3390/electronics8030322
Submission received: 11 February 2019 / Revised: 4 March 2019 / Accepted: 11 March 2019 / Published: 14 March 2019
(This article belongs to the Section Computer Science & Engineering)

Round  1

Reviewer 1 Report

The authors analyzes two features dimensionality approaches (Auto-Encoder and PCA) for supporting Intrusion Detection Systems based on machine learning techniques.

The paper is well written and presented in general but there are some points to be clarified.

In particular, the related work section should include a table in which the analyzed approaches are compared underlying pros and cons. In the fourth section, the authors should provide more details about parameters setting (i.e. lambda ,rho etc.). In the evaluation phase the authors shoud provide more details about the kind of features (from 81 to 59 as described at the row 322) involved in the reduction process.

I suggest also to cite the following papers for underlying the relevance of the topic:

1) Recognizing unexplained behavior in network traffic. In Network Science and Cybersecurity (pp. 39-62). Springer, New York, NY.

2) Machine learning based network intrusion detection. In Computational Intelligence and Applications (ICCIA), 2017 2nd IEEE International Conference on (pp. 79-83). IEEE.

Finally, a linguistic revision is necessary.

Author Response

We received valuable feedback from the respectful reviewers. They significantly helped us improve the quality of our manuscript. For that, we would like to thank the editor, associate/assistant editor and the reviewers. In summary, the changes applied to the manuscript based on the reviewers’ comments, are as follows:

All the reviewers’ comments and suggestions have been carefully considered and addressed.

A few references were added for better clarity.

Explanations have been   added/modified according to the reviewers’ comments and/or the authors’  observations.

All the changes  applied have been highlighted in the revised manuscript.

 

Now we explain the changes in detail per item. The items are categorized per reviewer. Our response to each item starts with R and is shown in boldface. 

 

We are truly grateful to have been given the chance to revise the paper according to the reviewers’ comments. In what follows, we have provided the reviewers’ comments and our responses regarding how we have addressed each of the comments. We have made our best efforts to address all comments. Thanks so much.

 

Reviewer #1:

Comments to the Author:

Q. The authors analyzes two features dimensionality approaches (Auto-Encoder and PCA) for supporting Intrusion Detection Systems based on machine learning techniques. The paper is well written and presented in general but there are some points to be clarified.

R: Thank you very much. We are glad and grateful that the reviewer found our paper satisfying.

 

Q. In particular, the related work section should include a table in which the analyzed approaches are compared underlying pros and cons

R: We thank the reviewer for this useful suggestion.

·    One common dimensionality reduction approach is the Missing Value Ratio (MVR) approach. The MVR approach is efficient when the number of missing values is high. For the CICIDS2017 dataset, the number of missing values is near zero. Therefore, we excluded the Missing Value Ratio approach. Other common approaches include the Forward Feature Construction (FFC) and Backward Feature Elimination (BFE) approaches. Both FFC and BFE are prohibitively slow on high dimensional datasets, which is the case for CICIDS2017 (>2,500,500 instances). As a result, we did not discuss these approaches. The PCA technique, on the other hand, is relatively computationally cost efficient, can deal with large datasets, and is widely used as a linear dimensionality reduction approach. The auto-encoder dimensionality approach is an instance of deep learning, which is also suitable for large datasets with high dimensional features and complex data representations.

 

·        Three additional references regarding dimensionality reduction techniques were added and cited in the revised manuscript:

·        Van Der Maaten, Laurens, Eric Postma, and Jaap Van den Herik. "Dimensionality reduction: a comparative review." J Mach Learn Res 10 (2009): 66-71.

·        Bertens, Paul. "Rank Ordered Autoencoders." arXiv preprint arXiv:1605.01749 (2016).

·        Silipo, Rosaria, et al. "Seven techniques for dimensionality reduction." KNIME (2014).

 

·        We also added Table 1 in the revised manuscript, which shows the properties of AE and PCA for dimensionality reduction along with an explanation.

Q. In the fourth section, the authors should provide more details about parameters setting (i.e. lambda ,rho etc.)

R: We thank the reviewer for the valuable comment.

In the computations, the weights are multiplied by λ to prevent the weights from growing too large.

 

The sparsity parameters and penalty are designed to restrict the activation of the hidden units, which reduces the dependency between the features.

 

We added a Design Principle Table in the revised manuscript for the parameters.


Q. In the evaluation phase, the author’s should provide more details about the kind of features (from 81 to 59 as described at the row 322) involved in the reduction process.

R: We thank the reviewer for the time spent in reviewing our work.

Here, the AE reconstructed a new and reduced feature representation pattern that reflects the original data with minimum error.  Unlike features selection techniques where the set of features made by feature selection is a subset of the original set of features that can be identified precisely, AE generated new features pattern with reduced dimensions. These explanations are added in the revised manuscript.

 

Q. I suggest also citing the following papers for underlying the relevance of the topic:

1) Recognizing unexplained behavior in network traffic. In Network Science and Cybersecurity (pp. 39-62). Springer, New York, NY.

3) Machine learning based network intrusion detection. In Computational Intelligence and Applications (ICCIA), 2017 2nd IEEE International Conference on (pp. 79-83). IEEE.

R: We thank the reviewer for this useful suggestion. The suggested references were included and cited in the revised manuscript as reference numbers [1] and [3].

 Q. Finally, a linguistic revision is necessary.

R: The manuscript has been checked by a Native English speaker as well as the academic resource center which is a professional English editing service. Minor corrections were made. Thanks.

Author Response File: Author Response.pdf

Reviewer 2 Report

The submission describes a study of the impact of various feature dimensionality reduction approaches applied to machine-learning based intrusion detection. In general terms, the proposal displays an strong contribution, but the authors must to address the following suggestions prior to publication:

Overall, the paper seems more like a technical report than a scientific paper. Note that the submissions describes the results of applying well-known features dimensionality reduction methods on a well-known dataset (CICDS2017). The paper is highly descriptive, but lacks of a proper scientific analytical background: design principles, null/alternative hypothesis, algorithm selection criteria, preliminarily assumptions/requirements, limitations of the research, etc.

The experimentation focuses on comparing results based on a pair of quality indicators: accuracy and performance. They are undoubtedly relevant, but other significant features (specially bearing in mind the heterogeneity inherent in the emerging communication environments) were not considered. To append some analytical study in terms of straightening against adversarial attacks, adaptation to non-stationarity/non-linear traffic, easily of dataset acquisition/model building, Quality of experience, fault tolerance, resilience, etc. shall enhance the impact of the publication.


Author Response

We received valuable feedback from the respectful reviewers. They significantly helped us improve the quality of our manuscript. For that, we would like to thank the editor, associate/assistant editor and the reviewers. In summary, the changes applied to the manuscript based on the reviewers’ comments, are as follows:

All the reviewers’ comments and suggestions have been carefully considered and addressed.

A few references were added for better clarity.

Explanations have been  added/modified according to the reviewers’ comments and/or the authors’     observations.

All the changes applied have been highlighted in the revised manuscript.

 

Now we explain the changes in detail per item. The items are categorized per reviewer. Our response to each item starts with ‘R:’ and is shown in boldface. 

 We are truly grateful to have been given the chance to revise the paper according to the reviewers’ comments. In what follows, we have provided the reviewers’ comments and our responses regarding how we have addressed each of the comments. We have made our best efforts to address all comments. Thanks so much.

Reviewer #2:

Comments to the Author:

Q. The submission describes a study of the impact of various feature dimensionality reduction approaches applied to machine-learning based intrusion detection. In general, terms, the proposal displays a strong contribution, but the authors must to address the following suggestions prior to publication:

R: Thank you very much. We are glad and grateful that the reviewer believes our paper displays a strong contribution.

Q. Overall, the paper seems more like a technical report than a scientific paper. Note that the submissions describes the results of applying well-known features dimensionality reduction methods on a well-known dataset (CICDS2017). The paper is highly descriptive, but lacks of a proper scientific analytical background: design principles, null/alternative hypothesis, algorithm selection criteria, preliminarily assumptions/requirements, limitations of the research, etc.

R: We thank the reviewer for this useful suggestion. We understand the reviewer’s concern. The key contributions and novelty appear in the Introduction. We have also carefully revised the manuscript to reflect the reviewer’s points. We have provided a brief analysis of the resilience capabilities of the proposed system against variant attacks as well.

Design Principles

We have added Table 5 in the revised manuscript. This Table explains the design principles for the auto-encoder. More explanations have also been added in the revised manuscript.

Additional references included in the revised manuscript for the above explanations are as follows.

[2] Abdulhammed, R.; Faezipour, M.; Elleithy, K., Intrusion Detection in Self organizing Network: A Survey.In Intrusion  Detection and  Prevention  for Mobile  Ecosystems ;  Kambourakis, G.;  Shabtai,  A.; Kolias,  C.;Damopoulos, D., Eds.; CRC Press Taylor  Francis Group: NewYork, 2017; chapter 13, pp. 393–449.

 

[7] Rosaria, S.; Adae, I.; Aaron, H.; Michael, B. Seven Techniques for Dimensionality Reduction

[8] Van Der Maaten, L.; Postma, E.; Van den Herik, J. Dimensionality reduction: a comparative review. J Mach Learn Res 2009,10,, 66–71.

[9] Bertens, P. Rank Ordered Autoencoders. arXiv preprint arXiv:1605.01749, 2016

 

Preliminarily Assumptions/Requirements and Null/Alternative Hypothesis

We have added a new subsection in the revised manuscript titled “Preliminary Assumptions and Requirements”. In this subsection, we provided the assumptions, requirements and our hypotheses.

Our main hypothesis is that reduced features dimensions representation in machine learning based IDS will reduce the time and memory complexity compared to the original features dimensions, while still maintaining high performances (not negatively impacting the achieved accuracy).

Another hypothesis claim is that the proposed balancing technique improves the data representation of imbalanced classes and thus, improves the classification performances compared to the original class distributions.

The results and discussions prove the hypothesis claims.

Algorithm Selection Criteria

We have added/retitled a new section in the revised manuscript titled “Dimensionality Reduction Approaches Selection Criteria and Related work”.

We added the selection process of the related work as follows:

This section aims to review the published related work in the past recent years that used features dimensionality reduction approaches to design an intrusion detection system. The selection process was based on certain criteria such as:

1.Being relevant to the CICIDS2017 dataset.

2. Being relevant to dimensionality reduction approaches; precisely, the auto-encoder and the PCA.

3. Being relevant to machine learning based intrusion detection.

Furthermore, we added the following explanation regarding the algorithm selection criteria in the revised manuscript:

For decades, researchers used dimensionality reduction approaches [5,6] for different reasons such as to reduce the computational processing overhead, reduce noise in the data, and for better data visualization and interpretation. One common dimensionality reduction approach is the Missing Value Ratio (MVR) approach [7]. The MVR approach is efficient when the number of missing values is high. For the CICIDS2017 dataset, the number of missing values is near zero. Therefore, we excluded the Missing Value Ratio approach. Other common approaches include the Forward Feature Construction (FFC) and Backward Feature Elimination (BFE) approaches [7]. Both FFC and BFE are prohibitively slow on high dimensional datasets, which is the case for CICIDS2017 (>2,500,500 instances). As a result, we did not discuss these approaches. The PCA technique, on the other hand, is relatively computationally cost efficient, can deal with large datasets, and is widely used as a linear dimensionality reduction approach [5,8]. The auto-encoder dimensionality approach is an instance of deep learning, which is also suitable for large datasets with high dimensional features and complex data representations [9].

We also added Table 1 along with an explanation in the revised manuscript, showing the properties of AE and PCA for dimensionality reduction.

Limitations of the research

We have added a new section titled “Challenges and Limitations” in the revised manuscript.

Q. The experimentation focuses on comparing results based on a pair of quality indicators: accuracy and performance. They are undoubtedly relevant, but other significant features (specially bearing in mind the heterogeneity inherent in the emerging communication environments) were not considered. To append some analytical study in terms of straightening against adversarial attacks, adaptation to non-stationarity/non-linear traffic, easily of dataset acquisition/model building, Quality of experience, fault tolerance, resilience, etc. shall enhance the impact of the publication.

 R: We thank the reviewer for this valuable comment.At the moment, a full-fledge implementation of these suggestions is not possible due to the fact that the current study was not specifically designed to evaluate factors related to fault tolerance and quality of experience. In the future, we intend to further extend our work with improved designs and tactics that consider these purposeful suggestions.

At the time being, we have added some details regarding the above points in the new “Challenges and Limitations” section of the revised manuscript.

Challenges and Limitations

Although this study has successfully demonstrated the significance of the feature dimensionality reduction techniques which led to better results in terms of several performance metrics as well classification speeds for an IDS, it has certain limitations and challenges which are summarized as follows:

Fault Tolerance

Fault tolerance enables a system to continue operating properly in the event of failure or faults within any of its components. Fault tolerance can be achieved through several techniques. One aspect of fault tolerance in our system is the ability of the designed approach to detect a large set of well-known attacks. Our models have been trained to detect the 14 up-to-date and well-known type of attacks. Furthermore, fault tolerance can be achieved by adopting the majority voting technique [52]. The trained models of Random Forest, Bayesian Network, and LDA can be utilized in a majority voting based intrusion detection system that can adapt fault tolerance. Moreover, the deployment of distributed intrusion detection systems in the network can enable fault tolerance.

 

Adaption to Non-stationary traffic

 The AE has the ability to represent models that are linear and non-linear. Moreover, once the model is trained, it can be utilized for non-stationary traffic. We intend to further extend our work in the future with an online anomaly-based intrusion detection system.

Model Resilience

As presented in Tables 12 and 13, the achieved FP rate is 0.010 and 0.001 respectively, which may reflect a built-in attack resiliency. Moreover, our models were trained in an offline manner. This ensures that an adversary cannot inject misclassified instances during the training phase. On the contrary, such case could occur with online-trained models. Therefore, it is essential for the machine learning system employed in intrusion detection to be resilient to adversarial attacks [53]. An approach to quantify the resilience of machine learning classifiers was introduced in [53]. The association of these factors will be investigated in future studies.

Ease of Dataset Acquisition/Model Building

The data used for our IDS model was acquired from the CICIDS2017 dataset which is a publicly available dataset provided by the Canadian Institute for Cybersecurity [35,36]. The dataset is open source and available for download and sharing.

 

Quality of Experience

According to [54], the Quality of Experience is used to measure and express, preferably as numerical values, the experience and perception of the users with a service or application software. The current research was not specifically designed to evaluate factors related to Quality of Experience. Future directions of this research may include such investigations.

 

Additional references included in the revised manuscript for the above explanations are as follows.

[52]        Kaur, Perminder, Dhavleesh Rattan, and Amit Kumar Bhardwaj. "An analysis of mechanisms for making ids fault tolerant." International Journal of Computer Applications 1.24 (2010): 22-25.

[53]        E. Viegas, A. Santin, N. Neves, A. Bessani and V. Abreu, "A Resilient Stream Learning Intrusion Detection Mechanism for Real-Time Analysis of Network Traffic," GLOBECOM 2017 - 2017 IEEE Global Communications Conference, Singapore, 2017, pp. 1-6.
doi: 10.1109/GLOCOM.2017.8254495.

[54]        Al-Shehri, Salman M., et al. "Common Metrics for Analyzing, Developing and Managing Telecommunication Networks." arXiv preprint arXiv:1707.03290(2017).

 Author Response File: Author Response.pdf

Round  2

Reviewer 2 Report

All the suggestions of the reviewer have been successfully addressed, so this reviewer considers that the submission is ready to be accepted.

Back to TopTop