Features Dimensionality Reduction Approaches for Machine Learning Based Network Intrusion Detection

The security of networked systems has become a critical universal issue that influences individuals, enterprises and governments. The rate of attacks against networked systems has increased dramatically, and the tactics used by the attackers are continuing to evolve. Intrusion detection is one of the solutions against these attacks. A common and effective approach for designing Intrusion Detection Systems (IDS) is Machine Learning. The performance of an IDS is significantly improved when the features are more discriminative and representative. This study uses two feature dimensionality reduction approaches: (i) Auto-Encoder (AE): an instance of deep learning, for dimensionality reduction, and (ii) Principle Component Analysis (PCA). The resulting low-dimensional features from both techniques are then used to build various classifiers such as Random Forest (RF), Bayesian Network, Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) for designing an IDS. The experimental findings with low-dimensional features in binary and multi-class classification show better performance in terms of Detection Rate (DR), F-Measure, False Alarm Rate (FAR), and Accuracy. This research effort is able to reduce the CICIDS2017 dataset’s feature dimensions from 81 to 10, while maintaining a high accuracy of 99.6% in multi-class and binary classification. Furthermore, in this paper, we propose a Multi-Class Combined performance metric CombinedMc with respect to class distribution to compare various multi-class and binary classification systems through incorporating FAR, DR, Accuracy, and class distribution parameters. In addition, we developed a uniform distribution based balancing approach to handle the imbalanced distribution of the minority class instances in the CICIDS2017 network intrusion dataset.

Q. Overall, the paper seems more like a technical report than a scientific paper.Note that the submissions describes the results of applying well-known features dimensionality reduction methods on a well-known dataset (CICDS2017).The paper is highly descriptive, but lacks of a proper scientific analytical background: design principles, null/alternative hypothesis, algorithm selection criteria, preliminarily assumptions/requirements, limitations of the research, etc. R: We thank the reviewer for this useful suggestion.We understand the reviewer's concern.The key contributions and novelty appear in the Introduction.We have also carefully revised the manuscript to reflect the reviewer's points.We have provided a brief analysis of the resilience capabilities of the proposed system against variant attacks as well.

Design Principles
We have added We have added a new subsection in the revised manuscript titled "Preliminary Assumptions and Requirements".In this subsection, we provided the assumptions, requirements and our hypotheses.
Our main hypothesis is that reduced features dimensions representation in machine learning based IDS will reduce the time and memory complexity compared to the original features dimensions, while still maintaining high performances (not negatively impacting the achieved accuracy).

Another hypothesis claim is that the proposed balancing technique improves the data representation of imbalanced classes and thus, improves the classification performances compared to the original class distributions.
The results and discussions prove the hypothesis claims.

Algorithm Selection Criteria
We have added/retitled a new section in the revised manuscript titled "Dimensionality Reduction Approaches Selection Criteria and Related work".

We added the selection process of the related work as follows:
This section aims to review the published related work in the past recent years that used features dimensionality reduction approaches to design an intrusion detection system.The selection process was based on certain criteria such as: 1. Being relevant to the CICIDS2017 dataset.
2. Being relevant to dimensionality reduction approaches; precisely, the auto-encoder and the PCA.

Being relevant to machine learning based intrusion detection.
Furthermore, we added the following explanation regarding the algorithm selection criteria in the revised manuscript: We also added Table 1 along with an explanation in the revised manuscript, showing the properties of AE and PCA for dimensionality reduction:

Limitations of the research
We have added a new section titled "Challenges and Limitations" in the revised manuscript.
Q.The experimentation focuses on comparing results based on a pair of quality indicators: accuracy and performance.They are undoubtedly relevant, but other significant features (specially bearing in mind the heterogeneity inherent in the emerging communication environments) were not considered.To append some analytical study in terms of straightening against adversarial attacks, adaptation to nonstationarity/non-linear traffic, easily of dataset acquisition/model building, Quality of experience, fault tolerance, resilience, etc. shall the impact of the publication.

R:
We thank the reviewer for this valuable comment.At the moment, a full-fledge implementation of these suggestions is not possible due to the fact that the current study was not specifically designed to evaluate factors related to fault tolerance and quality of experience.In the future, we intend to further extend our work with improved designs and tactics that consider these purposeful suggestions.
At the time being, we have added some details regarding the above points in the new "Challenges and Limitations" section of the revised manuscript.

Challenges and Limitations
Although this study has successfully demonstrated the significance of the feature dimensionality reduction techniques which led to better results in terms of several performance metrics as well classification speeds for an IDS, it has certain limitations and challenges which are summarized as follows: Fault Tolerance Fault tolerance enables a system to continue operating properly in the event of failure or faults within any of its components.Fault tolerance can be achieved through several techniques.One aspect of fault tolerance in our system is the ability of the designed approach to detect a large set of well-known attacks.Our models have been trained to detect the 14 up-to-date and well-known type of attacks.Furthermore, fault tolerance can be achieved by adopting the majority voting technique [52].The trained models of Random Forest, Bayesian Network, and LDA can be utilized in a majority voting based intrusion detection system that can adapt fault tolerance.Moreover, the deployment of distributed intrusion detection systems in the network can enable fault tolerance.

Adaption to Non-stationary traffic
The AE has the ability to represent models that are linear and non-linear.Moreover, once the model is trained, it can be utilized for non-stationary traffic.We intend to further extend our work in the future with an online anomaly-based intrusion detection system.

Model Resilience
As presented in Tables 12 and 13, the achieved FP rate is 0.010 and 0.001 respectively, which may reflect a built-in attack resiliency.Moreover, our models were trained in an offline manner.This ensures that an adversary cannot inject misclassified instances during the training phase.On the contrary, such case could occur with online-trained models.Therefore, it is essential for the machine learning system employed in intrusion detection to be resilient to adversarial attacks [53].An approach to quantify the resilience of machine learning classifiers was introduced in [53].The association of these factors will be investigated in future studies.Ease of Dataset Acquisition/Model Building The data used for our IDS model was acquired from the CICIDS2017 dataset which is a publicly available dataset provided by the Canadian Institute for Cybersecurity [35,36].The dataset is open source and available for download and sharing.Quality of Experience According to [54], the Quality of Experience is used to measure and express, preferably as numerical values, the experience and perception of the users with a service or application software.The current research was not specifically designed to evaluate factors related to Quality of Experience.Future directions of this research may include such investigations.Additional references included in the revised manuscript for the above explanations are as follows.-Shehri, Salman M., et al. "Common Metrics for Analyzing, Developing and Managing Telecommunication Networks."arXiv preprint arXiv:1707.03290(2017).