Central to tackling the exponential rise in cyber-attacks [1
], is Intrusion Detection Systems (IDS) systems that are capable of detecting zero-day cyber-attacks. Machine Learning (ML) techniques have been extensively utilised for designing and building robust IDS [3
]. However, while current IDS can achieve high detection accuracy for known attacks, they often fail to detect new, zero-day attacks. This is due to the limitations of current IDS, which rely on pre-defined patterns and signatures. Moreover, current IDS suffer from high false-positive rates, thus limiting the performance and their practical use in real-life deployments. As a result, large numbers of zero-day attacks remain undetected, which escalate their consequences (denial of service, stolen customer details, etc.).
According to Chapman [5
], a zero-day attack is defined as “a traffic pattern of interest that, in general, has no matching patterns in malware or attack detection elements in the network” [5
]. The implications of zero-day attacks in real-world are discussed by Bilge and Dumitras [6
]. Their research focuses on studying their impact and prevalence. The authors highlight that zero-day attacks are significantly more prevalent than suspected, demonstrating that, out of their 18 analysed attacks, 11 (61%) were previously unknown [6
]. Their findings showed that a zero-day attack can exist for a substantial period of time (average of 10 months [6
]) before they are detected and can compromise systems during that period. Additionally, Nguyen and Reddi [7
] refer to a statistical study that shows that 62% of the attacks are identified after compromising systems. Moreover, the number of zero-day attacks in 2019 exceeds the previous three years [8
]. All of these considerations highlight the clear and urgent need for more effective attack detection models.
One of the main research directions to detect zero-day attacks relies on detecting outliers (i.e., instances/occurrences that vary from benign traffic). However, the main drawback of the available outlier-based detection techniques is their relatively low accuracy rates as a result of both high false-positive rates and high false-negative rates. As discussed, the high false-negative rates leave the system vulnerable to attack, while the high false-positive rates needlessly consume the time of cyber security operation centres; indeed, only 28% of investigated intrusions are real [9
]. Ficke et al. [10
] emphasise the limitations that false-negative could bring to IDS development, for example, it reduces IDS effectiveness.
Sharma et al. [11
] proposed a framework to detect zero-day attacks in Internet of Things (IoT) networks. They rely on a distributed diagnosis system for detection. Sun et al. [12
] proposed a Bayesian probabilistic model to detect zero-day attack paths. The authors visualised attacks in a graph-like structure and introduced a prototype to identify attacks. Zhou and Pezaros [13
] evaluated six different supervised ML techniques; using the CIC-AWS-2018 dataset. The authors use decision tree, random forest, k-nearest neighbour, multilayer perceptron, quadratic discriminant analysis, and gaussian naïve bayes classifiers. The authors do not fully detail how these supervised ML techniques are trained on benign traffic solely to be utilised for unknown attacks detection or how zero-day (previously unseen) attacks are simulated and detected. Moreover, transfer learning is used to detect zero-day attacks. Zhao et al. [14
] used transfer learning to map the connection between known and zero-day attacks [14
]. Sameera and Shashi [15
] used deep transudative transfer learning to detect zero-day attacks.
Furthermore, ML is used to address Zero-day malware detection. For example, Abri et al. evaluated the effectiveness of using different ML techniques (Support Vector Machine (SVM), Naïve Bayes, Multi-Layer Perceptron, Decision trees, k-Nearest Neighbour, and Random Forests) to detect zero-day malware [16
], while Kim et al. [17
] proposed the use of Deep-Convolutional Generative Adversarial Network (DCGAN).
In this paper, we propose utilising the capabilities of Deep Learning (DL) to serve as outlier detection for zero-day attacks with high recall. The main goal is to build a lightweight intrusion detection model that can detect new (unknown) intrusions and zero-day attacks, with a high recall (true-positive rate) and low fallout (false-positive rate). Accordingly, having a high detection capability of zero-day attacks will help to reduce the complications and issues that are associated with new attacks.
The contributions of this work are threefold;
Proposing and implementing an original and effective autoencoders model for zero-day detection IDS.
Building an outlier detection One-Class SVM model.
Comparing the performance of the One-Class SVM model as a baseline outlier-based detector to the proposed Autoencoder model.
The rest of the paper is organised as follows; the background is presented in Section 2
, Section 3
discusses the related work showing the results and approaches of recent IDS research. Section 4
describes the datasets that are used and how zero-day attacks are simulated. In Section 5
, the proposed models are explained. Section 6
presents the experimental results and findings. Finally, the paper is concluded in Section 7
3. Related Work
IDS is defined as “a system or software that monitors a network or systems for malicious activity”. Generally, IDSs can either be Network Intrusion Detection System (NIDS) or Host Intrusion Detection System (HIDS). NIDS monitors the network and communication while HIDS monitors the internal operation and log files [34
]. Based on their detection techniques, IDSs are classified into Signature-based IDS, which relies on known signatures of prior and known attacks, and Anomaly-based IDS, which relies on patterns [35
]. When compared to signature-based IDS, anomaly-based IDS perform better with complex attacks and unknown attacks.
In the past decade, researchers developed multiple techniques in order to enhance the robustness of anomaly-based IDS. Subsequent to a long period of using statistical methods to detect cyber anomalies and attacks, the need for ML emerged. Because of the sophistication of cyber-attacks, statistical methods were rendered inadequate to handle their complexity. Therefore, with the advancement of ML and DL in other domains (i.e., image and video processing, natural language processing, etc.), the researchers adopted these techniques for cyber use. Nguyen and Reddi [7
] discuss the importance and benefit ML can provide to cybersecurity by granting a ‘robust resistance’ against attacks.
Based on the analysis of recent IDS research [36
], ML has dominated the IDS research in the past decade. The analysis shows that Artificial Neural Networks (ANN), SVM, and k-means are the prevailing algorithms. Buczak and Guven [37
] analyse the trends and complexity of different ML and DL techniques used for IDS. Moreover, recent research is directed towards the use of DL to analyse network traffic, due to the DL capabilities of handling complex patterns. Due to the complexity and challenges that are associated with encrypted traffic, building robust and reliable DL-based IDSs is crucial. Aceto et al. [38
] describe this advancement in traffic encryption, as it ‘defeats traditional techniques’ that relies on packet-based and port-based data. At the beginning of 2019, 87% of traffic was encrypted [39
], which emphasises on the growth and, thus the need for corresponding IDs. Research has utilised flow-based features as the building block for training and analysing IDSs in order to handle encrypted and non-encrypted traffic. The benefit of flow-based features, when compared to packet-based ones, relies on the fact that they can be used with both encrypted and unencrypted traffic and also they characterise high-level patterns of network communications. New DL approaches have recently been used to build robust and reliable IDS. One of these techniques is autoencoders.
In the cyber security domain, autoencoders are used for feature engineering and learning. Kunang et al. [40
] used autoencoders for feature extraction, features are then passed into an SVM for classification. KDD Cup’99 and NSL-KDD datasets are both used for evaluation. The evaluation of the model, using autoencoder for feature extraction and SVM for multi-class classification, has an overall accuracy of 86.96% and precision of 88.65%. The different classes accuracies show a poor performance, as follows: 97.91%, 88.07%, 12.78%, 8.12%, and 97.47% for DoS, probe, R2L, U2R, and normal, respectively; a precision of 99.45%, 78.12%, 97.57%, 50%, and 81.59% for DoS, probe, R2L, U2R, and normal, respectively.
Kherlenchimeg and Nakaya [41
] use a sparse autoencoder to extract features. The latent representation (the bottleneck layer of the autoencoder) is fed into a Recurrent Neural Network (RNN) for classification. The accuracy of the IDS model while using the NSL-KDD dataset is 80%. Similarity, Shaikh and Shashikala [42
] focus on the detection of DoS attacks. They utilise a Stacked Autoencoder with an LSTM network for the classification. Using the NSL-KDD dataset, the overall detection accuracy is 94.3% and a false positive rate of 5.7%.
] used autoencoders for dimensionality reduction and the extraction of bottleneck feature. The experiments were evaluated while using the NSL-KDD dataset. In a similar fashion, Niyaz et al. [44
] used autoencoders for unsupervised feature learning. They used the NSL-KDD dataset. Additionally, AL-Hawawreh et al. [45
] used deep autoencoders and trained them on benign traffic in order to deduce the most important feature representation to be used in their deep feed-forward ANN.
Shone et al. [46
] use a Stacked Non-Symmetric Deep Autoencoder to refine and learn the complex relationships between features that are then used in the classification using random forest technique. The authors used both the KDD Cup’99 and NSL-KDD datasets.
Farahnakian and Heikkonen [47
] used a deep autoencoder to classify attacks. The deep autoencoder is fed into a single supervised layer for classification. Using the KDD Cup’99 dataset, the highest accuracies are 96.53% and 94.71% for binary and multi-class classification respectively.
In agreement with our manuscript, Meidan et al. [48
] utilised the encoding-decoding capabilities of autoencoders to learn normal behaviour in IoT networks setup. Subsequently, IoT botnet and malicious behaviour are detected when the autoencoder reconstruction fails. Bovenzi et al. [49
] emphasised the need for adaptive ML models to cope with the heterogeneity and unpredictability of IoT networks. The authors propose a two-stage IDS model, where they leverage the autoencoder capabilities in the first stage of their IDS.
Two mainstream IDS datasets are chosen in order to evaluate the proposed models. The first is the CICIDS2017 dataset [50
], which was developed by the Canadian Institute for Cybersecurity (CIC). The CICIDS2017 dataset covers a wide range of recent insider and outsider attacks. It comprises a diverse coverage of protocols and attacks variations and, finally, it is provided in a raw format which enables researchers the flexibility of processing the dataset. Therefore, the CICIDS2017 dataset is well-suited for evaluating the proposed models.
The CICIDS2017 dataset is a recording of a five-day benign, insider and outsider attacks traffic. The recorded PCAPs are made available. Table 1
summarises the traffic recorded per day. The raw files of the CICIDS2017 dataset are pre-processed, as described in Section 5.1
. The full CICIDS2017 description and analysis is available in [51
The second dataset is the NSL-KDD [53
]. NSL-KDD was released by the CIC in order to overcome the problems of the KDD Cup’99 dataset [54
]. The KDD Cup’99 dataset was the dataset of choice for evaluating more than 50% of the past decade IDS [36
], followed by the NSL-KDD dataset, which was used for evaluating over 17% of IDS. However, the KDD Cup’99 has multiple drawbacks, as discussed thoroughly in [55
]. These drawbacks include class imbalance and redundant records. Additionally, Siddique et al. [56
] discussed the warnings provided to the UCI lab advising not to use KDD Cup’99 dataset in further IDS research. Consequently, NSL-KDD fits for the evaluation purpose of this manuscript, as well as the comparison with relevant research.
The NSL-KDD dataset covers normal/benign traffic and four cyber-attack classes, namely, Denial of Service (DoS), probing, Remote to Local (R2L), and User to Root (U2R). The NSL-KDD dataset is available in two files ‘KDDTrain+.csv’ and test file ‘KDDTest+.csv’. Similar to the KDD Cup’99, the NSL-KDD dataset is provided in comma separated value (csv) feature files. Each instance is represented with its feature values alongside the class label. The feature files undergo categorical features encoding to be appropriate for ML usage. The KDD Cup’99 and NSL-KDD datasets are analysed in [54
]; furthermore, NSL-KDD is studied in [57
7. Conclusions and Future Work
The work that is presented in this manuscript proposes a new outlier-based zero-day cyber-attacks detection. The main goal was to develop an intelligent IDS model that is capable of detecting zero-day cyber-attacks with a high detection accuracy while overcoming the limitations of currently available IDS. This manuscript purposes and evaluates an autoencoder model to detect zero-day attacks. The idea is inspired by the encoding-decoding capability of autoencoders.
The results show high detection accuracy for the autoencoder model for both the CICIDS2017 and the NSL-KDD. The CICIDS2017 zero-day detection accuracy reaches 90.01%, 98.43%, 98.47%, and 99.67% for DoS (GoldenEye), DoS (Hulk), Port scanning, and DDoS attacks. Moreover, the NSL-KDD detection accuracy reached 92.96%, which outperforms the only available zero-day autoencoder-based detection manuscript [63
Furthermore, the autoencoder model is compared to an unsupervised outlier-based ML technique; One-Class SVM. One-Class SVM is a prominent unsupervised ML technique that detects outliers. The one-class SVM mode presents its effectiveness in detecting zero-day attacks for NSL-KDD datasets and the distinctive ones from the CICIDS2017 dataset. When compared to One-Class SVM, autoencoder demonstrates its surpassing detection accuracy. Furthermore, both of the models demonstrate low miss rate (false-positives). Future work involves evaluating the proposed models with datasets that cover special purpose network IDS (e.g., IoT and Critical Infrastructure networks), which will comprise insights into adapting the proposed models, as well as proposing and adapting other ML techniques to use for zero-day attack detection. The source code for building and evaluating the proposed models will be made available through an open-source GitHub repository.