1. Introduction
With the increasing extent, diversity and added value of information and communication systems (ICT) usage, especially in today’s networked world, the number and types of security attacks increase also (see, e.g., [
1]). Companies, as well as public institutions, invest constantly growing parts of their ICT budgets on their network and computer security. In order to efficiently cope with this situation, research in the ICT security-related domain is very much needed.
We can observe two main approaches to IDS in the research literature. One focuses more on machine learning approaches with the aim to achieve the best possible prediction of attacks, or even specific attack types. The other one (knowledge-based approach) is more user-centric and tries to model the necessary background knowledge in the network security domain. This article presents a particular and specific contribution to the mentioned challenges in terms of knowledge-based IDS design, implementation and experimental evaluation. The main motivation is to get the best out of both worlds, i.e., extremely good performance in terms of prediction accuracy and false alarm rates, on one hand, and proper coverage of the relevant context as well as the provision of proper additional information to the network operators, on the other. We build on the current state-of-the-art in the two prominent areas in IDS, contributing to ICT security-threat prevention, i.e., machine learning and knowledge-based approaches. We are looking for symmetry in using both of these principal approaches, which means that both of them are equally important and mutually contribute to addressing intrusion detection and prevention challenges.
The rest of this paper is organized as follows. Related work is presented in the following
Section 2. Main building blocks of this symmetry-following strategy are outlined in
Section 3.1, followed by the detailed description of the two main building blocks, namely the knowledge model (described in
Section 3.2) and machine learning approaches (
Section 3.3). The mutual cooperation of the two main building blocks of our solution is presented in
Section 3.4.
Section 4 presents the experimental evaluation of the proposed symmetry-following system for intrusion detection. We start with the introduction of performance metrics used (
Section 4.1) in the subsequent experiments. Experimental evaluation is divided into three parts. We first present the performance of the two main prediction models of our hierarchical IDS in
Section 4.2.1, continue with the explanation of the interplay between our knowledge model and prediction models in
Section 4.2.2 and performance of a specific prediction model targeted for seldom new attack types in
Section 4.2.3. Finally,
Section 5 summarizes the main findings from our experiments and highlights the advantages of the proposed solution, which brings the symmetry-following strategy.
2. Related Work
In recent years, the use of machine learning methods in network intrusion detection domain became a widely studied area of research. With several publicly available datasets, numerous different approaches emerged to tackle the problem of detection of network attacks. Currently, popular methods used in the intrusion detection domain are neural networks models. As they usually gain better performance than traditional machine learning models, they became popular methods in this area, although their explanatory capacity is often very limited. Numerous different approaches are used, including both shallow and deep learning methods [
2], or a combination of both [
3].
Standalone data analytics and machine learning methods are often combined into more advanced hybrid approaches, combining various methods including the use of hierarchical and layered models [
4,
5,
6,
7] and anomaly detection models [
8,
9] or enhancing the machine learning models with knowledge-based approaches [
10,
11]. The overall motivation of those approaches is usually aimed to provide the capability of detection of rare attacks without sacrificing the detection accuracy of the frequent ones and, on the other hand, minimizing the false alarm rate in such attacks. Ahmin et al. [
12] proposed a hierarchical model combining tree and rule-based concepts and used three models (REP Tree, JRip, Forest PA), where the outputs of the two models are used as the inputs to the third one, creating a hierarchical model consisting of three classifiers, predicting either normal or attack connections and attack types. The approach was evaluated on the CICIDS 2017 dataset and achieved an accuracy of 96.65%, while keeping detection rate at 94,47%. Compared to other models on the same dataset, this approach gained superior accuracy performance, while still keeping reasonable training and evaluation time. In Reference [
13], a layered-approach with multi-classifiers is presented. To improve the classification of minority attacks, authors used a combination of Naive Bayes and NBTree models on the NSL-KDD dataset. The IDS uses four layers, the top two for detection of major attacks (e.g., DoS, Probe) and the other two layers for minority attack classes (U2R, R2L). The model uses different reduced feature sets for each attack prediction; authors kept only a subset of relevant features for each attack class using domain knowledge and continuous experiments with the models. Evaluation proved an improvement in classification in minor classes compared to standard approaches, gaining overall accuracy of 99.05%. In Reference [
14], Gain Ratio was used to select the features for each layer of the layered model consisting of trees, Naive Bayes and Perceptron models. In Reference [
15], a similar goal was achieved using Conditional Random Fields approach. Recently, Zhou et al. [
16] presented the ensemble model consisting of C4.5, Random Forest and PA Forest, where the decisions are based on the average of probabilities rule (voting based on average of probabilities). The ensemble model uses feature selection by applying a hybrid approach of combination of correlation-based feature selection and biologically inspired bat algorithm. The model was tested on KDD 99, CICIDS 2017 and NSL-KDD datasets, achieving superior performance compared to other similar models (accuracy of 99,9% on KDD, 96,8% on CICIDS 2017 and 99,1% on NSL-KDD).
In the area of knowledge models for intrusion detection systems, there have been ontologies explicitly designed for the description of the network attacks [
10,
17,
18], more specifically targeted to specific issues related to the domain, e.g., vulnerabilities [
19] or, on the other hand, covering a broader range, e.g., entire cybersecurity [
20]. Such ontologies were in the past used to develop the ontology-based IDS. In Reference [
21], authors present the DAML+OIL attack ontology and corresponding detection system applied on the KDD 99 data. The ontology includes high-level domain concepts (e.g., Attack, Host, Component, etc.) and defines the basic relations between the protocols (Components) and attacks, organized in the taxonomy. Abdoli et al. [
22] use the ontology to capture the semantic relations between the attacks in distributed IDS and improve the detection using such knowledge. Authors studied the attack scenarios and their effects and identified the attack relationships, which were incorporated into the ontology. Ontology was evaluated on KDD 99 dataset, focusing on the DoS attack classes, recognizing 99.9% of DoS attacks in the testing data. In distributed IDS, ontologies were also used to support cooperative detection [
23]. In Reference [
24], ontology-based, situation-aware intrusion detection system is presented. The approach is based on the integration of various heterogeneous data sources used to create the semantically rich knowledge base. Reasoner processes the streams from the sources, extracts the knowledge and uses the ontology to infer the possibility of an attack. More recently, OWL-S attack ontology was used in IDS to detect the protocol-specific attacks [
25], which was evaluated on the KDD 99 dataset. In Reference [
26], authors describe the usage of the ontology in semantic intrusion detection system for malware detection.
The approach presented in this paper combines both studied areas—hybrid hierarchical classifiers with knowledge-based approaches. Hierarchical classification is based on the taxonomy of the attack classes and uses various classification models to handle the prediction on a specific level of the target attribute. To predict the particular class of attack, we used an ensemble model with weighted voting for particular classes to achieve more robust classification even in minority classes. The entire process is guided by the ontology, which is used to navigate through the attack taxonomy and retrieve the corresponding model to perform the prediction and domain-specific knowledge related to the severity of the attacks, which could complement the predictions.
3. Hierarchical Intrusion Detection
3.1. The Overall Architecture of the Proposed System
The main objective of the proposed approach is to combine the multi-layered hierarchical model approach of using various predictive models to detect the intrusions on different levels of the attack type taxonomy and combine it with the knowledge obtained from the domain-specific knowledge contained in the ontology. Scheme depicted in
Figure 1 outlines how the knowledge model is used in the prediction.
Hierarchical classification can be divided into two separate phases.
Normal/Attack separation—the first phase is a binary classification task. The classifier used in this phase is used to distinguish normal traffic and attacks. If a connection is labelled as a normal one, then an alarm is not raised. Otherwise, the suspicious connection is processed by a set of models to determine the class of attack during the phase 2.
Attack class and type prediction—this phase is guided by the taxonomy of the attacks from the knowledge model. The system hierarchically processes the taxonomy and selects the appropriate model to classify the instance on a particular level of a class hierarchy.
When a class of attack is predicted, ontology is queried for all relevant sub-types of the attack type and to retrieve the suitable model to predict the particular sub-type. Knowledge model can also be used to extract specific domain-related information as a new attribute, which could be used either to improve the classifier’s performance or to provide context, domain-specific information which could complement the predictive model.
More in-detail description of the predictive models and their evaluation will be provided in the following sub-sections.
3.2. Network Intrusion Knowledge Model
The central part of the knowledge model is the attack taxonomy.
Figure 2 illustrates the overall taxonomy of the network attack included KDD 99 dataset, which is one of the (still) most frequently used data collections in the selected domain [
27,
28]. Attacks are divided into four groups (DOS, R2L, U2R, Probe), each of them including specific attack types.
Besides the taxonomy, the knowledge model also covers the domain-specific knowledge from the network intrusion field. Basic concepts and relations were extracted from the already existing security domain models and standards [
29]. As the objective of the knowledge model was to use it in the data analytical tasks, the concepts and properties had to be able to be mapped to the data used in the process. Moreover, concepts related to the classification models had to be included, to create the relation between the particular classifier and its usability on the specific level of target attribute hierarchy.
Figure 3 visualizes the top-level concepts and relations between them.
Connection class represents particular connections, whether normal ones or attacks. The class forms a class hierarchy when a sub-class Attacks represents the attack. Attack sub-classes (TypeOfAttack) represent the classes of the attacks (e.g., DoS, r2l, etc.); concrete attacks types are modelled as a sub-class of the ConcreteAttack type classes (e.g., back, land, etc.).
Effect class covers all possible effects that an attack affects (e.g., slowing down of the server response, gaining root access for the user, service outage, etc.)
Mechanism class and its sub-classes describe all possible mechanisms of particular attacks (e.g., poor environment maintenance, incorrect configuration of the components, etc.)
Flag characterizes the normal or error states of the specific connections (e.g., service not responding, denied the connection, etc.)
Protocol represents the protocols used in the connection (e.g., TCP, UDP, etc.)
Service concept describes service types related to the connection (e.g.,
http, telnet, etc.)
Severity describes how severe the possible attack type effects could be (low, medium and high).
Targets define the possible targets of the particular attack type (e.g., user, network, data, etc.).
Models concept covers the classification models used to predict the given target attribute
Ontology was implemented in OWL (Web Ontology Language). Particular ontology instances represent the particular connections (e.g., connection records from the data). Trained and serialized classification models are instantiated as the instances of the Model class. The models could be accessed using their URI property, which contains the URI of the stored serialized model in the system.
3.3. Machine learning Models for Detection of the Network Attacks on KDD 99 Dataset
To evaluate the proposed approach, we used the KDD Cup 1999 competition dataset 1999 [
30], a commonly used network intrusion detection data collection. Dataset consists of the device logs in a LAN network collected over nine weeks. We used the 10% sample subset of the dataset, which consisted of 494,021 records. Each record represents a connection and is labelled as a normal one, or as an attack (exactly one attack type assigned). The target attribute (attack type) can be organized in a concept hierarchy, which is specified in the knowledge model (see
Figure 2). There are 24 different attack types present in the dataset, and each attack type belongs to one of the four attack classes. The target attribute is heavily imbalanced,
Table 1. summarizes the number of records of attack types and attack classes.
Each connection is described using a set of features: basic features of the connection, content features and traffic features (overall 32 features). The first group describes the protocol type, duration of the connection, service on the destination of the attack and other features relevant to the standard TCP connection description. Content features are attributes that could be linked to the domain-specific knowledge. Traffic features describe the two-second time window attributes, e.g., a number of connections to the same host in such window. During the pre-processing, we mostly focused on feature selection. According to the work of Zhou and Cheng in [
16], we selected only the most relevant attributes: service, src_bytes, dst_bytes, logged_in, num_file_creations, srv_count, serror_rate, rerror_rate, srv_diff_host_rate, dst_host_count, dst_host_diff_srv_rate, dst_host_srv_diff_host_rate.
To illustrate the proposed approach on the KDD 99 network intrusion dataset,
Figure 4 depicts the structure of the prediction models used on different levels of the target attribute class hierarchy. During the first phase, an
attack detection model is used for prediction on the top-level of the class hierarchy, e.g., to distinguish between the attack connections from the normal ones. The classifier was trained on all instances of the dataset and target attribute was transformed to a binary one reflecting the class hierarchy. The main goal of the classifier is to reliably predict the normal connections and separate them from the attack ones.
If the model detects an attack connection, the ensemble model is used to predict one of the four types of the attack. In this case, we used an ensemble of classifiers with a weighted voting scheme, trained on all attack instances in the dataset. The main reason to use such an approach was the effect of class imbalance on this level. Standard machine learning models were able to gain good accuracy, achieved mostly by good performance on the dominant class (in this case DoS attacks). However, the models struggled to predict minor classes such as U2R.
Some standard classifiers performed very well when predicting the specific classes but failed (or produced significant errors) on the other ones. For example, when training a decision tree model on the attack connections, the model performed very well when classifying the DoS and R2L classes, on the other hand, missed an almost significant amount of the Probing attacks and was not able to detect the U2R attacks at all. That leads us to the idea of complementing such classifier with a set of other models, able to reliably predict the other classes. To perform a final prediction, weighted voting could be implemented, based on the classification performance in prediction of a particular class. The weighting scheme is depicted in
Table 2.
After the prediction of the attack class, particular models were employed to predict the concrete type of the specific attack class. Four different models were trained, each for prediction of the particular attack types for each attack class. The models were trained using only instances of particular attack class records from the dataset. This proved to be difficult on minority class (U2R), as the dataset contains very few records of that type.
Then, when an attack class is predicted, a set of models are dedicated to predicting the concrete attack type (e.g., DoS model to predict the concrete type of DoS attack, when such attack is detected by the IDS). There are four different models trained, each dedicated to prediction of each attack class’ types.
All models were trained in the Python environment using standard packages such as scikit-learn in Python. Predictive models are then serialized using native or supporting tools (e.g., Pickle in Python). Serialized models URI (Uniform Resource Identifier) was then passed to the knowledge model.
3.4. Use of Knowledge Model in Multi-Stage Intrusion Detection
Implemented semantic model is used in the hierarchical classification mostly to navigate through the target class taxonomy and to select the appropriate model to perform a prediction on the chosen level. The detection system is implemented in Python and communicates with the ontology using RDFlib package, which enables to specify the SPARQL queries and retrieve the results as a Python object. When predicting the unknown connection, the system checks the ontology taxonomy and starts from the top level. Using SPARQL query, it checks the top level of the class hierarchy and retrieves the corresponding model for given prediction type using the hasTargetAttribute property (e.g., attack detection). As the classification models are serialized and stored, they can be identified and retrieved using their URIs. Once a classifier is retrieved, it is deserialized in the system and used to predict the given unknown example. Once the prediction is performed, the system checks the prediction and queries the ontology to see if there is a classifier able to process the record further (e.g., to predict the type of the attack) and retrieves the corresponding classifier.
As the prediction of concrete sub-type of the attacks could be in some cases tricky (minority classes), we considered a retrieval of the information from the ontology, which could be beneficial without predicting a correct attack type. Moreover, in the network intrusion domain, new attacks (attacks, that were not present in the training data) can appear when using the models. In such a case, the system is able to predict the attack class and use domain-specific information from the knowledge model to enhance the prediction with domain-specific knowledge, e.g., severity of the detected attack. Besides using the class hierarchy, we experimented by using the domain-specific knowledge from the ontology to improve the detection system. We decided we can leverage the expert information about the possible effects of the given attacks and their severity. If the model is not reliable enough to predict the concrete attack sub-type, the system can be used to provide at least information on how severe the attack would be. Using the Effect class and hasSeverity property, we were able to retrieve the severity values of the particular attacks and enhance the dataset with this newly acquired attribute. We experimented then with the training of the classifiers able to predict the severity of the attacks, which could serve as a supporting source of information, complementing the attack type prediction or providing the severity information.
5. Discussion and Future Work
The presented approach described the IDS based on a combination of hierarchical ensemble model and knowledge model in the form of an ontology. Both hierarchical and ensemble approaches are relatively popular models used to tackle with the network intrusion detection; the IDS presented in this paper combines both approaches. Hierarchical aspect is considered when the system performs the prediction on a different level of the target attribute generalization. An ensemble model is then employed to detect the attack type to maintain the performance of prediction in major and also minor classes. The ontology then serves to automatically navigate through the target attribute and to select the proper model on the specific level of target generalization. The results proved that the achieved results are comparable to the current state-of-the-art approaches when it comes to performance evaluation using standard metrics. Integration with the knowledge model could be beneficial, as it enables to automate the navigation through the target attribute taxonomy and selection of the appropriate predictive model.
On the other hand, the complexity of the hierarchical and ensemble approach is higher when it comes to training. As each of the models are trained separately, the requirements on training resources and training time are significantly higher than in other compared models. This limitation could be significant when such an approach is expected to be deployed in a real-world environment on dynamic, ever-changing data streams. From this perspective, a very interesting aspect in case of network intrusion detection is the ability to detect the new attacks that are not present in the training set. In real-world scenarios, this corresponds to a specific type of the concept drift (when a new class value appears). To remove this limitation of the standard approaches, either concept drift detection methods or periodic re-training of the models should be employed. Especially in case of ensemble models, early detection of such phenomena is crucial to update the predictive models as soon as possible, so as to not miss any newly appearing attacks. In case of complex ensemble models, the re-training of the partial models could be non-trivial. Moreover, in case of domain knowledge, model is involved (as in the presented IDS model), the update of the model structure is necessary (e.g., addition of the new attack type into the taxonomy).
What could be an also interesting area of research in this field is the deeper integration of the domain-specific knowledge with prediction. The main issue of standard predictive models used in intrusion detection is that the predictions depend on the context of particular events and commonly, such context considers only previous events and their properties. In real-world applications (such as IDS), this context should be expanded with the other, domain knowledge. Predictions then should be the result of information describing the particular event, information about the previous events and information obtained from the domain knowledge model. In this case, the expanded context could be represented by the new, derived features or by the specific expert rules. Such domain expert rules could be used to cover the detection of very rare events (e.g., rare attacks), which could be difficult to extract from the historical data due to a significantly low number of examples. To support such context representation, the knowledge model has to be expanded with more domain concepts and properties, expanding the range of knowledge covered by the ontology.
6. Conclusions
In this paper, we proposed an original symmetrical combination of the knowledge-based approach and machine learning techniques to build a hierarchical intrusion detection system able to support detection of existing types and severity of new types of network attacks. Implemented knowledge model is used in the hierarchical classification mostly to navigate through the target class taxonomy and to select the appropriate model to perform a prediction on the selected level. In case of rare attack types, where the number of available training data is too low, severity prediction model is applied, and the network operator is provided, with information available in the knowledge model about the most probable class of attacks.
The performance of the proposed knowledge-based hierarchical IDS is 0,998 in terms of precision as well as recall and 0,001 in terms of FAR, which outperforms other state-of-the-art approaches presented in
Section 2. Moreover, the proposed system is also able to partially cover emerging types of attacks in terms of their predicted severity and additional information stored in the knowledge model. As a result, our knowledge-based approach leads to efficient and information-rich decision support for network operators. In the future work, we plan to extend our approach to make it more dynamic in terms of learning new types of attacks and deciding on the right moments when the available prediction models need to be retrained.