RDTIDS: Rules and Decision Tree-Based Intrusion Detection System for Internet-of-Things Networks

: This paper proposes a novel intrusion detection system (IDS), named RDTIDS, for Internet-of-Things (IoT) networks. The RDTIDS combines different classiﬁer approaches which are based on decision tree and rules-based concepts, namely, REP Tree, JRip algorithm and Forest PA. Speciﬁcally, the ﬁrst and second method take as inputs features of the data set, and classify the network trafﬁc as Attack/Benign. The third classiﬁer uses features of the initial data set in addition to the outputs of the ﬁrst and the second classiﬁer as inputs. The experimental results obtained by analyzing the proposed IDS using the CICIDS2017 dataset and BoT-IoT dataset, attest their superiority in terms of accuracy, detection rate, false alarm rate and time overhead as compared to state of the art existing schemes.


Introduction
In recent years cyberattacks, especially those targeting systems that keep or process sensitive information, are becoming more sophisticated. Critical National Infrastructures are the main targets of cyber attacks since essential information or services depend on their systems and their protection becomes a significant issue that is concerning both organizations and nations [1][2][3]. Attacks to such critical systems include penetrations to their network and installation of malicious tools or programs that can reveal sensitive data or alter the behavior of specific physical equipment. In order to tackle this growing trend, academics and industry professionals are joining forces in an attempt to develop novel systems and mechanisms that can defend their systems. Along with other preventive security mechanisms, such as access control and authentication, intrusion detection systems (IDSs) are deployed as a second line of defense. IDSs based on some specific rules or patterns of the normal behavior of the system can distinguish between normal and malicious actions [4,5].
Many different taxonomies for IDSs have been proposed until now. Based on the classification model they use, IDSs can be classified as rule-based, misuse detection and mixed systems. IDSs can also be classified as real-time if they use continuous monitoring of the system or periodic or offline if the detection happens in specific time instances or even offline using data that are collected and stored during a certain period of time. Moreover, when talking about Industrial Control Systems (ICSs) that have specific requirements and characteristics novel taxonomies were recently proposed. Authors in [6] proposed a classification of IDSs ICS that divides them into three new categories: protocol analysis-based, traffic mining-based, and control process analysis-based.
Countermeasures are taken according to the information obtained regarding the detected attacks from the detection systems. The better classification of the type of attack is provided, the more efficient will the chosen countermeasures be and the less will they affect the proper operation of the system or network. Moreover, if we do not detect the exact type of attack the countermeasures can have more serious consequences than the attack itself in some cases. For this reason, our goal is to create an intrusion detection model that correctly classifies each type of attack. In addition, our model must provide a low false alarm rate and a high detection rate both for frequent and infrequent attacks while on the same require low computing in order to perform classification. The latter characteristic is very important when IDSs are deployed in industrial control systems that operate critical infrastructures where correct and fast notification about cyber attacks is crucial [7]. This paper extends our work in [8] in many ways. First, we integrate the proposed system for the internet of things networks. We have present and critically analyze new related works that were published recently. We have redesigned the proposed system, we have presented the integration of RDTIDs into a three-tier fog computing architecture for IoT networks and have evaluated its performance against a Bot-IoT dataset.
The remainder of the paper is organized as follows. Section 2 discusses related work and places the research within that of the wider community. Section 3 introduces the key concepts and the overall architecture of the proposed system. Section 4 presents the experimentation setup, gives the simulation parameters and describes the evaluation of the method. Section 5 concludes the paper. Table 1 lists representative related works on hybrid intrusion detection systems, including machine learning and data mining methods. It also presents the security issue that each one of these methods tries to address, along with the dataset that was used in order to evaluate their performance.

Relevant Work
Aydin et al. [9] proposed a hybrid IDS by combining two approaches, namely, (1) packet header anomaly detection (PHAD), and (2) network traffic anomaly detection (NETAD) with the signature-based IDS Snort. Both PHAD and NETAD methods are anomaly-based IDSs. The aydın et al.'s system is tested on IDEVAL data, which shows that the number of attacks detected increases significantly using the proposed hybrid IDS as compared to signature-based systems. Wang et al. [10] proposed an intrusion detection approach, named FC-ANN, based on artificial neural networks (ANN) and fuzzy clustering. The FC-ANN approach uses three main modules, i.e., fuzzy clustering module, ANN module, and fuzzy aggregation module. The fuzzy clustering module is used to partition a given set of data into clusters. The ANN module is used to learn the pattern of every subset. The fuzzy aggregation module is used to aggregate different ANN's results and reduce any detection errors. The FC-ANN approach was tested on the KDD CUP 1999 dataset and was proven to be efficient against low-frequent attacks, i.e., R2L and U2R attacks.
Govindarajan and Chandrasekaran [11] proposed a neural-based hybrid IDS architecture using two methods, namely, 1) multilayer perceptron neural network (MLP), and 2) radial basis function neural network (RBF). The procedures of hybrid modeling using bagging classifiers are employed in order to increase robustness, accuracy, and better overall generalization. Moreover, the UNM Send-Mail Data is used in this study, which is based on an immune system developed at the University of New Mexico. The performance of the proposed IDS in terms of accuracy was 98.88% and 94.31% for normal as well as abnormal traffic respectively, which is better compared to the performance of the classifiers that compose it. Chung and Wahid [12] approach the problem of decision rules generation by employing intelligent dynamic swarm-based rough set (IDS-RS) for feature selection and simplified swarm optimization with weighted local search (SSO-WLS) strategy for data classification. The study provides a full system solution for improving the searching process in SSO rule mining by weighing three predetermined constants. The experimental results on the KDD CUP 1999 dataset show that the proposed hybrid network intrusion detection system using an intelligent dynamic swarm-based rough set shows a good overall performance with 93.3% accuracy in an average of 20 runs.
Elbasiony et al. [13] presented a combination of misuse and anomaly detection into a hybrid framework, which is based on two methods, namely, (1) random forests algorithm, and (2) K-means clustering algorithm. Specifically, this framework employs the random forests algorithm in misuse intrusion detection as well as the k-means clustering algorithm in anomaly detection. Due to correlated variables in random forests, this framework has low model interpretability and performance loss. Kim et al. [14] integrate a misuse detection model and an anomaly detection model in a decomposition structure. This study uses the C4.5 decision tree algorithm and multiple one-class SVM models. The experimental results on the NSL-KDD dataset show that the proposed hybrid intrusion detection method is better than the conventional methods in terms of detection performance, training time, and testing time.
Lin et al. [15] proposed a feature representation approach, named CANN, which combines cluster centers and nearest neighbors. The CANN approach uses three steps, namely, (1) Extraction of cluster centers and nearest neighbors, (2) Measurement and summing of the distance between all data, and (3) Classifier training and testing, which is based on the k-NN algorithm. The experimental results on the KDD CUP 1999 dataset show that CANN approach performs better than the k-NN and SVM classifiers.
Recently, a number of researchers have proposed the combination of classifiers in order to improve the overall performance under the NSL-KDD dataset [17][18][19][20]24]. In [17], Kevric et al. proposed a combined classifier approach using tree algorithms, namely, random tree, C4.5 decision tree algorithm, and NBTree. In [18], Al-Yaseen et al. proposed a hybrid IDS that uses support vector machine, extreme learning machine, and K-means clustering algorithm. The experimental results on the KDD CUP 1999 dataset show that the proposed hybrid network intrusion detection system can improve the overall performance and achieve an overall 95.75% accuracy.
In order to evaluate the performance of an IDS system, both KDD'99 and NSL-KDD datasets are commonly used. Ahmim et al. [19] proposed an IDS system, named HCPTC-IDS, which is based on combining probability predictions of a tree of classifiers. The HCPTC-IDS system is composed of two layers, namely, 1) the first layer, which is a tree of classifiers, and 2) the second layer, which is a final classifier that combines the different probability predictions of the first layer. The experiments on KDD'99 and NSL-KDD show that the HCPTC-IDS system is more precise than other recently proposed intrusion detection systems having accuracy equal to 96.27% for KDD'99 and 89.75% for NSL-KDD. The authors in [16] presented an anomaly detection technique based on support vector machine (SVM) and genetic algorithm (GA), in order to improve the performance of classification for SVMs. The experimental results on the KDD CUP 1999 dataset show an outstanding true-positive value of 0.973 that comes with a 0.017 false-positive value.
The blockchain technology [25] is combined with machine learning and data mining methods for cyber security intrusion detection. Derhab et al. [22] proposed an intrusion detection system, named RSL-KNN, against forged command attacks, which is based on the random subspace learning approach and K-Nearest Neighbor (KNN) classifier. The RSL-KNN system is combined with the blockchain technology for detecting in a short time any tampering with the OpenFlow rule. Ferrag et al. [21] proposed a novel deep learning and blockchain-based energy framework, called DeepCoin, smart grids. The DeepCoin framework uses two schemes, namely, a deep learning-based scheme and a blockchain-based scheme. The deep learning-based scheme is used for detecting network attacks based on recurrent neural networks, while the blockchain-based scheme is used for detecting fraudulent transactions. Similarly to DeepCoin framework, Ferrag and Leandros [23] proposed an intrusion detection system combined with the blockchain-based delivery framework, called DeliveryCoin, for drone-delivered services. Both DeepCoin and DeliveryCoin frameworks demonstrated the efficiency of the deep learning techniques in cyber security intrusion detection. For more detail about the deep learning techniques in cyber security intrusion detection, we refer the reader to our recent studies in [26,27].
Most of the relevant works use the KDD and NSL-KDD datasets, which are outdated and of very limited practical value for a modern IDS. Both benign and malicious network traffic has changed significantly since 1999 when these datasets were produced and the results obtained using them are of a limited value most of the times.
In order to overcome several shortcomings of previous proposed methods, like low detection of rare attack, mis-classification of attacks and time overhead, we propose a novel Hybrid IDS, which combines different classifier models, namely, REP Tree, JRip algorithm and Random Forest. Besides, we use the CICIDS2017 dataset [24], which we split in training and testing datasets, in order to evaluate their performance in detecting network intrusions and we compare it with other machine learning methods proposed by previous researchers, including, WISARD [28], ForestPA [29], J48 Consolidated [30], LIBSVM [31], FURIA [32], RandomForest, REPTree, MLP, NaiveBayes, JRip and J48.

The Proposed Model
According to [33], data imbalance is one of the causes that degrades the performance of machine learning algorithms in classifications. This is due to two major causes. First, the simple accuracy as an objective function used in most classification tasks is inadequate when the classifier faces data imbalance and the second comes from the distribution of the classes where the majority class is likely to invade the territory of the minority class. In general, for the same data set, if the number of classes increases, the sub-data set of each class decreases, leading to the decrease of the generalization capability and the increase of classification errors and vice versa. The mis-classification is usually due to the classification of attacks as normal behavior, or as another attack in the same or different category.
In order to minimize these classification errors and to increase the performance of intrusion detection mechanism, we propose the hierarchical intrusion detection model illustrated in Figure 1. The main feature of the RTDITS is that uses a binary and multi-class classifier in parallel, that feed a third classifier, thus minimizing the effect of data imbalance to the final outcome. Moreover, ensemble systems of classifiers are widely used for intrusion detection in networks (maglaras2016combining). These aim to include mutually complementary individual classifiers, which are characterized by high diversity in terms of classifier structure [34], internal parameters or classifier inputs. As stated in [35], ensemble and meta-ensemble methods show a number of advantages with regard to using a single model, i.e., they reduce the variance of the error, the bias, and the dependence on a single dataset and work well in the case of unbalanced classes.
Our hierarchical model aims to correctly classify each attack, providing a low false alarm rate and a high detection rate. It is composed of three classifiers. The first one uses the different features of the data set as inputs, in order to classify each row as benign or malicious. The second one uses different features of the data set as inputs for classifying each row as benign or one of the different categories of attacks. The third classifier uses all the features of the initial data set in addition to the outputs of the first and the second classifier as inputs, in order to classify each row of the data set as benign or a specific type of attack. For the integration of RDTIDS system for IoT networks, we propose a three-tier fog computing architecture, in which RDTIDS system is located in the fog computing layer, as presented in Figure 2. The classifiers that are used are REP Tree [36] Jrip [37] and Forest PA [29]. Based on the values of the features, the classifier makes rules which are used in order to correctly classify the data set.

Operation Mode
The operation mode of RDTIDS system is composed of two steps: training and testing. During the first step, we train the three classifiers that compose our hierarchical model. We start by training the first and the second one and using the results from these two we train the third classifier. Initially, the training Data set is labelled as benign and specific type of Attack. To train the first classifier, we modify the training data set, by labelling each row as "Attack" and "Benign". Then, we normalize the different features of the data set. After that, we perform the training of this classifier and as result of this sub step, we get model 1. To train the second classifier, we modify the training data set, by labelling the rows with specific attacks. Then, we normalize the different features of the data set. After that, we perform the training of this classifier and as a result of this sub step, we get model 2.
In order to train the third classifier, we modify the training data set, where we add two columns, the first one represents the classification results of model 1 for the rows of the training data set and the second one represents the classification results of model 2 for the rows of the training data set. Then, we normalize the different features of the data set. After that, we perform the training of this classifier and as result of this sub step, we get model 3.
As illustrated in Figure 1, in order to test our hierarchical model, we process each row of the test data set by model 1 and model 2, then we add the outputs of model 1 and model 2 to the features of each row, and finally, we process the result rows by model 3 that classify it as benign or a specific type of attack.
The whole procedure of transforming the dataset into information that can be read and interpreted from our classifiers is call pre-processing and consists of two steps: the first involves mapping symbolic-valued attributes to numeric-valued attributes and the second is implemented scaling. The main advantage of scaling is to avoid attributes in greater numeric ranges dominating those in smaller numeric ranges. Another advantage is to avoid numerical difficulties during the calculations.

Experimentation
In this section, we present in detail the Data Set used along with the Data pre-processing procedure. We also give the performance metrics used in our experiments. Moreover, we present the structure of our model. Finally, we provide a comparative study between our model and that of different classifiers. The experiments are done on a Windows 10, 64 bits PC with 8 GB RAM and CPU Intel(R) I5 2.7 GHz. Weka Data Mining Tools and MySQL RDBMS are used to implement our model.

Data Set and Data Pre-Processing
We use two new real traffic datasets, namely the CICIDS 2017 dataset [24] and the Bot-IoT dataset [38] for the experiments. Tables 2 and 3 summarizes the statistics of attacks in Training and Test in both datasets. Both datasets satisfy the eleven indispensable characteristics of a valid IDS dataset, namely Anonymity, Attack Diversity, Complete Capture, Complete Interaction, Complete Network Configuration, Available Protocols, Complete Traffic, Feature Set, Metadata, Heterogeneity, and Labelling [39].
The BoT-IoT dataset contains more than 72,000,000 records devised on 74 files, each row having 46 features. We use the version proposed by Koroniotis et al. [40], which is a version of training and testing with 5% of the entire dataset.
The CSV version of CICIDS 2017 contains 2,830,743 rows divided in 8 files, each row having 79 features. Each row of CICIDS 2017 is labelled as Benign or one of fourteen types of attack.
In order to create a training and test subset, we concatenate the 8 files in one file containing a unique table that contains all benign and attacks rows. Then, we remove all rows that have the feature "Flow Packets/s" equal to 'Infinity' or 'NaN'. After that, we remove identical rows, namely Bwd PSH Flags,Bwd URG Flags, Fwd Avg Bytes/Bulk, Fwd Avg Packets/Bulk, Fwd Avg Bulk/Rate, Bwd Avg Bytes/Bulk, Bwd Avg Packets/Bulk, Bwd Avg Bulk/Rate and Fwd Avg Bytes/Bulk.
After the elimination of these features, we extract the training and test subsets based on the distribution described in Table 2. In each subset we tried to include rows that contain all the attacks but the same row cannot appear in both subsets. For the training subset, we select the first rows of each type. Then, for the test subset, we select randomly some rows after the suppression of the training subset rows.  Finally, each value x i of the feature j is normalized based on the following equation:

Performance Metrics
IDS performance is evaluated based on its capability of classifying network traffic into a correct type. Table 4, also known as confusion matrix, shows all the possible cases of classification.
To evaluate RDTIDS system, we used two groups of metrics. The first group includes specific metrics: the detection rate (DR) of each type of attack and the true negative rate. The second one includes global metrics: global detection rate, false alarm rate (FAR) and accuracy. The following equations summarize how to calculate these metrics.

Practical Structure of RDTIDS System
The RDTIDS system demands a pre-processing of data, different labelling of rows according to the model that is used and creation of artificial features that are outputs of two of the classifiers used. For training the first classifier we need to label the training data set based on the attacks classification provided in Table 2 as "Attack" or "Benign". For the second classifier, we label each attack as follows: DDoS, DoS slowloris, DoS Slowhttptest, DoS Hulk, DoS GoldenEye, Heartbleed as "DoS"; FTP-Patator, SSH-Patator as "Brute-Force"; Web Attack-Brute Force, Web Attack-XSS, Web Attack-SQL Injection as "Web Attack". Finally, in order to train the third classifier, we use the labeling used for the second classifier. We also add the outputs of the trained classifier 1 and classifier 2 as new features.
The choice of the three classifiers that compose our hierarchical model is the most important and critical step. To choose the best composition, that gives optimal performance, we tested several combinations with different classifiers. The results for all those combinations are quite lengthy and are not represented in this article. Following this demanding procedure, we opt for the following configuration: classifier 1 is REP Tree [36]; classifier 2 is Jrip [37]; classifier 3 is Forest PA [29].

Comparative Study
To evaluate RDTIDS system, we compare it with some well known classifiers and some recent ones namely J48, Jrip, Naive Bayes, MLP, REP Tree, Random Forest, FURIA [32], LIBSVM [31], J48 Consolidated [30], Forest PA [29], WISARD [28]. In this comparative study we use the different metrics detailed in Section 4.2, in addition to training and testing time.

Conclusions
In this paper, we proposed a hierarchical intrusion detection system based on the combination of three different classifiers, namely REP Tree, JRip algorithm and Forest PA. The proposed model consists of three classifiers, where two of them operate in parallel and feed the third one. The evaluation using a real traffic data set 'CICIDS2017' and 'BoT-IoT' showed that our hierarchical model outperformed different well known and recent machine learning models, giving the highest TNR and highest DR for seven types of attacks. Overall, RDTIDS gives the highest DR with 94.475% and 95.175%, the highest accuracy with 96.665% and 96.995%, and the lowest FAR with 1.145% and 1.120%.