Next Article in Journal
Three-Dimensional Localization Algorithm Based on Improved A* and DV-Hop Algorithms in Wireless Sensor Network
Next Article in Special Issue
IoT-Based Bee Swarm Activity Acoustic Classification Using Deep Neural Networks
Previous Article in Journal
In-Situ Estimation of Soil Water Retention Curve in Silt Loam and Loamy Sand Soils at Different Soil Depths
Previous Article in Special Issue
Efficient Resource-Aware Convolutional Neural Architecture Search for Edge Computing with Pareto-Bayesian Optimization
 
 
Order Article Reprints
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Experimental Analysis of Attack Classification Using Machine Learning in IoT Networks

1
School of Computing, Edinburgh Napier University, Edinburgh EH10 5DT, UK
2
School of Electronics, Electrical Engineering and Computer Science, Queen’s University, Belfast BT9 5BN, UK
3
Department of Computer Science, Namal Institute, Mianwali 42250, Pakistan
4
College of Information Engineering, Yangzhou University, Yangzhou 225127, China
5
Department of Computer Science, King Fahad Naval Academy, Al Jubail 35512, Saudi Arabia
6
School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
*
Author to whom correspondence should be addressed.
Sensors 2021, 21(2), 446; https://doi.org/10.3390/s21020446
Received: 12 December 2020 / Revised: 6 January 2021 / Accepted: 7 January 2021 / Published: 10 January 2021
(This article belongs to the Special Issue AI for IoT)

Abstract

:
In recent years, there has been a massive increase in the amount of Internet of Things (IoT) devices as well as the data generated by such devices. The participating devices in IoT networks can be problematic due to their resource-constrained nature, and integrating security on these devices is often overlooked. This has resulted in attackers having an increased incentive to target IoT devices. As the number of attacks possible on a network increases, it becomes more difficult for traditional intrusion detection systems (IDS) to cope with these attacks efficiently. In this paper, we highlight several machine learning (ML) methods such as k-nearest neighbour (KNN), support vector machine (SVM), decision tree (DT), naive Bayes (NB), random forest (RF), artificial neural network (ANN), and logistic regression (LR) that can be used in IDS. In this work, ML algorithms are compared for both binary and multi-class classification on Bot-IoT dataset. Based on several parameters such as accuracy, precision, recall, F1 score, and log loss, we experimentally compared the aforementioned ML algorithms. In the case of HTTP distributed denial-of-service (DDoS) attack, the accuracy of RF is 99%. Furthermore, other simulation results-based precision, recall, F1 score, and log loss metric reveal that RF outperforms on all types of attacks in binary classification. However, in multi-class classification, KNN outperforms other ML algorithms with an accuracy of 99%, which is 4% higher than RF.

1. Introduction

The Internet of Things (IoT) offers a vision where devices with the help of sensors can understand the context and through networking functions can connect with each other [1]. The devices in the IoT network can be employed for collecting information based on the use cases. These include retail, healthcare, and manufacturing industries that use IoT devices for tasks such as tracking purchased items, remote patient monitoring, and fully autonomous warehouses. It is reported that the amount of IoT devices has been growing every year with the predicted amount of devices by 2025 reaching 75.44 billion [2]. Such a massive surge of IoT devices ultimately results in more attackers to target IoT networks. Reports state that most of the attack traffic generated on IoT networks is automated through various means such as scripts and malware [3]. The increase in attacks combined with the autonomous nature of the attacks is a problem for IoT networks as the devices are mostly used in a fire and forget fashion for years without any human interaction. This combined with the limitations of IoT devices including limited processing power and bandwidth means that providing adequate security can be difficult, which can result in network layer attacks such as denial of service (DoS). Therefore, it is important to research ways to identify this kind of traffic on networks which can be used in intrusion detection and prevention systems.
Machine learning (ML) methods can be exploited to detect malicious traffic in intrusion detection and prevention systems. ML is a subset of artificial intelligence (AI) that involves using algorithms to learn from data and make predictions based on the data provided [4]. ML has many applications including in retail, healthcare, and finance where AI algorithms may be applied for predicting customer spending habits, predicting medical problems in patients, and detecting bank fraud, respectively [5].
Due to the large yearly increases in cyberattacks that are being seen on a yearly basis, ML methods are being incorporated to help tackle the increasing threats of cyberattacks. ML has several uses within the field of cybersecurity, such as network threat analysis, which can be defined as the act of analyzing threats to the network [6]. ML can be beneficial in this task as it is able to monitor incoming and outgoing traffic to identify potentially suspicious traffic [7]. This area of research is known as intrusion detection and is a widely known research area. ML can be applied to intrusion detection systems (IDS) to help improve the systems ability to run autonomously and increase the accuracy of the system when raising the alarm on a suspected attack [8]. To this end, our primary role is to identify the best ML methods for detecting attacks on IoT networks, using a state-of-the-art dataset by utilizing both binary and multi-class classification testing.
The main contributions of this paper can be summarized as follows:
  • We conduct an in-depth and comprehensive survey on the role of various ML methods and attack detection specifically in regards to IoT networks.
  • We evaluate and compare the state-of-the-art ML algorithms in terms of various performance metrics such as confusion matrix, accuracy, precision, recall, F1 score, log loss, ROC AUC, and Cohen’s kappa coefficient (CKC).
  • We evaluate the results comparing binary class testing as well as examining the results of the multi-class testing.
The rest of the paper is organized as follows: Table 1 lists all the abbreviations used in the paper. Section 2 is devoted to a literature review involving investigating IoT intrusion detection techniques as well as ML methods and how they are being used to aid intrusion detection efforts specifically in regards to IoT networks. Details of various attacks that can occur in IoT networks are also showcased with an explanation of how the various ML methods and performance metrics work. Section 3 explains the performance evaluation, which also includes an in-depth examination of the data used in the datasets. The models are compared against each other for both binary and multi-class classification with an overall best model being selected. Finally, Section 4 draws a conclusion.

2. Background and Related Work

This section presents the background and examines current literature that would clear up the picture for the reader about the design of the experiments conducted in this paper. Firstly, we discuss IDS including the use of ML used in attack detection and the related work which would help with selecting the algorithms to be used as well as identifying any datasets that could be utilized for testing the models. Each algorithm is explored with further research into the suitability of the algorithm for use in an IDS. The IoT is also described including the attacks that are used in the dataset that has been selected.

2.1. Intrusion Detection System

An IDS is a tool that allows a network to be monitored for potentially harmful traffic. An IDS can be implemented using two distinct types: signature-based detection and anomaly-based detection. A signature-based IDS uses a database of existing attack signatures and compares the incoming traffic with the database, meaning that an attack can be detected only if the signature is already available in the database. An anomaly-based IDS monitors network traffic and attempts to identify any traffic that is abnormal in regards to the normal network traffic.
The signature-based detection approach has a major flaw as a signature-based IDS will always be susceptible to a zero-day attack or an attacker that modifies the attack to hide from the signature database. Anomaly-based IDS are much better suited to use ML as the IDS can be trained to detect the difference between normal traffic and attack traffic. However, integrating ML with IDS is not a silver bullet and may result in some problems. Research conducted by Sommer and Paxson [9] identified several problems where one important problem is that models can produce false positives, which can render the IDS unusable due to normal data causing the IDS to alert the system. Even though the research is very outdated, this is still a major problem when using ML with IDS. As a result of this, it is of paramount importance to identify models that produce the lowest number of false positives.

2.2. IoT Intrusion Detection Using Machine Learning

ML is a subset of AI that involves giving an algorithm or in this case a model a dataset which will be used to identify patterns that can be used to make predictions with future data. There has been limited research devoted to IDS using ML on IoT networks. To this end, recently a study used the Defense Advanced Research Projects Agency (DARPA) ML datasets to test the models such as support vector machine (SVM), Naive Bayes (NB), random forest (RF), and multi-layer perceptron [10]. The results of this research were presented in terms of root mean squared error, mean absolute percentage error, receiver operating characteristic curve, and accuracy, yielding good results with RF being one of the top models. However, this research has two main limitations: Firstly, it used the DARPA datasets, which were over 20 years old at the time of writing. Secondly, it was not performed for multi-class testing using the datasets.
The research was also conducted using the Bot-IoT dataset that used the models k-nearest neighbour (KNN), quadratic discriminant analysis, iterative dichotomiser 3, RF, adaptive boosting, multi-layer perceptron, and NB [11]. The research did yield very good results in terms of accuracy, precision, recall, F1 score, and time. This study used an up-to-date dataset as well as a wide variety of ML models. However, this research did not include any multi-class testing for any of the models.
In regards to multi-class classification, the authors of [12] used several ML methods. This research compared the algorithms such as logistic regression (LR), decision tree (DT), RF, and artificial neural network (ANN) using a dataset created by the researchers which was not available for public use. It was concluded in the study that RF was the best model for multi-class classification. This research shows that with multi-class classification it is possible to achieve high results. Testing with additional algorithms could help bolster the results of the research.
Overall, there is currently a lack of research into intrusion detection within the area of IoT networks. This could be due to the lack of datasets as well as lack of real hardware with all datasets being comprised of simulated IoT devices on regular computers. There is also a lack of research into multi-class classification, which could be due to the lack of a dedicated multi-class dataset. With all available datasets being created with binary classification in mind, performing multi-class testing requires the datasets to be merged into one with proper labelling for each class.
Various ML models can be utilized to perform ML tasks, each with their own mathematical equations powering the analysis of the data presented. In the next subsections, we discuss various ML algorithms for our analysis such as: (i) KNN; (ii) SVM; (iii) DT; (iv) NB; (v) LR; and (vi) ANN.

2.2.1. K-Nearest Neighbor

KNN is a supervised learning model that is considered to be one of the simplest ML models available [13]. KNN is referred to as a lazy learner because there is no training done with KNN; instead, the training data are used when making predictions to classify the data [13]. KNN operates under the assumption that similar data points will group and finds the closest data points using the K value, which can be set to any number [14]. KNN is a suitable model to be used for intrusions detection as showcased with several pieces of research conducted. The authors of [15] examined the effectiveness of KNN at distinguishing between attack and normal data. The results of this research show that KNN was an effective model of detecting attack data and had a low false-positive rate. Moreover, recent research also examined the effectiveness of KNN [16] with a similar consensus being met. The research showed that KNN was an effective model beating SVM and DT.

2.2.2. Support Vector Machine

Support Vector Machine (SVM) is a supervised learning algorithm that uses a hyperplane to separate the training data to classify future predictions. The hyperplanes divide a dataset into two classes and they are decision boundaries that help classify the data points. A hyperplane can be represented as a line or a plane in a multi-dimensional space and is used to separate the data based on the class they belong to. It does this by finding the maximum margin space between the support vectors. SVM is a suitable model for intrusion detection as evident by the large amount of research conducted over the years. One older piece of research created an enhanced SVM model for intrusion detection [17]. The research was successful at creating the model but proved to be only a slight improvement over regular SVM, showing that the model even without enhancements or augmenting is capable of accurately classifying attack data. Other more recent research compared SVM and ANN’s ability to classify attack data [18]. As previously mentioned, SVM relies on placing a hyperplane to separate data which can be expressed as follows:
a x + b = 0
where a is the vector of the same dimensions as the input feature vector x and b is the bias. In this case, a x can be written as a 1 x 1 + a 2 x 2 + . . . + a n x n where n is the number of dimensions of the feature vector x. When making predictions, the following expression is used:
y = s i g n ( a x b )
where s i g n is a function that returns either + 1 or 1 depending if the input is a positive number or a negative number respectively. This value is used to determine the prediction of what class the feature vector belongs to. x i is the feature vector and i and y i is the label that can either be + 1 or 1 and can be written as the follows:
a x i b + 1 i f y i = + 1
a x i b 1 i f y i = 1
SVMs use kernels and kernel is basically a set of mathematical functions. The kernel is used to take data as an input and transform them into the required form of processing data. The kernels can be linear, nonlinear, polynomial, Gaussian kernel, Radial basis function (RBF), sigmoid, etc.

2.2.3. Decision Tree

DT is a supervised learning algorithm that is useful to present a visual representation of the model. A DT uses a hierarchical model that resembles a flow chart which has several connected nodes. These nodes represent tests on the attribute in the dataset with a branch that leads to either another node or a decision on the data being classified [19]. The training data are used to build the tree with the prediction data being run through the nodes until the data can be classified. DT is a suitable model for intrusion detection based on the research conducted. One fairly recent piece of research compared DT with several other models including NB and KNN [20]. The results show that DT was one of the better models along with NB when compared to ANN’s which dominate IDS research. Other research created an IDS for connected vehicles in smart cities [21]. This research showed that the model that used DT was the best model with high accuracy and a low false positive rate. As previously mentioned, DT creates a hierarchical model using the training data to create nodes that act as tests for making predictions. When making DT, the root node needs to be selected as well as selecting the nodes that make up the DT. In this regard, there are many ways to do this with entropy being used in this case. Entropy is used to measure the probability of a data point being incorrectly classified when randomly chosen and is expressed as follows:
E = i = 1 c p i l o g 2 ( p i )
where p i is the probability of the data being classified to a given class of i and c is the number of classes. The attribute with the lowest entropy would be used for the root node.

2.2.4. Random Forest

RF is a supervised learning algorithm that is seen to be an improvement on the DT model. The random aspect of the model comes from two key concepts. The first is that, when training the model, each tree is given a random assortment of the data which can result in some trees using the same data multiple times. The reason behind this is to lower the variance of the model, which lowers the difference in the predicted results scores [22]. The second concept involves only using small subset of the features when splitting the nodes in the trees [23]. This is done to prevent overfitting when the model uses the training data to inflate the predictions made by the model [13]. When making predictions with RF, the average of each of the trees predictions is used to determine the overall class of the data; this process is called bootstrap aggregating [13]. The reason RF is seen as an improvement on DT is that, instead of relying on one tree to make the classification, multiple trees with different training data and with a different selection of features are used for giving predictions. This allows for a fairer analysis of the data when making predictions. RF is proven to be a suitable model for intrusion detection. To this end, the authors of [24] compared RF to other frameworks used in intrusion detection. They found that the RF model outperformed the other frameworks with increased accuracy, precision, recall, and F1 score.

2.2.5. Naive Bayes

NB is a probabilistic algorithm that works by getting the probability of all the feature vectors and their outcome. The algorithm is used to determine the probability of an event occurring based on previous events occurring which is called posterior probability and is expressed as follows:
P ( A | B ) = P ( B | A ) P ( A ) P ( B )
where P ( A | B ) is the posterior probability, P ( A ) is known as the prior probability, P ( B ) is marginal likelihood (evidence), and P ( B | A ) is referred to as the likelihood. This formula can be applied to datasets in the following way:
P ( y | x ) = P ( x | y ) P ( y ) P ( x )
where y is the class variable and x is the feature vector of size n shown as the following:
x = ( x 1 , x 2 , x 3 , . . . , x n )

2.2.6. ANN

An ANN refers to a model of performing machine learning that is based on how the human brain operates and can be used to perform supervised learning. An ANN consists of neurons or nodes that make up the layers of the network [25]. The three types of layers in an ANN are input, hidden, and output layers where the input layer takes information provided and passes it onto the hidden layer. The hidden layer performs computations and transfers the data to the output layer. The output layer also performs computations and presents the output of the ANN [26]. When performing supervised learning, the network is given the inputs and expected outputs for training. The connections between the nodes in the network have numbers assigned to them called weights. When an error is made by the network, the data are propagated back through the network and the weights are adjusted. This process occurs repeatedly until the error is minimized, and then the test data can be fed through the network [27]. Training an ANN is described as follows:
The first step in training the ANN involves multiplying the input values x i and the weights w i , and then summing the values expressed as the following:
x i · w i = ( x 1 · w 1 ) + ( x 2 · w 2 ) + . . . + ( x n · w n )
The second step involves adding the summed values to the bias b of the hidden layer node as expressed as the following:
z = x i · w i + b
The third step is to pass the z value through an activation function such as ReLU and Softmax. ReLU R ( z ) can be defined as follows:
y ^ = R ( z ) = m a x ( 0 , z ) ,
where z is the input to a neuron. When the z is smaller than zero, the function will output zero, and, when the z is greater or equal to zero, the output is simply the input. Softmax can be defined as follows:
y ^ = s ( z ) i = e z i j = 1 n e z j
where e is the base of the natural logarithm, z is a vector of the inputs, and i and j indexes the input and output units, respectively.
To train the ANN, the loss needs to be calculated so the network can effectively evaluate its performance and make the appropriate changes. Once the loss has been calculated, the next step is to minimize this loss by changing the weights and the biases. Knowing how the cost function C (which is is a measure of “how good” a neural network did with respect to its given training sample and the expected output) changes in relation to weights w i can be done using gradients. Using the following chain rule, the gradient of the cost function in relation to the weights can be calculated:
C w i = C y ^ × y ^ z × z w i
where C y ^ is the gradient of the cost function, y ^ z is the gradient of the predicted value, and z w i is the gradient of z in regards to w i .
ANN is the most suitable model for IoT attacks detection and has had many implementations. Recently, the authors of [28] implemented an ANN based model for detecting IoT based attacks. The model was successful and can be used on IoT networks to perform intrusion detection. In [29], the implementation is done for intrusion detection using ANNs. This research had very good results with the model having a near perfect accuracy and a very low false positive rate.

2.2.7. Logistic Regression

LR is a supervised learning algorithm that uses the logistic function also known as the Sigmoid function. Logistic regression is similar to linear regression except, instead of predicting data that are continuous, it is used for classifying data either true or false. Linear regression can have any value, whereas LR has values between 0 and 1 [30]. Logistic regression is a model that is less represented in intrusion detection than other models. Its suitability for use in intrusion detection is not as well established as the previous models. However, some research has examined a logistic regression based intrusion detection model [31]. This model was tested using multi-class classification and was able to outperform the other models.
As previously mentioned, logistic regression can be thought of as linear regression but for classification problems. The reason that logistic regression is used is because with linear regression the hypothesis h o ( x ) can be greater than one or less than zero. With logistic regression, the hypothesis is between zero and one, e.g., 0 h o ( x ) 1 , where h o is a single hypothesis that maps inputs to outputs and can be evaluated and used to make predictions.
To get a value between zero and one, the Sigmoid function is used which is represented as follows:
S ( x ) = 1 1 + e x
This function returns a number between 0 and 1 which can be mapped to a particular class of data by using a decision boundary to determine the likelihood of the data of a certain class, which can be expressed as follows:
p 0.5 c l a s s = 1
p < 0.5 c l a s s = 0
Once the threshold is set, predictions can be made using the Sigmoid function to determine the likelihood that the data belongs to class 1 as follows:
S ( c l a s s = 1 ) = 1 1 + e x
This function gives back a number that represents the probability that the data should be classified as Class 1. With the previously defined threshold, if the number is 0.5 or above then the data will be classified as Class 1, and anything less than 0.5 will be classified as class 0.
The following subsection provides some details on IoT including the attacks that are used in the dataset for this paper.

2.3. Internet of Things Attacks

As previously discussed, IoT is considered as a network of devices/objects communicating through wired or wireless communication technologies [32]. The protocols used by IoT devices are designed to be used on devices with limited computation, storage, and communication capabilities that need to conserve as much battery power as possible. Such protocols include ZigBee, radio-frequency identification (RFID), and smart Bluetooth. The relatively quick increase in IoT devices being used has resulted in a lack of standardization activities which have seen a massive influx of unsecured devices being connected to networks [33]. This in turn creates a massive attack vector allowing for a massive amount of vulnerable devices open to be exploited by attackers. In the following subsections, we provide relevant threats and attacks faced by IoT.

2.3.1. Data Exfiltration

A data exfiltration attack involves attackers gaining access to a private network and stealing data stored on the network [34]. This type of attack can result in the theft of data such as credit card information and personal data. Several studies have been conducted in the field of detecting data exfiltration attacks using methods such as partially observable Markov decision process [35] and a method that involves capturing metadata at the file system level [36].

2.3.2. DoS and DDoS

Denial of service (DoS) and distributed denial of service (DDoS) attacks are very similar in execution. The primary difference involves the scale of the attack. A DoS attack involves a single system and Internet connection being used to attack the victim, whereas a DDoS attack involves multiple systems and Internet connections on a global scale being used to attack the victim, which are typically referred to as botnets [37].
There are many different ways to perform either of these attacks depending on what protocol is used in the attack. These different methods include HTTP flood, TCP SYN, and UDP flood attack, as identified by Mahjabin et al. [38]. An HTTP flood attack involves altering either the GET or POST requests sent via HTTP. A GET request is used when a client wishes to receive information from the server, whereas a POST request is used to send information to the sever such as uploading a file. Sending thousands of these requests to a server or cluster of servers at once increases the workload at the server(s) side exponentially, slowing the entire system down or preventing legitimate users from accessing the server(s).
A TCP SYN attack exploits the three way handshake that occurs during a TCP connection which involves sending a SYN packet which elicits a response from the server with a SYN and ACK packet. During the attack, the destination address sent in the SYN packet is false. As a result, the server sends out SYN and ACK messages repeatedly. This process stores entries in the server’s connection tables which then becomes full and prevents legitimate users from accessing the server. A UDP flood attack involves sending UDP packets with a port number and sometime a spoofed IP address as well. Once the server receives this packet, it will check for any applications using the port in the UDP packet. The server checks for applications associated with these UDP packets and, if not found, the server sends back a “Destination Unreachable” packet. As more and more packets are received, the system becomes unresponsive to other clients.
Moreover, attackers are able to turn on devices such as webcams and digital video recorders (DVRs). One such example of this was the Mirai botnet in 2016 which was able to make use of up to 400,000 devices and take down large websites such as Twitter and GitHub [39]. Due to lack of security on IoT devices, paramount research has been conducted into detecting DoS and DDoS traffic [40,41]. However, all such algorithms lacks the use of ML techniques.

2.3.3. Keylogging

The basic function of a keylogger is to store the keystrokes made by a user on their keyboard. Keyloggers can be both hardware and software based [42,43]. Software keylogging is typically done by installing malware on the victim machine that saves the key strokes and relays this to the attacker. Some research has been devoted to keylogging detection methods (see, e.g., [44,45]).

2.3.4. OS Scan and Service Scan

Operating system (OS) and service scans are similar in nature and can be grouped into the attack category of probing. This can be done either passively, in which the attacker gathers packets from the network, or actively, in which the attacker sends traffic and recording the responses. Since passive scanning generates no traffic, active scanning is needed for traffic to test. OS scans involve the attacker being able to discover the OS being used by the victim machine. This information can help an attacker identify the type of device, e.g., server, computer or IoT device. It can also help the attacker identify the version of the OS being used. This can help the attacker find vulnerabilities related to the OS.
There has been plethora of research conducted into using OS scans to identify if a device is an IoT device. One study used neural networks to identify if the device scanned was an IoT device [46]. Another study used deep learning techniques to identify Raspberry Pi devices that were acting as IoT devices [47]. Both studies show that it is possible to identify IoT devices using OS scanning techniques.
Service scans, more commonly referred to as port scans, involve the attacker probing a network in order to identify open ports on the network [48]. This is commonly used by an attacker to gain a better insight into the types of activity on the network as well as showcasing any open ports that are vulnerable to being exploited. A port scan works by having the software used send a request to a port on another network to set up a connection. The software will then wait for a response from the network.
Due to the fact that IoT devices can range from printers to heating controllers, the ports that can be used by devices can vary. To this end, the authors of [49] conducted a study performing a scan on printers to identify vulnerable ports. The results showcase that port 9100 was a commonly opened port on printers. The port is used to carry data to and from printers over TCP. It was also noted that gaining access to the network using this port was a simple process.
Port scanning can also be used to identify if a device is an IoT device. An analysis by Sivanathan et al. [50] showed that by scanning for a small number of TCP ports it could be determined whether a device was an IoT device including information on the device itself, such as identifying a device as an HP printer. Since IoT devices are generally more vulnerable than other devices, this could be used to identify an entry point to a network. A study using an approach based on Dempster–Shafer evidence theory produced a solid groundwork for detecting port scan traffic [51]. Another study proposed a new evaluation metric for IDS, which was reported to take less time to identify port scan data than previous metrics [52]. Neither of these studies included IoT devices, and there is currently a lack of research into OS scans in regards to IoT devices.
Recently, several efforts have been devoted for ML in IoT network [32,53,54,55]. However, in most of the existing works, the performance are checked for specific types of ML algorithms, such as ANN, J48 DT, and NB without detailed performance evaluation. Although some work is based on various ML algorithms such as LR, SVM, DT, RF, ANN, and KNN, most of them are used to mitigate IoT cybersecurity threats in special environments such a smart city. Contrary to existing works, our study provides a comprehensive evaluation for both real attack and simulated attack data that were created by simulating a realistic network at the University of New South Wales where real attacks on IoT networks were recorded.

3. Performance Evaluation

3.1. Benchmark Data

Our evaluation involves using several datasets with several ML models to identify the best model for correctly classifying IoT attack data. When selecting the datasets, the two most important factors were the amount of variety in the attack data and how up-to-date the datasets are. The datasets chosen were the bot-IoT datasets [56] because they met the two criteria previously mentioned.

3.2. Performance Evaluation Metrics

For evaluation, we consider the following metrics.

3.2.1. Confusion Matrix

A confusion matrix shows the predictions made by the model. It is designed to show where the model has correctly and incorrectly classified the data.
The confusion matrix for binary and multi-class classification is different. With binary classification, the matrix shows the true positive (TP), true negative (TN), false positive (FP), and false negative (FN) results, as shown in Table 2. The columns represent the correct classification of the data and the rows represent the available classifications.
TP and TN are when the data are correctly classified as either attack or no attack. FP and FN are when data are incorrectly predicted as the other class. When using a confusion matrix for multi-class problems, the same principles apply. However, the matrix shows all the classes which allow for observing where the mis-classification is occurring in the classes, as shown in Table 3.
In Table 3, C represents where the correct classifications are located and W represents incorrect classifications. It is to be noted that correct classifications create a diagonal path through the table from the top left corner to the bottom right corner.

3.2.2. Accuracy

Accuracy is a metric that can be used to identify the percentage of predictions that were classified correctly and is expressed as follows:
A c c u r a c y = Number of correct predictions Total number of predictions
This can be expanded upon by utilizing the results of a confusion matrix including TP, TN, FP, and FN and can be defined as follows:
A c c u r a c y = TP + TN TP + TN + FP + FN

3.2.3. Precision

Precision is used to determine the ratio of correctly predicted positive outcomes against the total number of predicted positive outcomes and can be defined as follows:
P r e c i s i o n = TP TP + FP

3.2.4. Recall

Recall is used to determine the ratio of correctly predicted positive outcomes to all the outcomes in the given class and can be defined as follows:
R e c a l l = TP TP + FN

3.2.5. F1 Score

F1 score is the weighted average of both precision and recall which produces a number between 0 and 1. F1 score is seen as a better performance metric than accuracy and can be defined as follows:
F 1 s c o r e = 2 × ( recall × precision ) recall + precision
It is to be noted that selection of F1 score or accuracy is dependent on how the data are distributed. The F1 score seems a better performance metric than accuracy in the case where the classes are highly unbalanced. F1 score takes into account how the data are distributed, and, in most real-life classification problems, imbalanced class distribution exists and thus F1 score is a better metric to be used. Accuracy is used when the class distribution is similar and it does not take into account how the data are distributed, which may lead to wrong conclusion.

3.2.6. Log Loss

Log loss is used to measure the performance of a model by using the probability of the expected outcome. The higher the probability of the actual class is, the higher the log loss will be. The lower score indicates that the model has performed better.
For binary classification where number of possible classes (M) = 2, log loss can be expressed as follows:
( y i log ( p i ) + ( 1 y i ) log ( 1 p i ) )
For multi-class classification where M > 2, sa eparate loss for each class label is calculated, and the results are summed, which is expressed as follows.
c = 1 M y o , c log ( p o , c )
where M is the number of possible classes (0, 1, 2), log is the natural logarithm, y i is a binary indicator of whether class label i is the correct classification for observations, and p i is the models prediction probability.

3.2.7. ROC AUC

ROC is a graph used to plot the results of the model at various thresholds when making predictions. The graph uses the true positive rate (TPR) and false positive rates (FPR), which are expressed as follows:
T P R = TP TP + FN
F P R = FP FP + TN

3.2.8. Cohen’s Kappa Coefficient

Cohen’s kappa coefficient (CKC), also referred to as the kappa statistic, is used to test the inter rater reliability of prediction and can be expressed as follows:
k = Pr ( a ) - Pr ( e ) 1 - Pr ( e )
where Pr(a) is the observed agreement and Pr(e) is the expected agreement. This metric is useful as it compares the model against a model that guesses based on the frequency of the classes. This allows for the disparity in a dataset to be evaluated particularly with multi-class testing as the dataset has varying numbers of data points per attack.

3.3. Dataset Description

The dataset named Bot-IoT was submitted to the IEEE website on 16 October 2019 and was created by the University of New South Wales (UNSW). The dataset consists of ten CSV files containing records for the following attacks on IoT networks: (i) Data exfiltration; (ii) DoS HTTP; (iii) DoS TCP; (iv) DoS UDP; (v) DDoS HTTP; (vi) DDoS TCP; (vii) DDoS UDP; (viii) Keylogging; (ix) OS Scan; and (x) Service Scan. The dataset comprises both real attack and simulated attack data and was created by simulating a realistic network at the UNSW [56].
Table 4 shows the features used in the experiments. There are 35 columns in the dataset. However, only the ones in Table 4 were used. When deciding what features to use, the contents of the columns are examined and any columns that have no values are removed as well as columns that contain text and columns that are deemed to be irrelevant to the overall classification of the data.
One important part of examining the dataset involves checking the representation of the classes in the dataset, i.e. whether one class is over or under represented, as this can have a detrimental effect on the experiments. Table 5 shows the amount of attack data and no attack data for each dataset used in the experiments.
To conduct multi-class testing, a new CSV file is created using the binary classification datasets. The datasets were collected and then randomized and put into a new file. Due to the large size of the dataset, only a selected percentage of the data is used to prevent excessive run times. Table 6 shows the class representation of the training and test data in the multi-class dataset. It is observable in both the binary and multi-class datasets that not all classes have equal representation. Testing with weighted classes can be done to see the effects of having equal representation among the classes. The models SVM, DT, RF, ANN, and LR are able to use the balanced weighted classes option, which applies to the class weights as follows:
W = S a m p l e s C l a s s e s × Y
where S a m p l e s is the number of rows in the dataset, C l a s s e s is the number of classes in the dataset, and Y is the number of labels.

3.4. Implementation

3.4.1. Tools Used

We use Python version 3.7.4 programming language for the implementation of ML algorithms. The two main modules used for the implementation of the models are sklearn (also referred to as scikit-learn) and Keras. Keras is used to implement the ANN while sklearn is used to implement the other models. It is to be noted that, for comparison purposes, we used the default values of hyperparameters for each classifier. Table 7 contains names of the modules used and a brief description of the module.

3.4.2. Feature Extraction

The dataset contains features that either contain no information or have information that is irrelevant in helping the model classify the data. The unwanted features can be removed during the preprocessing stage using the pandas module. Several features, such as flgs, proto, dir, state, saddr, daddr, srcid, smac, dmac, soui, doui, sco, record, category, and subcategory, were removed from the dataset.

3.4.3. Feature Scaling

The features in the dataset contain large numbers that vary in size. Therefore, it is important to normalize the data in the features. This is done by re-scaling the values of the features to within a defined scale such as −1 to 1 and can be defined as follows:
x = a + ( x m i n ( x ) ) ( b a ) m a x ( x ) m i n ( x )
where x is the normalized value, x is the original value, and a and b are the minimum and maximum values. The result of this will take any number between −1 and 1. This can be done in Python using the MinMaxScaler in the preproccesing module.

3.4.4. Multi-Class Dataset

The multi-class dataset is created by collecting all the rows of all the datasets and then randomizing the rows using the random Python module. The random module contains the shuffle method, which allows an array, in this case the rows of the dataset, to be randomized. Due to the large size of the dataset when using it for testing, only roughly 25% of the dataset is used, which is 1,500,000 rows.

3.4.5. Training Data

The data used by the model to learn are called the training data. Data can be split into training and test data with multiple ratios. For this study, a split of 80:20 was used, with 80% being used for training the models, which is governed by the Pareto principle that states that 80% of result comes from 20% of the effort.

3.4.6. Test Data

Twenty percent of the data is used for testing, which is typically a good amount of data. However, if the dataset is small, this can result in a low amount of test data and in the illusion that the model has done extremely well when in fact it has not had enough data to be properly tested. To split the dataset into training and test data, train_test_split can be used from the Python module named model_selection. When using this function, the random state parameter can be used that sets the seed of the pseudo random number generator; in this case, the number 121 was used.

3.5. Results and Discussion

To test several ML algorithms and to identify which are the best and worst for classifying attack data on IoT networks, this section provides all the results and analysis based on several performance metrics including binary and multi-class testing.

3.5.1. Binary Classification

Data Exfiltration:Table 8 shows the results for data exfiltration data where RF has the best scores for all the performance metrics including log loss. Whereas DT also has perfect scores, it has a high log loss, indicating that the RF model is more confident in making predictions.
Table 9 shows the confusion matrix for RF and shows two noteworthy pieces of information. The first is that the amount of data tested is very low and that the classes do not have equal representation. It is possible that the low amount of test data is having an impact on the results. However, the other models except from DT have relatively poor scores compared to RF.
Table 10 shows that increasing the test data to 30% has a decrease in the log loss, indicating that the model performs better with more data although only marginally. Once the test data reaches 40% and beyond, the results begin to get worse, although the model is able to maintain perfect recall with up to a 50% split in the training and test data.
Due to the class representation being imbalanced, the weighted classes parameter can be used. This allows the disparity of the classes to be rectified, the results of which are shown in Table 11. This option is not available when using the KNN and NB models. It is observable in Table 11 that SVM has had its performance increase by using weighted classes with all metrics increasing and log loss decreasing. ANN is unaffected by weighted classes and LR is marginally affected with the model perfect precision but lowering its recall. DT losses its perfect scores while RF is able to keep perfect scores but slightly increases its log loss.
Without using weighted classes, RF is the best model due to its low log loss when compared to DT. When weighted classes are applied, RF is still the best model with perfect scores and a low log loss, indicating that the model is confident in making predictions.
DDoS HTTP:Table 12 shows the results of DDoS HTTP data. DT has perfect performance scores but a high log loss of 7.25. This dataset does not suffer from a lack of data, rather it suffers from a large imbalance of data since the attack data have more prevalence in the dataset, as shown in Table 13.
This confusion matrix shows a large disparity in the data with a ratio of 3:1319 in favor of attack data. A large disparity in the dataset can cause the log loss to be affected, as log loss is based on probability, and, because the data are more likely to be attack data, this can result in a skewed log loss.
Table 14 shows the results of weighted classes on the DDoS HTTP data. With weighted classes, both SVM and LR have a sizeable decrease in performance across all metrics except log loss which has decreased for both and ROC AUC, which has increased for both. ANN is unaffected by the weighted classes and retains its perfect recall, whereas RF loses the perfect recall. DT loses its perfect scores but has a large decrease in its log loss.
Without using weighted classes, DT is the best model due to the perfect scores, although the high log loss is a factor to consider. RF would be the second best as it has perfect recall as well as the lowest log loss and the highest ROC AUC. When weighted classes are applied, ANN is the best model as it has perfect recall and a low log loss.
DDoS TCP:Table 15 shows the results of the DDoS TCP data. The models DT and RF both have perfect score except for log loss which is high for both. Table 16 shows the confusion matrix for RF and once again the matrix shows a very large disparity in the data represented.
Table 17 shows the results of DDoS TCP data with weighted classes enabled. With weighted classes enabled, SVM has lost its perfect precision but lowered its log loss significantly. DT and ANN are unaffected by the weighted classes but RF retains its perfect scores and lowers its log loss slightly. LR has lost its perfect recall and increased its log loss and ROC AUC.
Both with and without weighted classes, RF is the best model as it has perfect scores. With weighed classes, the log loss is lowered but is still quite high when compared to LR which has a very low log loss.
DDoS UDP:Table 18 shows the results of the DDoS UDP data, where both KNN and and DT have perfect score but KNN is the better model as it has a lower log loss. Although the log loss is still high, this is the case for all the models apart from NB. Table 19 shows the confusion matrix for RF, which shows the disparity in the class representation.
Table 20 shows the results of DDoS UDP data with weighted classes enabled. The table shows that SVM has gained perfect scores and lowered it loss loss, while DT has lost its perfect scores and lowered its log loss substantially. RF has gained perfect scores and lowered its log loss, while ANN is unaffected. LR has lost perfect recall but gained perfect precision and lowered its log loss and increased its ROC AUC.
Without weighted classes, KNN is the best model as it has perfect scores but the log loss is high. NB would be second best as it has perfect precision and a low log loss. With weighted classes, RF is the best model as it has perfect scores and a low log loss.
Key logging:Table 21 shows the results of Key logging data. DT is the best model as it has the best log loss and ROC AUC scores combined with perfect precision while having high metric scores.
Table 22 shows the confusion matrix for DT where it is observable that the dataset has a low amount of data and the data are imbalanced.
Just as with data exfiltration, the amount of test data can be increased to observe the effect on the scores of the DT model. Table 23 shows the results of increasing the test data for key logging data. Increasing the test data to 30% gives the model perfect recall instead of perfect accuracy. Once the data are increased to 50%, the model no longer has perfect recall or precision. Based on the changes in the results, it is observable that the low amount of data has a significant impact on the results of the model.
Table 24 shows the results of key logging data with weighted classes enabled. SVM shows an overall decrease in performance with the model no longer having perfect recall. DT and RF also show a drop in performance with the models losing their perfect precision and recall, respectively. ANN is unaffected with LR having a large decrease in the models recall leading to the worst performance of all the models.
Without weighted classes, DT is the best model with the lowest log loss and highest ROC AUC as well as perfect precision. With weighted class, all the models tested had a decrease in performance except for ANN, which was unchanged. Apart from the models perfect recall, it still has comparatively worse scores than DT and RF. Unless perfect recall is a factor DT should be used as it will correctly classify more data than ANN.
OS Scan:Table 25 shows the results for OS Scan data. All of the models have good scores with RF, ANN and LR having a perfect recall indicating the models made no false negatives. RF has a higher precision than LR and ANN as well as having a lower log loss and higher ROC AUC. This would suggest that RF is the best model. However, inspection of the confusion matrix shows a large imbalance of data in the dataset, as shown in Table 26.
Table 27 shows the results of OS scan data with weighted classes enabled. SVM shows a decrease in accuracy, recall, F1 score, log loss, and ROC AUC. The table also shows that the models decreased the performance overall. DT shows a decrease in log loss and ROC AUC marking a slight increase in the models confidence but lower ability to perform well at different thresholds. RF has lost its perfect recall and has an increased log loss and ROC AUC. ANN has seen no change to its results, whereas LR has a large performance decrease with only ROC AUC have been improved.
Without weighted classes, RF is the best model as it has perfect recall and the lowest log loss also having the highest ROC AUC. With weighted classes, ANN is the only model with perfect recall but DT and RF both have better accuracy, precision, log loss, and ROC AUC. If having no false positives is needed, then ANN is the best, but DT is better at classifying data in general.
Service Scan:Table 28 shows the results for service scan data. The models SVM, RF, and ANN have perfect recall but have poor ROC AUC scores. DT has the highest ROC AUC and the lowest log loss, but RF could be considered the best due to its perfect recall.
Table 29 shows the confusion matrix for RF as well as the imbalanced data.
Table 30 shows the results of service scan data with weighted classes enabled. SVM was not tested due to excessive running times. DT, RF, and LR have increased their ROC AUC but all other metrics have been negatively affected. ANN is unaffected, being the only model to keep its perfect recall.
Without weighted classes of the models with perfect recall, RF is the best as it has the lowest log loss and highest ROC AUC. However, DT has the best log loss but does not have perfect recall. With weighted classes, ANN is the best as it is the only model to retain perfect recall, but its ROC AUC is the poorest of all the models.
DoS HTTP:Table 31 shows the results for DoS data; DT and RF both have perfect scores and a low log loss with DT narrowly beating RF.
Table 32 shows the confusion matrix for RF which showcases the disparity in the dataset.
Table 33 shows the results of DoS HTTP data with weighted classes enabled. SVM shows deceased performance in all metric except for ROC AUC. DT and RF have lost their perfect scores and have an increased log loss. ANN is unaffected, whereas LR has seen a decrease in all performance metrics apart from ROC AUC, which has increased.
Without weighted classes, DT is the best model as it has perfect scores and the lowest log loss. With weighted classes, ANN is the best model as it has perfect recall. In regards to the models ability to classify data, ANN comes out on top due to having perfect recall.
DoS TCP:Table 34 shows the results for DoS TCP data, where all the models apart from NB have perfect recall. DT and RF have the best ROC AUC scores, but both have high log losses when compared to the other models. KNN has the lowest log loss and a ROC AUC almost as good as RF.
Table 35 shows the confusion matrix for DT and shows the imbalance of the data in the dataset.
Table 36 shows the results of DoS TCP data with weighted classes enabled. SVM was not recorded due to excessively long running times. With weighted classes, both DT and RF have lost their perfect recall, but DT has gained perfect precision. Both models have also seen an improvement in log loss and ROC AUC. ANN is affected and LR has had a performance decrease in almost all metrics.
Without weighted classes, KNN could be considered the best model as it has the lowest log loss and a reasonably high ROC AUC. DT and RF have a higher ROC AUC but also have a considerably higher log loss than KNN. With weighted classes, both DT and ANN could be considered the best with DT having perfect precision and ANN having perfect recall. Both models also have a low log loss, but ANN has a poorer ROC AUC score.
DoS UDP:Table 37 shows the results for DoS UDP data. NB is the best model with perfect precision, low log loss, and high ROC AUC, as well as having high metrics across all categories. All the other models have perfect recall but have either a high log loss or a low ROC AUC, or both.
Table 38 shows the confusion matrix for NB which shows the disparity between the data in the dataset.
Table 39 shows the results of DoS UDP data with weighted classes enabled. ANN is unaffected and maintains poor log loss and ROC AUC scores. SVM has gained perfect precision but lost perfect recall with an increase in log loss and ROC AUC. DT has also swapped its precision and recall scores with an increase in both log loss and ROC AUC scores. RF has lost its perfect recall and increased its log loss and ROC AUC. LR has improved its log loss, ROC AUC, and gained perfect precision while losing perfect recall.
Without weighted classes, NB is the best model having perfect precision with a low log loss and high ROC AUC. With weighted classes, both SVM and LR perform very well but SVM is the better model as it has the lower log loss of the two models.

3.5.2. Model Comparison

Table 40 shows the best models for each of the datasets including both (with and without weighted classes). DT and RF are the models that appear the most in the table with ANN appearing frequently in the weighted classes column. Without the use of weighted classes, RF achieves the best performance. With weighted classes, ANN achieves the best performances. However, using weighted classes generally decreases the overall performance of the model.

3.5.3. Multiclass Classification

Table 41 shows the results for multi-class classification, KNN has the best performance metrics including the lowest log loss and the highest CKS. LR is the worst model with the lowest metrics including the lowest CKS and a high log loss beat only by SVM.
Table 42 shows the results with weighted classes. KNN and NB cannot use weighted classes and SVM was not tested because of its excessively long running time. Weighted classes have reduced the performance metrics for all models apart from ANN, which has had a small decrease in log loss, making it the best model with weighted classes.
Table 43 shows that KNN performs very well with the multi-class dataset with all the classes having low amounts of incorrectly classified data.
Table 44 shows that SVM performs poorly with the multi-class dataset with data exfiltration (1), DDoS HTTP (2), and key logging (5) data all being incorrectly classified. These classes are ones featuring low amounts of data, which could be the reason for the low accuracy.
Table 45 shows the confusion matrix for DT multi-class classification. It can be observed that the model performs very well; however, the model appears to have difficultly in correctly classifying the data that are imbalanced in the dataset. This is evident in Table 45 with data exfiltration (1), DDoS HTTP (2), and key logging (5) being incorrectly classified.
Table 46 shows the confusion matrix for DT with weighted classes enabled. Using weighted classes has resulted in an overall decrease in the models performance, but has improved the correct classification of data for normal traffic (0), data exfiltration (1), and key logging (5). This has also resulted in DoS HTTP having all its data incorrectly classified.
Table 47 shows the confusion matrix for NB multi-classification, which performs quite well with no classes having all the data incorrectly classified. The model is also able to handle the data disparity in the classes with the low data classes having good classification results.
Table 48 shows the results for RF multi-class classification, which has good classification accuracy for the classes that have lots of data. The classes with low data have no correctly classified data.
Table 49 shows the results of having weighted classes. It is shown that, despite lower correct classifications overall, the models performed better with low data and correctly classifying the classes.
Table 50 shows the results for ANN multi-class classification. The model performs well except for exfiltration (1) and key logging (5), which have incorrectly classified data.
Table 51 shows the results with weighted classes enabled. It is observable that the model is much better at classifying most classes with OS scan (6) and service scan (7) having the most incorrectly classified data. The models is also unable to correctly classify any data for normal data (0) and data exfiltration (1).
Table 52 shows the results for LR multi-class classification, which has poor performance overall with the low data classes and also having no correctly classified data.
Table 53 shows the results of having weighted classes. It is evident that the accuracy of overall classification has decreased; however, the model shows improvement in classifying the low data classes.

4. Conclusions

In this paper, state-of-the-art ML algorithms are compared in terms of accuracy, precision, recall, F1 score, and log loss on both weighted and non-weighted Bot-IoT dataset. It is shown that the performance of RF in terms of accuracy and precision is the best with the non-weighted dataset. However, in a weighted dataset, ANN has higher accuracy for binary classification. In multi-classification, KNN and ANN are highly accurate for weighted and non-weighted datasets, respectively. From the results, it is evident that, when all types of attack have weighted datasets, ANN predicts the type of attack with higher accuracy.
In the future, we intend to adopt the models explored in this research into an IDS prototype for testing using diverse data including a mix of attacks to validate the multi-class functionality of models.

Author Contributions

Conceptualization, A.C.; Methodology, A.C.; Validation, A.C., J.A., and R.U.; Formal Analysis, A.C.; Investigation, A.C., J.A., R.U., and B.N.; resources, A.C.; writing—original draft preparation, A.C. and J.A.; writing—review and editing, A.C., J.A., R.U., B.N., M.G., F.M., S.u.R., F.A., and W.J.B.; supervision, J.A. and W.J.B.; and funding acquisition, F.A. and J.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Dorsemaine, B.; Gaulier, J.P.; Wary, J.P.; Kheir, N.; Urien, P. Internet of Things: A Definition & Taxonomy. In Proceedings of the 2015 9th International Conference on Next Generation Mobile Applications, Services and Technologies, Cambridge, UK, 9–11 September 2015. [Google Scholar] [CrossRef]
  2. Statista. IoT: Number of Connected Devices Worldwide 2012–2025; Statista: Hamburg, Germany, 2019. [Google Scholar]
  3. Doffman, Z. Cyberattacks On IOT Devices Surge 300% In 2019, ‘Measured in Billions’. Available online: https://www.forbes.com/sites/zakdoffman/2019/09/14/dangerous-cyberattacks-on-iot-devices-up-300-in-2019-now-rampant-report-claims/?sh=24e245575892 (accessed on 10 November 2020).
  4. Furbush, J. Machine Learning: A Quick and Simple Definition. Available online: https://www.oreilly.com/content/machine-learning-a-quick-and-simple-definition/ (accessed on 10 November 2020).
  5. Jmj, A. 5 Industries That Heavily Rely on Artificial Intelligence and Machine Learning. Available online: https://medium.com/datadriveninvestor/5-industries-that-heavily-rely-on-artificial-intelligence-and-machine-learning-53610b6c1525 (accessed on 10 November 2020).
  6. Dosal, E. 3 Advantages of a Network Threat Analysis. Compuquip, 4 September 2018. [Google Scholar]
  7. Groopman, J. Understand the Top 4 Use Cases for AI in Cybersecurity. Available online: https://searchsecurity.techtarget.com/tip/Understand-the-top-4-use-cases-for-AI-in-cybersecurity (accessed on 10 November 2020).
  8. Mohammad, A.; Maen, A.; Szilveszter, K.; Mouhammd, A. Evaluation of machine learning algorithms for intrusion detection system. In Proceedings of the IEEE 15th International Symposium on Intelligent Systems and Informatics (SISY), Subotica, Serbia, 14–16 September 2017; pp. 000277–000282. [Google Scholar]
  9. Sommer, R.; Paxson, V. Outside the Closed World: On Using Machine Learning for Network Intrusion Detection. In Proceedings of the 2010 IEEE Symposium on Security and Privacy, Oakland, CA, USA, 16–19 May 2010; pp. 305–316. [Google Scholar]
  10. Foley, J.; Moradpoor, N.; Ochenyi, H. Employing a Machine Learning Approach to Detect Combined Internet of Things Attacks against Two Objective Functions Using a Novel Dataset. Secur. Commun. Netw. 2020, 2020. [Google Scholar] [CrossRef]
  11. Alsamiri, J.; Alsubhi, K. Internet of Things Cyber Attacks Detection using Machine Learning. Int. J. Adv. Comput. Sci. Appl. 2019, 10, 628–634. [Google Scholar] [CrossRef][Green Version]
  12. Hasan, M.; Islam, M.M.; Zarif, M.I.I.; Hashem, M. Attack and anomaly detection in IoT sensors in IoT sites using machine learning approaches. Internet Things 2019, 7, 100059. [Google Scholar] [CrossRef]
  13. Ali, N.; Neagu, D.; Trundle, P. Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets. SN Appl. Sci. 2019, 1, 1559. [Google Scholar] [CrossRef][Green Version]
  14. Harrison, O. Machine Learning Basics with the K-Nearest Neighbors Algorithm. Towards Data Science, 10 September 2019. [Google Scholar]
  15. Liao, Y.; Vemuri, V. Use of K-Nearest Neighbor classifier for intrusion detection. Comput. Secur. 2002, 21, 439–448. [Google Scholar] [CrossRef]
  16. Nikhitha, M.; Jabbar, M. K Nearest Neighbor Based Model for Intrusion Detection System. Int. J. Recent Technol. Eng. 2019, 8, 2258–2262. [Google Scholar] [CrossRef]
  17. Yao, J.; Zhao, S.; Fan, L. An Enhanced Support Vector Machine Model for Intrusion Detection. Rough Sets Knowl. Technol. Lect. Notes Comput. Sci. 2006, 538–543. [Google Scholar] [CrossRef][Green Version]
  18. Cahyo, A.N.; Hidayat, R.; Adhipta, D. Performance comparison of intrusion detection system based anomaly detection using artificial neural network and support vector machine. AIP Conf. Proc. 2016. [Google Scholar] [CrossRef][Green Version]
  19. Sharma, H.; Kumar, S. A Survey on Decision Tree Algorithms of Classification in Data Mining. Int. J. Sci. Res. (IJSR) 2016, 5, 2094–2097. [Google Scholar] [CrossRef]
  20. Stampar, M.; Fertalj, K. Artificial intelligence in network intrusion detection. In Proceedings of the 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 25–29 May 2015; pp. 1318–1323. [Google Scholar] [CrossRef]
  21. Aloqaily, M.; Otoum, S.; Al Ridhawi, I.; Jararweh, Y. An Intrusion Detection System for Connected Vehicles in Smart Cities. Ad. Hoc. Netw. 2019. [Google Scholar] [CrossRef]
  22. Koehrsen, W. An Implementation and Explanation of the Random Forest in Python. Available online: https://towardsdatascience.com/an-implementation-and-explanation-of-the-random-forest-in-python-77bf308a9b76 (accessed on 10 November 2020).
  23. Dubey, A. Feature Selection Using Random forest. Available online: https://towardsdatascience.com/feature-selection-using-random-forest-26d7b747597f (accessed on 10 November 2020).
  24. Farnaaz, N.; Jabbar, M. Random Forest Modeling for Network Intrusion Detection System. Procedia Comput. Sci. 2016, 89, 213–217. [Google Scholar] [CrossRef][Green Version]
  25. Saritas, M.M.; Yasar, A. Performance analysis of ANN and Naive Bayes classification algorithm for data classification. Int. J. Intell. Syst. Appl. Eng. 2019, 7, 88–91. [Google Scholar] [CrossRef][Green Version]
  26. Ujjwalkarn. A Quick Introduction to Neural Networks. The Data Science Blog, 9 August 2016.
  27. Maind, S.B.; Wankar, P. Research paper on basic of artificial neural network. Int. J. Recent Innov. Trends Comput. Commun. 2014, 2, 96–100. [Google Scholar]
  28. Anitha, A.A.; Arockiam, L. ANNIDS: Artificial Neural Network based Intrusion Detection System for Internet of Things. Int. J. Innov. Technol. Explor. Eng. Regul. Issue 2019, 8, 2583–2588. [Google Scholar] [CrossRef]
  29. Shenfield, A.; Day, D.; Ayesh, A. Intelligent intrusion detection systems using artificial neural networks. ICT Express 2018, 4, 95–99. [Google Scholar] [CrossRef]
  30. Rajput, H. MachineX: Simplifying Logistic Regression. Knoldus Blogs, 28 March 2018. [Google Scholar]
  31. Ghosh, P.; Mitra, R. Proposed GA-BFSS and logistic regression based intrusion detection system. In Proceedings of the 2015 Third International Conference on Computer, Communication, Control and Information Technology (C3IT), Hooghly, India, 7–8 February 2015; pp. 1–6. [Google Scholar]
  32. Hussain, F.; Hussain, R.; Hassan, S.A.; Hossain, E. Machine Learning in IoT Security: Current Solutions and Future Challenges. IEEE Commun. Surv. Tutor. 2020, 22, 1686–1721. [Google Scholar] [CrossRef][Green Version]
  33. Saleem, J.; Hammoudeh, M.; Raza, U.; Adebisi, B.; Ande, R. IoT standardisation. In Proceedings of the 2nd International Conference on Future Networks and Distributed Systems—ICFNDS 18, Amman, Jordan, 26–27 June 2018. [Google Scholar] [CrossRef]
  34. Ullah, F.; Edwards, M.; Ramdhany, R.; Chitchyan, R.; Babar, M.A.; Rashid, A. Data exfiltration: A review of external attack vectors and countermeasures. J. Netw. Comput. Appl. 2017, 101, 18–54. [Google Scholar] [CrossRef][Green Version]
  35. Carthy, S.M.M.; Sinha, A.; Tambe, M.; Manadhata, P. Data Exfiltration Detection and Prevention: Virtually Distributed POMDPs for Practically Safer Networks. Lect. Notes Comput. Sci. Decis. Game Theory Secur. 2016, 39–61. [Google Scholar] [CrossRef]
  36. Fadolalkarim, D.; Bertino, E. A-PANDDE: Advanced Provenance-based ANomaly Detection of Data Exfiltration. Comput. Secur. 2019, 84, 276–287. [Google Scholar] [CrossRef]
  37. Malik, M.; Singh, Y. A Review: DoS and DDoS Attacks. Int. J. Comput. Sci. Mob. Comput. 2015, 4, 260–265. [Google Scholar]
  38. Mahjabin, T.; Xiao, Y.; Sun, G.; Jiang, W. A survey of distributed denial-of-service attack, prevention, and mitigation techniques. Int. J. Distrib. Sens. Networks 2017, 13, 2–33. [Google Scholar] [CrossRef][Green Version]
  39. Kolias, C.; Kambourakis, G.; Stavrou, A.; Voas, J. DDoS in the IoT: Mirai and Other Botnets. Computer 2017, 50, 80–84. [Google Scholar] [CrossRef]
  40. Galeano-Brajones, J.; Carmona-Murillo, J.; Valenzuela-Valdés, J.F.; Luna-Valero, F. Detection and Mitigation of DoS and DDoS Attacks in IoT-Based Stateful SDN: An Experimental Approach. Sensors 2020, 20, 816. [Google Scholar] [CrossRef] [PubMed][Green Version]
  41. Ul, I.; Bin, M.; Asif, M.; Ullah, R. DoS/DDoS Detection for E-Healthcare in Internet of Things. Int. J. Adv. Comput. Sci. Appl. 2018, 9, 297–300. [Google Scholar] [CrossRef][Green Version]
  42. Olzak, T. Keystroke Logging (Keylogging). Available online: https://www.researchgate.net/publication/228797653_Keystroke_logging_keylogging (accessed on 12 November 2020).
  43. Abukar, Y.; Maarof, M.; Hassan, F.; Abshir, M. Survey of Keylogger Technologies. Int. J. Comput. Sci. Telecommun. 2014, 5, 25–31. [Google Scholar]
  44. Ortolani, S.; Giuffrida, C.; Crispo, B. Bait Your Hook: A Novel Detection Technique for Keyloggers. Lect. Notes Comput. Sci. Recent Adv. Intrusion Detect. 2010, 198–217. [Google Scholar] [CrossRef]
  45. Wajahat, A.; Imran, A.; Latif, J.; Nazir, A.; Bilal, A. A Novel Approach of Unprivileged Keylogger Detection. In Proceedings of the 2019 2nd International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), Sukkur, Pakistan, 30–31 January 2019. [Google Scholar] [CrossRef]
  46. Yang, K.; Li, Q.; Sun, L. Towards automatic fingerprinting of IoT devices in the cyberspace. Comput. Netw. 2019, 148, 318–327. [Google Scholar] [CrossRef]
  47. Aneja, S.; Aneja, N.; Islam, M.S. IoT Device Fingerprint using Deep Learning. In Proceedings of the 2018 IEEE International Conference on Internet of Things and Intelligence System (IOTAIS), Bali, Indonesia, 1–3 November 2018. [Google Scholar] [CrossRef][Green Version]
  48. Bhuyan, M.H.; Bhattacharyya, D.K.; Kalita, J.K. Surveying Port Scans and Their Detection Methodologies. Comput. J. 2011, 54, 1565–1581. [Google Scholar] [CrossRef]
  49. Markowsky, L.; Markowsky, G. Scanning for vulnerable devices in the Internet of Things. In Proceedings of the 2015 IEEE 8th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), Warsaw, Poland, 24–26 September 2015. [Google Scholar] [CrossRef]
  50. Sivanathan, A.; Gharakheili, H.H.; Sivaraman, V. Can We Classify an IoT Device using TCP Port Scan? In Proceedings of the 2018 IEEE International Conference on Information and Automation for Sustainability (ICIAfS), Colombo, Sri Lanka, 21–22 December 2018. [Google Scholar] [CrossRef]
  51. Shao, G.L.; Chen, X.S.; Yin, X.Y.; Ye, X.M. A fuzzy detection approach toward different speed port scan attacks based on Dempster-Shafer evidence theory. Secur. Commun. Netw. 2016, 9, 2627–2640. [Google Scholar] [CrossRef][Green Version]
  52. Lopez-Vizcaino, M.; Novoa, F.J.; Fernandez, D.; Carneiro, V.; Cacheda, F. Early Intrusion Detection for OS Scan Attacks. In Proceedings of the 2019 IEEE 18th International Symposium on Network Computing and Applications (NCA), Cambridge, MA, USA, 26–28 September 2019. [Google Scholar] [CrossRef]
  53. Rashid, M.M.; Kamruzzaman, J.; Hassan, M.M.; Imam, T.; Gordon, S. Cyberattacks Detection in IoT-Based Smart City Applications Using Machine Learning Techniques. Int. J. Environ. Res. Public Health 2020, 17, 9347. [Google Scholar] [CrossRef]
  54. Soe, Y.N.; Feng, Y.; Santosa, P.I.; Hartanto, R.; Sakurai, K. Machine Learning-Based IoT-Botnet Attack Detection with Sequential Architecture. Sensors 2020, 20, 4372. [Google Scholar] [CrossRef] [PubMed]
  55. Ioannou, C.; Vassiliou, V. Classifying Security Attacks in IoT Networks Using Supervised Learning. In Proceedings of the 2019 15th International Conference on Distributed Computing in Sensor Systems (DCOSS), Santorini Island, Greece, 29–31 May 2019; pp. 652–658. [Google Scholar] [CrossRef]
  56. Koroniotis, N.; Moustafa, N.; Sitnikova, E.; Turnbull, B. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst. 2019, 100, 779–796. [Google Scholar] [CrossRef][Green Version]
  57. Noble, W.S. What is a support vector machine? Nat. Biotechnol. 2006, 24, 1565–1567. [Google Scholar] [CrossRef] [PubMed]
  58. Rish, I. An empirical study of the naive Bayes classifier. In Proceedings of the IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, Seattle, WA, USA, 4 August 2001; Volume 3, pp. 41–46. [Google Scholar]
Table 1. Abbreviations and their explanations.
Table 1. Abbreviations and their explanations.
AcronymExplanationAcronymExplanation
IDSIntrusion Detection SystemsANNArtificial Neural Network
MLMachine LearningKNNK-nearest Neighbour
SVMSupport Vector MachineDTDecision Tree
NBNaive BayesRFRandom Forest
LRLogistic RegressionDDoSDistributed Denial-of-Service
IoTInternet of ThingsCKCCohen’s Kappa Coefficient
TPTrue PositiveTNTrue Negative
FPFalse PositiveFNFalse Negative
TPRTrue Positive RateFPRFalse Positive Rate
Table 2. Confusion matrix example.
Table 2. Confusion matrix example.
Actual Label
Predicted labelNo attackAttack
No attackTrue negativeFalse negative
AttackFalse positiveTrue positive
Table 3. Multi-class confusion matrix example.
Table 3. Multi-class confusion matrix example.
Actual Label
Predicted labelClass 1Class 2Class 3
Class 1CWW
Class 2WCW
Class 3WWC
Table 4. Dataset features and description.
Table 4. Dataset features and description.
FeaturesDescription
StimeRecord start time
SportPort that data is being sent from
DportPort that data is being received from
PktsTotal number of packets transferred
BytesTotal number of bytes transferred
LtimeRecord last time
SeqSequence number
DurRecord total duration
MeanAverage duration of aggregated records
SumTotal duration of aggregated records
MinMinimum duration of aggregated records
MaxMaximum duration of aggregated records
SpktsSource to destination packet count
DpktsDestination to source packet count
SbytesSource to destination byte count
DbytesDestination to source byte count
RateTotal packets per second in transaction
SrateSource to destination packets per second
DrateDestination to source packets per second
Table 5. Dataset label distribution.
Table 5. Dataset label distribution.
DatasetNo Attack DataAttack DataTotal
Data exfiltration24118142
DDoS HTTP551977119826
DDoS TCP3210485431048575
DDoS UDP3610485391048575
Key logging16414691633
OS Scan3949358275362224
Service scan199310465821048575
DoS HTTP562970629762
DoS TCP10610484691048575
DoS UDP3710485381048575
Table 6. Multi-class data representation.
Table 6. Multi-class data representation.
ClassesTraining DataTest DataTotal
No attack13983351733
Data exfiltration22729
DDoS HTTP420910155224
DDoS TCP22163856377278015
DDoS UDP22272855302278030
Key logging31481395
OS Scan758771890794784
Service scan22174555768277509
DoS HTTP634314757818
DoS TCP22355555236278791
DoS UDP22217155501277672
Table 7. Modules used and description.
Table 7. Modules used and description.
Module NameDescription
numpyUsed to store the dataset in an array
pandasUsed to read the dataset CSV file
preprocessingUsed to normalize feature data
model_selectionUsed for splitting the training and test data
randomUsed to randomize the multi-class dataset
metricsContains the performance metrics used in testing the model
neighborsContains KNN model
SVM [57]Contains the SVM model
treeContains the DT model
naive bayesContains the NB model
ensembleContains the RF model
linear modelContains the LR model
modelsContains the ANN model
layersContains ANN layers
utilsContains class weight for ANN
Table 8. Data exfiltration results.
Table 8. Data exfiltration results.
Algorithms UsedAccuracyPrecisionRecallF1 ScoreLog LossROC AUC
KNN [14]0.860.950.870.910.190.83
SVM [57]0.890.950.910.930.270.85
DT [19]1.01.01.01.09.991.0
NB [58]0.891.00.870.933.570.93
RF [24]1.01.01.01.00.0591.0
ANN [25]0.820.821.00.902.570.5
LR [31]0.890.950.910.930.220.85
Table 9. Data exfiltration RF confusion matrix.
Table 9. Data exfiltration RF confusion matrix.
Actual Label
Predicted labelNo AttackAttack
No Attack50
Attack024
Table 10. Data exfiltration RF test data amounts.
Table 10. Data exfiltration RF test data amounts.
Test AmountAccuracyPrecisionRecallF1 ScoreLog LossROC AUC
201.01.01.01.00.0591.0
301.01.01.01.00.0431.0
400.980.971.00.980.0420.94
500.970.961.00.980.0830.9
600.940.970.950.960.0890.89
Table 11. Data exfiltration weighted classes results.
Table 11. Data exfiltration weighted classes results.
Algorithms UsedAccuracyPrecisionRecallF1 ScoreLog LossROC AUC
KNN [14]n/an/an/an/an/an/a
SVM [57]0.931.00.910.950.250.95
DT [19]0.931.00.910.950.120.95
NB [58]n/an/an/an/an/an/a
RF [24]1.01.01.01.00.0741.0
ANN [25]0.820.821.00.902.570.5
LR [31]0.891.00.870.930.380.93
Table 12. DDoS HTTP results.
Table 12. DDoS HTTP results.
Algorithms UsedAccuracyPrecisionRecallF1 ScoreLog LossROC AUC
KNN [14]0.990.991.00.990.00950.83
SVM [57]0.990.991.00.990.00930.77
DT [19]1.01.01.01.07.251.0
NB [58]0.990.990.990.990.0630.66
RF [24]0.990.991.00.990.00210.88
ANN [25]0.990.991.00.990.0440.5
LR [31]0.990.991.00.990.00690.77
Table 13. DDoS HTTP DT confusion matrix.
Table 13. DDoS HTTP DT confusion matrix.
Actual Label
Predicted labelNo AttackAttack
No Attack90
Attack03950
Table 14. DDoS HTTP weighted classes results.
Table 14. DDoS HTTP weighted classes results.
Algorithms UsedAccuracyPrecisionRecallF1 ScoreLog LossROC AUC
KNN [14]n/an/an/an/an/an/a
SVM [57]0.890.990.890.940.0130.83
DT [19]0.990.990.990.990.0180.88
NB [58]n/an/an/an/an/an/a
RF [24]0.990.990.990.990.00470.88
ANN [25]0.990.991.00.990.0440.5
LR [31]0.910.990.910.950.150.90
Table 15. DDoS TCP results.
Table 15. DDoS TCP results.
Algorithms UsedAccuracyPrecisionRecallF1 ScoreLog LossROC AUC
KNN [14]0.990.991.00.991.760.83
SVM [57]0.991.00.990.995.820.83
DT [19]1.01.01.01.09.991.0
NB [58]0.991.00.990.990.0290.99
RF [24]1.01.01.01.02.551.0
ANN [25]0.990.991.00.994.750.5
LR [31]0.990.991.00.990.000100.58
Table 16. DDoS TCP RF confusion matrix.
Table 16. DDoS TCP RF confusion matrix.
Actual Label
Predicted labelNo AttackAttack
No Attack60
Attack0209709
Table 17. DDoS TCP weighted classes results.
Table 17. DDoS TCP weighted classes results.
Algorithms UsedAccuracyPrecisionRecallF1 ScoreLog LossROC AUC
KNN [14]n/an/an/an/an/an/a
SVM [57]0.990.990.990.990.000400.83
DT [19]1.01.01.01.09.991.0
NB [58]n/an/an/an/an/an/a
RF [24]1.01.01.01.01.331.0
ANN [25]0.990.991.00.994.750.5
LR [31]0.990.990.990.990.0250.91
Table 18. DDoS UDP results.
Table 18. DDoS UDP results.
Algorithms UsedAccuracyPrecisionRecallF1 ScoreLog LossROC AUC
KNN [14]1.01.01.01.04.561.0
SVM [57]0.990.991.00.998.930.92
DT [19]1.01.01.01.09.991.0
NB [58]0.991.00.990.990.000980.99
RF [24]0.990.991.00.995.710.92
ANN [25]0.990.991.00.995.300.5
LR [31]0.990.991.00.997.770.78
Table 19. DDoS UDP KNN confusion matrix.
Table 19. DDoS UDP KNN confusion matrix.
Actual Label
Predicted labelNo AttackAttack
No Attack70
Attack0209708
Table 20. DDoS UDP weighted classes results.
Table 20. DDoS UDP weighted classes results.
Algorithms UsedAccuracyPrecisionRecallF1 ScoreLog LossROC AUC
KNN [14]n/an/an/an/an/an/a
SVM [57]1.01.01.01.02.841.0
DT [19]0.991.00.990.990.0000110.99
NB [58]n/an/an/an/an/an/a
RF [24]1.01.01.01.00.00201.0
ANN [25]0.990.991.00.995.300.5
LR [31]0.991.00.990.990.000280.99
Table 21. Key logging results.
Table 21. Key logging results.
Algorithms UsedAccuracyPrecisionRecallF1 ScoreLog LossROC AUC
KNN [14]0.980.981.00.990.330.93
SVM [57]0.960.961.00.980.160.81
DT [19]0.991.00.990.990.00850.99
NB [58]0.910.920.980.952.640.58
RF [24]0.990.991.00.990.0220.96
ANN [25]0.910.911.00.951.580.5
LR [31]0.960.961.00.980.160.79
Table 22. Key logging DT confusion matrix.
Table 22. Key logging DT confusion matrix.
Actual Label
Predicted labelNo AttackAttack
No Attack290
Attack2296
Table 23. Key logging DT test data amounts.
Table 23. Key logging DT test data amounts.
Test AmountAccuracyPrecisionRecallF1 ScoreLog LossROC AUC
200.991.00.990.990.00850.99
300.990.991.00.990.0800.96
400.990.981.00.990.110.95
500.990.990.990.990.130.95
Table 24. Key logging weighted classes results.
Table 24. Key logging weighted classes results.
Algorithms UsedAccuracyPrecisionRecallF1 ScoreLog LossROC AUC
KNN [14]n/an/an/an/an/an/a
SVM [57]0.870.980.870.920.170.88
DT [19]0.980.990.980.990.0380.97
NB [58]n/an/an/an/an/an/a
RF [24]0.980.990.980.990.0510.97
ANN [25]0.910.911.00.951.580.5
LR [31]0.770.980.760.850.460.82
Table 25. OS Scan results.
Table 25. OS Scan results.
Algorithms UsedAccuracyPrecisionRecallF1 ScoreLog LossROC AUC
KNN [14]0.990.990.990.990.0630.80
SVM [57]0.940.990.940.970.0240.96
DT [19]0.990.990.990.990.00380.98
NB [58]0.980.980.990.990.540.51
RF [24]0.990.991.00.990.00610.83
ANN [25]0.980.981.00.990.160.5
LR [31]0.980.981.00.990.0360.50
Table 26. OS Scan RF confusion matrix.
Table 26. OS Scan RF confusion matrix.
Actual Label
Predicted labelNo AttackAttack
No Attack608161
Attack071673
Table 27. OS scan weighted classes results.
Table 27. OS scan weighted classes results.
Algorithms UsedAccuracyPrecisionRecallF1 ScoreLog LossROC AUC
KNN [14]n/an/an/an/an/an/a
SVM [57]0.890.990.890.940.0130.83
DT [19]0.990.990.990.990.0250.88
NB [58]n/an/an/an/an/an/a
RF [24]0.990.990.990.990.0300.99
ANN [25]0.980.981.00.990.160.5
LR [31]0.900.990.900.950.190.94
Table 28. Service Scan results.
Table 28. Service Scan results.
Algorithms UsedAccuracyPrecisionRecallF1 ScoreLog LossROC AUC
KNN [14]0.990.990.990.990.0130.79
SVM [57]0.990.991.00.990.0120.54
DT [19]0.990.990.990.990.00280.84
NB [58]0.990.990.990.990.260.58
RF [24]0.990.991.00.990.00390.54
ANN [25]0.990.991.00.990.0290.5
LR [31]0.990.990.990.990.00870.54
Table 29. Service Scan RF confusion matrix.
Table 29. Service Scan RF confusion matrix.
Actual Label
Predicted labelNo attackAttack
No attack31350
Attack0209334
Table 30. Service scan weighted classes results.
Table 30. Service scan weighted classes results.
Algorithms UsedAccuracyPrecisionRecallF1 ScoreLog LossROC AUC
KNN [14]n/an/an/an/an/an/a
SVM [57]n/an/an/an/an/an/a
DT [19]0.970.990.970.980.0790.97
NB [58]n/an/an/an/an/an/a
RF [24]0.940.990.940.970.130.96
ANN [25]0.990.991.00.990.0290.5
LR [31]0.850.990.850.920.290.90
Table 31. DoS HTTP results.
Table 31. DoS HTTP results.
Algorithms UsedAccuracyPrecisionRecallF1 ScoreLog LossROC AUC
KNN [14]0.990.991.00.990.00630.90
SVM [57]0.990.991.00.990.00650.86
DT [19]1.01.01.01.00.000131.0
NB [58]0.990.990.990.990.0340.77
RF [24]1.01.01.01.00.000941.0
ANN [25]0.990.991.00.990.0290.5
LR [31]0.990.991.00.990.00440.81
Table 32. DoS HTTP RF confusion matrix.
Table 32. DoS HTTP RF confusion matrix.
Actual Label
Predicted labelNo attackAttack
No attack110
Attack05942
Table 33. DoS HTTP weighted classes results.
Table 33. DoS HTTP weighted classes results.
Algorithms UsedAccuracyPrecisionRecallF1 ScoreLog LossROC AUC
KNN [14]n/an/an/an/an/an/a
SVM [57]0.900.990.900.950.00670.90
DT [19]0.990.990.990.990.00980.95
NB [58]n/an/an/an/an/an/a
RF [24]0.990.990.990.990.00970.95
ANN [25]0.990.991.00.990.0290.5
LR [31]0.880.990.880.940.210.89
Table 34. DoS TCP results.
Table 34. DoS TCP results.
Algorithms UsedAccuracyPrecisionRecallF1 ScoreLog LossROC AUC
KNN [14]0.990.991.00.990.000350.90
SVM [57]0.990.991.00.990.00110.61
DT [19]0.990.991.00.991.620.95
NB [58]0.990.990.990.990.0260.69
RF [24]0.990.991.00.992.250.92
ANN [25]0.990.991.00.990.00160.5
LR [31]0.990.991.00.990.000660.61
Table 35. DoS TCP DT confusion matrix.
Table 35. DoS TCP DT confusion matrix.
Actual Label
Predicted labelNo attackAttack
No attack192
Attack0209694
Table 36. DoS TCP weighted classes results.
Table 36. DoS TCP weighted classes results.
Algorithms UsedAccuracyPrecisionRecallF1 ScoreLog LossROC AUC
KNN [14]n/an/an/an/an/an/a
SVM [57]n/an/an/an/an/an/a
DT [19]0.991.00.990.990.0180.99
NB [58]n/an/an/an/an/an/a
RF [24]0.990.990.990.990.0220.97
ANN [25]0.990.991.00.990.00160.5
LR [31]0.960.990.960.980.0780.91
Table 37. DoS UDP results.
Table 37. DoS UDP results.
Algorithms UsedAccuracyPrecisionRecallF1 ScoreLog LossROC AUC
KNN [14]0.990.991.00.993.280.75
SVM [57]0.990.991.00.990.000390.68
DT [19]0.990.991.00.993.410.87
NB [58]0.991.00.990.990.000650.99
RF [24]0.990.991.00.991.610.87
ANN [25]0.990.991.00.995.300.5
LR [31]0.990.991.00.990.000300.56
Table 38. DoS UDP NB confusion matrix.
Table 38. DoS UDP NB confusion matrix.
Actual Label
Predicted labelNo attackAttack
No attack80
Attack4209703
Table 39. DoS UDP weighted classes results.
Table 39. DoS UDP weighted classes results.
Algorithms UsedAccuracyPrecisionRecallF1 ScoreLog LossROC AUC
KNN [14]n/an/an/an/an/an/a
SVM [57]0.991.00.990.990.000530.99
DT [19]0.991.00.990.995.240.99
NB [58]n/an/an/an/an/an/a
RF [24]0.990.990.990.991.340.93
ANN [25]0.990.991.00.995.300.5
LR [31]0.991.00.990.990.000790.99
Table 40. Model comparison.
Table 40. Model comparison.
DatasetBest Model
No Weighted ClassesWeighted Classes
Data ExfiltrantionRFRF
DDoS HTTPDTANN
DDoS TCPRFRF
DDoS UDPKNNRF
KeyloggingDTDT
OS ScanRFANN
Service scanRFANN
DoS HTTPDTANN
DoS TCPKNNDT
DoS UDPNBSVM
Most OccurrencesRFANN
Table 41. Multi-class results.
Table 41. Multi-class results.
Algorithms UsedAccuracyPrecisionRecallF1 ScoreLog LossCKS
KNN [14]0.990.990.990.990.0420.99
SVM [57]0.790.790.790.790.650.75
DT [19]0.960.960.960.960.110.95
NB [58]0.940.940.940.940.300.93
RF [24]0.950.950.950.950.300.94
ANN [25]0.970.970.970.970.0660.97
LR [31]0.740.740.740.740.630.68
Table 42. Multi-class weighted classes results.
Table 42. Multi-class weighted classes results.
Algorithms UsedAccuracyPrecisionRecallF1 ScoreLog LossCKS
KNN [14]n/an/an/an/an/an/a
SVM [57]n/an/an/an/an/an/a
DT [19]0.920.920.920.920.460.90
NB [58]n/an/an/an/an/an/a
RF [24]0.860.860.860.860.790.83
ANN [25]0.970.970.970.970.0630.97
LR [31]0.690.690.690.690.750.63
Table 43. KNN confusion matrix.
Table 43. KNN confusion matrix.
Predicted
True012345678910
01720020110750021
114000200000
200 965420004310
3001563683000023
4000455296000002
5010008000000
6480000018294565000
7120000039555357000
8003863000142710
9002910004552182
10000410000155496
Table 44. SVM confusion matrix.
Table 44. SVM confusion matrix.
Predicted
True012345678910
010002301831116155
100004000210
20002966000796304
3000196261756100055177781357
40004295450600002365
5000070007220
600010013056577914129
7000100309752658080
800051217000568855
9000180444200048529339
100004021513900002246319
Table 45. DT confusion matrix.
Table 45. DT confusion matrix.
Predicted
True012345678910
0300120098227014
100003000000
200001084000000
3000556480000000
4000055460000000
5000084 000000
600000096209474000
7000000055504000
800000000 159700
9100000000557840
10000000000055387
Table 46. DT weighted classes confusion matrix.
Table 46. DT weighted classes confusion matrix.
Predicted
True012345678910
02972013100210103
106003000000
200001055000000
3000557160000000
4000055324000000
500000 7000000
6103700000179650000
7144200000054099000
800000000 001520
9000000000557450
10000000000055674
Table 47. NB confusion matrix.
Table 47. NB confusion matrix.
Predicted
True012345678910
0331010922462150
107000000000
220100800007000
3290056296000292300
4200055298002000
5000008100000
6670000018199641000
7362000001575439648000
800000000147500
91000000550551800
10100000010055499
Table 48. RF confusion matrix.
Table 48. RF confusion matrix.
Predicted
True012345678910
0000162031100017214
100052000000
200095520000000
3000563770000000
40009555207000000
5000774000000
60001009238951401513
70000000557160471
8000142200000053
9000000000552297
100001000000055491
Table 49. RF weighted classes confusion matrix.
Table 49. RF weighted classes confusion matrix.
Predicted
True012345678910
025011011183210129
107000000000
200101300200000
3002313754418602000000
40007655226000000
5041007600000
618011000017748514443110
7333000001628138936131830
800000020147300
93758000000025504481005
10100000000055500
Table 50. ANN confusion matrix.
Table 50. ANN confusion matrix.
Predicted
True012345678910
0401140021986461
100070000000
200981340000000
30027563491000000
4000455298000000
5000810000000
6000900153123586000
7000000270153063000
8000590000141501
900055001901551610
10000200000055499
Table 51. ANN weighted classes confusion matrix.
Table 51. ANN weighted classes confusion matrix.
Predicted
Predicted
True012345678910
0001101222881462
100007000000
200101005000000
3000563770000000
4100255299000000
5000008100000
6000000169501956010
7000000419551569000
800000000147302
9000000004552320
10000000000155500
Table 52. LR confusion matrix.
Table 52. LR confusion matrix.
Predicted
True012345678910
0 00011202001010912
100070000000
200 09360003086017
3000 16470690100044244908472
4000145 491110000715947
5000810000000
6200000 1469041860029
7000000371352049002
8000139180904828198
90002744530204355403537
1000010658919500001535633
Table 53. LR weighted classes confusion matrix.
Table 53. LR weighted classes confusion matrix.
Predicted
Predicted
True012345678910
0 291400271014061
107000000000
209 6920200030624
30105432 1401075160001004236369674
404450186 5242300034712494
5030007800000
6357000000 137531582200
7435300000965241758001
80157100800073642
90095442445301102479509078
1001771971000898500002161635037
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Churcher, A.; Ullah, R.; Ahmad, J.; ur Rehman, S.; Masood, F.; Gogate, M.; Alqahtani, F.; Nour, B.; Buchanan, W.J. An Experimental Analysis of Attack Classification Using Machine Learning in IoT Networks. Sensors 2021, 21, 446. https://doi.org/10.3390/s21020446

AMA Style

Churcher A, Ullah R, Ahmad J, ur Rehman S, Masood F, Gogate M, Alqahtani F, Nour B, Buchanan WJ. An Experimental Analysis of Attack Classification Using Machine Learning in IoT Networks. Sensors. 2021; 21(2):446. https://doi.org/10.3390/s21020446

Chicago/Turabian Style

Churcher, Andrew, Rehmat Ullah, Jawad Ahmad, Sadaqat ur Rehman, Fawad Masood, Mandar Gogate, Fehaid Alqahtani, Boubakr Nour, and William J. Buchanan. 2021. "An Experimental Analysis of Attack Classification Using Machine Learning in IoT Networks" Sensors 21, no. 2: 446. https://doi.org/10.3390/s21020446

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop