Hybrid Deep Learning-Based Intrusion Detection System for RPL IoT Networks

: Internet of things (IoT) has become an emerging technology transforming everyday physical objects to be smarter by using underlying technologies such as sensor networks. The routing protocol for low-power and lossy networks (RPL) is considered one of the promising protocols designed for the IoT networks. However, due to the constrained nature of the IoT devices in terms of memory, processing power, and network capabilities, they are exposed to many security attacks. Unfortunately, the existing Intrusion Detection System (IDS) approaches using machine learning that have been proposed to detect and mitigate security attacks in internet networks are not suitable for analyzing IoT trafﬁcs. This paper proposed an IDS system using the hybridization of supervised and semi-supervised deep learning for network trafﬁc classiﬁcation for known and unknown abnormal behaviors in the IoT environment. In addition, we have developed a new IoT specialized dataset named IoTR-DS, using the RPL protocol. IoTR-DS is used as a use case to classify three known security attacks (DIS, Rank, and Wormhole). The proposed Hybrid DL-Based IDS is evaluated and compared to some existing ones, and the results are promising. The evaluation results show an accuracy detection rate of 98% and 92% in f1-score for multi-class attacks when using pre-trained attacks (known trafﬁc) and an average accuracy of 95% and 87% in f1-score when predicting untrained attacks for two attack behaviors (unknown trafﬁc).


Introduction
The IoT technology has recently been the fuel engine for many smart applications designed to improve life quality, such as smart cities, transportation, healthcare, energy management, agriculture, environmental monitoring, and others [1].IoT transforms the physical objects from being traditional to being smart by taking advantage of underlying technologies such as sensor networks, embedded devices, pervasive computing, ubiquitous communications protocols, and applications.The majority of IoT systems use resourceconstrained (low processing power and storage) devices and a low-power and lossy network (LLN) to communicate between them in an ad hoc way.Therefore, these systems are vulnerable and prone to many cyber-attacks, where traditional security measurements cannot be directly applied [1].
Each protocol layer in the IoT stack is susceptible to various security threats; however, most of the attacks target the network layer, as for traditional networks.These attacks usually affect routing protocols in terms of data flow disruption or network resources exhaustion.Routing protocols play a vital role in IoT network architecture.A routing protocol's primary role is to discover and establish a route from a source node to a destination node and maintain the availability of such routes for subsequent transmissions.Several routing protocols have been used for the Wireless Sensor Networks (WSN) and IoT.However, the Routing Protocol for Low-Power and Lossy networks (RPL) is a very promising routing protocol because of its ability to provide efficient routing among resourceconstrained IoT IP-based devices.RPL was standardized by IETF ROLL (Routing Over Low-Power Lossy Links) in 2012 [2].However, the RPL protocol is vulnerable to a wide range of attacks that are difficult to detect and mitigate [3].The attacks can target different network components in which some attacks are used to exhaust the network resources (energy, memory, and storage) to shorten the device's lifetime, which then shortens the network's lifetime.Other attacks target the RPL topology, aiming to disturb the normal data workflow.Moreover, some attacks might target data traffic confidentiality considered eavesdropping attacks [4].
Among the different security controls that are used to mitigate the security risks in IoT environments, Intrusion Detection Systems (IDS) are widely used techniques for detecting suspicious events.The main goal of the IDS is to monitor, analyze, and detect abnormal behaviors (attacks) in the network traffic [5].There are three general types of IDS, which are signature-based, anomaly-based, and specification-based.For the signature-based approach, which is also known as rule-based, the system stores the signatures of the known attacks in a database and compares them to the network traffic pattern [6].Whenever the pattern matches the existing attack signature, the system triggers an alert.Although this approach is considered very fast in detecting known attacks, it is not useful in identifying new attacks.In the anomaly-based detection, the system defines the network's legitimate activities and then compares them with anomalous activities (unknown attacks).The statistical methods and machine learning techniques are the most used in this type of IDS.However, many of the systems that apply in anomaly-based systems suffer from a higher rate of false-positives (regular traffic classified as malicious) if they are not designed and trained carefully.For the specification-based approaches, the expected legitimate behaviors of the network components such as the routing protocols and the nodes are defined such that any variation in this behavior is considered as an attack.Although this ensures lower false-positive rates, some security experts need to determine the elements' specifications, which is more time-consuming [6].
IDS systems have been studied and developed using machine learning (ML) techniques to detect anomaly network behaviors.However, with the rapid increase in connected network devices and traffics that produce a large scale of data, identifying various types of attacks becomes more sophisticated and challenging by using the simple or shallow machine learning approaches [7].Deep learning (DL) is the advanced branch of machine learning that has shown its success in classification and dimensionality reduction tasks [8,9].In deep learning, features can be learned from a large number of training samples and automatically reduce the network traffic complexity to find the correlations among them [10].This makes it more powerful in detecting complex attack patterns and zero-day attacks.However, DL-based IDS have not been studied enough in the IoT network context and, more specifically, in RPL-based networks.
Moreover, to accurately train and evaluate the ML-DL-based IDS, a relevant dataset is required.The majority of the existing methods use existing benchmark datasets such as KDD99 [11], NSL-KDD [12], and UNSW-NB15 [13].However, these datasets are obsolete, or they have been created using computer system traffics that are not suitable for the IoT IDS system.On the other hand, the existing datasets that are designed for IoT IDS systems, such as the WSN-DS [14] dataset, do not mimic the reality in collecting the network traffics.They are based on sniffing or monitoring mechanisms that need to be distributed across the IoT network, which is difficult to apply in practice.Moreover, the Bot-IoT [8] dataset, which is meant for IoT networks, does not consider common types of ad hoc networks where the devices form a network using routing protocols to communicate between each other to route the data to a central location.
In this paper, we propose a new dataset named IoTR-DS based on RPL (which is considered the de facto routing protocol for IoT networks).IoTR-DS is created by simulating three common attack (DIS, Rank, and Wormhole) traffics beside the normal traffics.Unlike the existing datasets, the data collection does not involve traffic sniffing or monitoring but utilizes the same data packets sent by the nodes to the root by embedding the necessary information on it.The idea is to shift the IDS control to the powerful node and, at the same time, not to use the network sniffing technique, which is not practical and costly.
In addition, the paper proposes a hybrid DL-based IDS based on the hybridization of supervised Deep Artificial Neural Network (DANN) and semi-supervised Deep Autoencoder (DAE) models to classify attacks using the IoTR-DS dataset.The supervised DANN model is trained using labeled attacks, and it is used to identify known attacks.The semi-supervised DAE model is trained using normal traffic samples only and is used to predict the traffic that DANN was not trained for.The idea is to compare the average reconstruction error during the normal traffic-training phase with the predicted traffics.Another goal of the proposed approach is to validate the IoTR-DS dataset in terms of whether it contains enough features to efficiently classify different attacks.In summary, the contribution of this paper is twofold:

•
Develop a new specialized dataset named IoTR-DS (https://github.com/alsawafi/IoT-DL-IDS (accessed on 20 October 2020)) by simulating three types of RPL attacks along with normal traffics and characterize attack features that will be used as the primary inputs to the IDS.

•
Propose a new DL-based IDS using the hybridization of supervised DANN and semi-supervised DAE and evaluate it on the IoTR-DS dataset.
The remainder of this paper is organized as follows: Section 2 presents some background about RPL and attacks and reviews related works on machine-deep learning-based IDS and datasets.Section 3 offers the IoTR-DS dataset creation and RPL attacks modeling.The design details of the proposed hybrid DL-IDS using the IoTR-DS dataset is discussed in Section 4. The evaluation experiments' methodology and the results analysis conducted for the proposed protocol are shown and discussed in Section 5. Finally, Section 6 concludes the work and outlines future research directions.
A list of all of the abbreviations cited in this paper is summarized in the following Table 1:

Literature Review
This section first introduces the RPL protocol components and operation.Then, it describes the common attacks that target the RPL.Finally, it reviews some related works on ML-DL IDS and datasets used to detect attacks related to the IoT network.

Routing Protocol for Low-Power and Lossy Networks (RPL)
RPL is an IPv6 routing protocol that has been used in IoT and standardized by IETF in 2012.It is mainly designed to operate on energy-constrained devices that use low power, low-cost communication technologies, and less memory.Figure 1 illustrates the main components and terminology used in RPL.The RPL protocol arranges the nodes (sensors) into a tree topology called a destination-oriented directed acyclic graph (DODAG).The tree root (sometimes named a border) is a non-constrained node that initiates and orchestrates the tree construction.All nodes on each DODAG are rooted at a single root.A specific objective function is used to determine how a DODAG is structured based on specific application needs.It uses some metrics and constraints to compute some parameters such as the rank of the node (based on the distance to the root) and the preferred parent of the node.For example, the hop count metric can be used to compute the rank.

Literature Review
This section first introduces the RPL protocol components and operation.Th describes the common attacks that target the RPL.Finally, it reviews some related w on ML-DL IDS and datasets used to detect attacks related to the IoT network.

Routing Protocol for Low-Power and Lossy Networks (RPL)
RPL is an IPv6 routing protocol that has been used in IoT and standardized by in 2012.It is mainly designed to operate on energy-constrained devices that us power, low-cost communication technologies, and less memory.Figure 1 illustrat main components and terminology used in RPL.The RPL protocol arranges the (sensors) into a tree topology called a destination-oriented directed acyclic (DODAG).The tree root (sometimes named a border) is a non-constrained node th tiates and orchestrates the tree construction.All nodes on each DODAG are roote single root.A specific objective function is used to determine how a DODAG is struc based on specific application needs.It uses some metrics and constraints to compute parameters such as the rank of the node (based on the distance to the root) and th ferred parent of the node.For example, the hop count metric can be used to compu rank.In order to exchange messages between the nodes themselves and with the ro an ad hoc manner), RPL uses some ICMPv6-based control messages.The main co message used is the DODAG Information Object (DIO).It is used by the root to in and maintain a DODAG tree and by other nodes to join this tree and to keep track RANK (position in relation to the DODAG root).For DODAG consistency purposes node will inspect the next received DIOs for a different DODAG version or dif RANK number than the previous ones.To search for new DODAG or maintain an ex one, if the node does not receive any DIO message within a specific time, it will mu a DODAG Information Solicitation (DIS) message.This will solicit a DODAG DIO sage from an RPL node.RPL also supports downward traffic routes by using a Destin In order to exchange messages between the nodes themselves and with the root (in an ad hoc manner), RPL uses some ICMPv6-based control messages.The main control message used is the DODAG Information Object (DIO).It is used by the root to initiate and maintain a DODAG tree and by other nodes to join this tree and to keep track of its RANK (position in relation to the DODAG root).For DODAG consistency purposes, each node will inspect the next received DIOs for a different DODAG version or different RANK number than the previous ones.To search for new DODAG or maintain an existing one, if the node does not receive any DIO message within a specific time, it will multicast a DODAG Information Solicitation (DIS) message.This will solicit a DODAG DIO message from an RPL node.RPL also supports downward traffic routes by using a Destination Advertisement Object (DAO) control message, which is a unicast message that is used to propagate destination information upward along the DODAG.

RPL Attacks
In this section, we describe three different attacks against the RPL protocol that are used in this study.We also describe the relevant features that might help in identifying such an attack and could be used for the dataset.
Flooding (DIS) Attack: In flooding attacks, the attacker nodes usually target the availability of the network by sending a large amount of traffic, which exhausts the resources of the neighboring nodes and makes them unavailable.It is a Denial of Service attack (DoS).In RPL, the malicious node may send a large number of DIS control messages to flood the neighboring nodes to solicit DIO messages.As a response, the nodes receiving DIS messages generate more traffic to the network by broadcasting DIO messages [15].The important features that can be observed to identify such an attack are: the number of DIS messages received; the number of DIO messages sent by each node; the end-to-end delay; and the average packet delivery ratio.
Rank attacks: The node rank plays an important role in RPL.It is used to construct the optimal network topology and prevent the formation of loops in the network.Attacking the rank by deliberately changing its value can lead to two different types of known attacks: increased rank attacks and decreased rank attacks.The malicious node advertises a higher rank value in the increased rank attack than it is supposed to have.In this case, its new preferred parent might be its previous child in the prior sub-DODAG.This will form a routing loop between neighbor nodes.Although a loop avoidance mechanism is designed to fix such loops, it requires many DIOs to be exchanged between the nodes, resulting in exhausting node resources.In the decrease rank attack, the malicious node advertises a lower rank value, aiming to attract other nodes to connect to it.The rank attacks lead to the selection of non-optimized routes and may lead to poor network performance [3].The important features that can be observed to identify such attacks are: the number of DIOs sent; the frequency of parent and RANK change; the end-to-end delay; and the average packet delivery ratio.
Wormhole attack: The wormhole attack requires at least two nodes to create a dedicated communication tunnel between them.They usually have an alternative network interface and use either wired or wireless links to make the out-of-band connection.Once the tunnel is made, one attacker can replay all the messages received from the normal path to the second attacker node using the dedicated link [16].The attacker nodes might be far away from each other since they might use better communication coverage ranges or wire connections.In RPL, each attacker node will first replay the DIOs messages received from other nodes to the second attacker using their second interface.The receiving attacker node will process this DIO (on its second interface) and add the sender (attacker node) as its preferred parent.Now, the route is set up between the two nodes.All normal data traffic received by the first attacker on its first interface will be replayed on the second interface.
Many consequences can result from the wormhole attack depending on the intention of carrying out such an attack.It can be used to disturb the normal operation of the routing by selecting non-optimal paths.It can also be used to eavesdrop on all traffic passing through the tunnel (confidentiality attack).To make it more effective, sometimes, this attack is combined with other attacks such as a decreased rank attack, which attracts more neighbors to connect to this malicious node.In this case, the aim is to route more traffic using a wormhole tunnel.The important features that can be extracted during this wormhole attack are: the number of DIOs sent by each node; the end-to-end delay; and the average packet delivery ratio.Although the same features might be present in different attack types, their values will differ from one attack to another, in which the IDS system is supposed to capture the difference ranges between them.

Related Works
Several IDS systems based on deep learning have been proposed, aiming to secure IoT systems.In [17], the authors introduce a DL-based IDS that uses the spider monkey optimization (SMO) algorithm to extract the most relevant features from the dataset and a stacked-deep polynomial network (SDPN) to classify the data as normal or abnormal.The model is evaluated using the NSL-KDD dataset and shows a high detection rate for different attack categories (DoS, U2R, R2L, and probe).The authors in [18] implemented a DL-based IDS for the IoT at the fog level rather than at a centralized cloud.They demonstrate that distributed attack detection at the fog level is scalable, and DL models outperform shallow ML models when used to detect attacks in the NSL-KDD dataset.Similarly, at the fog level, the authors in [19] proposed an IDS using a deep multi-layered recurrent neural network.The system is composed of a cascaded filtering stage where each filter is tuned to different hyperparameters for enhancing the detection of specific attack types.The model is evaluated using the NSL-KDD dataset to detect particular types of attacks.A cloud-based distributed deep learning framework is proposed by [20] to identify and mitigate Botnet, DDoS, and phishing attacks.The framework consists of two components that work cooperatively.These two components are the Long-Short-Term Memory (LSTM) network model at the back-end for detecting Botnet attacks.A Distributed Convolutional Neural Network (DCNN) model hosted in the IoT devices is used to detect DDoS and phishing attacks.In [21], the authors propose using a Deep Auto-Encoder (DAE) and Deep Feedforward Neural Network (DFFNN) to detect anomaly behaviors in Internet Industrial Control Systems (IICSs).The (DAE) algorithm is used to learn normal network behaviors and tweak the optimal parameters (i.e., weights and biases), which then give the DFFNN a better tuning of the parameters and classify normal and abnormal network behaviors.
In more related works that use ML-based IDS in detecting attacks to the RPL, the author in [22] introduces an IDS to detect wormhole attacks in the RPL using three approaches based on ML, namely, the K-means-based approach, a decision tree (DT), and a hybrid approach that combines both methods.The K-mean algorithm is used to cluster routers into groups of safe zones from which a router can communicate with other nodes in the same zone.If the router tries to add new neighbors outside the safe zone, this is considered a wormhole attack.The DT is used to learn the safe distance between any two neighboring routers, where any attempt to communicate more than this distance will consider victims of wormhole attacks.The hybrid approach is used to enhance accuracy by filtering out some of the false-positives.This approach is only limited to one type of attack: the wormhole attack; it also uses additional control messages to send the mapping requests to other nodes.The router nodes' locations are required as input data to the IDS, which is considered impractical in more real implementation.In [23], the author proposed a hybrid IDS framework based on specification-based and anomaly intrusion detection models for detecting selective-forwarding and sinkhole attacks in an RPL-based network.The specification-based intrusion detection uses the agents located in the router nodes to monitor the behavior and send the results to the root.The anomaly-based intrusion detection works as a global detection approach.It is located in the root, and it uses an unsupervised optimum-path forest algorithm to analyze incoming data and detect anomalies.The hybrid method can achieve a reasonable true-positive and false-positive rate for detecting both attacks.This approach considers separating the network nodes into router and leaf nodes.The local IDS agents are only located on the router nodes, and they do not generate data.However, this is not the case in the usual IoT RPL-based network, where the node generates and routes the data simultaneously.In addition, the approach does not consider the energy consumption of those constrained nodes that have agents that might be high, and without this important overhead, it is difficult to assess this method.
In practice, most recent ML-and DL-based IDS approaches use existing benchmarking datasets to evaluate their works.The KDD99 [11] dataset is considered the most popular one released in 1999.It consists of 4.9 million labeled samples of regular traffic and 22 attack types that are categorized into four categories, namely, probing (probe) attacks, root to local (R2L) attacks, user to root (U2R) attacks, and denial of service (DoS) attacks.Another common dataset is NSL-KDD [12], an improved version of KDD99 that eliminates the redundant records in both the training and testing sets while keeping a reasonable number of records.The UNSW-NB15 [13] dataset was introduced to reflect real network traffic and modern low-footprint attacks.The dataset contains a total number of 2.5 million records, 49 features, and 9 types of modern attacks.A more recent WSN-DS dataset [14] was developed for a Wireless Sensor Network (WSN).It consists of 23 features and 4 labeled attacks (Blackhole, Grayhole, Flooding, and Scheduling).A more recent Bot-IoT dataset [8] was developed for both normal IoT-related and other network traffic, along with different types of attack traffic generally used by botnets.It contains of a total number of 72 million records, 35 features, and 3 attack categories (probing, DDOS, and information theft).
However, the KDD99, NSL-KDD, and UNSW-NB15 datasets are not sufficient to be used as a benchmark for IoT-IDS.The network topologies used to create these datasets are based on client-server wired (Ethernet) communication, which is different from the one used in IoT, where wireless LLN networks are mostly used.Hence, the data traffic types, protocols, and attack types are different in those datasets compared to the IoT case.For example, in IoT networks that use RPL routing over 6LoWPAN, specific control messages (DIO, DIS, DAO, and DAO-ACK) are responsible for network creation and management.Therefore, they are sensitive to different types of attacks.For the WSN-DS dataset, although it was designed for WSN (IoT type network), it requires a set of monitoring nodes to watch the neighbors and report them to the base station.This process consumes more energy and is difficult to manage and apply in reality.With the Bot-IoT dataset, the IoT services are simulated in a direct communication fashion, where the devices are connected directly with the server.However, this does not replicate other scenarios where the IoT devices form a sort of ad hoc network that allows them to communicate with each other and route the data to a center using a routing protocol.In addition, the majority of normal and attack traffic in the dataset are related to traditional computer network systems, and very little traffic is associated with the IoT services.
Therefore, our proposed IoTR-DS dataset is more specialized for IoT-type networks where the RPL protocol is used over the 6LoWPAN link layer.It adds some features that are further related to IoT normal and attack traffic that are not considered in the other dataset.In addition, no monitoring or sniffing tools are required to collect the nodes' data.The important IDS parameters are carried on the data packet sent by the nodes to the root, without any extra overhead.A full description of the IoTR-DS dataset is given in the next section.

IoTR-DS Dataset Creation and Attacks Modeling
In this section, we will describe the proposed IoTR-DS dataset attributes (features) and the modeling of the envisaged attacks that produce traffic sample subsets.

IoTR-DS Attributes
As mentioned, instead of using a monitoring and sniffing tool to capture the nodes' traffics, the IoTR-DS is created by utilizing the sensing data packets sent by all nodes to the root.Some of the routing attributes are appended to the UDP (data) packet when they reach the network layer, such as numSentDIO, numSentDIS, numRankChange, and others (see Table 2).The root will also append some attributes of information related to the UDP (data) packet, such as recTime and numUdpRec.According to the literature, these attributes are useful and analyzed by the IDS system to detect abnormal behaviors.Their values might change due to normal network behaviors such as nodes joining the tree, changing their rank or parent, etc.On the other hand, the change can be due to anomaly behaviors in the network, such as attacks.Moreover, a specific attack can make changes to one or more parameters at a certain level.The IDS system's job is to differentiate between normal and abnormal traffic and classify the attack types.
The packets sent by the nodes are stored in the root as log files, which represent the raw datasets.Table 1 lists the dataset IoTR-DS attributes, where the last one represents the traffic label, whether it is normal traffic or an attack.When the malicious nodes attack, the assumption is that the whole network will be affected during the attack time.Therefore, all traffics are labeled with an attack number during the attack time.The number zero is reserved for labeling normal traffic (i.e., when there are no attacks).

Attack Implementation
For the evaluation purpose, four subsets are created using four different scenarios in which the traffic is labeled according to an individual scenario.The first dataset represents the normal traffic behavior when there is no attack within the network.The other three datasets are the results of simulating three types of attacks, which are DIS, rank, and wormhole attacks.In all scenarios, the network's size consists of 100 nodes, whereas the number of malicious (attackers) nodes varies in a different attack scenario.The following gives more details about these attack implementation scenarios.
Algorithm 1 shows the pseudocode for implementing a flood attack (DIS attack in RPL).At a preset time interval, three selected nodes at different locations will periodically (every 0.2 s in our implementation) broadcast DIS messages.This process is repeated with another three malicious nodes at a different place and at different time intervals.The pseudocode in Algorithm 2 shows the rank attack implementation.At a preset point of a time interval, three malicious nodes will initially decrease their rank by two and broadcast DIO messages, including the new ranks to their neighbors.The nodes receiving this DIO will see a better-preferred parent candidate than it currently has, and, thus, it will select it as its preferred parents, recalculate the rank, and broadcast the DIO with the new parameters.The nodes receiving this new DIO might also consider changing their rank and preferred parent.This process is repeated again with another three malicious nodes at a different location and at different time intervals.In a wormhole attack, the malicious node has two network interfaces.On the 802.15.4 interface, the node makes communication to the normal RPL tree, whereas it uses the 802.11(WiFi) interface to create a tunnel with another malicious node.Although the two malicious nodes are far apart, they are considered neighbors to other nodes.Algorithm 3 shows the pseudocode for implementing this attack.After the tree creation, and at a particular time, one malicious node's multicast DIO message on its interface 802.11 (wider coverage) will be picked up by another malicious node, which has a similar interface.The second malicious node will connect to the sender (create a tunnel) and broadcast the DIO message on 802.15.4,advertising a better link.The node receiving the DIO from this malicious node will probably change its preferred parent to be the sender.The RPL protocol and attacks are implemented using the OMNeT++ simulation tool.OMNeT++ is an extensible, modular, component-based C++ simulation library and framework, primarily for building network simulators [24].It is open-source and can run on top of different operating systems such as Linux, Windows, and MAC.It has the capability of implementing and simulating RPL and other routing protocols at a larger scale.We applied the attack algorithms mentioned earlier.Table 3 shows the configuration parameters for the implementation of attack scenarios.Table 4 describes the five sub-datasets' results from the simulation scenarios.For the normal dataset, it contains only normal traffic samples without any attack behaviors.The DIS, Rank, and Wormhole datasets contain a mixture of both normal and abnormal (attack) traffic samples.The combined dataset represents the concatenation of the four datasets in one larger dataset.Notice that the number of samples varies from one dataset to another to give some samples diversity.

Proposed Hybrid Deep Learning-Based IDS (DL-IDS)
The proposed IDS model applies semi-supervised and supervised deep-learning models to detect normal and abnormal (attacks) behaviors in the IoT network.The concept is to use the semi-supervised DAE to verify whether a given traffic behavior has been seen before or not by comparing its reconstructing error with known normal and attack reconstruction errors.If the reconstruction error is within predefined boundaries, it will pass it to the supervised DANN, which will classify the traffic more accurately.Otherwise, the traffic is considered a new attack type, and, therefore, a new class and reconstruction error is added to the system.The proposed model's overall architecture is presented in Figure 2, and the following subsection describes it in detail.

Data Preprocessing
The log files created by the root, which contain data collected from all network nodes, are considered raw data and cannot be used directly as inputs to the DL-IDS model.There-

Data Preprocessing
The log files created by the root, which contain data collected from all network nodes, are considered raw data and cannot be used directly as inputs to the DL-IDS model.Therefore, the first step is to preprocess those files.Hence, Python with supported Pandas [25] and Numpy [26] libraries are used to process the raw datasets.Some of the features, such as the DIO packet count, DIS packet count, rank changes count, and others, cannot be used because their values are accumulated each time.Therefore, the first stage is to calculate their values at the sending time by finding the difference between the previous value and the current value at the receiving time.The average packet delivery ratio (PDR) and delay features are also calculated at this stage, as shown in Equations ( 1) and (2), respectively.

PDR =
number o f packets received number o f packets sent (1) where i is node's packet sequence number.Subsequently, since the individual sample traffic might not give enough information to the DL model to process, it is good to group some samples to compute the feature averages.Therefore, the traffic instances are grouped according to the receiving time window (in seconds), and the average feature value is taken at this window time.This will also reduce the total number of instances within the dataset when fed to the DL model, which will reduce the learning time.In our implementation, we found that taking the average of the samples every 2 s gives the best result.
After that, we apply normalization techniques to the data features to be on a similar scale to fit the DL-IDS model; Equation ( 3) is applied to each column feature X to produce a normalized value between 0 and 1.Now, the dataset is ready to feed the DL model.

Hybridization of DAE and DANN Models (DAE-DANN)
This section describes the IDS training and classification phase using the DAE-DANN approach.Algorithm 4 shows the main steps of the approach.Let us first define some terms: let R be a set of real numbers and X ∈ R N×d be a matrix of N training samples (network traffics) of d features each; therefore, x i = (x i1 , x i2 , . . ., x id ) stands for the ith training traffic samples, and y ∈ {y 0 , y 1 , . . . ,y i } N×1 is a vector of N multi-labels of the training samples.Z ∈ R M×d : a matrix of M testing samples (network traffics) of d features each; therefore, z i = (z i1 , z i2 , . . ., z id ) stands for the ith traffic test samples.l ∈ {l 0 , l 1 , . . . ,l i } M×1 is a vector of M multi-labels of the testing samples.
In the first phase, the DAE model is trained using normal traffic samples.During the training process, the model will reconstruct the initial input by minimizing the mean squared error between the input and output.The average loss value (reconstructed mean square error) result of the training process will be stored as a threshold named normalLoss threshold .During the test phase, the model calculates the average reconstruction error for a given test sample z i using the function shown in Equation ( 4).
This function computes the difference between the average sum of the original input features (z i test ) of a given sample z i and the reconstructed inputs (z i predict ) when applying the DAE to the same sample z i and using the tuned training parameters' weight and the bias that was used with the normal training samples X.The resulting reconstruction loss value (z i loss di f f ) will then be compared with the stored normal samples' loss threshold (normalloss threshold ) boundary and attack (attack l thresholds) boundaries.The (normalloss threshold ) boundary is defined between 0 and normalloss threshold × 2. The attack attack l thresholds boundary is defined between attack l threshold 2 and attack l thresholds × 2. If z i loss di f f is out of these boundaries, then this traffic is considered a new attack label.
where d is the number of features.
To simplify, let us consider the following example: Suppose that we are starting with training set X containing normal traffic samples and one known attack labeled (y 0 , y 1 ), where y 0 is the normal traffic label and y 1 is the attack traffic label number 1. Therefore the normalloss threshold and attack 1 threshold are known.Let us assume that we have another unknown attack added to the test sample Z in the range of z 1 .. z m that occurs at a specific time sequence.These test samples are passed to the DAE, compute the z i loss di f f of each sample i, and are compared to the exciting normalloss threshold and attack 1 threshold.If the z i loss di f f value is within the normalloss threshold boundary or range, then this sample can be considered as normal traffic, whereas if z i loss di f f is within the attack 1 threshold, then this traffic can be considered as attack 1 .Therefore, these samples will be passed to the supervised DANN model.However, since this is a new attack type, the z i loss di f f of attack samples should be different and out of the range of the normalLoss threshold and attack 1 threshold boundaries.The average out-of-boundaries z i loss di f f of given samples within a time window, w, is calculated, the resulting value is stored as the new attack threshold (attack 2 threshold), and y is updated by adding a new label for the recently detected attack.The DANN will be retrained again based on the updated dataset.DANN was used to classify known behaviors, based on which the model was trained much faster and produced a higher classification accuracy.Train DAE with on (X, y), where y = 0, 0 : normal traffic label 4 Compute the average normalLoss threshold 5 Return normalLoss threshold 6 Function DANN_Train (X, y): 7 Train DANN with on (X, y) to obtain optimal values of weight and bias and reduce loss 8 Return optimal weights & biases 9 10 Function DAE_Classifier (Z): 11 z i predict = DAE.predict(zi )//Feed the DAE with new traffic instances 12 loss z i = Loss Di f f (z i predict , z i )//Compute loss di f f by Equation (4).13  The following is a description of the architecture and configuration of the DAE and DANN used in the IoT-IDS system.

Deep Autoencoder (DAE)
The AE is a type of ANN that is usually used in unsupervised machine learning.Figure 3 shows the overall architecture of the AE that is implemented in the proposed model.It consists of an input layer of d neurons, which represent the number of features.The input layer is then encoded (compressed) by passing it to a hidden layer l 1 with a size of d/2 neurons.Another layer with a size of l 1 /2 neurons is added, which also represents the compressed data representation of the original input.The AE is then trained to reconstruct the inputs from the compressed layer (bottleneck) by having the hidden decoding layers.All layers are followed by dropout hidden layers (not shown in the figure) to prevent the data's overfitting.The encoder maps the input vector x i to the hidden representation unit, which represents the latent space of the bottleneck layers, as shown in Equation (5).
where W is the weight matrix, b is the bias vector, and ℊ is the activation function.The well-known activation functions, namely, the Rectified Linear Unit (ReLU) and hyperbolic tangent (tanh), are used interchangeably in the model.Experiments have found that combining these functions produces better results.ReLU (Equation ( 6)) is a non-linear activation function, and it is very suitable in MPL with many hidden layers because it is fast and helps to reduce the error gradient issue and state vanishing [27].The tanh (Equation ( 7)) function is more like a sigmoid function but differs in the output range of (−1, 1), where the range sigmoid is (0, 1).
where x is the input to the function.where  is the weight matrix, b is the bias vector, and ℊ is the activation function.The well-known activation functions, namely, the Rectified Linear Unit (ReLU) and hyperbolic tangent (tanh), are used interchangeably in the model.Experiments have found that combining these functions produces better results. (Equation ( 6)) is a non-linear activation function, and it is very suitable in MPL with many hidden layers because it is fast and helps to reduce the error gradient issue and state vanishing [27].The tanh (Equation ( 7)) function is more like a sigmoid function but differs in the output range of (−1, 1), where the range sigmoid is (0,1).
where x is the input to the function.
The decoder map   reconstructs the  ′ of the same input size : By using backpropagation of the error in the training process, the autoencoder tries to minimize the average reconstruction error (or loss) between the input and the reconstructed output (Equation ( 9)) where m is the number of samples.

Deep Artificial Neural Network (DANN)
A multilayer perceptron (MLP) model is adopted in the proposed architecture, which represents a deep neural network.The architecture consists of a typical deep neural network, which has an input layer, multiple hidden layers, and an output layer, as presented in Figure 4.The decoder map l n reconstructs the x of the same input size x: By using backpropagation of the error in the training process, the autoencoder tries to minimize the average reconstruction error (or loss) between the input and the reconstructed output (Equation ( 9)) where m is the number of samples.

Deep Artificial Neural Network (DANN)
A multilayer perceptron (MLP) model is adopted in the proposed architecture, which represents a deep neural network.The architecture consists of a typical deep neural network, which has an input layer, multiple hidden layers, and an output layer, as presented in Figure 4.The neurons in each layer are fully connected with other neighbor layer neurons, where the data are transformed from layer to layer in a forward direction.Each neuron in the hidden layer is activated by calculating an output value based on the input data from the previous layer, along with the weight (w) values and bias (b).Let the  ∈ {1, … ., } index be the hidden layers of the L number.From Figure 4, the output of activating    is shown in Equation (10).
where  is any hidden neuron,  is the number of neurons at the hidden layer,   −1 is the  neuron at the previous layer,    is the weight connection between the    and   −1 neurons, and    is the bias of layer . is the activation function, and  is used for the hidden layers.
Since the network outputs are in multiple classes (labeled attacks), the softmax function is used, which calculates the probability distribution across the classes.The input parameter is converted to be in the form of a one-hot encoded matrix.Equation ( 9) calculates the probability of class   for traffic input   , where  is the class number of C classes, and the input value of   and   are calculated using Equation (11).Each class's output is in the range of [0, 1] and adds up to 1 for all the classes.
To calculate the amount of the difference between the predicted class value and the original true class value, the categorical cross-entropy loss function is used.This function The neurons in each layer are fully connected with other neighbor layer neurons, where the data are transformed from layer to layer in a forward direction.Each neuron in the hidden layer is activated by calculating an output value based on the input data from the previous layer, along with the weight (w) values and bias (b).Let the l ∈ { 1, . . . . L} index be the hidden layers of the L number.From Figure 4, the output of activating a l i is shown in Equation (10).
where i is any hidden neuron, n is the number of neurons at the hidden layer, a l−1 i is the i neuron at the previous layer, w l i is the weight connection between the a l i and a l−1 i neurons, and b l i is the bias of layer l. f is the activation function, and ReLU is used for the hidden layers.
Since the network outputs are in multiple classes (labeled attacks), the softmax function is used, which calculates the probability distribution across the classes.The input parameter is converted to be in the form of a one-hot encoded matrix.Equation ( 9) calculates the probability of class z i for traffic input x i , where k is the class number of C classes, and the input value of z i and z k are calculated using Equation (11).Each class's output is in the range of [0, 1] and adds up to 1 for all the classes.
To calculate the amount of the difference between the predicted class value and the original true class value, the categorical cross-entropy loss function is used.This function is designed for the multi-class classification tasks, where an instance input can belong to one possible class.Equation (12) shows the categorical cross-entropy loss function (CL).
where C denotes the number of classes, z i is the true class value, and z l is the predicted class value that is calculated using the softmax function (Equation ( 9)).To minimize the loss value during the training process, the Adam optimization algorithm backpropagates to calculate the gradient (change).

Experiment Setup and Evaluation
The experiments were conducted using the TensorFlow platform [28] and Keras [29] as a higher-level framework.The hardware used for the implantation is: Intel(R) Xeon(R) CPU E3-1535M v6@3.10GHz, RAM 64 GB, 1 TB SSD HD.The same hardware could be used at the root node of the WSN real architecture.

Experiment Scenarios
Different experiments are designed for various possible scenarios to evaluate the performance of the IoT IDS-DL model using the IoTR-DS dataset.The first experiment was designed to evaluate the model's performance in predicting a single unlabeled attack based on training with the normal traffic.The model was first trained using the normal traffic dataset, and then it was tested with a single attack type using the dataset containing the normal and attack traffic.The second experiment used combined datasets to evaluate detecting two unseen and untrained attacks, among other trained attacks.The third experiment was set to evaluate the supervised binary classification based on having only one known (labeled) attack.Each time, an individual attack dataset was used to train the model and to identify normal and malicious (attack) traffic.The fourth experiment was conducted to evaluate the system in detecting multi-trained (labeled) attacks using multi-class classification.The combined dataset was used to train the model to distinguish between four classes: normal, DIS, rank, and wormhole attacks.Finally, the system was evaluated against four different ML-DL learning models, which are J48 [30], KNN [31], SVM [32], and LSTM [33].In all of the supervised experiment scenarios, the datasets were split into 70% train, 30% evaluate, and 30% test from the 70% train.For predicting unseen attacks using mainly the semi-supervised model, the normal dataset is the only one used for the training and evaluation (70% train, 30% evaluate), whereas the full attack datasets are used for testing purposes.This was to ensure that the attack behavior was not seen (trained) by the model.
Since the IoT-DL models' performance depends on the optimal hyperparameters configuration, such as the number of hidden layers, the number of neurons per layer, the learning rate, the activation function, and others.Tables 5 and 6 show the near-optimal hyperparameters configuration for DANN and DAE, respectively, based on the best results of running different experiment trials.

Evaluation Metrics
In order to determine the detection accuracy rate of the proposed model using IoTR-DS, it is evaluated using well-known performance metrics used in machine learning, which are: accuracy, recall, precision, and F1 score.These metrics are calculated using four parameters: True-Positive (T P ), True-Negative (T N ), False-Positive (F P ) and False-Negative (F N ).T P determines the amount of malicious traffic that is correctly classified as malicious traffic by the model, where T N indicates the amount of normal traffic that is correctly classified as normal by the model.On the other hand, F P determines the amount of normal traffic that is incorrectly classified as malicious traffic by the model, where F N defines the amount of malicious traffic that is incorrectly classified as normal traffic by the model.
The accuracy metric measures the correct prediction ratio by dividing the total number of correct predictions (T P and T N ) by the sum of all predictions, as illustrated in Equation (13).
Precision (Equation ( 14)) estimates the ratio of correctly classified attack traffic to all of the predicted attack traffic, whereas recall (Equation ( 15)) indicates the ratio of correctly classified attack traffic to the total attack traffic.The f1 score (Equation ( 16)) can be defined as the weighted average of precision and recall.

Precision =
T P T P + F P (14)

Semi-Supervised Classification Results
In this evaluation experiment, the model was first evaluated in predicting a single unlabeled attack at a time when the model did not see attack behaviors during the training phase.Then, the model was evaluated in predicting two attacks that were not seen by the model during the training phase using a mixture of testing traffic of DIS and rank attacks, besides the normal traffic.The model was trained using the normal traffic samples only where the loss threshold was stored and used to detect the abnormal behaviors.
Figure 5 shows the loss values relevant to the number of epochs over the normal training dataset.The loss value reaches its lower value near 300 epochs, which means that the epochs are sufficient for the model.The average reconstruction error (loss) recorded by the model during the training phase is 0.085, which is stored as a threshold value.As mentioned earlier, when predicting new traffic, the model will reconstruct the error using trained normal samples parameters (weight and bias).If the given traffic is not within the normal traffic range, the reconstruction error should be relatively higher than the threshold.Figures 6-8 show the average reconstruction error when the DIS, rank, and wormhole attack are predicted, respectively.These figures clearly show the difference between the normal loss threshold and the average attack loss values.In addition, they demonstrate that each attack loss value or cluster does not interchange with other attack loss values; therefore, they can separate the attack domain between each other.The higher the distance between the attack loss value and the normal threshold, the higher the detection rate of the attack.
by the model during the training phase is 0.085, which is stored as a threshold value.As mentioned earlier, when predicting new traffic, the model will reconstruct the error using trained normal samples parameters (weight and bias).If the given traffic is not within the normal traffic range, the reconstruction error should be relatively higher than the threshold.Figures 6-8 show the average reconstruction error when the DIS, rank, and wormhole attack are predicted, respectively.These figures clearly show the difference between the normal loss threshold and the average attack loss values.In addition, they demonstrate that each attack loss value or cluster does not interchange with other attack loss values; therefore, they can separate the attack domain between each other.The higher the distance between the attack loss value and the normal threshold, the higher the detection rate of the attack.by the model during the training phase is 0.085, which is stored as a threshold value.As mentioned earlier, when predicting new traffic, the model will reconstruct the error using trained normal samples parameters (weight and bias).If the given traffic is not within the normal traffic range, the reconstruction error should be relatively higher than the threshold.Figures 6-8 show the average reconstruction error when the DIS, rank, and wormhole attack are predicted, respectively.These figures clearly show the difference between the normal loss threshold and the average attack loss values.In addition, they demonstrate that each attack loss value or cluster does not interchange with other attack loss values; therefore, they can separate the attack domain between each other.The higher the distance between the attack loss value and the normal threshold, the higher the detection rate of the attack.Table 7 presents the performance metrics of the binary classification using DIS testing instances.It shows that the overall DIS attack detection accuracy is 99%, with a precision, recall, and f1-score of 99%, 98%, and 98%, respectively, using the DIS dataset.With the rank attack classification, the accuracy is 96%, and the precision, recall, and f1-score are 85%, 98%, and 91%, respectively, as shown in Table 8.For the wormhole DAE classification, the overall accuracy detection rate is 92%, with a precision, recall, and f1-score of 70%, 93%, and 80%, respectively, as shown in Table 9.These values are much lower than those of the other two attack cases because the reconstruction error value for the wormhole attack traffic is lower than the other attack errors.In other words, the difference in the feature values changes between the wormhole attack, and there is less normal traffic compared with those change values in other attacks.Finally, by predicting a combination of two untrained attacks (DIS, rank), the overall accuracy of classifying the traffic is 95%.The precision, recall, and f1-score reported in DIS are 99%, 95%, and 97%, respectively, which are much higher than those in the rank attack, which are 93%, 53%, and 68%, respectively, as listed in Table 10.Again, this is due to the reconstruction error value for the rank attack traffic being much lower than that of the DIS attacks, making it more challenging to detect.Table 7 presents the performance metrics of the binary classification using DIS testing instances.It shows that the overall DIS attack detection accuracy is 99%, with a precision, recall, and f1-score of 99%, 98%, and 98%, respectively, using the DIS dataset.With the rank attack classification, the accuracy is 96%, and the precision, recall, and f1-score are 85%, 98%, and 91%, respectively, as shown in Table 8.For the wormhole DAE classification, the overall accuracy detection rate is 92%, with a precision, recall, and f1-score of 70%, 93%, and 80%, respectively, as shown in Table 9.These values are much lower than those of the other two attack cases because the reconstruction error value for the wormhole attack traffic is lower than the other attack errors.In other words, the difference in the feature values changes between the wormhole attack, and there is less normal traffic compared with those change values in other attacks.Finally, by predicting a combination of two untrained attacks (DIS, rank), the overall accuracy of classifying the traffic is 95%.The precision, recall, and f1-score reported in DIS are 99%, 95%, and 97%, respectively, which are much higher than those in the rank attack, which are 93%, 53%, and 68%, respectively, as listed in Table 10.Again, this is due to the reconstruction error value for the rank attack traffic being much lower than that of the DIS attacks, making it more challenging to detect.

Supervised Binary Classification
For the binary classification, loss, and Area Under Curve (AUC) performance, there are plots based on the training process.Although the training iterations (number of epochs) was initially set to 1000, the training stopped, and the loss AUC achieved was very good, even much earlier (converged).The dropout and callback techniques were used to speed up the training process and avoid overfitting.Figure 9 shows that the loss and AUC converge to the best performance with 70 epochs in the DIS dataset.For the rank dataset, 140 training epochs were needed to reach a higher performance rate (Figure 10), and 200 training epochs were needed in the wormhole case (Figure 11).
The performance detail results for DIS, rank, and wormhole attacks binary classification are reported in Tables 11-13, respectively.The model classification accuracy rates are 99, 98, and 97% for the DIS, rank, and wormhole attacks, respectively.The other performance metrics (precision, recall, and f1 score) also achieved a higher rate between 91 and 99%.are plots based on the training process.Although the training iterations (number of epochs) was initially set to 1000, the training stopped, and the loss AUC achieved was very good, even much earlier (converged).The dropout and callback techniques were used to speed up the training process and avoid overfitting.Figure 9 shows that the loss and AUC converge to the best performance with 70 epochs in the DIS dataset.For the rank dataset, 140 training epochs were needed to reach a higher performance rate (Figure 10), and 200 training epochs were needed in the wormhole case (Figure 11).are plots based on the training process.Although the training iterations (number of epochs) was initially set to 1000, the training stopped, and the loss AUC achieved was very good, even much earlier (converged).The dropout and callback techniques were used to speed up the training process and avoid overfitting.Figure 9 shows that the loss and AUC converge to the best performance with 70 epochs in the DIS dataset.For the rank dataset, 140 training epochs were needed to reach a higher performance rate (Figure 10), and 200 training epochs were needed in the wormhole case (Figure 11).The performance detail results for DIS, rank, and wormhole attacks binary classification are reported in Tables 11-13, respectively.The model classification accuracy rates are 99, 98, and 97% for the DIS, rank, and wormhole attacks, respectively.The other performance metrics (precision, recall, and f1 score) also achieved a higher rate between 91 and 99%.

Supervised Multi-Class Classification
Since the DANN model's main purpose is to conduct multi-classification, the model is trained using the combined attack datasets to distinguish among four classes: normal, DIS, rank, and wormhole.Different depths of hidden layers (1, 2, 3, and 4 layers) were implemented to find the model's optimal one.Figure 12 shows the loss and AUC using a single DANN hidden layer model.It took almost the full training trials (1000 epochs) to achieve a higher performance.Table 14 presents the performance metrics with a single-layer multi-class classification.The model achieves an overall accuracy of 96%, with a higher precision, recall, and F1-score in detecting the normal class, and the lowest one with the wormhole attack class.percentage are achieved in classifying the normal class (99%).The overall average performance metrics are very good (92%) in classifying the three attacks in addition to the normal traffic.
percentage are achieved in classifying the normal class (99%).The overall average performance metrics are very good (92%) in classifying the three attacks in addition to the normal traffic.Adding more hidden layers does not always mean improving model performance.Figure 15 shows that, with four hidden layers, the model requires more than 600 training epochs (which is more in the three-layers model) to reach the best loss and AUC performance.Table 17 also presents that the overall accuracy detection rate is 97% less than when using three layers.Moreover, it is noticed that the average precision, recall, and f1-score of the four classes are less than those in the three-layers model.By considering these results, it was decided to use three hidden layers in the DNN-IDS multi-class model.Adding more hidden layers does not always mean improving model performance.Figure 15 shows that, with four hidden layers, the model requires more than 600 training epochs (which is more than in the three-layers model) to reach the best loss and AUC performance.Table 17 also presents that the overall accuracy detection rate is 97% less than when using three layers.Moreover, it is noticed that the average precision, recall, and f1-score of the four classes are less than those in the three-layers model.By considering these results, it was decided to use three hidden layers in the DNN-IDS multi-class model.Adding more hidden layers does not always mean improving model performance.Figure 15 shows that, with four hidden layers, the model requires more than 600 training epochs (which is more than in the three-layers model) to reach the best loss and AUC performance.Table 17 also presents that the overall accuracy detection rate is 97% less than when using three layers.Moreover, it is noticed that the average precision, recall, and f1-score of the four classes are less than those in the three-layers model.By considering these results, it was decided to use three hidden layers in the DNN-IDS multi-class model.As mentioned earlier, the DL-IDS using the IoTR-DS dataset is also evaluated against other classical and deep learning models to see how the model performs compared to those models.Since it is difficult to evaluate the hybrid model, we have considered comparing the f1-score and accuracy performance of the supervised part (DANN) against the J48, KNN, SVM, and LSTM models, and the results are presented in Table 18.As shown, the DANN outperforms the other models in terms of both f1-score and accuracy metrics.The

Figure 5 .
Figure 5. Loss Performance Over a Number Of Epochs Using Normal Traffic (No Attacks).

Figure 6 .
Figure 6.Average Predicted DIS Attack Loss Compared to the Normal Loss Threshold.

Figure 5 .
Figure 5. Loss Performance Over a Number Of Epochs Using Normal Traffic (No Attacks).

Figure 5 .
Figure 5. Loss Performance Over a Number Of Epochs Using Normal Traffic (No Attacks).

Figure 6 .
Figure 6.Average Predicted DIS Attack Loss Compared to the Normal Loss Threshold.Figure 6.Average Predicted DIS Attack Loss Compared to the Normal Loss Threshold.

Figure 6 .
Figure 6.Average Predicted DIS Attack Loss Compared to the Normal Loss Threshold.Figure 6.Average Predicted DIS Attack Loss Compared to the Normal Loss Threshold.

Figure 6 .
Figure 6.Average Predicted DIS Attack Loss Compared to the Normal Loss Threshold.

Figure 7 . 26 Figure 7 .
Figure 7. Average Predicted Rank Attack Loss Compared to the Normal Loss Threshold.

Figure 8 .
Figure 8.Average Predicted Wormhole Attack Loss Compared to the Normal Loss Threshold.

Figure 8 .
Figure 8.Average Predicted Wormhole Attack Loss Compared to the Normal Loss Threshold.

Figure 9 .
Figure 9. DANN Model Loss and AUC for DIS Attack Binary Classification.

Figure 10 .
Figure 10.DANN Model Loss and AUC for Rank Attack Binary Classification.

Figure 9 .
Figure 9. DANN Model Loss and AUC for DIS Attack Binary Classification.

Figure 9 .
Figure 9. DANN Model Loss and AUC for DIS Attack Binary Classification.

Figure 10 .
Figure 10.DANN Model Loss and AUC for Rank Attack Binary Classification.Figure 10.DANN Model Loss and AUC for Rank Attack Binary Classification.

Figure 10 .
Figure 10.DANN Model Loss and AUC for Rank Attack Binary Classification.Figure 10.DANN Model Loss and AUC for Rank Attack Binary Classification.J. Sens. Actuator Netw.2023, 12, x FOR PEER REVIEW 21 of 26

Figure 11 .
Figure 11.DANN Model Loss and AUC for Wormhole Attack Binary Classification.

Figure 11 .
Figure 11.DANN Model Loss and AUC for Wormhole Attack Binary Classification.

Figure 14 .
Figure 14.Loss and AUC with Three DANN Hidden Layers Used in Multi-Class Classification.

Figure 15 .
Figure 15.Loss and AUC With Four DNN Hidden Layers Used in Multi-Class Classification.

Figure 14 .
Figure 14.Loss and AUC with Three DANN Hidden Layers Used in Multi-Class Classification.

J
. Sens. Actuator Netw.2023, 12, x FOR PEER REVIEW 23 of 26 percentage are achieved in classifying the normal class (99%).The overall average performance metrics are very good (92%) in classifying the three attacks in addition to the normal traffic.

Figure 14 .
Figure 14.Loss and AUC with Three DANN Hidden Layers Used in Multi-Class Classification.

Figure 15 .
Figure 15.Loss and AUC With Four DNN Hidden Layers Used in Multi-Class Classification.Figure 15.Loss and AUC With Four DNN Hidden Layers Used in Multi-Class Classification.

Figure 15 .
Figure 15.Loss and AUC With Four DNN Hidden Layers Used in Multi-Class Classification.Figure 15.Loss and AUC With Four DNN Hidden Layers Used in Multi-Class Classification.

.
List of abbreviations.
If Malicious node and Start Attack < time < End Attack then 3 Rank-(one time), Trickle reset DIO timer, Send DIO next interval 4 If a node receiving DIO and Sender Rank less than Preferred_Parent (default route) Rank then 5 Recalculate Preferred_Parent and this Node Rank 6 Trickle reset DIO timer, Send DIO to all reachable nodes on next interval 7 End If Malicious node and Start Attack < time < End Attack then 3 Replay and Multicast DIO on 802.11 interface every t time 4 If Malicious node receiving DIO on 802.11 interface then 5 Recalculate Preferred_Parent (to be the Malicious node) and this Node Rank 6 Trickle reset DIO timer, Send DIO (with new rank) on 802.11 to all reachable nodes on next interval 7 If a node receiving DIO and Sender Rank less than Preferred_Parent (default route) Rank then 8 Recalculate Preferred_Parent and this Node Rank 9 Trickle reset DIO timer, Send DIO to all reachable nodes on next interval 10 End

Table 3 .
Simulation Parameters For Normal and Attack Traffic Scenarios.

Table 4 .
Number of Samples in IoTR Subsets.

Table 7 .
Performance Metrics in DAE DIS Attack Classification.

Table 8 .
Performance Metrics in DAE Rank Attack Classification.

Table 7 .
Performance Metrics in DAE DIS Attack Classification.

Table 8 .
Performance Metrics in DAE Rank Attack Classification.

Table 9 .
Performance Metrics in DAE Wormhole Attack Classification.

Table 10 .
Performance Metrics in the DAE of Both DIS and Rank Attacks Classification.

Table 11 .
Performance of DNN IDS Binary Classification (DIS Attack).

Table 12 .
Performance of DNN IDS Binary Classification (Rank Attack).

Table 11 .
Performance of DNN IDS Binary Classification (DIS Attack).

Table 12 .
Performance of DNN IDS Binary Classification (Rank Attack).

Table 16 .
Performance Metrics with Three-Layer Multi-Class Classification.

Table 16 .
Performance Metrics with Three-Layer Multi-Class Classification.

Table 16 .
Performance Metrics with Three-Layer Multi-Class Classification.