A Review of Intrusion Detection Systems Using Machine and Deep Learning in Internet of Things: Challenges, Solutions and Future Directions

: The Internet of Things (IoT) is poised to impact several aspects of our lives with its fast proliferation in many areas such as wearable devices, smart sensors and home appliances. IoT devices are characterized by their connectivity, pervasiveness and limited processing capability. The number of IoT devices in the world is increasing rapidly and it is expected that there will be 50 billion devices connected to the Internet by the end of the year 2020. This explosion of IoT devices, which can be easily increased compared to desktop computers, has led to a spike in IoT-based cyber-attack incidents. To alleviate this challenge, there is a requirement to develop new techniques for detecting attacks initiated from compromised IoT devices. Machine and deep learning techniques are in this context the most appropriate detective control approach against attacks generated from IoT devices. This study aims to present a comprehensive review of IoT systems-related technologies, protocols, architecture and threats emerging from compromised IoT devices along with providing an overview of intrusion detection models. This work also covers the analysis of various machine learning and deep learning-based techniques suitable to detect IoT systems related to cyber-attacks.


Introduction
The recent development in communications and information technologies, such as the Internet of Things (IoT), has extraordinarily surpassed the traditional sensing of nearby environments. IoT technologies have facilitated the development of systems that can improve life quality. IoT is one of the fastest-growing technologies in computing, with an estimated 50 billion devices by the end of 2020 [1]. It has been estimated that, by the year 2025, the IoT and related applications have a potential economic impact of $3.9 trillion to $11.1 trillion per year [2]. The IoT devices can become smart objects by taking advantage of its core technologies like communication technologies, pervasive and ubiquitous computing, embedded devices, Internet protocols, sensor networks, and Artificial Intelligence (AI)-based applications [3].
The ubiquitous interconnection of physically distributed IoT devices extends the computation and communication to other IoT devices with different specifications [4]. Multiple types of sensors, embedded in these devices, enable them to gather real-time data from the physical devices remotely.

Main Contribution
In this paper, a detailed review of network threats from IoT networks and their devices with corresponding ML and DL based attack detection techniques is presented. Table 1 summarizes a comparison of our survey with the other surveys conducted on IDSs in IoT networks. As described in the table, this survey covers all important aspects on the subject of ML and DL based techniques used for IDS in IoT networks and their systems. The table also shows that other surveys partially cover some of the aspects and there is no single paper that explains all the aspects. The key contributions of this survey are described as follows: • Discussion of IoT architectures and IoT Protocols, covering their technologies, frequency bands, and data rates. The organization of the paper is presented as follows. In Section 2, recent studies conducted related to the anomaly and intrusion detection in IoT networks are discussed. In Section 3, an overview of IoT systems is presented covering IoT architecture and reference models and IoT protocols. Section 4 describes various attacks and threats against IoT systems. Following this, Section 5 discusses IDS architecture, its design choices and various detection methods, including their ML and DL techniques described in Sections 6 and 7, respectively. Section 8 describes briefly the datasets that are available and used for testing IDS. Finally, the future challenges and paper's conclusion are provided in Sections 9 and 10, respectively.

Current Reviews
Various survey studies have been carried out in the field of IoT security by describing vulnerabilities in IoT systems. However, most of the existing studies on IoT security have not mainly focused on the applications of ML/DL techniques for IoT security. Table 1 summarizes a comparison of our survey with the other surveys conducted on IDSs in IoT networks. The comparison discusses the contributions of each survey related to the design of IoT-based IDSs. In [32], the authors studied the challenges of IoT security at the communication layer. A study in [33] focused on reviewing IDSs for IoT networks. The work in [34] covered a brief discussion of the ML technique's relevance in the context of IoT security and privacy. Moreover, they identified limited bandwidth, computation power and lack of adequate storage as bottlenecks in any implementation of ML-based security solutions for IoT networks. There are other studies [35,36], which discussed the feasibility of both ML and data mining techniques to detect intrusions in IoT networks by implementing these techniques in IDSs either through detecting anomalies or classification of traffic. In [21], the authors highlighted differentials between IDSs running over wired networks and those running over wireless infrastructure, especially IoT networks. Due to fundamental architectural variations, the application of ML techniques in IoT IDSs needs specific treatment related to the type of attacks, underlying protocols (both in communications and networks), and application layer.
Another study published in [22] discussed the implementation of IDS in the context of MANETs. The authors described that there are three different types of IDS architectures feasible in MANETs. First architecture can be a layered architecture organized in multiple hierarchical layers. Second architecture can be a flat one for deploying in a distributed and cooperative environment. While the third one can be a hybrid of both using mobile agents. Another study [23] discussed various Intrusion Detection algorithms related to IDS implementation in MANET. According to the authors, these IDS algorithms can be categorized in various categories based on the underlying principle used for the detection of an attack. These principles can either be a rule, statistics, heuristics, signature, state, reputation score, or route used. These techniques were later classified further as anomaly detection, misuse, signature-based, or hybrid techniques. There were other classification criteria proposed by the authors [23] like real-time/offline, attack types and effectiveness of detection (scalability, reliability, timeliness, etc.).
Another survey presented in [30], the authors explained a classification of IDS for Wireless Sensor Networks (WSN) based on the deployment model of the IDS agent. The deployment model can be either distributed, central, or a hybrid mode, which is suggested as the best-suited model for WSNs. A similar study [31] carried out a classification of WSNs based on IDS using the criteria of detection type used by the IDS. The classes identified included anomaly detection, misuse detection and detection based on specifications. Another aspect of cloud-based IoT environment was discussed in [16], where the authors studied and classified various cloud-based IDSs affecting Confidentiality, Integrity, and Availability (CIA) of cloud computing-based IoT networks. They explained Hypervisor-based IDS, Host-based IDS (HIDS), Network-based IDS (NIDS) and Distributed IDS. In [30], the authors presented a survey on IoT IDS with a focus on an IDS architecture. The survey covered existing IoT protocols, standards and technologies, IoT security threats, detection types and concludes by suggesting proposed IoT IDS architecture.
The authors in [39], proposed a novel multi-stage anomaly detection technique based on Boruta Firefly Aided Partitioning Density-Based Spatial Clustering of Applications with Noise (BFA-PDBSCAN). The authors claimed that their proposed technique produced better results in comparison to the related techniques of Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN). In [40], the authors proposed a hybrid data processing model for network anomaly detection that utilizes Grey Wolf Optimization (GWO) and Convolutional Neural Network (CNN) techniques. The authors stated that their model achieved better accuracy and detection rate in comparison to the other state-of-the-art IDSs. In [41], an anomaly detection method based on a deep autoencoder was used to detect attacks of IoT botnets. The method comprises extracting statistical features from behavioral snapshots of normal IoT device traffic sequences and training of a DL based autoencoder on the extracted features. The reconstruction error for traffic observations is then compared with a threshold to classify them as normal or anomalous. The authors evaluated the proposed detection method on the BASHLITE and Mirai botnets dataset generated using commercial IoT devices. In a recent survey paper published in [37], learning-based NIDSs for IoT systems were discussed in an overview of ML-based NIDSs for IoT systems.

IoT System Environment
The adoption of IoT throughout real-world applications, such as home automation, industrial automation and city automation, resulted in a plethora of micro computation devices and energy-efficient communication technologies, specifications and protocols. IoT systems have been widely employed in applications of military, agriculture, power systems, education and commerce. Diverse areas of applications resulted in the realization of various devices, communication standards and protocols. The IoT system paradigms illustrate its various applications, where the access network technology is presented in Figure 1 that shows a loose clustering of various IoT communication technologies and protocols to the corresponding network.  Figure 1. IoT system environment-applications and related access networks and protocols.

IoT Architecture
The IoT architecture consists of physical objects integrated into a communication network and supported by computational equipment to deliver smart services to users. The IoT system should be capable of connecting billions of heterogeneous devices through the Internet, so there is a need for a layered and flexible architecture. There are numerous architectures and reference models proposed by various authors and organizations but those have not yet converged to a formally recognized reference model [3][4][5][6][7][42][43][44]. The most common architectures and reference models (the terms "architecture" and "reference model" used interchangeably by the authors) are explained as follows: • A 3-layer architecture. The most common and basic model is a 3-layer architecture comprising of the perception, network and application layers [3,4,43], as depicted in Figure 2, the perception layer is also called 'the device layer' that includes physical devices and sensors. The network layer is also named 'the transmission layer', which should securely transmit the telemetry data of sensors to processing and data analytical systems. The application layer offers global management of applications using the systems at the network layer.

•
International Telecommunication Union (ITU) recommended Reference Model for IoT. ITU recommends a reference model for IoT that comprises four layers, along with security and management capabilities linked to the layers [45]. The layers are as follows: device layer, network layer, application support layer, service support and application layer, as shown in Figure 3.

• IoT-A Architectural Reference Model proposed by the European Commission (FP7).
The European Commission within the Seventh Framework Program (FP7) supported the project IoT-A proposed by Martin Bauer et al. [6]. The IoT-A model attempts to design an architecture that could meet the requirements of the industry and researchers. It offers high-level architectural perspectives and views for building IoT systems. The architecture comprehensively describes the structuring and modeling of IoT business process management, IoT services, cross-service organization and virtual entities, information and functional viewpoints, in an abstract way [46]. Amongst these various views, a functional view of IoT architecture is depicted in Figure 4. WSO2, an open-source technology provider, has proposed an Architectural Reference Model based on its skills in the IoT solutions development. Figure 5 depicts the WSO2 recommended architecture. It consists of five layers: (1) Client/external communications-Web/Portal, Dashboard, Application Programming Interface (APIs), (2) Event processing and analytics (including data storage), (3) Aggregation/bus layer-Enterprise Service Bus (ESB) and message broker, (4) Relevant transports-XMPP/CoAP/AMQP/HTTP/MQTT, etc. and (5) Devices [47]. The model includes the cross-cutting layers that have (1) a device manager, and (2) an identity and access management system. • An IoT Reference Architecture suggested by Cisco: Cisco introduced a seven-layered IoT reference model [48]. The model and its levels are illustrated in Figure 6. The authors described that control information flows from level 7 to level 1 in a control pattern. The flow of information is the reverse in a monitoring pattern and it is bidirectional in most systems.   Figure 6. Cisco IoT reference model [48].

IoT Protocols
Several protocols and specifications inherited from the TCP/IP model, some technologies are specifically developed for IoT systems. IEEE 802.15.4 (transmission and communication specification standards) is not alone in the paradigm of IoT specific technologies and standards. In Table 2, a description of the IoT technologies with respective frequency bands and supported data rates and area coverage.

IoT-Based Threats and Attacks
IoT systems suffer from various security risks as compared to conventional computing systems due to several reasons [15,47]. First, IoT systems are highly diverse with regards to devices, platforms, communication means and protocols. Second, IoT systems comprise "things" not planned to be connected to the Internet, where control devices are used to link physical systems. Third, there are no well-defined boundaries in IoT systems, which regularly change due to the mobility of users and devices. Forth, IoT systems, or part of them, would be physically insecure. Last but not least, due to the limited energy of IoT devices, it is usually very hard to deploy advanced security techniques and tools on IoT devices.
An IoT network often contains hundreds of nodes with assigned functions ranging from sensing of light, temperature and noise to associated control systems to regulate lighting and heating, ventilation, and air conditioning (HVAC) systems, etc. All these sensors and control systems communicate through different network protocols like Bluetooth, WiFi, ZigBee, etc. An IoT gateway is used to connect these devices to the Internet. Being composed of layers of standards, services and technologies, the IoT environment has privacy and security concerns at each of these layers. While it seems that the IoT environment has similar security concerns to the Internet, cloud and mobile communication networks, there are distinct characteristics that set IoT environments, along with the applications of contemporary security controls [10]. These can share data, computing capacity limitation and a large number of networked IoT devices.
One instance of the susceptibility of IoT devices to attacks was demonstrated in September 2016, where an IoT botnet built from the Mirai malware-possibly the largest botnet on record-was responsible for a 620 Gbps attack directed towards Brian Krebs's security blog [11]. Mirai followed a simple strategy, where it tried a list of 62 common user credentials to get access to digital video recorders, home routers and network-enabled cameras, which generally had fewer defenses than other IoT devices. Later, in the same month, the French webhost OVH (On Vous Héberge) was attacked by the Mirai-based attack, which broke the record for the largest recorded distributed denial of service (DDoS) attack peaking at 1.1 Tbps [12]. The attack was made possible due to default and weak security configurations. Similarly, in [49], the authors described the relative ease of compromising various IoT devices, due to flaws in protocol implementations.
The rapid proliferation of IoT based devices is likely to make such networks susceptible to attacks against privacy and security aspects. In [13], the authors identified various security issues in IoT networks built with commercially available IoT devices like sensors. One example cites a smart watering system that is capable of measuring environmental variables like temperature and humidity, etc. An actuator module was employed for functionality implementation with a web-based user interface. The system was built on an Arduino Uno. The authors described the exposure of such network to spoofing attacks through a software-enabled access point (SoftAP), where an attacker managed all IoT devices in a network to shut down for a while as the SoftAP broadcasts de-authentication packets.
Due to the limited processing capabilities of IoT devices, the hacker made all IoT devices vulnerable in the network to connect to the SoftAP as it appeared to have a stronger signal than the actual access point (AP) with the same service set identifier (SSID). This allowed the compromise of all network communications to eavesdropping and man in the middle (MiTM) attacks. Such attack scenarios built a case for the deployment of IDSs in IoT networks to discover vulnerabilities of IoT devices. The idea of IoT revolves around the intelligent integration of a real physical environment with the Internet to enable interactivity. For this reason, IoT environments have interconnections and dependencies with multiple heterogeneous environments. This exposes each IoT system to cyber threats from each connected environment [50,51]. IoT environments face threats from multiple dimensions both from physical and virtual domains. Figure 7 illustrates multiple threat dimensions of an IoT environment that would be exploited.

IoT Environment
Each  Though IoT Security threats can be broadly divided into cyber and physical domains, our survey is mainly concerned with cyber threats, which can take the form of either active or passive attacks. Passive Attacks are characterized by a lack of any alteration to information or its flow, thereby only compromising the confidentiality and privacy of communications. In some cases, a passive attack can enable location tracking of IoT devices [52][53][54]. Active Attacks involve active alteration and modification of information and its flow, but are not limited to device settings, control messages and software components.
One active attack is when the IoT system is used as a vector to launch massive DDoS against Internet systems. IoT systems are a suitable vector for these attacks because of their large numbers and comparative ease of their compromise, due to poor security practices and weak defense mechanisms. Mirai can be used as an example of a botnet attack through for compromising IoT systems [11,55,56]. IoT systems face many threat dimensions from multiple directions, including user interface, cloud services, other interconnected IoT systems associated to sensors and network services [12], as shown in Figure 7. A discussion of these dimensions is presented in the following subsections.

User Interface
Most use cases of IoT systems involve the provision of services to users by IoT systems through some sort of a user interface (mobile, desktop or web application). The case of smart home appliances can be controlled by users through mobile applications. The rapid proliferation of smartphones has provided malicious actors to disguise malicious applications and malware as benign utility mobile applications and publish them through applications to store without being detected [57,58]. Also, smartphones can sometimes be hacked through platform vulnerabilities of these devices like Android vulnerabilities. This leads to exposing all information stored on the phone with the possibility of malware compromise. Eavesdropping, location tracking, Denial of Service (DoS)/DDoS, bluejacking and bluesnarfing are attacks enabled through user interface platforms [59][60][61].

Cloud Services
Though Cloud services and IoT systems lie at two ends of the resource availability spectrum, the two can complement each other to produce an excellent blend of technologies. Cloud services are characterized by ubiquitous access to computing power and storage, etc., which can offset the resource limitations of IoT systems [62]. The potential of IoT systems can be maximized through integrated use with cloud services to conserve energy and provide all types of services without being constrained by storage and processing power limitations [63]. Likewise, cloud services can benefit from large deployments of IoT systems through integrated applications [64]. Such a distributed architecture opens up vulnerable points for many attacks at multiple layers, as explained below.
• Authorization Attacks. Through the exploitation of vulnerabilities in data security mechanisms, an attacker may be able to gain unauthorized access to information on both cloud and IoT systems. • Integrity Attacks. Such attacks enable an attacker to compromise the integrity of data through spoofing and bypass the authorization controls to gain direct access to databases. • Compromise of Visualization platform. A vulnerability in the virtualization platform can be exploited by an attacker to bypass security and isolation controls between the host and the guest operating system (OS), resulting in privilege escalation and pivoting attacks [65]. • Confidentiality Attacks. IoT systems, like wearable devices, are used to monitor health-related data of highly confidential nature. Similarly, smart home devices capture sensitive private data of the users. Privacy and confidentiality concerns overshadow the advantages of cloud services. Moreover, multi-tenancy and geographical location of cloud services pose a serious threat to the confidentiality of data through privilege escalation and hacking [66].

Connections of Multiple IoT Systems
Various IoT systems are designed to work autonomously and interact with other IoT systems, such as sensors and actuators of smart cars and smart homes, without requiring human involvement. Such an interaction is aimed at achieving an autonomous and collaborative functionality. Smart cars and smart homes can communicate with each other and provide interdependent services and functions. For instance, [67] described such a scenario where sensing increased temperature by a temperature sensor, coupled with sensing of unplugging of a smart plug, the windows of the room are automatically opened. The window opening actuator would be reachable for an attack as it may manipulate the temperature sensing device through its interface and in turn that compromises the actuator [67]. This example highlights the fact that the weakest part of interdependent IoT systems can compromise other parts as well.
A large number of interconnected devices in IoT systems increases the vulnerability and also the impact of any attack, where one compromised device can lead to the compromise of billions of devices. Such a scenario can impact any externally connected networks and systems also. One study [68] demonstrated that an experimental malware attack against Philips Hue smart lamp was so successful that it compromised all such lamps in the network, despite the presence of reliable cryptographic authentication mechanisms against malicious firmware updates. Similar attacks could provide the control of lights of an entire city or their use in DDoS against outside targets [68].
Various types of sensors are an essential part of IoT systems like GPS, Radio-Frequency Identification (RFID), temperature gauge and IP cameras. This also includes sensors and actuators embedded in autonomous vehicles and the internet of vehicles (IOVs). These physical devices are vulnerable to physical attacks and manipulation by malicious actors. Another component of IoT systems susceptible to such physical attacks is the actuator part, which performs some function based on readings of sensor devices. Both actuators and sensors would be subjected to DoS attacks through flooding, eavesdropping, location tracking, cloning and spoofing attacks [69][70][71]].
An IoT system consists of several interconnected devices using either wireless or wired networks. A large network linked to devices would have weak security profiles, where sensors and actuators are vulnerable to a multitude of attacks. WSNs provide information to external entities without any restriction. When they are integrated with conventional networks services, they cause regression in the security of conventional networks [72,73].

Protocols Level Attacks
IoT systems are different from traditional Internet protocols, which require lightweight protocols to address issues of limited energy, data rate and computing power. A detailed description of IoT protocols based attacks can be found in [74]. Attacks of IoT technologies are presented with threat types in Table 3.

Radio-Frequency Identification (RFID)
Because the communication between the reader and RFID tags is made through an unprotected wireless channel, the transmitted data is exposed by unauthorized readers. RFID systems face different security threats as compared to the security threats encountered by traditional wireless systems [75]. Various hacking techniques against RFID are discussed as follows: • Tag Disable. An attacker may remove the tag, delete the tag memory by sending a kill command, remove the antenna, give a high energy wave to a tag, and use a Faraday cage to block electromagnetic waves. • Tag Modification. An attacker modifies or deletes valuable data from the memory of the tag. • Cloning Tags. An attacker imitates or clones the tags after skimming the tag's information. • Reverse Engineering. Using reverse engineering, an attacker can make a copy of a tag, and using tag examination, the attacker may get confidential data stored within a tag.

•
Eavesdropping. RFID systems working in ultra high frequency (UHF) are more vulnerable to this threat. An attacker gathers the information shared between a valid tag and valid reader.

•
Snooping. An attacker introduces an unauthorized reader to interact with the tag.

•
Skimming. An attacker snoops data shared between a legitimate reader and legitimate tag. • Replay Attack. An attacker spies to collect information about the IoT device or node replays eavesdropped information to achieve deception. • Relay Attacks. An attacker places an illegitimate device between the tag and the reader to intercept, modify and forward information directly to other systems.
• Electromagnetic (EM) Interference. An attacker creates a signal in the same range as the reader to preclude tags from communicating with readers. • Fake RFID Tag Query. An attacker sends queries and gets the same response from a tag at various locations to determine the location of a specific tag. • Cryptograph Decipher Attack. An attacker decodes encryption algorithms by launching violent attacks and gets the plain text by deciphering the intercepted cryptography. • Blocker tag Attack. Using a blocker tag, an attacker attempts to restrict the reader from reading tags.

Zigbee Protocol
The Zigbee protocol is one of the most popular IoT protocols used for communication in IoT devices because of its low cost, low power consumption and scalability. While the importance of security was considered during the design of Zigbee, some trade-offs have been kept to bring the cost of devices down and make them scalable at a low cost. Some of the standard security measures could not be implemented which ultimately resulted in security vulnerabilities. The major security threats against Zigbee networks are enumerated below.

•
Sniffing. Zigbee networks are exposed to sniffing attacks since they do not implement encryption techniques. The attacker can capture some packets to execute malicious activities using some software tools like KillerBess's zbdump tool [76]. • Replay Attack. If an attacker is able to intercept the packets, the attacker can sniff raw packets of a network and could re-send the captured data as normal traffic [76].

•
Attaining the Link or Network key. Since keys need to be reinstalled on the air when its objects require reflashing, an attacker can obtain the ZigBee network or link keys. Also, physical attacks can be used to obtain the key, where the keys can be extracted from ZigBee devices' flash memory when the device is physically accessed [77,78].

•
Eavesdropping. An attacker can eavesdrop a ZigBee network and redirect its packets using an MiTM attack. • ZED Sabotage Attack. Authors in [79] proposed an attack against the ZigBee protocol called the ZigBee End-Device (ZED). The purpose of the attack is to make the ZED unavailable by transmitting a particular signal periodically to wake up the device to drain its battery.

Wireless Fidelity (WiFi)
A detailed review of attacks against various versions of the 802.11 security mechanism (i.e., WPA, WPA2, WEP) is explained in [80]. The most common WiFi attacks are described below.
• Attacks Related to Retrieving Key. An attacker would monitor specific packets and then crack the key process offline.

Bluetooth
Most of the issues found in Bluetooth are related to the pairing process. Attacks can be launched during the pairing process stages, like before the completion of the pairing process and after the pairing of devices is completed [83]. For instance, based on information collected after pairing, attackers can launch man-in-the-middle attacks. A review of Bluetooth security issues is explained in [83][84][85]. The common attacks against Bluetooth are discussed below.
• PIN Cracking Attack. This type of attack is performed during the pairing of the device and the process of authentication. An attacker collects the random number (RAND) and the Bluetooth Device Address (BD_ADDR) of the targeted device using some frequency sniffer tool. Then, a brute-force algorithm (for example, E22 algorithm) is applied to check all possible combinations of the PIN with the data collected earlier until the correct PIN is determined [84]. • MAC Spoofing Attack. An attack is launched during the process of link keys generation and before encryption is established. Devices manage to authenticate each other using generated link-keys. In this, attackers can imitate another user. Attackers can also dismiss connections or even alter data [84].  [86]. After the attack is launched, devices share messages unknowingly [58]. During this time authentication is performed without the shared secret keys [58]. When the attack is successful, the two devices are paired to the attacker [57,58], while they believe the pairing was successful.

•
Bluebugging. An attacker exploits vulnerabilities of old devices firmware to spy on phone calls, send and receive messages, and connect to the Internet without legal users' knowledge.

•
Bluesnarfing. An attacker gets unauthorized access to devices to retrieve information and redirect the incoming calls.

Near Field Communication (NFC)
Although the communication range of NFC is restricted to a few centimeters, the International Organization for Standardization (ISO) standard does not guarantee secure communication. The common attacks against NFC technologies are briefly mentioned below [87].

•
Eavesdropping. By using powerful and bigger antennas than those of mobile devices, NFC communications can be received or intercepted by an attacker in the vicinity of the devices. This allows an attacker to eavesdrop an NFC communication across larger distances. The RPL protocol has been designed to allow point to point, multiple-point to point, and point to multiple-point communication. It is a distance-vector routing protocol based on IPv6. The RPL devices work on a specific topology that joins tree and mesh topologies called Destination Oriented Directed Acyclic Graphs (DODAG) [74,92]. Attacks against routing protocol can cause communication failures within IoT systems [93]. The interconnection of IoT systems to the Internet multiplies the vulnerabilities exponentially through exposure to innumerable attack vectors. The main attacks against RPL are discussed as follows: • Sinkhole Attack. An attacker may announce a favorable route or falsified path to entice many nodes to redirect their packets through it. • Sybil Attack. An attacker may use different identities in the same network to overcome the redundancy techniques in scattered data storage. Also, this can be used to attack routing algorithms. • Wormhole Attack. An attacker disturbs both traffic and network topology. This attack can be launched by generating a private channel between two attackers in the network and transmitting the selected packets through it. • Blackhole Attack. An attacker maliciously advertises itself as the shortest path to the destination during the path-discovering mechanism and drops the data packets silently. • Selective Forward Attack. It is a variant of the Blackhole attack, where an attacker only rejects a specific subpart of the network traffic and forwards all RPL control packets. This attack is mainly targeted to disturb routing paths; however, it can also be used to filter any protocol [74]. • Hello flooding attack. An attacker can announce itself as a neighbor to many nodes, even the complete network by broadcasting a "HELLO" message with a strong powered antenna and a favorable routing metric. This is done by an attacker in order to deceive other objects to send their packet through it [94].

Internet Protocol (IPv6) and Low-Power Wireless Personal Area Networks (6LoWPAN) Based Attacks
6LoWPAN was designed to meet the communication requirements of connecting resourceconstrained, low-powered objects and IPv6 networks. To achieve this, 6LoWPAN uses fragmentation at the adaptation layer. The main attacks against 6LoWPAN are explained as follows: • Fragmentation Attack. IoT object communicating in IEEE 802.15.4 has a Maximum Transmission Unit (MTU) of 127 bytes, as opposed to in IPv6, which has a minimum MTU of 1280 bytes. This is done using a fragmentation mechanism. Since fragmentation is performed without using any type of authentication, an attacker can inject fragments among a fragmentation chain [95]. • Authentication Attack. In the absence of an authentication mechanism in 6LowPAN, any malicious object can join the network and get legitimate access [92].
• Confidentiality Attack. In the absence of an encryption technique in 6loWPAN, attacks affecting confidentiality, like eavesdropping, spoofing and Man in the Middle can be launched.

Intrusion Detection System (IDS)
Most IDSs have a common structure that includes: (1) a data gathering module collects data, which possibly contains evidence of an attack, (2) an analysis module detects attacks after processing that data, and (3) a mechanism for reporting an attack. In the data gathering module, the input data of each part of IoT systems can be gathered and examined to find normal behavior of interaction, thereby detecting malicious behavior at the early stages. The Analysis module can be implemented using various techniques and methods, however, ML and DL based methods are more suitable and dominant for data examination to learn benign and anomalous behavior based on how IoT devices and systems interact with one another in IoT environments. Furthermore, ML/DL methods can predict new attacks, which are often different from previous attacks, because ML/DL methods can intelligently predict future unknown attacks through learning from existing legitimate samples [12]. Figure 8 shows the components of typical IDS based on ML/DL methods.

Design Choices of ML/DL Based IDS
As depicted in Figure 9, the main differences in the design choices for IDSs depends on the following factors: • Detection methods. It could be signature-based, anomaly-based or hybrid-based detection.

Detection Methods of IDSs
The detection methods used for IDSs can be divided into four methodological types [33], as shown in Figure 9 and explained below.

Signature-Based Detection Techniques
Signature-based detection techniques contain a repository of attack signatures and compares the network traffic or system actions against this repository of signatures. As soon as any match is found, a detection alert is raised. Though sufficiently accurate against known attacks for which signatures exist in the repository, this technique cannot detect zero day (new) attacks. Even if it is not effective against mutations of an existing attack [54,96,97].
Some research, like [98], proposed means to overcome this deficiency of signature-based techniques through the use of an Artificial Immune System (AIS). This technique designed detectors relying on signatures/patterns of attacks using the model of immune cells, which can detect if a packet is normal or malicious through its classification as self or non-self element. The system has the capacity for the adoption of new patterns from continuous monitoring of the system. However, the feasibility of such a detection technique in a resource-constrained IoT environment is questionable.
The authors in [99] resolved this predicament of resource constraints in signature-based IDS through utilizing a separate Linux machine with an adapted version of the Suricata-based signature IDS. However, the authors did not provide any clues of updating attack signatures. The authors in [100] extended the work published in [99] by proposing modifications in signature matching techniques. Another research by [101] tackled processing power constraints of IoT systems through the use of auxiliary shift values with a multiple pattern detection algorithm, which enables a reduction in the number of matching operations required between attack signatures and network traffic packets. The system used signature repositories of the open-source IDS (Snort) and the open-source antivirus (ClamAV).

Anomaly-Based Detection Techniques
Anomaly-based detection techniques rely on a baseline normal behavior profile for the monitored environment [97,102]. This normal baseline is then used for comparison of system actions at any given moment. Any deviations out of bounds of the allowed threshold are reported by raising an alert without providing any classification for the type of attack detected. There are also attempts of using machine learning models that learn normal and attack events as behavioral detection models, but building normal profiles are better than learning normal and attack events that can not include new attack events in real-world networks. In comparison with signature-based detection techniques, anomaly-based detection techniques are more effective in discovering new attacks. One drawback of this technique is the difficulty in building the normal behavior baseline profile, which gives rise to increased false positive rates [20,103,104]. Anomaly-based detection techniques rely on ML algorithms to build a baseline normal profile of monitored systems. The use of such ML techniques in resource and energy-constrained IoT environments is still a challenge, due to high computational resources needed to train and validate ML techniques.

Specification-Based Detection Techniques
The basic principle of both anomaly-based detection and specification-based detection techniques is the same, where the normal behavior of a system is profiled through some means and is compared against current system actions to detect out of range deviations. However, in anomaly-based techniques, normal behavior is learned through ML, whereas for specification-based techniques it needs to be manually specified through a repository of rules and associated ranges of deviations by a human expert [105]. This allows for lowering the false-positive rates as compared to the anomaly-based detection techniques [20]. Having the advantage of not requiring any learning phase after specifying a rule set [105], these techniques suffer from lack of adaptability to varied environments and are liable to errors in specifications [19].

Hybrid-Based Detection Techniques
Hybrid-based detection techniques employ a mix of the earlier mentioned techniques to offset the shortcomings and optimize the advantages of detecting existing and new attacks. The authors in [106] proposed SVELTE, which is an IDS for IP-connected IoT systems that use RPL as a routing protocol in 6LoWPAN networks. This IDS was designed using a hybrid of anomaly and signature based detection techniques to obtain a balance between storage and processing requirements of each of these two techniques. They tried to balance the storage cost of the signature-based detection and computing cost of the anomaly-based techniques.

Machine Learning (ML) Techniques for IDS
As discussed in the previous section, apart from specification-based detection, all types of detection techniques rely on some sort of ML algorithm for the training phase of the IDS. In this section, an overview of different ML techniques used in IoT environment based IDSs is presented. Table 4 gives a brief overview of ML methods, their advantages and limitations along with reference to related research work conducted. In the end, Table 5 summarizes research works conducted to propose IDSs using various ML methods, as detailed below. Figure 10 illustrates the most common ML techniques used for designing IDSs in IoT networks.

Naive Bayes (NB) Classifier
This algorithm employs Bayes' theorem to predict the probability of occurrence of an event based on previous observations of similar events [107]. In ML scenarios, this can be used for classification of normal and abnormal behaviors based on previous observations in supervised learning mode. The NB classifier is a commonly used supervised classifier known for its simplicity. NB calculates posterior probability and based on that a labeling decision is made to classify unlabeled traffic as normal or anomalous. An independent set of features of the observed traffic like, status flags, protocol, latency, are used to forecast the probability of traffic being normal or otherwise. Being simple and easy to implement an algorithm, various IDSs have employed an NB classifier to identify anomalous traffic [108][109][110][111]. It requires very few samples for training [112] and can classify in both binary and multi-label classification. However, it fails to take into account interdependencies between features for classification purposes, which affect its accuracy [113].

K-Nearest Neighbor (KNN)
KNN does not require any parameters for its working. Euclidean distance is used to measure the distance between neighbors [114]. Figure 11 shows the basic principle behind the KNN classification algorithm, used to classify a new data instance into already observed classes based on its relative distance to either of the classes. The green squares depict the normal behavior class and red triangles show the abnormal behavior class, any newly observed unknown instance (blue hexagon) can now be classified based on the number of maximum nearest neighbors from either of the classes. Accordingly, this new instance is classified as a known class. k is the number of nearest neighbors used for classification.
The classification will change with the value of k. For k = 1, the red hexagon will be classified as an abnormal class, but for k = 2 and k = 3, it will be classified as a normal class. Hence, obtaining the optimal value of k through testing is vital for the accuracy of this algorithm [115]. Some researches [116][117][118] have used KNN based classification for anomaly and intrusion detection in general and IoT based network intrusion detection in particular [119,120] with reasonable accuracy in detecting User to Root (U2R) and Remote to Local (R2L) attacks. While KNN is simple to use, determining the optimal value of k and identifying missing nodes are time-consuming and costly in terms of accuracy.

Decision Trees (DTs)
Decision Trees (DTs) work by extracting features of the samples in a dataset and then organizing an ordered tree based on the value of a feature. Every feature is represented by a node of the tree and its corresponding values are represented by the branches originating from that node. Any feature node that optimally divides the tree in two is considered the origin node for the tree [121]. Various metrics are utilized for identification of the origin node, which optimally divides the training datasets like the Gini index [122] and Information Gain [123]. Anomaly Uknown Normal Figure 11. K-Nearest Neighbor (KNN) classification principle. Figure 12 illustrates decision tree nodes. DT algorithms involve two processes, namely induction and inference, aimed at building the model and then carrying out the classification [124]. During the induction process, construction of a DT starts with adding nodes and branches. Initially, these nodes are unoccupied, and then through a process of feature selection through information gain and other measures, a feature is selected that is deemed to split the training dataset samples. This feature is then assigned as the origin vertex of the DT.
The process continues to select feature root nodes, to minimize the overlapping between different classes found in the training dataset. Resultantly, the accuracy of classifier increases in identifying distinct instances of a class. In the end, the leaves of each sub-DT are identified and classified according to their corresponding classes. After the construction of DT, the inference process can start, where any unknown instances of classes with features can be classified through iterative comparison with constructed DT. After the acquisition of a matching leaf node, the classification process for the new sample is completed [124]. In context of intrusion detection DTs have potential for use as classifier [125,126]. However, aspects of bigger storage requirements and computational complexity must be considered [124]. In the IoT environment, research published in [127] used DT to detect DDoS attacks through analysis of network traffic for identifying malicious sources.

Support Vector Machines (SVMs)
SVM is another type of classifier that works through the creation of a hyperplane in the feature set of two or more classes. The splitting hyperplane is found through a maximum distance of the nearest data point of each compared class [128], as shown in Figure 13. SVMs are most appropriate for the use case where classes containing large feature sets are required to be classified based on a fewer number of data samples [35,129,130]. Based on statistical learning [128], SVMs are ideal for anomaly detection where classification between normal and abnormal classes is required. SVMs are highly scalable due to simplicity and are capable of performing tasks like anomaly-based intrusion detection in real-time including online learning [131][132][133]. In [134], authors use an optimized version of SVM to propose "Sec-IoV", a multi-stage model for anomaly detection, for detection of anomalous traffic in vehicle-to-vehicle (V2V) communications in Internet of Vehicles (IoV) networks. Another advantage of using SVM is its use of lesser storage/memory. The use of SVM-based IDSs in an IoT system has been evaluated in various research studies [135][136][137], where SVM showed more accurate results than other ML algorithms including DTs, NB and Random Forest. However, the use of optimal kernel function in SVM, which is used to separate the data when it is not linearly separable, remains a challenge to achieve the desired classification speed.

Ensemble Learning (EL)
EL works by building on strengths of various classifiers, through a combination of their results and then generating a majority vote out for classification, as shown in Figure 14. This improves classification accuracy through a combination of various homogeneous/heterogeneous classifiers' outputs [138,139]. EL is based on the study [140], where it was found that every ML classification algorithm depends on the application and associated data for its accuracy. Hence, no ML algorithm can be described as "one size fits all solution" and for generalized applications, EL like combinations may be best suited for maximizing accuracy through a reduction in variance and avoiding overfitting [141].
The accuracy of EL leads to the cost of increased time complexity, due to the use of multiple classifiers in parallel [142,143]. The efficacy of EL for intrusion detection has been examined in various studies [144][145][146]. The feasibility of EL under limited resource environments like IoT has been studied [147] with a generalized application lightweight EL framework being proposed for online anomaly detection in IoT networks. This study showed that such an EL algorithm produced better and accurate results than each member classifier individually [147].

Random Forest (RF)
RFs can be categorized as a supervised ML algorithm. An RF is built using multiple DTs to predict more accurate and error resistant classification results [148,149]. Randomly constructed DTs are trained to output classification results based on majority voting [148]. Though DTs can be considered as components of RF, there are two distinct classification algorithms due to the reason that contrary to DTs, which build a rule-set during training for subsequent classification of new samples, RF builds a rule-subset using all member DTs. This results in a more robust and accurate output, which is resistant to overfitting and requires substantially fewer inputs and does not require the process of feature selection [35]. As proposed by some studies [150,151], RF is suitable for anomaly and intrusion detection in IoT networks. Moreover, another study [152] has shown RF to be better than KNN, artificial neural network (ANN) and SVM at DDoS detection in IoT networks because it requires fewer input features and can bypass heavy computations required for feature selection in real-time IDS [153].

k-Means Clustering
It is an unsupervised algorithm, which is based on the discovery of k clusters in the data samples. Each instance of sample data is assigned to a particular cluster based on its features. The samples are distributed over k clusters according to their features using the estimation of centroids as per squared Euclidean distance. Recalculation of centroids of each cluster is then performed through taking the mean of data points allocated to that cluster, as shown in Figure 15. The process continues iteratively until no modifications to the clusters can be made [154,155]. Selection of an appropriate value of k and the assumption that the sample dataset will be equally distributed over the k clusters act as limitations for the k-means clustering algorithm. Previous studies presented in [156,157] suggest the suitability of k-means clustering for anomaly detection through calculating feature similarity. The authors [158] suggested combining DT with k-means clustering for anomaly detection in IoT networks to improve the performance.

Principle Component Analysis (PCA)
PCA is not an anomaly detection technique, but it is commonly used as a feature selection or a feature reduction technique from a large dataset. The selected feature sets can then be used along with some other ML classifiers to detect anomalies in an IoT network. The PCA technique transforms a large set of variables into a reduced set of features without losing much of the information. Various research works [159][160][161][162] used a combination of PCA with various classifiers to detect anomalies in IoT networks.  -It requires very few samples for training [112].
-It can classify in both binary and multi-label classification.
-It shows robustness to irrelevant features.
It fails to take into account interdependencies between features for classification purposes, which affect its accuracy [113].
Determining optimal value of K and identifying missing nodes are challenging.
-It requires bigger storage -It is computationally complex -It is easy to use only if few DTs are used.

SVM [131-133]
Scan, DDoS (TCP, UDP flood), smurf, portsweep -SVMs are highly scalable due to simplicity and are capable of performing tasks like anomaly-based intrusion detection in real-time including online learning.
-SVMs are considered suitable for data containing a large number of feature attributes.
-SVMs use lesser storage and memory.
-The use of optimal kernel function in SVM, which is used to separate the data when it is not linearly separable, remains a challenge to achieve desired classification speed.
-It is difficult to understand and interpreting SVM-based models. -It produces a more robust and accurate output which is resistant to overfitting.
-It requires substantially fewer inputs and does not require the process of feature selection.
-Since RF constructs several DTs, its use may be impractical in real-time applications requiring large dataset.
-It is less effective as compared to supervised learning technique, in particular detecting known attacks.

PCA [159-162]
Used in combination with other ML methods -PCA is suitable where the dataset involves large set of variables as PCA transforms it to reduced set of features without losing much of information.
-Can reduce the complexity in the data.
-It is not an anomaly detection method, it must be used with some other ML methods to design a security model.
apache2 udpstorm processtable mailbomb Network Traffic anomaly detection PROBE attacks or non-PROBE attacks

Deep Learning (DL) Techniques for IDSs
DL algorithms outperform ML algorithms in applications involving large datasets. DL becomes most relevant in IoT security applications as IoT environments are characterized by the production of vast amounts and a variety of data [171]. Furthermore, DL is capable of the automatic modeling of complex feature sets from the sample data [171]. Another advantage of DL algorithms is their ability to allow deep linking in IoT networks [172]. This enables automatic interactions between IoT-based systems in the absence of human intervention [171] to perform assigned collaborative functions.
Because of their ability to extract hierarchical feature representations in complex deep architecture, DL can be classified as a branch of ML algorithms that uses multiple non-linear layers of processing to extract feature sets. These feature sets are then used for abstraction and pattern detection after necessary transformations [173]. As shown in Figure 16, DL can be used in a generative mode with unsupervised learning, discriminative mode using supervised learning, or a hybrid approach by combining both modes. In this section, various major DL based techniques used for designing an IDS are discussed. Table 5 below summarizes research studies conducted to propose IDS using various DL-based methods. Details about each research work along with the DL technique is explained in respective sub-sections below.

Recurrent Neural Networks (RNNs)
RNN is a discriminative DL algorithm, which is best suited in environments where data is to be processed sequentially. Unlike other neural networks, its output is dependent on back-propagation instead of forward propagation [173][174][175]. A temporal layer is incorporated in an RNN for analyzing data sequentially followed by learning about multi-dimensional differences in unrevealed units of recurrent components [165]. Modifications to these unrevealed units are then made corresponding to data encountered by the neural network, causing continuous updates and the manifestation of the current state of the neural network.
The current unrevealed state of the neural network is processed by an RNN algorithm through the estimation of succeeding hidden states as triggering of a previously unrevealed state. A simple explanation of RNN functioning is described in Figure 17. Here, outputs from neurons are sent back as feedback to the neurons of the previous layer. Because IoT environments are characterized by the generation of large amounts of sequential data like network traffic flows, RNNs become relevant in IoT security applications, especially network intrusion detection. Previous research [176] has proposed the use of an RNN for network intrusion detection through analysis of network traffic behavior and reported obtaining useful results, particularly time series-based threats. Another recent research [177] proposes an IDS that uses cascaded filtering stages in which deep multi-layered RNN are applied for each filter. RNNs are then trained to detect common attacks launched in IoT environments, like R2L, Dos, U2R and Probe.
Long short-term memory (LSTM) network architectures, which are a specialized form of RNN, have also been used in the designing of IDS. The main attribute of LSTM based RNNs is to persist information or cell state for later use in the network. This feature makes them appropriate for performing analysis of temporal data that changes over time. Thus, LSTM networks are preferred to solve problems related to anomaly detection in time-series sequence data. Various forms of RNN, including LSTM based RNNs, have been used for anomaly and intrusion detection in IoT networks by researchers in [178][179][180][181][182][183]. While RNNs have demonstrated promising results in predicting time series data, the detection of anomalous traffic using these predictions is still challenging.

Convolutional Neural Network (CNN)
CNN is also a discriminative DL algorithm, which was designed to minimize the number of data inputs required for a conventional artificial neural network (ANN) through the use of equivariant representation, sparse interaction and sharing of parameters [184]. Thus CNN becomes more scalable and requires less time for training. There are three-layer types in a CNN, namely convolutional layer, pooling layer and activation unit, as shown in Figure 18. The convolutional layers use various kernels for convoluting data inputs [185]. The pooling layers downsize samples, thus minimizing the sizes of succeeding layers. It involves two techniques: Max pooling and average pooling, where the former chooses a maximum value for every cluster of past layers after distributing the input among distinctive clusters [186,187].
The average pooling, on the other hand, calculates the average values of every cluster in the previous layer. The activation unit is able to trigger an activation function on every feature in the feature set in a non-linear fashion [187]. CNN is best suited for highly efficient and fast feature extraction from raw data but at the same time CNN requires high computational power [188]. Hence using CNN on resource-constrained IoT devices for their security is highly challenging. This challenge is somewhat addressed through distributed architecture where a lighter version of Deep NN is trained and implemented on-board with only a subset of vital output classes, whereas, the high computational power of the cloud is used to perform the complete the training of the algorithm [166]. Their use in IoT environment security was discussed in previous research published in [189,190] for malware detection. In [40], authors propose a hybrid data processing model for network anomaly detection that utilizes Grey Wolf Optimization (GWO) and CNN techniques. Authors claim to have achieved better accuracy and detection rate in comparison to other state-of-the-art IDS.

Convolution
Pooling Filtering Input Layer Hidden Layer Output Layer Figure 18. Illustration of convolution neural network working.

Deep Autoencoders (AEs)
It is an unsupervised algorithm designed for the reproduction of its input at its output through the use of a decoder function and a hidden layer containing the definition of a code utilized for the representation of input [184]. The other function in an AE neural network is called the encoder function and is responsible for the conversion of the acquired input into code. During training, reconstruction errors must be minimized [191]. One use case for AE is feature extraction from the datasets. However, these suffer from the requirement of high computational power. Deep AEs have been used for the detection of network-based malware in previous research with better accuracy than SVM and KNN [167]. Kitsune [41] is one such study where an ensemble of deep auto-encoders was used to implement an online lightweight IDS for IoT environments based on unsupervised learning and anomaly detection where authors demonstrate better accuracy as compared to other ML and DL techniques.

Restricted Boltzmann Machine (RBM)
It is an unsupervised learning-based algorithm and builds a deep generative and undirected model [168]. There are no two nodes in any layer of an RBM that have any connection with each other. Visible and hidden layers are the two types of layers making up an RBM. Known input parameters are contained in the visible layer, while the unknown potential variables are included with several layers forming the hidden layer. Working hierarchically, features extracted from a dataset are then passed on to the next layer as latent variables. RBMs were used in various research work [192,193] for network/IoT intrusion detection systems. The challenge of implementing RBMs is that it needs high computational resources while implementing it on low-powered IoT devices. Furthermore, Single RBM lacks the capability of feature representation. However, this limitation can be overcome by applying two or more RBM stacked to form a Deep Belief Network (DBN).

Deep Belief Network (DBN)
Being formed by stacking two or more RBMs, DBN can be considered as unsupervised learning based generative algorithms [194]. They perform robustly through unsupervised training for each layer separately [165]. Initial features are extracted in the pre-training phase for each layer, followed by a fine-tuning phase where the application of a softmax layer is executed on the top layer [170]. It is mainly composed of two layers, i.e., visible layer and hidden layer, as shown in Figure 19. Though the study in [188,195] discussed malicious attack detection using DBNs with comparatively better results than ML algorithms, no evidence of applicability in the IoT environment was reported in the literature.

Back Reference
Input Layer Hidden Cells Output Cells Figure 19. Illustration of deep belief network working.

Generative Adversarial Network (GAN)
It is a hybrid DL method that uses both generative and discriminative models at the same time for training [196]. Distributions of the dataset and samples is obtained by the generative model predictions about the authentic origination of a given sample from a training dataset and are made by the discriminative model [196]. As shown in Figure 20, both generative and discriminative models work as adversaries where the generative model attempts deception through the generation of a sample using random noise. On the other hand, the discriminative model attempts to authenticate real training data samples from deceptive samples generated by the generative model. Here, D(x) represents a binary classification giving output as real or fake (generated). The measure of correct/incorrect classification determines the accuracy and performance of both the models in an inversely proportional fashion. This results in models updating in each iteration [191]. The study published in [169] discussed the utility of the GAN algorithm for detecting anomalous behavior in IoT environments with promising results due to their ability to counter zero-day attacks through the generation of samples mimicking zero-day attacks, thereby causing the discriminator to learn different attack scenarios. However, the challenge with using GAN is that its training is difficult and it produces unstable results [196,197].

Ensemble of DL Networks (EDLNs)
As discussed earlier, the ensemble of various ML classifiers proves more effective than individual ML classifier results. Similarly multiple DL algorithms can be used in parallel through organizing in an ensemble to produce better results than each component DL algorithm. EDLNs can have any combination of a discriminative, generative, or hybrid type of DL algorithms. Best suited for solving complex issues, EDLNs perform better in uncertain environments with a high number of features. A heterogeneous EDLN has classifiers from the different genres, whereas a homogeneous EDLN has classifiers from the same genre. Both compositions are aimed at increasing efficiency and producing accurate results [198]. Application of EDLN for IoT security requires further study and research, to evaluate the possibility of improving the performance and accuracy of the IoT security system [12]. Table 6 illustrates common attack types handled by corresponding DL methods along with reference to related research. Table 6 also describes advantages and limitations of each suggested DL method. Later, Table 5 below covers the comparison of work conducted on ML and DL techniques on IoT Security.  [179][180][181][182][183] -Best suited in environments where data is to be processed sequentially.
-In some cases, IoT system environment produces sequential data, hence RNNs are suitable in IoT security.
The major challenge in the use of RNNs is handling the issue of vanishing or exploding gradients, which hinders learning of long data sequences.

CNNs [189,190]
Malware attacks -CNN is best suited for highly efficient and fast feature extraction from raw data.
-Since CNN can automatically learn behavior from raw network security data, they have potential application in IoT security.
-CNN requires high computational power; thus using CNN on resource-constrained IoT devices for their security is highly challenging.

Datasets Available for IoT Security
Evaluating the effectiveness of any IDS entails a reliable and current dataset that contains present benign and anomalous activities. Most of the earlier research in IDS relied on the KDD99 [199] dataset due to the absence of other datasets for about two decades. However, analysis suggests that the KDD99 dataset negatively affects the IDS results in [199,200] and [201]. Numerous research efforts have been undertaken to address the weaknesses of KDD99 and other datasets that appeared after that. A brief description of the most common datasets for evaluating IDS is presented below.

•
KDD99. This is a modification of the DARPA funded DARPA98 dataset that initiated from an IDS program conducted at MIT's (Massachusetts Institute of Technology) Lincoln Laboratory for evaluating IDSs that differentiate between inbound normal and attack connections. Later on, this dataset was used in the International Knowledge Discovery and Data Mining Tools Competition [202] after some filtering, resulting in what is known as the KDD CUP 99 dataset [199]. This dataset has been used by most of the researchers for the last two decades now. The absence of alternatives has resulted in several works directed on the KDD CUP 99 dataset [199] as a widespread benchmark for the accuracy of the classifier. However, KDD-99 possesses numerous weaknesses, which discourage its use in the current context, including its age, highly skewed targets, non-stationarity between training and test datasets, pattern redundancy, and irrelevant features. • NSL-KDD. NSL-KDD is an effort by the researchers who published their work in [199] to overcome the weaknesses of KDD-99. It is a more balanced resampling of KDD-99 where the emphasis is laid on examples that are expected to be missed by classifiers trained on the basic KDD-99. However, as their authors acknowledge themselves, there are still weaknesses in the dataset, like its non-representation of low footprint attacks [200]. • The DEFCON dataset. DEFCON-8 dataset, generated in 2000, comprises of port scanning and buffer overflow attacks. Another version, the DEFCON-10 dataset, generated in 2002 uses bad packets, FTP by telnet protocol, administrative privilege, port scan and sweeps attacks [203]. The traffic produced during the Capture the Flag (CTF) competition is dissimilar from network traffic of the real world because it primarily consists of attack traffic as opposed to usual background traffic, therefore its applicability for evaluating IDS is limited. The dataset is mostly used to assess alert correlation techniques [204,205].  [207]. These datasets are specific to certain events or attacks and are anonymized with their protocol information, payload, and destination. These are not effective benchmarking datasets because of several shortcomings, as discussed in [207], like the unavailability of ground truth about the attack instances.

•
The LBNL dataset contains anonymized traffic, which is comprised of only header data. The dataset was generated at the Lawrence Berkley National Laboratory, by gathering real outbound, inbound and routing traffic from two edge routers [206]. It lacked the labeling process and also no extra features were created [206].

•
The UNSW-NB15 is a dataset developed at UNSW Canberra by the researchers of [208] for the evaluation of IDS. The researchers used the IXIA PerfectStorm tool to generate a mixture of attack and benign traffic, at the Australian Center of Cyber Security (ACCS) over two days, in sessions of 16 and 15 h. They generated a dataset of size 100 GB in the form of pcap files with a substantial number of novel features. NB15 was planned as a step-up from the KDD99 dataset discussed above. It covers 10 targets: one benign, and nine anomalous, namely: DoS, Exploits, Analysis, Fuzzers, Worms, Reconnaissance, Generic, Shell Code and Backdoors [208]. However, the dataset was designed based on a synthetic environment for producing attack activities.

•
The ISCX datasets [207]. The Canadian Institute for Cybersecurity has been working on the generation of numerous datasets that are used by independent researchers, universities and private industry around the world. A few datasets relevant to our work are IPS/IDS dataset on AWS (CSE-CIC-IDS2018), IPS/IDS dataset (CICIDS2017), CIC DoS dataset (application-layer), ISCX Botnet dataset, ISCX IDS 2012 dataset, ISCX Android Botnet dataset, and ISCX NSL-KDD dataset. Their latest dataset related to our work is CICIDS2017. This dataset covers benign and the most up-to-date common attacks, which is comparable to the real-world data [209]. The CICIDS2017 consists of multiple attack scenarios, with realistic user-related background traffic produced by using the B-Profile system. For this dataset they built the abstract behavior of 25 users based on the FTP, SSH, HTTP, HTTPS and email protocols. However, the ground truth of the datasets, which would improve the reliability of the labeling process, was not shared. Moreover, applying the idea of profiling, which was used to produce these datasets, in real networks could be problematic due to their intrinsic complexity [209].

•
The Tezpur University IDS (TUIDS) dataset [206]. This dataset was generated by the professors from Tezpur University, India. This dataset features DoS, Probing, Scan, U2R and DDoS attack scenarios, performed in a testbed. However, the flow level data does not contain any new features other than those produced by the flow-capturing process [209]. • BoT-IoT [209] The BoT-IoT dataset was created by designing a realistic network environment in the Cyber Range Lab of The center of UNSW Canberra Cyber. The environment incorporates a combination of normal and botnet traffic. Researchers also present a testbed setting for handling the existing dataset shortcomings of capturing complete network information, correct labeling, and the latest and complex attack diversity. In their work, the authors also evaluate the reliability of the BoT-IoT dataset using different ML and statistical techniques for forensics purposes in comparison to the other datasets discussed above. The dataset's source files are provided in different formats, including the original pcap files, the generated argus files and CSV files. The files were separated, based on attack category and subcategory, to better assist in the labeling process.

Challenges and Future Research Directions
A large number of studies and research works have been published related to IDSs for IoT. However, there are still a large number of open research challenges and issues, particularly in the use of ML and DL techniques for anomaly and intrusion detection in IoT. The challenge is that there exists no standard mechanism that guarantees validation of the proposed systems or method. The research works mostly demonstrate evaluation of their proposed systems based on synthesized datasets and address one specific problem which may not work in the real world on real data and in the presence of other problems. As evident from this and other similar studies conducted on state of the art in IDS for IoT, it is very difficult to design an IDS which covers, at least, the most important aspects of an effective IDS, that is it is deployable, online, scalable, works effectively on real data and satisfies all stakeholders requirements. Instead, most of the published work share evaluation results tested on contrived datasets, cover a single or some part of the system, and show results using biased parameters.
Furthermore, a proof of completeness and accuracy of any proposed IDS is very hard to define or accomplish. Thus, one of the conclusions from this study is that it is very hard to design a comprehensive IDS, which can offer good accuracy, scalability, robustness and protection against all types of threats. Below, some of the major issues and challenges that researchers face today and in the future are described. Since the IoT security measures are still not matured, there is enormous scope for future research in this area, particularly in anomaly and intrusion detection using ML and DL techniques.
The most recent challenges related to anomaly and intrusion detection in IoT networks are discussed in the following:

•
To test and validate proposed NIDS, a good quality dataset related to IoT IDS is very essential. Such a dataset should possess a reasonable size of network flow data covering both attack and normal behavior with the corresponding label. Furthermore, in order to capture normal behavior, normal traffic data from each type of IoT device is required, other than the attack data for testing the NIDS. However, as discussed in the previous section, most of the publicly available datasets lack in providing the required features, like missing labels, incomplete network features, missing raw pcap files and are difficult to comprehend and/or have incomplete CSV files. Moreover, datasets available only capture normal behavior of a specific type of IoT devices, which restricts training of IDS on those devices only. Creating a dataset that can address these issues in a real environment will be a challenge and a potential area of research.

•
Developing an online and real-time, anomaly-based IDS for IoT networks is very challenging. This is because such an IDS would require to learn a normal behavior first to detect abnormal or malicious behavior. The learning phase assumes that there is no noise or attack traffic during this period which cannot be guaranteed. Such an IDS may generate false alarms if these issues are not addressed.

•
As also described in this paper, most of the anomaly-based NIDS tries to construct a model that captures the profile of all possible behavior or patterns of normal traffic. This, however, is extremely challenging because it has been proven that such models tend to bias towards the dominated class, that is, normal class, resulting in high false-positive rates. Furthermore, it is also not possible to capture all possible normal observations that may be generated in a network, particularly in a heterogeneous environment of IoT networks, which increases false-negative rates. Completely avoiding or minimizing false-positive rates and false-negative rates in NIDS is another research challenge.

•
It would be interesting to develop models trained on specific types of devices. These models can be applied to IDSs in other organizations using a similar type of device. This will assist other organizations, which can deploy these models and thus save time that would have been required to collect the data and train the IDSs. It will also help in detecting malicious IoT devices, which are already compromised because their behavior would be different from normal behavior captured by trained models. Developing such models is a challenging task and a potential area for future research. • Different stages involved in the design and implementation of NIDS, like data-preprocessing and feature reduction, model training and deployment, in particular, ML and DL based NIDS, increase computational complexity. Thus designing an efficient NIDS that is light on computational requirements is another challenge and area for future research. • Feature selection and dimensionality reduction methods used for proposed IDSs are suitable to work on a specific type of normal traffic and to detect a particular type of attacks which may not work once the environment of normal or attack sequences change a bit, especially under a fast-changing environment of IoT devices and networks. Thus, dynamic and computationally efficient mechanism for feature selection which can work under all types of normal and attack traffic is a potential research challenge. • DL and ML-based techniques and algorithms are being widely used for training a model on a large dataset. This has facilitated in effective handling of cyber-attacks. However, with regards to the use of DL and ML algorithms for attack detection in IoT networks, some challenges need the attention of researchers; for example, resource constraints issue with IoT devices limits the use of DL/ML algorithms [163] for protection of IoT networks. Another challenge with the use of ML/DL techniques in large and distributed networks, like that of IoT networks, is that they face scalability issues, for example in terms of various scenarios and choices of IDS deployment. One possible solution to limitations of individual DL or ML algorithms suggested by some of the authors [211] is the use of an ensemble of ML/DL algorithms that performed better in comparison to an individual ML algorithm; however, such algorithms were computationally expensive and thus resulted in network latency issues, which cannot be afforded in critical systems involving risks to human lives, like health and autonomous or internet of vehicles (IoVs) systems.

•
The techniques of semi-supervised learning, transfer learning and reinforcement learning (RL) are still not well explored and experimented for designing an IDS for IoT security in order to achieve important objectives like real-time, fast training and unified models for anomaly detection in IoT and thus are potential areas of future research. Moreover, it would be an interesting research area to use RL in combination with DL because their combined use can be beneficial in IoT network scenarios involving large data dimensionality and non-stationary environments.

Conclusions
During the last decade, the use of IoT devices has increased exponentially in all walks of life due to its capacity of converting objects from different application areas into Internet hosts. At the same time, users' privacy and security are threatened due to IoT security vulnerabilities. Therefore, there is a requirement to develop more robust security solutions for IoT. Machine and deep learning-based IDS is one of the key techniques for IoT security. In this work, a survey of ML and DL based Intrusion Detection techniques used in IDS for IoT networks and systems is presented. The IoT architecture, protocols, IoT systems vulnerabilities, and IoT protocol-level attacks have been discussed in detail. Then, this paper surveyed various research work available in the literature, which suggested IDS methodology for IoT or proposed attack detection techniques for IoT that could be part of an IDS, specifically about various ML and DL techniques available for IDS in IoT and their use by the researchers. Also, a review of various datasets available for IoT security-related research is elaborated. This work attempts to provide the researchers with the summarized but comprehensive and useful insight into the various security challenges currently being faced by IoT systems and networks and possible solutions, with a focus on intrusion detection, based on ML and DL based methods.